You are viewing an archived version of danielfischer.com

 

All That Git Talk in The Rails World, What Gives?

Published: February 1st, 2008

I’ve been noticing a heavy upheaval in the source control world of Ruby on Rails: and the spotlightis on Git. So Git, what gives?

Introduction to Git

If you don’t know what Git is, it’s supposedly the next best thing for source control, just like SVN was to CVS. It was originally created by Linus Torvalds, the same guy who brought you Linux1. Git is most notably used for the Linux kernel.

Things to note

There are a couple things that I’ve read that are supposed to make Git very powerful, which I somewhat agree with, but at the same time don’t understand:

  • Speed: I don’t have any graphs on me, but ones that I’ve seen show it being mega-folds faster than all version-control systems. This is especially shown when dealing with thousands of small files (Like most open-source projects). Of course, when looking at the graphs, the critism around them was that they were showing results on ‘old versions of the version-control system’.
  • Superior Branching Capabilities: This is the only one I truly understand and absolutely give Git credit for. As you all probably know, branching in SVN is an incredible pain. In Git, it’s as easy as 1, 2, 3. I’d provide an example if I knew it off the top of my head, but I’m sure someone will comment on this for further insight.
  • Distributed Repositories: This feature is probably the biggest feature of Git itself. When you get code from a repository, that code you checked out is actually a full repository itself. What does this mean? That potentially means anyone can pull the repository from you. This feature is what I don’t understand, why do you need distributed repositories? Why would you need something other than a central repository with read/write access? Maybe I’m just so used to the subversion workflow that I just don’t understand why a distributed repository system would be useful. There is one small thing, and I’ll mention that on the next bullet.
  • In relation to above, when a user checks out a repository, they are given a full repository themselves; this leads to an amazing benefit: they get to commit locally without syncing with a central repository. That means you can be offline and still commit your code and have it logged.

My Views

That’s the basic understanding I have of Git, and obviously you can see I’m a bit confused and don’t really understand why it’s overly superior to SVN and other version-control systems. I’ll give it props for being faster, and easier to branch off of, but that can happen in Subversion as well with an update. Another thing to note is that if your current project is using subversion you could potentially use Git as a wrapper, which gives you access to the local repository features, but I don’t think it’ll be any faster because it’s still a hook to SVN commands. One other downside to using Git as a wrapper for SVN is that all the commits you made with Git will not be pushed into one big commit in Subversion. Here’s an example:

Dave, your ever-so-loving coding partner who uses Git instead of Subversion to be the cool developer on the block, somehow committed 500 changes in less than a minute! This changed the subversion change-set from 500, to 1000 in less time than it takes the big hand on the clock to fully turn!

Hopefully you understand that example. I personally dislike Git on Subversion projects for the above example’s outcome. I actually use those change-sets as a milestone in my mind of how things are progressing, if it changed 200% in one minute, it would really throw off the point of a change-set.

The Summary

So, in summary: Git is superior in branching, and for the most part, speed. But, why Distributed Repositories? I hear it can be useful in open-source projects, but I really have no idea why, and why would this be useful for closed projects as well?

I’d really appreciate if anyone can shed some light on why it’s otherwise superior. I love to stay on top of the best technology, so kick my ass on my lame knowledge and educate me! Then I’ll be able to convert my team to Git :)

p.s a lot of Rails projects are popping up on http://github.com/.

  1. http://www.linux.org/ []

The distributed nature of git makes collaboration easier than with svn. Consider an open source project like Rails. Under svn, you checkout the Rails source, make the changes you like, produce one monolithic patch, and submit that back to the core team for evaluation. During this time you can never commit those changes anywhere or mark waypoints in development.

With GitHub and git, the workflow is much nicer. Let’s assume that Rails core maintains their version of Rails on GitHub. You would fork the Rails project into a repo under your name. You then clone that to your machine (which is a FULL repository with all history and can be pulled down surprisingly quickly). You make your changes, being able to commit as you please (commit messages and all). When you’re ready for Rails core to evaluate your modifications, you push those changes to your fork of Rails on GitHub and inform the core team that they should pull from your repository. It’s trivial for a core member to evaluate your code in a new branch of their local repository. They don’t have to worry about a patch not applying cleanly or being incompatible with their current revision. If they like the changes, they can merge them back into the mainline which would then show each of your commits as you had made them, complete with you as the author and placed correctly in the timeline. You complete the circle by updating your local repo from the new Rails master. You don’t even need to pull down the commits you authored, you already have them!

You can use the same workflow for closed source projects. You don’t need GitHub to make this possible, but it definitely makes the process more streamlined (and gives you a nice online interface to your code and changesets).

Another thing I love about git is the ability to create repositories without a central server. When I start a new project, a single ‘git init’ in that directory gives me a complete, self contained repo. You can even use git to revision your /etc directory so you never lose a config file again!. Oh, did you know that git repos only need a single .git dir in the top level of the repo? Sure beats dealing with a .svn in EVERY directory.

Even without all this good stuff, I’d still use git just for the local commits and cheap and easy branches. Any time I want to try out a new feature I just branch off and start coding like mad. No worrying about what revision I branched at, or special directory structures (i.e. trunk, tags, branches) to maintain. Just my code at whatever revision I want. Moving between branches is so fast you sometimes wonder if it actually changed.

Git has some very powerful tools for merging branches and messing around with prior commits. Need to combine several commits into one before you submit back to the mainline? No problem, git will help you do that. Need to merge just a single commit from one branch to the next? Look up git-cherry-pick. Updating branch B with changes from branch A by applying each branch B specific commit on top of branch A can be done with git-rebase. It even pauses after a conflicting merge and lets you fix it at the point it first occurred, then continue once you’ve fixed it!

I’m going to stop, but these are just a FEW of things that make git an improvement over centralized SCMs. As with any powerful tool, git has a bit of a learning curve. What really opened my eyes was sitting down for a few hours with the docs to REALLY understand the underlying organization of the object database, the difference between the working directory and index, and how to manipulate commits.

Hope this helps!

[Disclaimer: I am co-founder of GitHub]

gravatar

Regarding your concern about the individual commits to SVN - you can avoid that pretty easily by doing your work in a branch and merging it into master with the –squash flag before committing to SVN. You might have to tweak the commit message to get rid of a lot of the git-specific metadata, but you can preserve the original commits and you only bump the revision number by 1.

gravatar

The Err guys talk a good deal about git in the latest Ruby On Rails Podcast episode. Christ does a good job of explaining how git changes open source. Check it out: http://podcast.rubyonrails.com/programs/1/episodes/err-free

gravatar

TYPO: *Chris* does a good job of explaining…

But I suppose Christ could also appreciate the benefits of git.

gravatar

Hey Fisch.

One great thing about distributed repositories with git that makes sense even with a closed source projects is the ability to do commits even while disconnected.

So I am on a train with no internet but of course I am still working. No problem. I just continue making my small changesets, committing each time I do something that works as a whole.

Once I get to a connection I do a push to the main repository (also at github!).

With svn the push and commit are bound, so if you are working in a disconnected state you have to ball up a shit load of stuff together while you are offline and commit when you get back on. Anyone who values small changes and commits can see why this approach sucks ass.

I hate going back to svn now that I am all gitted up. :)

gravatar

You should read the the Advogato article “Git is the next Unix” - http://www.advogato.org/person/apenwarr/diary/371.html - it really helped me understand what a big departure Git is from traditional source control systems.

gravatar

If I can chime in on the benefits of the distributed nature of Git which you seem to wonder about. Let me give a few scenarios.

When we all live in a world where everyone has a complete copy of the history of a project in a compact and fast local repos its easy to pick up where someone else left off.

For example, say I developed a really cool gem. However I tired of it after one release and never touched it again. In the centralized SVN world that plugin would probably sit and rot on my server. Or worse, I might take it offline and no one can ever see the history again. All they may have in this scenario is the latest snapshot of current code pulled from SVN.

In a Git world, where we all have equal clones of the full history of the project, any Rails developer can announce that they are the new master and any interested contributors can send patches or push to that new master. This changing of the guard can happen at any time and is trivial. In fact, if you think you can do it better than me then we can both announce we are masters and fork our projects. The project that wins is the one that people decide to push to/pull from.

Another major benefit is that we are all no longer dependent on that one master SVN repos. If the machine the master hub repository sits on gets nuked it will be a major pain for developers to figure out how they will share their work while someone tries to un-nuke the server. This is not to say that Git doesn’t work well in this centralized model. It does if you want it to (just designate one repos as the place for all to push to). But if the ‘central’ repos gets nuked in a Git world, someone else just announces that their copy is the new central hub for everyone to push to and pull from. Problem solved with one email and a simple config change.

Lastly, even if your a single developer working alone you can benefit from Git’s decentralized nature. I can keep one clone on my laptop, and one on my desktop, and as long as I have SSH access between machines I can always push and pull code between them. I can even do a backup push to an offsite repos on any machine that has Git installed and SSH access. The Git protocol is so fast and compact I can pull the entire Rails repository in just a couple of minutes. With all history of every commit ever made to rails tucked neatly inside.

Happy Git user. Decentralized is good. Really happy Rails community is moving in this direction.

Glenn

gravatar
  • Glenn Rempe
  • Feb 2nd

I agree that Git can help even for a single developer, I use it myself.
As for private developing, it’s easy to work local then share the work when it’s more stable.
Being distributed means it’s safer, if anything happens to the central repo (It happened for me) then you can simply clone your own, because there is no central repository, just one that you declare it.

Besides GitHub there’s also Gitorious a free git hosting for open source. Even the site is open source.

gravatar

[...] guy is confused about why distributed version rocks so much. He first needs to read James’ article. Then you guys with the same feelings [...]

[...] All That Git Talk in The Rails World, What Gives?Capistrano 2.1 out! Includes GIT [...]

The best way to see why git is so hot is to give it a try. It’s more than just easy branches and speed. It’s the whole concept that git embodies. I use git even for “shared” directories between all my latops. It simply solves a lot of problems that other tools don’t.

gravatar

Nice discussion on Git. Let me showcase something that I did just this week that would be a pain to do in SVN but was trivial to do with Git.

I had a blog hosted over Mephisto 0.7.3. I am using the SVN server of my hosting service. Capistrano to deploy, all good and dandy.

Finally Rick Olson annouces Mephisto 0.8. Great! I have to have it! But … how?? I’ve made a few changes, tweaked here and there.

And one good news: Mephisto is now hosted over Git, so I just clone it. Then I clone my SVN repository using Git-SVN (and this is one hella of a nice tool). Finally, I merge both branches. Lot’s of conflicts, that’s for sure. But from now on I have a common ancestor to both branches and the next merges should be trivial.

Sometimes I am not a hacker good enough to send things back to the original master, or I just made changes that matters to my work (like my blog), so I maintain 2 different branches: one with the original source, and the other with all my changes. I can now just merge everything from the master, or cherry-pick just the changes I need, and Git makes it utter trivial to do so.

gravatar

I think the problem you describe with the revision number bumping up 200% is a matter of bad source control practice. You would have experienced the same bump if the developer had committed each time to Subversion as well — just over a longer period of time, but the end result would be the same.
Ie. the problem seems to be too many small commits.
But perhaps the design of Git is inviting to that kind of behavior a little more than SVN etc.

gravatar
  • Tobias
  • Feb 6th

[...] All That Git Talk in The Rails World, What Gives? [...]

You are not really “git”ing the point of git. You did mention fast, easy branching and distributed. But then you go into “My Views” and say that subversion can branch easy too (which I disagree with). Subversion branches have to happen on the server and then have to be re-downloaded again to the developer. Then they might propagate to ALL developer’s machines depending on how much of the SVN tree they have checked out. Git branching happens in microseconds.

Subversion has so many flaws in it’s repository format too. Read here about Repository Formats Mattering http://keithp.com/blogs/Repository_Formats_Matter/ They do! Most projects are 50% the size of subversion repositories.

As to your argument about 500 commits at once. It’s just a matter of education. You could take all 500 of those commits and roll them up as one with git merge –squash

Git, BZR, HG - they are all the wave of the future. Resistance is futile. Git is just one of the 1st.

:)

gravatar

e.g.
Let’s say you’ve got 5 developer’s. With a distributed repo.. each developer can create as many branches as they want in their local repo, trying a bunch of different things out.

With a central repo system, the developer might hesitate to create so many branches on the server, since they don’t want to clutter up the server with branches that only they care about, and that aren’t likely to be merged into the code base in the end.

Also, (I think) svn doesn’t have the ability to obliterate (yet)…

With git.. you can branch, do some funky experimental code, still enjoying the benefits of version control, and not think about who else is going to see your experimental code, that you might not want to have to explain to the team yet, until it looks like something you might keep and merge in.

So essentially, branching and merging becomes a much more natural and casual thing to do with a distributed system.

Cheers,
Rajesh Duggal.

gravatar
  • Rajesh Dugga...
  • Mar 11th
Enter your comment

Ready. Set. Go.

In terms of the formatting, you're allowed to use markdown, textile, or basic html; it's truly up to you -- what strikes your fancy?

You don't have to worry about your e-mail address being sold to a russian-spam-mafia. I'm only going to use it for my own weird needs; like asking you out for a date on a lonely night of coding.