Talk:Git (software)
From Wikipedia, the free encyclopedia
Archives |
|
[edit] Git.kernel.org Web Service
Feels to me like we should give more prominent mention to the public web service
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git
I'm not yet skilled enough as a Wikipedian to be sure if this is an encyclopedic idea or not, nor how to accomplish it idiomatically.
Wikipedia trails today include:
page http://en.wikipedia.org/wiki/Linux_kernel
page http://en.wikipedia.org/wiki/Git_%28software%29
footnote http://git.or.cz/gitwiki/GitProjects
"""
Linux Kernel
http://kernel.org
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git (gitweb)
"""
I imagine we lose most of the audience at the transition to footnote from page.
-- Pelavarre 15:48, 2 December 2007 (UTC)
[edit] software configuration management
At the start of the tech talk (see article references), Linus starts by saying that git is a source control management system, and not a software configuration management system, as the wikipedia article says on the first line. Who's right? -- 62.166.203.162 09:26, 13 November 2007 (UTC)
[edit] Missing info?
I'm semi-technical, and this encyclopedia entry leaves a lot to be desired -- what language was git programmed in? I see that there is a C implementation, which implies that it was written in something else. If this info could be added by somebody knowledgeable, that would be great. Also, the "bullet-format" entry does not make for an easy read at all. —The preceding unsigned comment was added by 72.77.106.253 (talk) 23:34, August 20, 2007 (UTC)
[edit] Missing link
The link for gct "a GUI enabled commit tool for Git and Mercurial." points to a 404 error et http://www.cyd.liu.se/~freku045/gct/ —Preceding unsigned comment added by 81.57.182.250 (talk • contribs) 14:25, 27 November 2006
- Removed. qwe 16:45, 27 November 2006 (UTC)
[edit] Trivia
[edit] When did git self-host?
I found this interesting thread on the git mailing list: Trivia: When did git self-host?. Maybe someone wants to incorporate that information into the article. --88.134.67.237 14:08, 27 February 2007 (UTC)
- Done. 71.41.210.146 05:10, 5 March 2007 (UTC)
[edit] Meaning of the word 'git'
This isn't worth putting on the article page, but just FYI, "git" does not (at least directly) mean a stupid or unpleasant person. It's actually the slang pronunciation of the word "get" which you may be more familiar with in the context of "the get of my loins", or "he's a misbegotten son of a ...". In other words, it's an alternative for the word 'bastard'. Which actually fits quite well with the quotation from Linus: Linus Torvalds has quipped about the name "git", which is a British slang for a stupid or unpleasant person: “I'm an egotistical bastard, and I name all my projects after myself. First 'Linux', now 'git'.” --129.113.28.125 13:51, 13 August 2007 (UTC)
[edit] Useful references
This is a list of good mailing list postings that could be useful for the article at some future date. (Please delete when they're merged into the article.) [1] a very informative example of how "git bisect" works with non-linear history. —Preceding unsigned comment added by 71.41.210.146 (talk • contribs) 05:10, 5 March 2007
[edit] Serious cleanup needed
Urgh. This is practically unreadable just now. I'll try to tackle some of the more egregious bits straight away, probably by deleting them. Chris Cunningham 11:54, 13 March 2007 (UTC)
- Done for now. The internal gubbins section needs reordered, and it looks like there's massive ref duplication, but can't do everything at once. Chris Cunningham 13:17, 13 March 2007 (UTC)
[edit] External links
I nuked the expansive mailing list portal section, along with things which had been dead for ~18 months and some tutorial links. There were some links which may contribute useful info to the article; for now I've commented them out. I'll move them to talk on the next pass. Please try to find them homes in the article itself if they contain notable info. Chris Cunningham 13:17, 13 March 2007 (UTC)
[edit] git's development process explained
Junio C Hamano, A note from the maintainer, 2007-02-16
I just found this interesting mailing list post, sent out every few months by git's maintainer. It describes how to retrieve git, how to contribute, and also how development takes place. I don't know if it's worth mentioning in the article, but I think it's useful at least on this page for people like me how always wondered what these topic branches (master, maint, next, pu) were all about. --88.134.67.237 14:29, 31 March 2007 (UTC)
[edit] What should be done with the "Using Git" section?
I notice that this change deleted the example: http://en.wikipedia.org/w/index.php?title=Git_%28software%29&diff=114782916&oldid=114782573 leaving the section confusingly incomplete. Should the section be deleted entirely, or what should it contain and how big should it be? I'm willing to do the writing, but could use guidance on the goals. (Please purge this section from the discussion page if/when the issue is settled.) 71.41.210.146 06:25, 16 May 2007 (UTC)
[edit] GNU Interactive Tools
The GIT FAQ says about the name that it is a
- “random three-letter combination that is pronounceable, and not actually used by any common UNIX command”
The latter part is not true. For years GIT has been GNU Interactive Tools, a sort of a file manager:
- http://directory.fsf.org/git.html
- http://hulubei.net/tudor/git/
- http://packages.debian.org/git
- http://packages.ubuntu.com/git
--Blanu 06:04, 17 May 2007 (UTC)
GNU is an acronym for GNU Is Not UNIX, and thus GNU Interactive Tools expands to GNU Is Not Unix Interactive Tools. Therefore, the statement is quite correct. Git is NOT actually used by any common UNIX command. —Preceding unsigned comment added by 65.217.43.131 (talk) 01:09, August 25, 2007 (UTC)
- indeed, and on Debian at least, the package name "git" is taken by GNU IT. -- Jon Dowland 13:05, 12 July 2007 (UTC)
[edit] POV
The "Unique Characteristics" and "Using GIT" sections are laden with language that fails Wikipedia's requirements of Wikipedia:Neutral point of view. This needs to be cleaned up. -/- Warren 06:00, 3 June 2007 (UTC)
- "Using Git", as mentioned above on this discussion page, is just broken. But as for "Unique characteristics", I'm trying to figure out precisely what's wrong. I think it goes on a bit, and could be tightened up, but I'd hate to perpetuate NPOV problems and I'm not quite grasping how the problem applies in this specific instance.
- I just read through the NPOV policy and most of the associated articles, but they're mostly about edit wars and political hot buttons and situtations where there are strongly polarized opinions. The problem here is I'm not seeing what the alternate point of view is!
- Most articles are written by fans, and no difference here, but I don't see significant egregious gushing. There are a number of factual assertions, and while there aren't specific citations for each one, as far as I can tell they're all objectively true. The strongest assertions are specifically footnoted.
- The first two paragraphs could probably be toned down; I'll do that.
- The most questionable fact is "toolkit design"; as part of the MS-Windows porting effort, there's an effort underway (and nearing completion as of 1 June 2007) to eliminate all the shell scripts, so this could stand an update. It's a bit tricky, though—it's still designed that way; only the implementation has changed.
- The section title could perhaps be "Comparison to other version control systems", since some of the points are things that are not truly unique, but the current one has the general idea and is shorter. "Distinguishing characteristics", perhaps?
- Maybe it's a pervasive tone thing and not specific words. Still, could you (or anyone) point out a specific example to make the flaws clearer? Thanks.
- 71.41.210.146 17:59, 3 June 2007 (UTC)
-
- Statements like "Git supports rapid and convenient branching and merging" are simply not acceptable. More precisely, words like "rapid" and "convenient" are not words that an encyclopedia should use to describe something, unless such a claim can be quantified. These are assertions of opinion, not statements of fact. Here are some other similar statements with inappropriate phrases bolded: "Repositories can be easily published"; "shell scripts that provide convenient wrappers. It is easy to chain the components together to do other clever things."; "It is thus easy to experiment with new merge algorithms"; "communication for the merge is small and efficient". It's fine to state that a piece of software is designed to be "fast" or "easy", if you can find someone authoritative who has stated that; you would also need to say who made that statement. Research papers or other studies that quantitatively measure Git in comparison with other SCM systems is a good basis for content, too. However, unchallenged statements of a product's awesomeness are WP:NOT acceptable per Wikipedia's policy on advertising. -/- Warren 18:54, 3 June 2007 (UTC)
- Thanks, I'll fix! Note that in many cases, the adjectives can be justified by links to benchmarks. The shell-script wrappers part is trying to say "more convenient than the primitives", which I think is a sufficiently obvious statement to not need specific citation, but I can certainly clarify it. 71.41.210.146 02:40, 4 June 2007 (UTC)
- Statements like "Git supports rapid and convenient branching and merging" are simply not acceptable. More precisely, words like "rapid" and "convenient" are not words that an encyclopedia should use to describe something, unless such a claim can be quantified. These are assertions of opinion, not statements of fact. Here are some other similar statements with inappropriate phrases bolded: "Repositories can be easily published"; "shell scripts that provide convenient wrappers. It is easy to chain the components together to do other clever things."; "It is thus easy to experiment with new merge algorithms"; "communication for the merge is small and efficient". It's fine to state that a piece of software is designed to be "fast" or "easy", if you can find someone authoritative who has stated that; you would also need to say who made that statement. Research papers or other studies that quantitatively measure Git in comparison with other SCM systems is a good basis for content, too. However, unchallenged statements of a product's awesomeness are WP:NOT acceptable per Wikipedia's policy on advertising. -/- Warren 18:54, 3 June 2007 (UTC)
Something specific: 'Efficient handling of large projects. Git is very fast, and scales well. It is commonly an order of magnitude faster than other revision control systems, and several orders of magnitude faster on some operations.[cited]'
- Apart from the aforementioned problems (note the Bold items), the phrase 'Other revision control systems' is very general, bordering on the deceptive. You have to go to the linked article (blog) to notice that it is only talking about Bazaar and Mercurial, in a test that the cited article itself calls 'unscientific'. And despite giving timings to the millisecond, the comments note 'The units for those tests are seconds, and they're all wall clock times.'
- For this sort of statement to be asserted; the methodology needs more rigour; and the range of systems it is compared to should be wider, and include a winder range of systems such as CVS, Subversion, and commercial systems such as ClearCase (and Perforce — Humg 23:17, 31 August 2007 (UTC)).
- ie. To make assertions about GIT's speed; especially if asserting that it is 'orders of magnitude' faster, better citations are necesscary.
- $0.02. EasyTarget 10:06, 5 June 2007 (UTC)
- I'll dig them up; the statement is objectively true, just needs better citation. Mercurial is the only VCS that's close. Here's a first pass at relevant links. OpenSolaris SCM evaluation FreeBSD evaluation of VCS speed Jst's evaluation for Mozilla (notice the > 100x speed difference on several operations) a subjective blog post Some very fragmentary benchmarks at the openoffice.org wiki An Hg vs. bzr comparison and this and this and this dead link that gets cited a lot (and I've found relocated here) that can be used to link hg vs. git observations like this. Too bad that omits numbers, as does this observation that only says "blazingly fast" vs. "unworkably slow". Likewise this complaint about SVK.
- Oh! I missed this link farm of VCS comparisons. And here are some 1-10 rankings of 7 VCSs including "speed", but it doesn't quote figures.
- You know, a direct comparison with CVS is hard to find. Here's a space comparison (and I know I've seen others), but we want time comparisons. This fairly famous blog post says "We've all gotten very spoiled by Git; many operations which take minutes under CVS now complete fast enough to leave you wondering if anything happened at all." and this blog post about moving from CVS to git says "The first thing you will notice is that it’s damn fast!", but neither quote numbers. Relative to CVS, it appears sufficiently lopsided that people don't generally quantify it. I'll have to search mailing list archives for info.
- a lot of comparisons only talk about features and not performance. CVS vs. SVN speed comparison.
- here's an observation to support the "it's easy to use git in scripts" point. Not relevant to speed, but useful for later.
- Of course, I could always do benchmarks myself, but that would be "original research". 71.41.210.146 22:56, 5 June 2007 (UTC)
- Another speed comparison (where he creates a branch in a very non-optimal way with git). Here's a not very scientific git vs. BitKeeper comparison.
- Thanks! Those are much more informative comparisons. An observation:
- - They are still limited in the systems they compare to; only making comparisons to other (mostly FOSS) CM tools with a similar use model and architecture. There are also very successful and widely used commercial CM systems; eg. ClearCase, StarTeam and Synergy, that I do not see covered in any of those references.
- I'm not going to make the change myself, but the statement should be put in context; GIT's speed advantage has only (so far) been demonstrated when compared to similarly orientated tools, and only on POSIX architectures.
- EasyTarget 09:53, 13 June 2007 (UTC)
It should be noted that "commonly", from 'commonly an order of magnitude faster than other revision control systems,' is a term of art in computer science that refers to runtime execution speed with respect to time (see any Computer Science/Mathematics article on big-O notation). It specifically refers to the "average case" execution time. While such a claim may require citation (typically via benchmarks) it does not indicate bias, or an intent to advertise. James Martinez 5:53, 25 July 2007 (UTC)
How about adding a 'criticisms' section too? Compared to other versioning systems like svn of even sccs, git is terribly hard to use and the documentation is not particularly helpful. Basically it suffers from the same weaknesses that Linux suffered from for a decade or so of its early life. High technical quality and powerful options, but terrible in terms of usability for a new user or a non-super user. IMO it should have stayed as an engine, it really needs a front-end to be less than maddening to use. 128.97.68.15 (talk) 21:14, 23 May 2008 (UTC)
[edit] "Unique characteristics" is a bad section header
In particular, the word "unique" is troublesome. You can't guarantee that any of the characteristics are unique, and some of them demonstrably are not.
All it takes is for a new VCS to come out that includes some of these features, and suddenly the article is making a false claim.
Really this section is about what makes Git interesting compared with other VCSs.
Some viable replacements for the section header:
- Distinguishing features
- Notable characteristics
- Differentiating features
- Idiosyncracies
- Comparison to similar projects
- Peculiarities
Or, simply
- Characteristics
Direvus 22:04, 14 August 2007 (UTC)
Also, within this is at least one thing that is not a designed feature, as designated:
- Garbage accumulates unless collected. Aborting operations or backing out changes will leave useless dangling objects in the database. These are generally a small fraction of the continuously growing history of wanted objects, but reclaiming the space using git-gc --prune can be slow.[17]
This should be moved somewhere else if this section stays with its current intent. 205.209.73.247 (talk) 16:37, 9 April 2008 (UTC)
[edit] Interview with Junio Hamano
Episode 19 of FLOSS Weekly has an extended interview with Junio Hamano.
http://www.twit.tv/floss19
Dvandelay 21:56, 2 September 2007 (UTC)
[edit] Confusingly ambiguous thing about snapshooting
The text says: "One property of Git that has led to considerable controversy is that it snapshots directory trees of files.". This appears to say: "of all the things that can be done to directory trees of files, snapshooting is the thing (or one of the things) that git does." What I think it means to say is "The things that Git snapshoots are directory trees of files rather than individual files." This could perhaps do with clarification. AMackenzie (talk) 21:44, 23 January 2008 (UTC)
[edit] Wrong source link
Can someone who know which one is correct, correct the source link for linus comment. Link says [43] but it goes to #42. This probably isn't right. —Preceding unsigned comment added by 195.148.99.9 (talk) 12:32, 24 January 2008 (UTC)
[edit] Article status
I removed the cleanup tag that had been there since August 2007. This article is much better than that now! The article covers quite much and is very well referenced, even if there are still things to do. Most importantly there still needs to be some editing done to convert the list style to a more fluid text. We need to isolate even fewer really characteristic features of git to describe. As a git user I know that it might be hard to reduce the number of unique features. -- Sverdrup (talk) 20:48, 8 March 2008 (UTC)
[edit] "git bisect" is worthy of some text some time...
Since copied, it's a dearly loved "killer feature" by those who use it. References:
- Andreas Ericsson (2008-03-11). Re: Mercurial's only true "plugin" extension: inotify... and can it be done in Git?. git mailing list. “Clearly, git is the most innovative tool here, since its developers managed to cook up something so immensely useful as "git bisect" (which by the way is well-nigh single-handedly responsible for reducing our average bugreport-to-fix time from 4 days to 6 hours).”
- Michal Piotrowski; Maciej Rutecki, Rafael J. Wysocki (2007-06-17). "Chapter 4: Git, quilt and binary searching", Linux Kernel Tester's Guide, version 0.3-rc1. Retrieved on 2008-03-11.
- Bowes, James (2007-02-18). "git bisect: A practical example with yum". James Bowes' blog. “I used git bisect to track down a bug in yum last night. It was so easy and practical that I figured I should record it here, so that others might want to give git a try.”
Oh, and another Git talk, this one post-1.5-release:
- David Nusinow. (2007, 06-18). Maintaining Packages with Git (ogg). Debconf 2007. Retrieved on 2008-03-12. Event occurs at 02:34. "You don't realize how slow [subversion] is until you're not having to hit the network every single time you do an operation."
- (Addendum: Just listened to the talk; it's truncated at 15 minutes and not very interesting. Ah, well.)
71.41.210.146 (talk) 03:46, 12 March 2008 (UTC)
[edit] More references... Linus bragging
Placed here for possible reference when more article text gets written.
- Linus Torvalds (2006-11-28). git and bzr. bazaar mailing list. Retrieved on 2008-03-13. “Such a "multiple sources" case can actually be found by doing
which (correctly) figures out that the code comes from both merge-tree.c (the "entry compare/extract" functions)_and_ from sha1_name.c (the "find_tree_entry()" function).git blame -C tree-walk.c
So yes, "git blame" is a _hell_ of a lot more powerful than anybody else's "annotate", as far as I know. I literally suspect that nobody else comes even close.” - Linus Torvalds (2005-06-22). Do a cross-project merge of Paul Mackerras' gitk visualizer. Retrieved on 2008-03-13. “This merge itself is pretty interesting too, since it shows off a feature of git itself that is incredibly cool: you can merge a separate git project into another git project. Not only does this keep all the history of the original project, it also makes it possible to continue to merge with the original project and the union of the two projects.”
And a "Linus is proud of his performance" link:
- Linus Torvalds (2006-11-28). git and bzr. bazaar mailing list. Retrieved on 2008-03-13. “Performance is important to git, but it's important not in the sense of "let's not do it because it performs badly", but in the sense of "things should be so fast that people don't even realize that they are done". You guys may count commit times in seconds. I still want to commit multiple patches _per_second_ to the kernel tree. THAT is performance.”
[edit] Two inaccuracies
I have uncounted two inaccuracies in the article. Can anyone correct them? Below, I have explained what is wrong with those statements.
Statement 1: Git is a set of primitive programs written in C, and a large number of shell scripts that provide convenient wrappers.[13] It is easy to chain the components together to do other clever things.[14]
There is constant work of converting existing shell scripts to C. Currently, most of git commands are implemented in C with less than 25% implemented as shell script wrappers. However, this does not mean that that the ability to write convenient wrappers as shell scripts has diminished over time.
Maybe, it should be re-written as: "Git started as a set of primitive programs written in C, and a large number of shell scripts that provide convenient wrappers.[13] Most of those shell scripts are converted to C now, but it still easy to chain the components together to do other clever things.[14]"
Statement 2: Git on Windows is noticeably slower,[46] due to Git's heavy use of file system features that are particularly fast on Linux.[47]
The main problem with the Cygwin version of Git is emulation of Fork (operating system). Describing fork() as file system feature is incorrect as it has nothing to do with filesystem, but process creating.
Fork is available on any POSIX-compliant operating system, and it is reasonable fast on any system using a copy-on-write technique, while Cygwin emulation has to perform the full copy with other overhead caused by the lack of support in the kernel, which makes it so expensive.
The problem is most noticeable for git commands implemented as shell scripts, because shell scripts often create new processes to do their job, which involves fork().
--Dpotapov (talk) 22:17, 23 March 2008 (UTC)
- Thanks for the suggestions. Why not be bold and go ahead and make the changes? Your first point is definitely well-taken (and I'll make the change), but I question the second. While fork is one problem, the really nasty one is the
lstat(2)equivalentGetFileAttributesExA(). On Linux, due to the dcache, it's blazingly fast, and git relies on being able tostat()every file in the source tree very rapidly. But it's far slower on Windows. Some recent work has eliminated some redundantstat()calls and eliminated some emulation overhead,[2] but there's still a difference. - Oh, here's a reference to the MinGW port nearing completion.[3]
- 71.41.210.146 (talk) 05:08, 26 March 2008 (UTC)
Why I think that the statement #2 is incorrect:
1. Fork is the *major* problem for Git on Windows. Its emulation by Cygwin is *very* slow, and fork() is NOT a filesystem feature.
To demonstrate how slow shell scripts can be on Windows, here is one example [4]. When git-fetch was re-written from shell to C, Windows users reported 25x or more speedup, while Linux users have not noticed any difference. (Probably, because on Linux the speed was bound by network communication and time needed for the server to respond). So, the lost of performance on fork() emulation is really huge on Windows.
2. Indeed, Git does stat() on every file in the working tree, but practically any other version control system does so, because before to check-in changes or show changes or many other operations, you have to find what files have been changed. The only exception I heard about is Mercurial, which optionally allows to run a special deamon, which monitors changes in your work tree, and thus to avoid the need to scan the whole directory to find changes. Thus doing stat() on every file in your work is pretty normal.
While emulation of lstat() in Cygwin has some overhead over using Windows native API, it is not as big to be easily noticeable. You can try to run Windows-native version of Subversion and then Cygwin version of Subversion, and see what difference does it makes...
You said stat() is blazingly fast on Linux and far slower on Windows. IMHO, what is "blazing fast" and "far slower" is very subjective. Do you have any numbers?
Here are results of what happened when the number of lstat() calls was cut twice: on Linux, the performance increased on 57% [5], while on Windows (MINGW version) it increased only on 39% [6].
If lstat() were the main cause why Git is slower on Windows than on Linux then you would expect a bigger gain on Windows than on Linux, but in reality the opposite is true.
Thus, the idea of stat() as being the cause is completely discredited. In seems, the original idea was based solely on some Linus' speculation a long time ago, but Linus has never run Git on Windows. Besides, later, after seeing the result, he openly admitted: "I have absolutely no idea how to do performance analysis or even something simple as getting a list of system calls from Windows (even if I had a Windows machine, which I obviously don't ;), so I'm afraid I have no clue why git might be considered slow there. I was hoping this was it" [7].
So, I believe the phrase "due to Git's heavy use of file system features that are particularly fast on Linux." should be removed as lacking of any ground.
3. I think the phrase "Git on Windows is noticeably slower" is correct, but the reference associated with this phrase is a bit misleading and requires further explanation. In cases where Git commands written in C is used, Git performs only slightly slower than on Linux (it still may be noticeable, but I have also noticed that SVN can be slower on Windows than on Linux in some cases). The real slowdown happens when a Git command written in shell is used. These commands can be slower on Windows 10 times or more. (See my above example about rewriting git-fetch in C). The reference attached to the phrase about Git being slower on Windows refers precisely to the case where a shell script was used. Unfortunately for Windows users, git-merge is still a shell script, so it is not surprisingly that it is considerably slower on Windows than on Linux.
--Dpotapov (talk) 11:19, 26 March 2008 (UTC)

