Talk:B-tree
From Wikipedia, the free encyclopedia
Shouldn't the majority of this be under (a,b)-tree? B-trees are a variant of (a,b) trees, so the (a,b) tree article should contain pretty much everything except stuff like the fact that a = B/4 and b = B/2 in a B-tree.
Moved this text to the discussion section from the main definition. Chadloder 18:34 Jan 22, 2003 (UTC)
The original author interchanged between B+-tree and B-tree. If I remember correctly the only difference is that one only stores values in leaf nodes.
Also, should this article be titled B-tree instead of B tree? And does any know what the B stands for? I think it is balanced.
-
- German Wikipedia states that it is unclear what the B stands for and that the inventors didn't gave an explanation for it. It could be Balanced, Bayer, Broad, Bushy, or Boeing. (One of the inventors of the B-Tree is Rudolf Bayer, who worked for the Boeing Scientific Research Labs.)
- I moved the article from B tree to B-tree. The old title might have been that way because the author was trying to write about B+trees and the '+' got gobbled up. --Mrwojo 15:32 Feb 15, 2003 (UTC)
- Sometimes B+Trees are also named B*Trees.
There is some disagreement on what the B stands for. Some people think it stands for 'balanced', 'bushy', or even 'Bayer' after one of the authors. Chadloder 18:35 Jan 22, 2003 (UTC)
Contents |
[edit] duplicates
B-trees can have duplicates, right? This could be added to the article. Meonkeys 19:16, May 16, 2005 (UTC)
Yes. The B-tree representing the "name index" in a phone book database probably has lots of identical duplicate "John Smith" names.
Also, is a B-tree the same as an n-ary tree? Meonkeys 19:16, May 16, 2005 (UTC)
A B-tree can be seen as a very specialized form of n-ary tree, one that adds a pile of special restrictions. The restrictions make insertion and deletion a little more complicated -- but, in return, the B-tree can guarantee very fast searches.
- Some nodes in some n-ary trees have only 1 or 2 children. B-trees are constructed so every internal node has at least L children.
- Some n-ary trees do not have all the leaves on the same level. (For example, it's impossible for a binary tree with 5 leaf nodes to put them all on the same level). B-trees are constructed so that every leaf is at the same level.
--DavidCary 20:52, 30 July 2005 (UTC)
[edit] Nodes structure
I 'm wondering if the U (maximum of elements) in the article is not an 2L (minimum of elements).
Other sources from goole tell U =2L (minimization factor): http://www.bluerwhite.org/btree/ (the structure of B-tree), http://www.semaphorecorp.com/btp/algo.html (3 paragraph). --Loading 07:29, 18 Jun 2005 (UTC)
Certainly a programmer *could* pick U=2L. That type of B-tree is the simplest kind for a teacher to try to explain. But often programmers pick some maximum number of elements U that is is an odd number, such as 2-3 trees, L=2, U=3. Donald Knuth seems to like U = L * 3/2, in the form of a B*-tree. --DavidCary 20:52, 30 July 2005 (UTC)
I am teaching B-trees at San Jose State University. I have three sources (other than the Wikipedia article) that pairwise disagree on this L-U issue. Here are what they say:
(1) The well-known algorithms textbook, Introduction to Algorithms, by Corment, Leiserson, Rivest, and Stein (CLRS), defines B-trees in terms of one parameter t, the "minimal degree" (p. 439). Thus L = t and U = 2t. It is required that t be an integer, so in particular U has to be even.
(2) The B-tree animation applet of slady (linked from the article) also uses a single parameter, which he calls the "order". He doesn't explain it, but experiments with the applet give, for order 1, L=2 and U = 3, and for order 2, L=3 and U = 5. In particular you always get U odd, so the animation applet can't duplicate any of the trees discussed in CLRS. (Do you suppose this confuses my students?!)
(3) Knuth's "Art of Computer Programming", vol. 3, p. 473, defines B-trees in terms of the "order" m. Comparing it to the Wikipedia articla, we see U=m and L = ceiling(m/2); we need ceiling since m is not required to be even. So CLRS covers the case when m is even, m = 2t; but Knuth also allows the case L=t and U = 2t-1; this seems to be the case the applet covers, where the "order" you enter in the applet is t-1.
Some restriction on the relation between L and U has to be made; when we split a full node (containing U-1 elements) upon inserting a new key, we will get two new nodes containing U/2 elements each (for even U) or one with floor(U/2) and one with ceiling(U/2) elements, for odd U. We must require that L be at most floor(U/2) so that these nodes are legal. Moreover, L can't be too small either, or deletion won't work right. On 1 April 2006 I edited the article to correct this point, and also to correct the description of the deletion algorithm. --User and date unknown
In the following I'll try to explain the definition tought at Technical University of Munich where Bayer used to work. This definition seems to make sense, and is consistent with the 3 definitions above. Only the terms used differ slightly. As above, I'll use the letters L and U for the minimum and maximum number of children of a node, though the letters a and b are commonly used at TUM.
Any search-tree that works the way explained in the article needs to comply to U >= 2L-1. Otherwise splitting up a node upon insertion will not work. A balanced search tree complying this rule is called (a,b)-tree. An (a,b)-tree where U = 2L-1 is called a B-tree.
These definitions have the following implications:
- The above definitions 2 and 3 comply to this definition of a B-tree. (In the second edition of Knuth's "Art of Computer Programming", vol. 3 it's on page 483, and it's a bit vague: does m/2 mean integer division here?)
- The above definition 1 is not valid for a B-tree in this strict sense, but for another specialization of an (a,b)-tree.
- The values U = L * 3/2 for B*-trees do not comply to this definition of a B-tree or an (a,b)-tree. Therefore a B*-tree is not a specialization of a B-tree, but a different though related datastructure.
- The term (a,b)-tree gives another possible explaination for the meaning of the B in B-tree. For a B-tree, only the b needs to be known, and it's always odd. The a can be calculated as ceiling(b/2).
I don't know if the above nomenclature is common outside of TUM. I'm not an expert on the topic. The only references I can provide are two lecture notes in German: ead_ws9697.ps.gz and ead_ws9798.ps.gz.
Maybe someone should check the original publications! However, this node structure issue needs to be clarified in the article. In particular the rule U >= 2L-1 should be mentioned, even if it turns out that all (a,b)-trees are commonly called B-trees.
Michael R. 22:55, 16 August 2006 (UTC)
[edit] variants
Should *this* article B-tree have a section that explains variants such as B*-tree, B+ tree, etc.? Then I want a section that explains pros and cons -- why would someone use one variant instead of another, or instead of a simple binary tree ? Some people find that preferable to having a bunch of very short stub articles that say "just like a B-tree except that ..." -- or, worse, a bunch of fully-fleshed out articles that repeat the preliminary explanation over and over. --DavidCary 20:52, 30 July 2005 (UTC)
Are 2-3-4 trees a kind of B-tree with L=2, U=4 ?
[edit] Error in illustration of B+ tree
It appears that there is an error in the illustration. According to the text, all values in the leftmost subtree should be less than a1. In the illustration, however, the leftmost subtree contains the value 3, which is equal to a1.
I agree, from "Database Systems: The Complete Book"
-- if j pointers are used then there will be j - 1 keys, e.g. K_1, K_2, ..., K_j-1 so the first pointer points to a part of the B tree where some of the records with values < K_1 will be found; the second pointer points to where records with values >= K_1 and < K_2 will be found and so on.
132.205.155.150
This ordering is also used in 'Silberschatz, Korth, Sudarshan: Database System Concepts, 4th Edition'. Schmid 12:39, 13 August 2006 (UTC)
[edit] Concurrent access
The paragraph at the end of the insertion section suggests that an improvement involves splitting nodes that might fill on a leaf node split on the way down the tree. This buys nothing. It only reduces the effective node size by one.
Unless the nodes are locked there is no guarantee the nodes will remain non-full on the way back up during a split. To allow maximal concurrency, you don't want to lock nodes on the insert descent.
--Karl Malbrain 20:52, 18 May 2006 (UTC)
[edit] Repeated info
A lot of information is repeated redundantly throughout the article. Please consider rewriting it to make it a bit more concise. --221.134.26.57 17:37, 21 May 2006 (UTC)
[edit] Picture
I think the picture should be removed. Why would you put a picture of a B+tree on the B-tree page. It would be like looking up the US capital and finding a picture of the eiffel tower. Not to mention the B+tree picture is incorrect. 3 should be in the middle node and 5 in the last. The pictures are great but poeple look at the picture instead of reading and the info in the picture is wrong. I would recommend removing the pictures untell they can be corrected. —The preceding unsigned comment was added by 216.239.10.103 (talk • contribs).
[edit] weight-balancing scheme
Could someone explain the weight-balancing schema invented by Arge and Vitter? It seems to have better performance than the 2-3 B-trees.
[edit] Relaxed Balance
There has been quite a bit of work by Kim S Larsen and Rolf Fagerberg (e.g. http://citeseer.ist.psu.edu/51205.html ) showing how the balance of a btree can be maintained lazily with several benefits. For the purposes of elucidation relaxed balance is particularly good because you can describe insertion or deletion and separate the basic steps of the algorithm and the steps for the maintenance of balance. Another benefit is that all modifications to the structure of the tree are local, so concurrency is more easily exploited.
It'd be nice for the text to include a description of this work. Dr Tom Conway (talk) 01:20, 3 January 2008 (UTC)

