Talk:Bootstrapping (statistics)

From Wikipedia, the free encyclopedia

This article is within the scope of WikiProject Statistics, which collaborates to improve Wikipedia's coverage of statistics. If you would like to participate, please visit the project page.

WikiProject Mathematics
This article is within the scope of WikiProject Mathematics, which collaborates on articles related to mathematics.
Mathematics rating: Start Class Low Priority  Field: Probability and statistics
One of the 500 most frequently viewed mathematics articles.
Please update this rating as the article progresses, or if the rating is inaccurate. Please also add comments to suggest improvements to the article.

Contents

[edit] Merging with Bagging

(see later for diuscussion of contents)

Yes this page should be merged.Gpeilon 15:01, 10 January 2007 (UTC)

I agree. Tolstoy the Cat 19:11, 22 January 2007 (UTC)

I agree. --Bikestats 13:08, 9 February 2007 (UTC)

I also agree. Tom Joseph 20:53, 13 February 2007 (UTC)

i also agree.

I do not agree because I was looking for an explanation of the word 'bootstrapping', not 'bootstrap'

I agree Eagon 14:49, 13 March 2007 (UTC)

I Agree


I do not agree. Bagging is now one of the most famous ensemble methods in machine learning and have their own many unique properties. Nowadays, the reasons why bagging work very well in various situations are still mystery, and there are many theoretical explanations trying to explain bagging click here for a survey.

IMO, merging bagging with Bootstrapping (statistics) is rather similar to merging maximum entropy with information entropy which is not appropriate.

To sum up, bagging has its own unique place in literatures, and should also have their own page here. -- Jung dalglish 03:12, 7 May 2007 (UTC)

--- I agree completely with this point; bagging is one of the key approaches to ensemble-based machine learning, and it certainly has its own life entirely apart from the origins of bootstrapping in statistics. From a machine learning point of view, it would be meaningless to remove it to a statistics based article; machine learners would not find it, because they would not look there.

---

I do not agree that they should be merged. Bagging is a sufficiently unique and well-defined method that it warrants its own page. I was looking for bagging as a machine learning method, and would not have immediately thought to look under boostrapping.

--

Bagging is a specific application of bootstrapping, which is different enough from the usual applications that it deserves its own page: You are using the bootstrap sample of estimators to create another estimator, rather than using it merely to estimate the distribution of estimators. --Olethros 15:10, 6 June 2007 (UTC)

--

I do not agree that they should be merged. This article provided a quick and readily absorbed reference for me today, and if it had been buried in a lengthy broad discussion I probably would not have found it and benefitted from the information.

--

I think they should not be merged as "bagging" seems a particular specific application that should not appear in a mainstream initial discussion of bootstrapping. A brief description with cross-reference would be more suitable. Melcombe 13:21, 16 July 2007 (UTC)


-- There seem to be 2 separate discussions on this page. The first related to "bootstrap" and "bootstrapping". The second to merging "bagging" into the bootstrap article. Like others, I don't think bagging should be merged in. As others have said, it is one particular application. Tolstoy the Little Black Cat 16:50, 19 August 2007 (UTC)

--

I don't think Bootstrap aggregating (bagging) should be merged in with Boostrapping. The current bootstrapping page is simple and general. To merge in a relatively large, highly specific, relatively atypical application (the page on on bagging) will confuse those looking for a basic understanding of what statistical bootstrapping is, and the basic bootstrapping information will be mostly irrelevant for the typical person looking for bagging. Each article should certainly link to the other, but I think merging will drastically reduce the value. Glenbarnett 03:18, 27 September 2007 (UTC)

--

I also disagree about merging these. Bootstrap methods are great for inference, but bootstrap aggregation is a method for ensemble learning - i.e. to aggregate collections of models, for robust development using subsamples of the data. To include bagging into bootstrapping is to misunderstand the use of bagging. —Preceding unsigned comment added by 71.132.132.11 (talk) 05:32, 27 September 2007 (UTC)

I also disagree about merging the Bootstrap and the Bootstrap Aggregating (Bagging) pages; the former is a resampling method for estimating the properties of an estimator while the latter, although it uses bootstrap methodology, is a an Ensemble Learner technique from Statistical Learning and / or Data Mining. In my opinion they are only related by the fact that Bagging uses some modified bootstrap technique to acheive its goal.

Gérald Jean —Preceding unsigned comment added by 206.47.217.67 (talk) 20:06, 22 November 2007 (UTC)

--

I disagree with merging these. The primary use of bootstrapping is in inferential statistics, providing information about the distribution of an estimator - its bias, standard error, confidence intervals, etc. It is not usually used in its own right as an estimation method. It is tempting for beginners to do so - to use the average of bootstrap statistics as an estimator in place of the statistic calculated on the original data. But this is dangerous, as it typically gives about double the bias.

In contrast, bootstrap aggregation is a randomization method, suitable for use with low-bias high-variability tools such as trees - by averaging across trees the variability is reduced. Yes, the mechanism is the same as what beginners often do, but I don't want to encourage that mistake. Yes, the randomization method happens to use the same sampling mechanism as the simple nonparametric bootstrap, but that is accidental. The intent is different - reducing variability by averaging across random draws, vs quantifying the sampling variation of an estimator.

Tim Hesterberg --Tim Hesterberg (talk) 05:30, 6 December 2007 (UTC)

Can we now agree that merging is not appropriate and remove this from the discussion, or at least from the top of the article page? Melcombe (talk) 11:35, 12 February 2008 (UTC)

[edit] Discussion of contents

[edit] mediation

I would like to raise an isssue with the mention of "mediation" in the intro material. Should there be a minor subsection for this, explaining what "mediation" means, somw brief details of how boostrapping applies, and possibly with its own example being shown to show the contrast with an ordinary single sample case. Melcombe 13:21, 16 July 2007 (UTC)


[edit] pivots

This page needs to mention pivotal statistics, which are critial to bootstrapping. —Preceding unsigned comment added by 129.2.18.171 (talk) 22:18, 11 February 2008 (UTC)

now added new section, but possibly there is a need for a much more technical description of bootstrapping overall in order to provide enough context/information. This need for a more formal specification would also benefit other parts perhaps. Melcombe (talk) 11:31, 12 February 2008 (UTC)


[edit] unclear

I was looking for a definition of the bootstrap method, and couldn't understand the definition given here, in the 2nd sentence: "Bootstrapping is the practice of estimating properties of an estimator (such as its variance) by measuring those properties when sampling from an approximating distribution. One standard choice for an approximating distribution is the empirical distribution of the observed data." Since I do not know whar bootsrapping is, I cannot change it myself, so I wrote it here instead. Setreset (talk) 15:27, 8 April 2008 (UTC)