Talk:Kernel density estimation
From Wikipedia, the free encyclopedia
(Particularly the section about the risk function.)
Contents |
[edit] Incorrect caption
Note that the figure shows
rather than
as the caption says. --anon
- How do you know, as there is no y-axis in the picture? Oleg Alexandrov (talk) 03:31, 1 March 2006 (UTC)
-
is an average. An average is never greater than the largest component. If you look at the graph, the blue curve is clearly the sum of the component curves. Zik 03:40, 5 March 2006 (UTC)
- You are right, I fixed the caption. I have no idea how I had missed that. :) Oleg Alexandrov (talk) 01:09, 6 March 2006 (UTC)
[edit] name
In my experience calling the technique Parzen windowing is limited specifically to time-series analysis, and mainly in engineering fields. In general statistics (and in statistical machine learning), the term kernel density estimation is much more common. Therefore I'd propose it be moved there. As an aside, the attribution to Parzen is also historically problematic, since Rosenblatt introduced the technique into the statistics literature in 1956, and it had been used in several more obscure papers as early as the 1870s, and again in the early 1950s. --Delirium 22:59, 26 August 2006 (UTC)
[edit] x
What is x in the equation? --11:06, 5 October 2006 (UTC)
- It is a real number, I guess. Oleg Alexandrov (talk) 02:55, 6 October 2006 (UTC)
[edit] Changing the name of this page
The technique called here Parzen window is called kernel density estimation in non parametric statistics. It seems to me to be a much more general term and much clearer for people searching for it. The comment above state the same problem. I also agree that the article should refer to the Parzen-Rosenblatt notion of a kernel, and not just of Parzen. The definition of a Parzen-Rosenblatt kernel should be latter added on the kernel (statistics) page. —The preceding unsigned comment was added by Gpeilon (talk • contribs).
- That's fine with me. If you move the page, you should also fix the double redirects. That is, after the move, while viewing the article at the new name, click on "what links here" on the left, and any redirects which point to redirects need to be made to point to the new name. Cheers, Oleg Alexandrov (talk) 03:18, 9 January 2007 (UTC)
[edit] Formula for optimal bandwidth
Hi, I just noticed that the optimal global bandwidth in Rosenblatt, M. The Annals of Mathematical Statistics, Vol. 42, No. 6. (Dec., 1971), pp. 1815-1842. has an additional factor of
. Just an oversight, or is there a reason for the difference that I'm missing? Best, Yeteez 18:34, 24 May 2007 (UTC)
In addition, what is the lower case n in the optimal bandwidth, it is undefined. CnlPepper (talk) 17:18, 13 December 2007 (UTC)
[edit] Scaling factor
Shouldn't the σ in the formula for K(x) be dropped, on the grounds that it is already there in the form of h in the formula for
?
--Santaclaus 15:45, 7 June 2007 (UTC)
[edit] Stata
Though not sure whether it violates the guidelines of what wikipedia is, I like the example section. But I would like to see the commands in some non-proprietory language, e.g. R. --Ben T/C 14:41, 2 July 2007 (UTC)
[edit] Practical Use
Can somebody please add a paragraph on what the practical use of Kernel density estimation is? Provide an example from statistics or econometrics? Thanks!
[edit] Kernel?
Isn't a Gaussian with variance of 1 totally arbitrary? On the other hand, using the PDF of your measurement tool as a kernel seems quite meaningful. For example, if you are measuring people's heights and know you can measure to a std. dev of 1/4", then convolving the set of measured heights by a Gaussian with std. dev of 1/4" seems like it captures everything you know about the data set. For example, in the limit of one sample, the estimation would reflect our best guess of the distribution for that one person. 155.212.242.34 22:07, 6 November 2007 (UTC)

