User:TheFearow/PopularWords
From Wikipedia, the free encyclopedia
I am currently working on some statistics on the most popular words that are used. At the moment my main studies are in titles, partially because a dump of the didles is a 20mb download, and one of the articles is just over 2gb.
[edit] Data Source
I am using the slightly outdated database dumps, as screen scraping all 1.8 million entries even if I was using 100 entries a page would result in over 18000 page views (which i'm not sure I would be loved for).
[edit] Processing
I am doing the processing using a custom written Java application. I will consider publicising the source at a later date, once I get the bugs worked out and make it tidier.
[edit] Results
The results will be on the following pages:
- User:TheFearow/PopularWords/Title
- User:TheFearow/PopularWords/TitleBig (Same as above but with only words over 5 characters)

