Talk:Most common words in English

From Wikipedia, the free encyclopedia

[edit] Length of the lists

Since I began this article and reworked this topic, two editors have attempted to add words to the lists (without sources, too). I have reverted these additions for the following reasons:

The lists that are in place are from a calculation done by Ask Oxford, what I'd consider a reliable source. I think it's reasonable that any additions to the lists must either come from the same study, or replace the whole list at once. We should not be mixing sources or adding unsourced material to an already sourced list.

Furthermore, do we really need or want more than 25 words per type (do we care especially about the 40th)? or 100 lemmas absolutely? I don't see why we should. If better, longer lists are found, it's probably best that we link to them, not include them directly: this is a general purpose encyclopedia for the average curious man (who I think will be satisfied with the length of these), not a source of data for the aspiring linguist.

Before increasing the length of these lists (very tempting, I know), please consider the above arguments and respond here so that we may discuss it. -- Rmrfstar 09:23, 27 October 2006 (UTC)

The words and their ranks I added are based on the same corpus. -- Dissident (Talk) 17:33, 27 October 2006 (UTC)
As for your second point, Wikipedia is not a paper encyclopedia, so there is no reason to arbitrarily limit oneself as long as the supplied info remains verifiable. -- Dissident (Talk) 15:22, 2 November 2006 (UTC)
Is there any reason to extend the lists? Perhaps linking to them would be more practical. -- Rmrfstar 00:01, 14 November 2006 (UTC)

[edit] A/an?

If this list covers lemmas, and not individual words, why are a and an listed separately?--CJGB (Chris) 19:50, 18 February 2007 (UTC)

I dunno. It's that way on the site, though, so the problem is not here. -- Rmrfstar 02:36, 14 June 2007 (UTC)


[edit] Most common words in Spoken English

50 most common words with their frequency per 1000000 words

Source Oxford Corpus, but analysed by me User:Ca_Woodcock 19:57, 26 September 2007 (UTC)

Note that this list is quite different and more important in my opinion (note the prominence of I and you)

  • the — 38009
  • I — 22103
  • you — 21063
  • to — 20914
  • and — 20230
  • that — 19216
  • it — 18246
  • a — 18039
  • of — 16043
  • 's — 15543
  • in — 11345
  • we — 10825
  • is — 10244
  • n't — 8152
  • do — 7559
  • er — 7300
  • they — 6720
  • have — 6543
  • be — 6035
  • on — 6007
  • for — 5746
  • was — 5407
  • there — 5360
  • what — 5171
  • this — 5098
  • erm — 4977
  • one — 4869
  • are — 4767
  • 've — 4578
  • if — 4529
  • 're — 4214
  • with — 4197
  • Yeah — 4166
  • think — 4109
  • not — 3950
  • but — 3876
  • know — 3791
  • at — 3579
  • got — 3568
  • can — 3489
  • would — 3447
  • or — 3412
  • And — 3399
  • about — 3367
  • so — 3286
  • just — 3155
  • as — 3146
  • all — 3098
  • your — 2773
  • like — 2724


Also it is worth pointing out that in spoken English, adverbs are more common than adjectives, so why not a list of those as well? In terms of word types in spoken English:

  • singular nouns — 94549
  • personal pronouns — 94441
  • articles — 60964
  • adverbs — 60494
  • prepositions — 54922
  • adjectives — 37896
  • determiners — 32857
  • co-ordinating conjunctions — 32771
  • plural nouns — 28980
  • lexical verb infinitives — 28971
  • lexical verb base forms — 26112
  • lexical verb 's forms — 23722
  • modal auxiliaries — 20212
  • interjections — 18613
  • cardinal numerals — 18440
  • 'of' — 15940
  • subordinating conjunction — 15772
  • to — 15731
  • lexical verb 3rd form — 14709
  • lexical verb -ing form — 13216
  • n't — 12402
  • be — 11619
  • proper nouns — 10325
  • adverb particles — 10104