Frequently used English words list?
|
|
Thread rating:  |
Xing Qiu - 24 Oct 2005 00:56 GMT Hi,
I just wonder is there a list of 20,000 most frequently used English words? I think my vocabulary is around 15,000 and I have no problem reading English literature in general. However, from time to time I met unfamiliar words, so I think maybe I should simply spend sometime memorize another couple of thousand words. I searched the Internet but couldn't find exact what I want (most word lists I've found contain only 1,000~5,000 words).
Thanks, Xing
John Ramsay - 24 Oct 2005 01:05 GMT > Hi, > [quoted text clipped - 8 lines] > Thanks, > Xing Lorge Thorndike - Most common 30,000 words. Used to develop vocabulary section of IQ test 1954
See article below.
Vocabulary Resources for Material Writers
Writers
>From The Materials Writers Newsletter The Newsletter of the Materials Writers' National Special Interest Group of the Japan Association of Language Teachers Vol. IV, No. 3, October 1996 John Bauman Enterprise Training Group Material written for ESL students needs to use somewhat simplified vocabulary and structure if it is to be accessible to lower and intermediate level students. In terms of vocabulary, a writer can try to "keep it simple" while writing, but a more rigorous approach is to compare a text with a list of words prepared for this purpose. A variety of lists of words are available, as well as different ways to use them. In this article, I will briefly list and describe some lists. I'll also discuss a program that will analyze a text and give some links for further exploration of this topic on the internet. Links to sites mentioned are given in the "Web Links" section at the end of this article. Teaching and Learning Vocabulary (Nation 1990) contains a good general discussion of this topic. Nation doesn't hesitate to quantify the issue. His model of an ideal vocabulary teaching sequence starts with the most frequent 2,000 words, which he calls general service vocabulary. Everybody needs to know these words; they make up about 87% of an average written text. After this point, general frequency becomes less useful as a guide to what words to teach. Students are better off studying a list of words specific to their field of interest or need, if one can be found. For the student aiming at English-language higher education, Nation's 800 word University Word List is appropriate. After this, the remaining vocabulary of English is of too little frequency to merit direct study. Skills such as analyzing word parts, context guessing, etc. can be taught. The number of different words used will depend on the level of the text. Writers of material for ESL learners also have to decide which words to use, or, in a larger sense, to which population of words should they restrict themselves. Here a list becomes necessary. Many have been developed over the years. The following remain relevant. The General Service List The General Service List (GSL)(West 1953) is the specific list of 2,000 words that Nation refers to when he writes about the "first 2,000 words." It's based on written texts, it's old, and it's not in frequency order, though frequency numbers are given. The source of the frequency information is even earlier than the publication date, being derived from Thorndike and Lorge (1944). But the list was not compiled based on frequency alone. It was created to be an ideal vocabulary for ESL students to start out with. Through the 1970s, a lot of material, particularly graded readers, was based on this list. Even today, much of this material is sold and used. The GSL is out of print, and somewhat out of favor. The list is available as a component of the Vocabprofile program described below and, in a slightly different form, on this web page.
Thorndike and Lorge
The Teacher's Word Book of 30,000 Words (Thorndike and Lorge, 1944) was created as a resource for elementary and high school teachers in the United States. It is still frequently cited, though computer-produced corpora have largely replaced it as an authority on the frequency of words. For example, it's the source of the words above the 2,000 word level in the vocabulary test in Nation (1990). It's old, it's based on a compilation of pre-WW2, non-computerized word counts totaling about 18 million written words. As published, it's not in frequency order, but frequency ranks are given for each word. The University Word List The University Word List (UWL)(in Nation, 1990) is a list of academic vocabulary composed of about 800 words. It's designed for students who plan to study in an English-language college or university. Essentially, it's the most common 800 words in academic texts, excluding the 2,000 words of the GSL. This list is structurally linked to the GSL. A student who studies the GSL, followed by the UWL, will find no repetition of words. The list is divided into 11 parts. Part one has the greatest frequency and range, part 2 next, etc. This list is also a component of the Vocabprofile program. The Brown Corpus The Brown Corpus (Francis and Kucera, 1982) is the earliest computerized study of English vocabulary. It is an analysis of 1 million words published in the United States in 1961. It's also kind of old, but it's more consistent in it's definition of "word" (as a lemma) than the earlier lists. The 1982 publication, which includes both alphabetical and frequency order lists of the words, is a very useful resource. The LOB Corpus The LOB Corpus (Hofland and Johansson, 1982) is a study of 1 million words of British text published in 1961. It was designed to be a British counterpart to the Brown corpus.
The Cambridge English Lexicon
The Cambridge English Lexicon (CEL) (Hindmarsh, 1980) is a list of 4470 words, prepared with reference to the GSL, Thorndike and Lorge, Brown, other sources, and the author's experience as an ESL teacher and material developer. Each item is graded from 1 to 5. The most useful aspect of the list is that the different meanings of the words are also graded on the same scale. Only the CEL and the GSL give separate information on the different meanings of common words (though, of course, dictionaries do also). The GSL gives actual frequency numbers for the different meanings, but the age of the data and the fact that it was gathered by hand may make the CEL a more reliable source for an indication of the relative importance to students of different meanings of words. The grading in the CEL is not based solely on frequency. Modern Corpora These days, much is heard about corpora from dictionary publishers, who all boast about the enormous corpora that their learner dictionaries are based on. The British publishers are particularly enthusiastic about this, using either the CoBuild corpus or the British National Corpus (BNC) as a source of lexicographic information. Both of these corpora contain more than 100 million words. Limited access to them is possible through the internet, see the links on the Collocations Homepage listed below. Depending on your purpose, it may be more useful to access these corpora in pre-digested form through the dictionaries based on them. A lemmatized frequency list of the BNC has been prepared by Adam Kilgarriff and is available for FTP. Vocabprofile Vocabprofile is a freeware program for PCs that will compare a given text with any properly formatted list. Three lists can be done at a time. The output will report what percent of the words in the text are on each of the lists. It will also print the text with the words marked to indicate which list they are on, or if they aren't on a list. Vocabprofile is available for FTP at the URL below. The three lists that come with the program are the first 1,000 words of the GSL, the second 1,000 words of the GSL and the UWL. Concluding Remarks None of these resources is ideal. Thorndike and Lorge and the GSL are old, old enough that the English of today surely differs significantly. However, the core vocabulary of English changes more slowly, so at the frequency level of the first 2,000 words this may be less of a problem. The GSL offers some advantages as a standard. It was specifically designed as a teaching vocabulary list. It has a long history of use, both in teaching materials and in second language acquisition research. A program to compare it with a given text is readily available. Of the lists above, only the CEL was also compiled for the purpose of facilitating the creation of teaching materials. It's more modern than the GSL, but appears to have had less impact. It is not conveniently available for computerized text comparison. The Brown Corpus, the LOB Corpus and the lemmatized list from the BNC are useful because they give the lists in frequency order. This allows a population of words to be defined much more precisely, and individual words to be compared with each other. But these lists were prepared for linguistic research, not teachers. They're lists of lemmas, which means that words are listed more than once if they can act as more than one part of speech. Some derived forms are also considered as separate lemmas, such as comparative and superlative forms of adjectives. These factors affect both the frequency rankings of words and the number of words that appear on a list. In other words, a list of 1,000 words taken from the GSL or CEL would contain more than 1,000 lemmas. These corpus-based lists need substantial adjustment to make them appropriate as vocabulary standards. These adjustments have already been made to the GSL and CEL. An author of EFL material has many vocabulary options available. I hope this discussion of resources is useful and that the bibliography and the internet sites below will be helpful in finding the items that will serve your specific needs. Links to sites mentioned Adam Kilgarriff http://www.itri.brighton.ac.uk/~Adam.Kilgarriff/ Links to his lemmatized, frequency order version of the BNC are here. John Higgins http://www.marlodge.supanet.com/index.html Here you can find Vocabprofile as well as links to other programs. Bibliography Francis, W.N. and Kucera, H. (1982).Frequency Analysis of English Usage. Houghton Mifflin, Boston Hindmarsh, R. (1980). Cambridge English Lexicon. Cambridge University Press, Cambridge Hofland, K. and Johansson, S. (1982). Word Frequencies in British and American English. NAVF, Bergen Nation, I.S.P. (1990). Teaching and Learning Vocabulary. Newbury House, New York
Thorndike, E.L. and Lorge, I. (1944). The teacher's Word Book of 30,000 Words. Teachers College, Columbia University, New York West, M. (1953). A General Service List of English Words. Longman, London
Back to the Top John Bauman's Homepage
CV - 24 Oct 2005 21:41 GMT > Hi, > [quoted text clipped - 8 lines] > Thanks, > Xing Hmmm, I wonder is there anywhere on the web you can test your vocabulary, like answering a quiz and getting an approximate word count as a result ? Might be interesting to try. CV
Einde O'Callaghan - 24 Oct 2005 23:21 GMT > Hi, > [quoted text clipped - 5 lines] > couldn't find exact what I want (most word lists I've found contain only > 1,000~5,000 words). I believe one of the commissions of the EU has developed lists of core words and functions that are essential at various levels of language acquisition for all the major EU languages. they are designed to help prepare textbooks and as guidelines for foreign language examinations in the different EU countries. I'm not certain whether they are available on-line - I'll check during the next few days.
However, at your level of cpompetence I don't think memorising a few thousand random words is the solution to your "problem" - I don't really think it IS a problem. I think it would be far more useful to read a large quantity of general literature (and specialist literature in your various areas of interest) and note the words you don't know. This will enable you to develop your own list of vocabulary that is useful for you to learn.
Regards, einde O'Callaghan
Jan - 25 Oct 2005 08:16 GMT Einde,
Are you talking about the Common European Framework?
If you are, then it is a rather vague list of 'can do' statements: a functional view of the language. People in some countries have taken it further for the different languages, but I don't know how far. And in any case, functional language like this is apparently sometimes hard to pin down.I heard of a Collins COBUILD experiment in which native speakers were recorded in conversations that should bring up the language of 'recommending' and 'advice' (talking about visiting a holiday place). Never once did the speakers use 'should'!
Xing Qiu,
Why don't you just read a bit more in English? It's surely more fun than learning an abstract list of words, especially because you get more information about how the word is used. Words don't usually like being alone: they hang out with the other words, contexts and register they belong with.
Some learners I know make marks in their dictionary every time they look up a word when they are reading: if a word has three or more marks, they probably need to spend a bit more time trying to remember it. And there are plenty of books now that use simple English, for example the Penguin Readers, or the Oxford Bookworms.
Good luck!
Jan
Einde O'Callaghan - 25 Oct 2005 21:37 GMT > Einde, > > Are you talking about the Common European Framework? I think that's what they're called
> If you are, then it is a rather vague list of 'can do' statements: a > functional view of the language. People in some countries have taken it [quoted text clipped - 4 lines] > language of 'recommending' and 'advice' (talking about visiting a > holiday place). Never once did the speakers use 'should'! I'm sure I saw lists of English WORDS, not just functions, while attending a presentation of a new series of books that were based on this fraqmework.
Regards, Einde O'Callaghan
P.S. I agree with your suggestion about reading and dictionary work.
credoquaabsurdum - 27 Oct 2005 03:58 GMT Here in Greece, it's been the Common European Framework this and the CEF that for the last two years. I paid forty bucks for the commission report and still haven't gotten past page five.
Many, many publishers have jumped on the CEF bandwagon and promised that their books are based on it. The recognized experts then disagree and point out circuitously that the publishers are simple thieves bent on commercial success and devil-take-the-hindmost (new information, indeed). Quite frankly, for all practical purposes, it seems to be a lot of dignified talk and no action, in the classic European tradition...
The "can-do statements" might as well be called "I-think-I-can, I-think-I-can statements," for all the real-world effectiveness we've seen to date.
> > Einde, > > [quoted text clipped - 18 lines] > > P.S. I agree with your suggestion about reading and dictionary work.
|
|
|