About the Collins Corpus and the Bank of English™


The Collins corpus is a 2.5-billion word analytical database of English. It contains written material from websites, newspapers, magazines and books published around the world, and spoken material from radio, TV and everyday conversations.  New data is fed into the corpus every month, to help the Collins dictionary editors identify new words and meanings from the moment they are first used.

The Bank of English™ forms part of the Collins Corpus. It contains 650 million words from a carefully chosen selection of sources, to give a balanced and accurate reflection of English as it is used everyday.

All cobuild dictionaries are based on the information we find on the Bank of English™ and the Collins corpus. Because the corpus is so large, we can look at lots of examples of how people really use the words. The data tells us how words are used; what they mean; which words are used together; and how often words are used. This information on frequency helps us decide which words to include in the cobuild dictionaries. Did you know, for example, that around 90% of English speech and writing is made up of approximately 3,500 words? Corpus tells us which these words are, and helps us ensure that when you use a cobuild dictionary, you can be sure that you are learning the words you really need to know.

When a dictionary editor wants to add a new word to cobuild, they search the corpus for every example of the word. The word appears on the computer screen in a long list of sentences and the editor can arrange the lines in different ways depending on what they want to look at. For example, if they want to see how the word agree is used, they can look at examples on the corpus. The examples below show them that you can ‘agree on something’, ‘agree with somebody’ and ‘agree to something’.

agree

All of the examples in cobuild dictionaries are examples of real English, taken from the Bank of English. myCOBUILD.com has over 75,000 examples which show the user how the words are really used. The examples have been carefully chosen from the Bank of English™ to demonstrate typical grammatical patterns, typical vocabulary and typical contexts for your word.

The corpus lies at the heart of cobuild and you can be confident that cobuild will show you what you need to know to be able to communicate easily and accurately in English.

Wordbanks online

Wordbanks online contains 57 million words of written and spoken English, from both American and British sources, from the Bank of English.  It is available online for teachers and students.  Click here to find out more.