global English linguistics and lexicography

What’s that supposed to mean: chunking – part two

It’s generally accepted that “chunking” – the tendency of words to form combinations which are both recurrent and non-random – is an important feature of language. But in some of the recent discussion of this topic, doubts have been raised as to how far these combinations are worth teaching. Some argue that learning large numbers of “chunks” imposes too great a burden on students, when they could get by well enough with less idiomatic expressions. One well-known sceptic, Michael Swan, cautions against “formulaic expressions” being given “more attention than they deserve”, arguing that this may distract from the central tasks of  mastering grammar and vocabulary.

But surely this is a false distinction – as if chunks on the one hand, and grammar and vocabulary on the other, can be viewed as separate parts of the language system. Corpus linguistics shows that this is not a convincing model of how language works. As linguists like Palmer, Firth and Sinclair have demonstrated, phraseology is an integral part of language, and you can’t really “know” a word unlesss you know how it typically combines with other words.

Brett Reynolds makes a distinction between “fun facts” about language and “useful facts [and] skills”, and implies that most chunks are fun rather than useful. One of the important types of chunk is collocation, and Reynolds claims that “few collocations are useful enough to bother teaching”, because “collocations tend to be rare”. This will come as a surprise to lexicographers and corpus linguists. Language data tells us that collocation is an essential part of the way we communicate.

Take the example of the word crime. It occurs around 150,000 times in our corpus of almost 2 billion words, and in 33,000 of these examples, it is the object of a verb. The evidence shows that relatively few verbs regularly combine with crime, and these include:

commit: almost 7000 instances
: 3500
: 1800
: 1500
: 1000
: 950
: 725
: 650

The numbers are approximate, but they show that these eight verbs account for over half of all verb+crime combinations. Similar patterns appear in other grammatical relations, such as crime+noun (where words like prevention, rate and scene are very common), and adjective+crime (where common collocates include serious, petty, violent and organized). Some of these combinations are predictable (a learner might get them right by guesswork) but most are not.

Brett Reynolds suggests that learners can get by with saying “heavy wind” rather than the more natural-sounding “strong wind”. If we apply this argument to crime, a speaker might say that someone “did a bad crime” (rather than the more idiomatic “committed a serious crime”), and s/he would still be understood. Well, yes – but is this what learners really want?  The fact is that it is almost impossible to talk about crime without knowing its most frequent collocates, so the idea that collocations are too rare to be worth teaching is hard to accept.

Some will argue that for simple communicative purposes (especially in an ELF context), collocation is a luxury we can do without. What I will suggest in the third and last post on this topic is that it is often difficult to understand what someone means without a good grasp of collocation.

Email this Post Email this Post

About the author

Michael Rundell


  • You have repeated the “heavy wind” example a second time, but it is a perfectly acceptable “chunk!” At first I thought it might be US English, but the first listing from a Google search of “heavy wind” produced a headline from The Telegarph (12 April 2008): “Northern England hit with heavy wind and rain.”

  • While 7,000 instances of ‘commit’ + ‘crime’ certainly sounds like a large number, it has to be put in context. It comes out to about 3.5 instances per million words. In the Corpus of Contemporary American English, the collocation occurs about 3.0 times PMW. This is indeed rare, not so rare that native speakers of English will find it in any way unfamiliar, but I think more folks would agree that three in a million is rare.

    Michael confounds the idea that “collocation [in general] is an essential part of the way we communicate” with the idea that a particular collocation is therefore also important. Nevertheless, I can accept that his sense of rare is not what mine is. But if Michael argues that collocations occurring at the rate of three in a million must be taught, then I take it he would certainly content that most collocations more common than that (those that aren’t predictable at least) must also be taught. And by the same criteria, individual words occurring more commonly than that should also be taught.

    Well, it depends on how you count words, but this probably means something like 10,000 word families or so should be taught. And if we estimate that each has three collocations in this frequency range, that requires the teaching of 40,000 items. Surely, the impracticalities of this endeavour begin to become clear.

Leave a Comment