Dictionary signals vs. noise

Posted by on October 02, 2012

There has been much discussion lately about crowd-sourced dictionary-making. For example, slang lexicographer Jonathon Green, in an article in the Guardian, observes ironically that everyone “seems to have a novel in them. Maybe there’s a dictionary too.”

Some of the commentary on this topic gives the impression that dictionaries’ future could lie in the hands of the general public, instead of being trusted to trained lexicographers. But does anyone consider this a serious possibility – Urban Dictionary–style websites aside? It seems to me more a matter of dictionaries finding different ways to integrate public input, and this is something they’ve always done to varying degrees.

In a recent post on whether the wisdom of crowds can work for dictionaries, Michael assesses Macmillan’s experience with its crowd-sourced Open Dictionary. He finds that the most fruitful areas for user-generated content are “neologisms, regional varieties, and technical terms” – words beyond what James Murray called the “well-defined centre” of English vocabulary – and he acknowledges that “enlisting enthusiastic amateurs and subject-field specialists could help us to develop even better resources in the long run”.

Neologisms also appear regularly in Macmillan Dictionary’s weekly BuzzWord column, where Kerry writes about usages that are new or newly popular. Some of the featured words (such as medal and troll) already exist in the dictionary but under more familiar senses; most (twitchfork, humblebrag) have no entries there at all. The BuzzWord articles allow topical expressions – few of them destined for imminent dictionary inclusion – to be explained in detail to curious readers.

Urban Dictionary is an extreme case in that its entries are entirely user-generated; it is therefore best consulted with a certain scepticism. This is not to say UD is unhelpful: it’s sometimes the best or even the only place to find a plausible explanation for contemporary slang, especially the more faddish or explicit sort. But unless several definitions converge on a sense, a pinch of salt or a confirming source tends to be necessary.

The problem, as Orin said in a comment to Michael’s post, is “the lack of any curatorial aspect”. User-generated dictionaries allow for meanings to be invented willy-nilly, or intuited based on scant evidence, then published on a whim. Entertainment value alone may then boost their standing. Signals must compete with noise. No one seriously enquiring about word usage will rely exclusively on such sources, though they may appreciate their supplementary and niche value.

This is where mainstream dictionaries excel (or should, ideally). They can vet material and analyse it against evidence of usage, for example in huge corpora of language as it has been written and spoken. People turn to trusted dictionaries, online and off, for guidance, authority and reliability. They know its entries have been composed systematically, with care and deliberation, by people appropriately trained in the art.

But any notion that it’s either one or the other is a false dilemma. It’s not an either–or scenario. Authoritative dictionaries will continue to invite content from readers, but this doesn’t mean they’re handing over the keys. The curation should, and will, continue.

Comments (7)
  • Well said, Stan. I think English benefits tremendously by not having a governing body, e.g. like L’Académie française is for French. But because of that it’s vital that English have neutral gatekeepers, and that’s what lexicographers do. Without professional lexicography there would be a danger of English veering into an Orwellian direction: the public is not neutral, not expert, and can be manipuated.

    Posted by Orin Hargraves on 2nd October, 2012
  • Thanks, Orin. I agree: English is well served by the absence of an academy, the existence of which would inevitably go against the language’s natural tendencies. The work lexicographers do is very valuable, since crowds are seldom wise or impartial — or sufficiently qualified to perform this specialised task.

    Posted by Stan on 2nd October, 2012
  • My take on crowd-sourcing might be a little different. The point about Murray’s ‘well-defined centre’ (mentioned by Stan, and close enough to what Macmillan calls its ‘core’ or ‘red’ vocabulary) is precisely that it is well-defined…in the sense of having been quite thoroughly analysed and described by skilled lexicographers. With this area well covered, we don’t generally need input from ‘the crowd’. But crowd-sourcing could have a lot to offer for all the other kinds of vocabulary. The Urban Dictionary has arguably given crowd-sourcing a bad name, with its highly subjective and/or scatalogical content – though even here there are bright spots, as Stan points out. But i think it has great potential in areas like creating multilingual technical dictionaries, and this is already happening. The key is how you manage these projects, in providing well-thought-out guidelines and robust templates for contributors to use. The recent experiment by Collins looked poorly managed to me, and they ended up with far too much noise and not enough signal (i think they collected about 50 ‘usable’ words from many thousands of mainly useless submissions). But it doesn’t have to be that way. With smart management, we should all be able to benefit from people’s willingness to contribute.

    Posted by Michael Rundell on 3rd October, 2012
  • Thanks for your considered comment on this, Michael. I’m sure you’re right that how these projects are managed is the most important thing, and that certain areas will benefit particularly.
    A word or usage might be amazingly popular one month or year, and all but forgotten a few short years later. Good lexicographers observing these trends and vogues know when to resist them and when and how to incorporate them, sorting the wheat from the chaff for the benefit of the rest of us.

    Posted by Stan on 3rd October, 2012
  • [...] Dictionary signals vs. noise looks at the business of crowd-sourcing in dictionary-making. (Crowd-sourcing means outsourcing a task to the general public or another unspecified group.) Some recent discussion about this might give the impression that the field of lexicography is destined for an Urban Dictionary–style makeover. This won’t happen. It seems to me more a matter of dictionaries finding different ways to integrate public input, and this is something they’ve always done to varying degrees. [...]

    Posted by Crowd-sourced dictionaries and rare portmanteaus « Sentence first on 25th October, 2012
  • Well, wuddaya know. I never saw willy-nilly in the sense of ‘in a careless way without planning’ before. I was trying to understand how meanings could be invented whether their inventors wanted to invent them or not!

    Posted by John Cowan on 25th October, 2012
  • John: It’s a good thing I linked to the definition, then! The ‘haphazard’ sense of willy-nilly is the more familiar one to me. I wonder if that’s unusual.

    Posted by Stan on 26th October, 2012
Leave a Comment
* Required Fields Notify me of follow-up comments via email