Man vs machine: dictionaries and LTPosted by Michael Rundell on February 17, 2011
Macmillan runs a series of webinars, which are a bit like interactive lectures that anyone can join in. Coming up in 2011 are speakers such as Lindsay Clanfield and Simon Greenall, and from the same page you can watch sessions from the archive featuring well-known language-teaching experts like Scott Thornbury, Ken Wilson and Sam McCarter. A couple of weeks ago I did a webinar on the subject of language technology and its impact on dictionaries, which you can find here (you can also download the Powerpoint presentation that the webinar was based around). A word of warning: the first five minutes or so are a little messy (we had a few teething problems with the technology), but it gets better after that.
While computer technology means we can put our dictionary online, hold webinars, and discuss language issues in this blog, language technology is more about the data and software we use in the background, to help us decide what to say about words. Language technology (LT) is big business. It’s what powers search engines like Google or sites offering automatic translation. Some LT tools are pretty simple: sites like Google Fight or Google’s Ngram Viewer work by counting the number of times a particular word or phrase is used – and counting is something that computers are good at. That’s also how we can identify, very reliably, the ‘core’ vocabulary of English, which is shown as the red words in the Macmillan Dictionary. But for more sophisticated tasks like translating, computers have to be trained to understand human language – or at least to perform as if they understand it.
This is hard for computers. Language is full of ambiguities because most words have more than one meaning. Human beings are good at dealing with this, and in most situations misunderstandings are rare. If I say ‘I’m going to the bank’, the person I’m talking to doesn’t need to ask if I mean ‘the financial institution’ or ‘the side of the river’. Context will tell them which sense of bank I’m referring to. But the only thing a machine knows is that bank has two possible meanings (which have different equivalents in other languages), and it has to decide which one fits best. So the main goal of language technology is to enable computers to do what humans do so effortlessly when they communicate with one another. Progress towards this goal is slow but steady. Just this week, an IBM computer has beaten two champion (human) contestants in the popular US quiz show Jeopardy.
Thanks to research in this field, lexicographers now have powerful software that reveals far more about how words behave and work together than we knew just ten or even five years ago – and that’s the main theme of my webinar. I’m afraid I probably talked for too long, and when the session ended there was no time to answer anyone’s questions. So this is another opportunity: if there’s anything you’d like to ask about this subject, use the Comments here and I’ll get back to you.
Thank you for this. The PowerPoint presentation was very interesting even in the absence of a presenter.
On your point about computers’ difficulties with language, you might enjoy a browse here; it’s an online archive of HAL’s Legacy, a book I read quite a few years ago containing essays on the technical aspects of AI, including speech recognition and understanding. Inevitably it has dated, but it remains interesting.
I thought that the RED WORDS were words that are used most often when humans speak. I don’t recall using IBM and Google Fight the last time I told my husband that I loved him.
There dosn’t seems a lot to be done since Fillmor’s Case for Case. Comparable. .Collocations and somethng opposite that`s what I`m interested in. Havn’t run across anything of the sort. Am I right?