A preposition problemPosted by Orin Hargraves on November 19, 2012
To begin, and so that you won’t feel you have yet another language problem on your plate, the preposition problem here is not a problem for you; it’s a problem for computers. You remember computers—those machines we rely on increasingly to do a huge amount of work for us.
A big job for computers today is data mining, and within data mining, the tasks of natural language processing (NLP) figure prominently: getting computers to “understand” language the way that people do and do things with it like summarize, translate, answer questions, and identify pertinent information. Within NLP, a central task is word sense disambiguation (WSD): deciding what a particular word in a particular context means. This is where prepositions often throw a monkey wrench (or spanner, if you prefer) into the works.
Computers start with the disadvantage of not having the innate capacity and learned ability that humans have to understand language. The first step that programmers often take to address this deficiency in computers is to break down natural language into chunks that a computer can deal with algorithmically. Programs parse text and assign part-of-speech labels to all the elements in sentences: noun, verb, adjective, adverb, and so forth. This is where the preposition problem may rear its head.
Most programmatic language parsers take a traditional view of grammar and work from rules about language that are centuries old. For example, a small word like at, to, with, in, or up, occurring after a verb and before a noun or noun phrase, is typically labeled as a preposition, and the noun phrase following it is labeled as the object of the preposition. Consider a simple case first:
We agreed to all their demands.
I don’t agree with you.
We agree on most things involving the children.
In these sentences, agree means roughly the same thing and in each case it is followed by a prepositional phrase that completes or expands the idea of the verb. Computers are happy with sentences like these and can deal with them almost as effectively as humans do.
Now consider these sentences:
You must abide by the rules.
Smith persuaded his employers and lenders to abide with him.
Our created being abides in the eternal essence and is one with it in its essential existence.
In these sentences, abide has three different meanings. What clues you up about this is the preposition that follows the verb. But does the preposition have in identifiable meaning in these sentences? You probably don’t pause to think: you just know that abide means something particular when by comes after it, not the same thing as when in or with comes after it. Perhaps you understand abide by as a transitive idea, with what follows it as its object. Perhaps you regard it as a phrasal verb, as some dictionaries do. If you do either of these things, you are light years ahead of most computers, which don’t normally consult dictionary sense inventories at this stage of analysis. It isn’t the case that the meaning of abide by the rules is not compositional—that is, understandable from its parts—but it makes more sense to treat the parts here as abide by and the rules, not abide and by the rules. Most computer programs are not equipped to do this.
Now look at these sentence pairs:
1a) She plays the piano beautifully.
1b) He plays at the guitar but has never mastered it.
2a) You guessed the answer!
2b) Read the clues and have the children guess at the solution.
3a) He struck me with a 2 x 4.
3b) He struck at me several times but missed.
Most computers will parse the b) sentences as having intransitive verbs followed by prepositional phrases beginning with at. Are they? And if so, what exactly does at mean in these sentences? Traditional grammar holds with the computational view of the b) sentences but it’s also possible to interpret the b) sentences as simple variations on the a) sentences, still transitive, in which the particle at—perhaps not genuinely functioning as a preposition—has a consistent semantic effect on the meaning of the verb.
Most people learn to understand these sentences and distinguish their meanings before they are introduced to any notion of the grammar involved. So you wonder, what would an NLP program look like that tried to start with what sentences mean, rather than with how 20th-century grammarians diagram them?
Yes, prepositions are a challenge. Four-year-old native speakers almost always get them right, while very advanced learners can make mistakes. But, as you say, they’re an even bigger problem for computers, and getting computers to learn how to do ‘prepositional attachment’ (attaching the preposition the right noun, for example) is very difficult. When Cherie Blair (wife of the former prime minister) was asked what her most embarrassing moment was when she lived in Downing Street, she said ‘opening the door in my nightie the morning after we moved in’. One commentator joked ‘Funny place for a door!’ – but to a computer this would be the most obvious interpretation of ‘the door in my nightie’.