linguistics and lexicography Love English

What can language research tell us about the ‘real world’? Part 3

We have seen in previous posts (in part 1 and part 2) that the real world tends to be defined and conceived in opposition to other concepts – such as academic life, childhood, or the online world. Here are a couple of more general examples from the ukWaC corpus, in which the real world is contrasted with that which is perfect or ideal:

We also know that in the real world none of us has enough time to run our homes as perfectly as we would like.

With a good quality radio signal carrying a well-adjusted audio signal to a decent quality radio receiver, it sounds fine to most listeners. But the real world is not an ideal place.

However, this is not quite the whole story. I was struck by this headline in the The Guardian from a few months ago (March 10th):

Graduates of Hogwarts look for grown-up glory in the real world.
After finding fame and fortune as teenagers, the Harry Potter stars may find it hard to forge successful careers as adult actors.

Here we find once more the contrast with academic life and with childhood, but there is also a third opposition in play here, between the real world and art or fiction. This same opposition can be found in other corpus examples:

With its lively engagement with the real world as well as the world of private creativity, this anthology will contribute to an ongoing sense of Scottish cultural identity.

Express yourself through art and escape from the real world for two hours on a Thursday afternoon.

However there are a few curious fiction-related examples that do not seem to conform to this opposition. Consider the following:

Mark Twain creates a fascinating experience of a man and a woman discovering each other, learning to live together in the real world, growing up toward being a whole being.

Hard Rain seamlessly blends the real world of the USS Enterprise with the fictional version of San Francisco 1941 , as seen by Captain Jean-Luc Picard ‘s own holodeck creation , detective Dixon Hill.

Mark Twain was a novelist, a writer of fiction, and the first example evokes the real world in connection with characters inside a novel. The second example is inspired by the science fiction TV series Star Trek; the USS Enterprise is a starship, and the holodeck is a kind of 3D entertainment zone for the crew. Here we are dealing with a fiction within a fiction; although San Francisco 1941 may seem more ‘real’ to us than a 23rd-century starship, this is not necessarily the case for the crew; the polarity is reversed (as Captain Picard might say).

A similar science-fiction scenario occurs in the film The Matrix, in which our familiar, everyday world turns out to be a computer-generated simulation and the ‘real world’ is a nightmare devastated earth controlled by evil machines (or something like that). The implications of this can get very confusing for fans of the film:

The “Real World” is another matrix: This one strikes me as being almost too obvious. It explains so much stuff. It explains why it is relatively simple for people to move from the real world into the matrix. It explains how Agent Smith is able to get out of the matrix. Most of all, it explains how Neo can have super powers in the real world.

(It’s not real you know!) Which all tends to show that the real world is a very relative, and very slippery concept. It all depends on your standpoint.

My final point brings me back to the title of the conference that originally inspired this series of posts: Language in the Real World. Indeed, the corpus data does suggest an opposition between language and the real world:

He can now see the gap between his ideas and language and the real world.

The real world imposes its structure on language , and language does not , can not , impose some different , arbitrary structure on the real world.

If that is so, and given that the conference was concerned with the very serious business of preparing graduates for employment, one might object that this focus on language serves no great purpose, that it is just an academic linguist, divorced from the real world, fooling around in his ivory tower. However, I’m not just any old linguist, I’m a corpus linguist, and this is the first line of the Wikipedia entry for corpus linguistics:

Corpus linguistics is the study of language as expressed in samples (corpora) or “real world” text.

So there. Slippery indeed.

Email this Post Email this Post

About the author


John Williams


  • As promised many moons ago on June 25th, here are the top 20 two-word expressions to be found in quotation marks in the giant ukWaC corpus (1,318,612,719 words). The numbers represent the raw frequency in the corpus:

    1. 413 as is
    2. 319 The Prisoner
    3. 315 thank you
    4. 294 real world
    5. 285 hands on
    6. 239 Third Way
    7. 231 best practice
    8. 226 at risk
    9. 202 out there
    10. 194 Plymouth Brethren
    11. 193 Contact Us
    12. 183 AS IS
    13. 180 must have
    14. 180 how to
    15. 164 I AM
    16. 156 regime change
    17. 129 what if
    18. 128 real life
    19. 127 Thank You
    20. 123 Thank you

    There are quite a few interesting expressions in this list, but also a few oddities. What on earth are the Plymouth Brethren doing there for instance? This may be an effect of the composition of the corpus. I hope to return to this list in future blog posts.

    For the technically-minded, the original corpus search – for two-word expressions enclosed in quotation marks – consisted of a CQL query in SketchEngine:

    [lemma = “\””] [word = “[a-zA-Z][a-zA-Z]*”] [word = “[a-zA-Z][a-zA-Z]*”] [lemma = “\””]

    The sequence [word = “[a-zA-Z][a-zA-Z]*”] represents any single alphabetic word. I needed it twice because I was searching for two-word expressions. (In theory, two sets of square brackets alone [] [] should have done the trick, but when I tried it the query also found single words in quotation marks. Perhaps someone at SketchEngine can tell me why.)

    I then downloaded and saved the concordances to a file (let’s call it File.txt) and (after much head scratching) applied the following Unix command to it:

    cat File.txt | grep ‘< \"' | sed 's/^.*.*$//g’ | sort | uniq -c | sort -nr > FileSorted.txt

    If anyone would like to know what each part of that string of algebra is doing, please contact me privately. If you’ve got Linux or cygwin on your computer, you can try it for yourself.

    And so to the prize. We promised a Macmillan Collocations Dictionary to the first person to correctly predict any five of the above top 20 expressions. The undoubted winner is Alexander Bochkov with 7 correct predictions (but Alexander, were you actually predicting, or counting?). Alexander – could you email your address details to medoblog.admin -at- , and a dictionary will find its way to you sometime after August 13th.

    Honourable mentions go to Monika Sobejko and to Caroline, not only for their correct *predictions* and near misses, but also for their excellent comments, nailing down why we actually write some expressions in quotation marks.

    Thank you to everyone who took part.

  • The Unix command did not appear correctly in the above comment. This is the correct version:

    cat File.txt | grep ‘< \"' | sed 's/^.*.*$//g’ | sort | uniq -c | sort -nr > FileSorted.txt


  • No that’s not the right command either – it’s just the same as before. Something is happening between Copy, Paste, and Submit. Ah computers. Please email me privately if you would like the correct Unix command. j

  • Great news! Thank you very much. I sent an email to the above-mentioned address but I haven’t received a reply yet. Also, how can I contact you to get the Unix command?

Leave a Comment