Real Vocabulary Quiz, Question 8: should I say “the data is…” or “the data are…”?

Posted by on April 20, 2016

Real Vocabulary with Scott ThornburyOur Real Grammar series showed how the evidence of language in use often undermines or contradicts the made-up or outdated “rules” which some people insist on.

In this series on Real Vocabulary, with Scott Thornbury, we’re bringing you blog posts, videos and a quiz that give evidence-based answers to frequently asked questions about vocabulary.


In the eighth question in our Real Vocabulary quiz, we asked which of these two sentences was correct:

The data was collected over a period of six months.
The data were collected over a period of six months.

In other words, should we treat the noun data as plural or singular?

You may be wondering why we are even asking this question. The reason is that data came into English from Latin, where it was the plural form of the noun datum — so in Latin data would always be followed by a plural verb. Some traditionalists believe that, for this reason, it should take a plural verb in English too, but there is no longer much support for this view. The Wall Street Journal used to insist that its writers treated data as plural, but in 2012, it accepted (somewhat grudgingly) that the time had come for a more lenient policy:

Most style guides and dictionaries have come to accept the use of the noun data with either singular or plural verbs, and we hereby join the majority.

As we have pointed out before, words that entered the English language from Latin or Greek don’t necessarily behave the same way, or mean the same thing, as they did in the language they came from. (In Latin, data meant “things given” or “gifts” — not “information”.) If we look at other cases of Latin words adopted into English, we find a range of different usages. The word agenda meant (in Latin) “things that need to be done” and it was plural (its singular form was agendum). But in English agenda is singular, and it is a countable noun with a regular English plural (agendas); the form agendum doesn’t exist. On the other hand, the word media retains its Latin singular form in English, and this is quite common:

The Web represents a new medium in which electronic information will diverge from its print counterpart.

But when “the media” is used for referring, collectively, to information channels such as newspapers, television and the internet, it can be followed by either a singular or plural verb — and the corpus data doesn’t show a strong preference for either. In a corpus of 1.6 billion words, the combination “the media is” occurs around 3500 times, compared with just under 3000 instances of “the media are”.

Going back to data, the evidence suggests that, in scientific and technical texts, writers still favour the Latin model. Thus the Style Guide of the UK’s Office of National Statistics says bluntly

Use ‘data’ as a plural: The data are for 2012 to 2013.

And the singular form datum – though far less frequent than data – can be found in this kind of discourse:

People tend to think that a scientist’s job is to gather every single datum about something in nature–a mountain, a species of jellyfish, a neutron star. (Carl Zimmer, National Geographic, March 2004)

But in general use, data is three times more likely to be followed by a singular verb than by a plural. When singular, it behaves like nouns such as rice. Although rice consists of many individual items, we don’t talk about “a rice” (we say “a grain of rice”), and we think of rice as a single mass: data is also made up of many separate bits but seen as a single whole.

Our conclusion is that both sentences in the quiz are acceptable. In non-specialized texts, “the data was” would be more usual, but in more technical contexts, we are just as likely to say “the data were”.

Finally, a note on pronunciation. In British and American English, we usually say /ˈdeɪtə/ — so that it rhymes with “later”. But English speakers from Australia, New Zealand and South Africa say /ˈdɑːtə/, rhyming with “starter”. 

To learn more about Real Vocabulary, keep a close eye on our Real Vocabulary page. You can also follow this topic using #realvocabulary on Twitter, and remember that you can find all the blog posts on this topic by using the tags “prescriptivism” or “realvocabulary”.

