Forum Moderators: phranque
The first one does sound correct but the second one actually is becuase 'data' is a plural.
I'm rather fussy about grammar, but I must disagree on this point. Yes, it is easy to find prescriptivist grammarians who will insist that since "data" is a plural noun in Latin it can only ever be used as such in English (world without end, Amen). Therefore "data is" must be considered wrong or at best "colloquial" in any context.
Fortunately, many grammarians are willing to recognize that language changes, and that Latin rules need not always apply to English words derived from Latin, especially when the English word begins to take on a new sense.
Logically, "the data are" is correct if we are thinking of discreet bit or facts. There are certainly scientific uses in which the word must be treated as a plural. Yet there are many contexts in which the form "data" is used to refer to a whole collection, with no thought of each individual "datum". Thus the word functions as a "mass noun" like "furniture" which is treated as a collective singular-- as opposed to a "count noun" like "chair(s)."
A closely related example is the word "media". In contexts where a disinction between one medium and a second (etc) is in view. it should certainly be treated as a grammatical plural (e.g., when discussing an artist who "uses a variety of media, but is especially fond of watercolors"). But if one speaks of "the mass media", viewed as a collective whole, not distinct types, it is perfectly acceptable (or preferrable!) to say"the media is".
An extreme example of a Latin plural form shifting usage is the word "opera". It is originally the plural form of "opus", but when referring to a particular type of musical theater the word forms a new singular.
Usage Note: The word data is the plural of Latin datum, “something given,” but it is not always treated as a plural noun in English. The plural usage is still common, as this headline from the New York Times attests: “Data Are Elusive on the Homeless.” Sometimes scientists think of data as plural, as in These data do not support the conclusions. But more often scientists and researchers think of data as a singular mass entity like information, and most people now follow this in general usage. Sixty percent of the Usage Panel accepts the use of data with a singular verb and pronoun in the sentence Once the data is in, we can begin to analyze it. A still larger number, 77 percent, accepts the sentence We have very little data on the efficacy of such programs, where the quantifier very little, which is not used with similar plural nouns such as facts and results, implies that data here is indeed singular.