The effect of “big data” on the humanities is a hot topic in intellectual circles these days, and every so often, the shifting sound of popular music is at the center of a big data story: the research results may seem counter-intuitive (is pop music getting sadder and slower?) or confirm what you thought all along (is pop music getting louder and more monotonous?), but, as reporters rush to assure us, they are newsworthy because, for the first time, the conclusions are backed with hard data, not squishy aesthetic theorizing. The numbers don’t lie.
But research can only be as good as the encoded data it’s based on; look under the surface of recently reported computer-enabled analyses of pop music and you’ll find that the old programmer’s dictum—“garbage in, garbage out”—is still the last word.
1. The Million Song Dataset
Take, for instance, the research that gave rise to the “pop music is getting louder and more monotonous” meme. A group of Spanish researchers in artificial intelligence led by Dr. Joan Serrà ran statistical analyses of the “Million Song Dataset,” a machine-readable collection of metadata correlated to about that many songs, chosen to provide a representative random sample of the music consumers might be looking for online. (Full information about the MSD can be found here.)
Yes, I said consumers—the Million Song Dataset is a collaboration between sound researchers at Columbia University and a commercial start-up called The Echo Nest, working together on better methods of Music Information Retrieval (MIR). MIR is the algorithmic back-end behind the preference engines that drive music streaming services like Pandora, swinging into action when you ask the computer to show you “more songs like that last one.” The basic data in the MSD is catalog, artist, and genre. But there is also computer-generated data about musical structure. Some of the information is basic (length of track), some would require a little processing (number of beats), and some could only be the result of complex Echo Nest algorithms whose working assumptions are proprietary (key and mode, time signature, timbre).
The point to remember is that the MSD does not have any actual audio in it; the data on musical characteristics like key, mode, meter, and timbre are represented by arrays of pre-crunched numbers. Thus any conclusions drawn from the MSD are already constrained by the assumptions and mindset of the industry-research teams that created the database, and by the technical limitations of the DSP that turns sound files into numerical arrays. Of course, digital sound files are themselves nothing more than huge arrays of numbers, but, as the MSD FAQ freely admits, “there is a lot of information lost when going from audio to these [numerical representations of musical] features.”
2. All That Stuff Sounds the Same to Me
As long as the data in the MSD is being used to sharpen the wits of online preference engines, this rough numerical representation of a few chosen musical features is probably good enough—and if it isn’t, market competition will obsolete the methodology. Unfortunately, researchers’ attempts to transform marketing data into musicology just don’t work anywhere near as well. It’s a relatively simple task to correlate the date field of each record in the MSD with the number between zero and one that represents its “loudness.” The results do support an “evolutionary” hypothesis: adapting to an increasingly noisy environment, popular music has indeed been getting louder over time. Bolstered by this positive finding, the Spanish team tried a more complex correlation exercise. The following discussion is going to get a bit technical, but we need to track the actual methodology of the study, found only in a supplement to the published research paper, to see if its news-making conclusions actually hold up.
The MDS has numerical arrays representing pitch and timbre for every “segment” (= one beat); these store dimensional values between zero and one for each of the twelve chromatic pitches used in Western music, representing their relative presence. Serra and his colleagues threw away most of this information, settling for a simpler set of twelve binaries: effectively, each pitch was either “on” or “off” during the segment in question. This generated 4096 (212) discrete “codewords” (i.e., chords) which could then be tracked by the researchers to see how often they occurred and whether they changed from beat to beat.
Timbre is represented in the MDS by an array of floating-point values for loudness and eleven other abstract features of perceived sound (brightness, flatness, sharpness of attack, etc.). In Serra’s model this immense field of sound is run through a high-medium-low statistical filter and reduced to 311 (117,147) possibilities. (I told you this was going to get technical.)
So, do you believe that there are precisely 4096 chords and 117,147 sounds in Western music, ready to be distributed, one per beat, across the entirety of popular song? If so, you’ll be happy to know that their general distribution follows Zipf’s Law, the same inverse-square relation that describes the frequency of words in a corpus of language, a satisfying scientific result and not particularly newsworthy. But then Serra tracked the transitions between pitch and timbre codewords across the entire corpus, and found that, for both pitch and timbre, the number of non-identical transitions has decreased over time since 1955.
Or, as one headline writer summarized: “Modern Music too Loud, All Sounds the Same.”
It seems to me that “modern music” has little to do with it. At what point during the crunching down of sound into numbers, numbers into codewords, and codewords into transition networks, do we accept that so much information has been lost and regenerated that the model itself is showing its monotony, not the corpus? It’s as if you were to convert all the CD-quality audio in your collection to 64-bit MP3, then expand it again using interpolation, crunch it back down to 64 bits again, interpolate and reduce, over and over again, until the end product had the sonic texture of processed cheese. Would you then blame “the music”?
3. Most of My Heroes Don’t Appear in No Dataset
But all my musicological second-guessing to this point is just preamble to the one really stunning failure of the Million Song Dataset. What, you may ask, does the MDS tell us about syncopation, polyrhythm, groove, swing, flow? The answer: absolutely nothing. (Say it again!)
This is a classic example of how any map, no matter how big, differs from the territory. In the numerical world of the MDS, musical rhythm is nothing more than a mapmaker’s grid, like longitude and latitude, to orient the pitch and timbre algorithms as they do their work. The dataset shows where, and with what degree of confidence, a computer can locate the fall of a song’s basic beat—but that evidence is used only to define the boundaries of significant units. The complicated distribution of attacks and releases between the beats is not quantitatively analyzed in any way that I can see.
This is not a technical problem, since existing beat-slicing software for DJs could easily do the job. It is, rather, a problem of cultural perspective. Our own language betrays us: what is “monotony,” literally? In European languages, listening too long to “one tone” is a deep-rooted metaphor for existential boredom; is there an analogous figure of Western speech whose vehicle is rhythmically impoverished music? (Evidently not, since at least one online dictionary gives the following usage: “That song has a monotonous rhythm.”)
Do I have to spell it out? Music isn’t getting stupider, it’s getting funkier. Increasing rhythmic complexity, increasing overall volume, and decreasing pitch circulation with homogenization of timbre means that popular music has indeed “evolved,” away from the European model of expansive melody and big ensembles (middle-period Elvis) toward a leaner, more African ideal based on small groups emphasizing repetition and tightness (middle-period Run-DMC). If you want to confirm this hypothesis, I’m afraid the Million Song Dataset will be of no use; you’ll need another, less ethnocentric set of data, and, in the meantime, why not listen to some actual contemporary popular music? (How about this; or this; or this; or even this?)
I think you’ll find it anything but monotonous.