We're Wired for Rhythm in the Bass and Melody Up Top

Here's an interesting bit of research I stumbled on: "Superior time perception for lower musical pitch explains why bass-ranged instruments lay down musical rhythms" A long title for something you'll probably you'll just intuitively get, but interesting nevertheless because it appears we're hard wired for it. Bottom line: Research says we hear the beat in the bass, and, in another paper, we hear the melody in the highest note. Moreover—and this is the surprising bit—these perceptions appear to occur very early in the peripheral perception process well before we cognitively appreciate the music.

The tests were mainly performed by strapping people to an EEG to measure a particular sort of brainwave called a mismatch negativity response (MMN), which is a signal basically indicating that the brain was surprised by something it didn't expect.

MMN is generated primarily in the auditory cortex and represents the brain’s automatic detection of unexpected changes in a stream of sounds. MMN has been measured in response to changes in features such as pitch, timbre, intensity, location, and timing. It occurs between 120 and 250 ms after onset of the deviance. That MMN represents a violation of expectation is supported by the fact that MMN amplitude increases as changes become less frequent

The cool thing about it is that it's automatic...you don't have to think about it. In fact, researchers explain, "We recorded electroencephalography (EEG) while participants watched a silent movie and analyzed the mismatch negativity (MMN) response." And in parts of the pitch study they actually used babies 7 and 3 months old.

In the pitch study, subjects listen to a series of tone pair bursts wherein every once in a while the high or low note is changed in pitch. They found:

MMN was generated to pitch deviants in both the high-pitch and low-pitch voices, but that it was larger to deviants in the higher-pitch voice in both musician and nonmusician adults (2–4) and in infants (5, 6). Furthermore, modeling results suggest that this effect originates early in the auditory pathway in nonlinear cochlear dynamics (10). Thus, we concluded that pitch encoding is more robust for the higher compared with lower of two simultaneous voices.

In the rhythm studies, subjects listened to to a series of tone pair bursts wherein occasionally either the low or high tone came too early or late relative to the overall beat. They also did a second experiment where people were asked to tap along to the beat with their finger, then measured the motor response to the early or late arrivals of the high and low note. They found:

The results show a low-voice superiority effect for timing. We presented simultaneous high-pitched and low-pitched tones in an isochronous stream that set up temporal expectations about the onset of the next presentation, and occasionally presented either the higher or the lower tone 50 ms earlier than expected, while leaving the other tone at the expected time. MMN was larger in response to timing deviants for the lower than the higher tone, indicating better encoding for the timing of lower-pitched compared with higher-pitch tones at the level of the auditory cortex. A separate behavioral study showed that tapping responses were more influenced by timing deviants to the lower- than higher-pitched tone, indicating that auditory–motor synchronization is also more influenced by the lower of two simultaneous pitch streams. Together, these results indicate that the lower tone has greater influence than the high tone on determining both the perception of timing and the entrainment of motor movements to a beat. To our knowledge, these results provide the first evidence showing a general timing advantage for low relative pitch, and the results are consistent with the widespread musical practice of most often carrying the rhythm or pulse in bass-ranged instruments.

Furthermore, and speaking about the measurements image at the top of this page:

Because the deviant tones occurred sooner than expected, one obvious candidate to explain the low-voice superiority effect for timing is forward masking. In forward masking, the presentation of one sound makes a subsequent sound more difficult to detect. The well-known phenomenon of the upward spread of masking, in which lower tones mask higher tones more than the reverse (30), suggests that, in the present experiments, when the lower tone occurred early, it might have masked the subsequent higher tone more than the higher tone masked the lower tone when the higher tone occurred early. We examined evidence for such a peripheral mechanism by inputting the stimuli used in the present experiment into the biologically plausible model of the auditory periphery of Bruce, Carney, and coworkers (27, 28). Because timing precision is likely reflected by spike counts in the auditory nerve, we used spike counts as the output measure rather than the pitch salience measure used in Trainor et al. (10). As can be seen in Fig. 3A, when the lower tone began 50 ms earlier than the higher tone, the spike count at the onset of the lower tone was similar to the spike count when both tones began simultaneously in the standard stimulus, suggesting that the onsets of the low-tone early and standard (simultaneous onsets) stimuli are similarly salient at the level of the auditory nerve. Furthermore, in this low-tone early stimulus, when the higher tone began 50 ms after the lower tone, Fig. 3A shows that there was no accompanying increase in the spike count at the onset of the higher tone, because of forward masking. Thus, the model indicates that the timing of the low-tone early stimulus is unambiguously represented as the onset of the lower tone. However, the situation was different for the high-tone early stimulus. When the higher tone began before the lower tone, Fig. 3A shows that the spike count at the onset of the higher tone was a little lower than for the standard stimulus where both tones began simultaneously. Furthermore, in the high-tone early stimulus, when the low tone entered 50 ms later, the spike count increased at the onset of the low tone. Thus, in the high-tone early stimulus, the spike count increased at both the onset of the high tone and at the onset of the low tone. The timing onset of this stimulus is thereby more ambiguous compared with the case where the low tone began first.

This pattern of results can also be seen in the time-frequency representation shown in Fig. 3B. The Top plot shows the spike count across frequency for the simultaneous (standard) case. A clear onset is shown across frequency channels. The Middle plot shows the spike count across frequency for the low-tone early stimulus. Here, a clear onset is shown across frequency channels 50 ms earlier (at the onset of the lower tone) and no subsequent increase when the second higher tone begins. The lack of subsequent increase is likely because the harmonics of both tones extend from their fundamental frequencies to beyond 4 kHz so the frequency channels excited by the high tone are already excited by the lower tone. Thus, a change in the exact pattern of neural activity is observed at the onset of the high tone but the spatial extent of excitation in the nerve does not change. Finally, the Bottom plot shows the spike count across frequency for the high-tone early stimulus. Note that, at the onset of the higher tone, spikes occur at and above its fundamental; however, when the lower tone enters, an additional increase in spikes occurs at the lower frequencies. Thus, in the case where the higher tone begins first, two onset responses can be seen, making the timing of the stimulus more ambiguous. Greater ambiguity in the onset of the high-tone early stimulus compared with the low-tone early stimulus may have contributed to the low-voice superiority effect for timing as seen in larger MMN responses and greater tapping adjustment for the low-tone early compared with high-tone early stimuli.

And so, as we perform what these researchers call "audio scene analysis" when we listen to music, the workings of our peripheral auditory system (the mechanical ear and nervous structure) makes us more aware of pitch in the higher notes, and more aware of the timing in lower notes. This conclusion is:

Together, these studies suggest that widespread musical practices of placing the most important melodic information in the highest-pitched voices, and carrying the most important rhythmic information in the lowest-pitched voices, might have their roots in basic properties of the auditory system that evolved for auditory-scene analysis.

Or, put another way, the structure of our auditory system has informed our esthetic sense to have us perceive the beat in the bass and the melody on top as more pleasing.

Bass soloists and triangle percussionists wept.

John Grandberg's picture

"Bass soloists and triangle percussionists wept."

That line had me cracking up.

ThePianoMan17's picture

Hi Tyll. As a musician and a student of cognitive science, I'd like to point out a few things. First, thanks for posting this article! I think there's too little talk about perceptual studies in the hifi world. Second, this study, while interesting, doesn't really prove the point. In fact, we also know that musical preference in kids and adults is learned as much as it is innate. The practice of even having Melody and harmony and arranging them as such has a lot to do with the sensitivity of the ear to these frecuencias, but also applies a lot less if you grew up on non-western music. The exposure we have actually changes and conditions the way we hear. For example, most musicians don't have the left/right ear discrepancy of hearing tones better with their left ear and conversation better with their right. If you grow up speaking a tonal language like mandarin, you are more likely to retain perfect pitch when you grow up. If the culture you grew up in places greater emphasis on rhythm, you're more likely to hear that before you hear tone.
Additionally, it's never possible to be absolutely sure in cognitive science. For example in this study: a burst of tones is not musical information, which we know gets processed differently in some very puzzling ways. But if one were to use musical information it would be conditioning and changing the subjects preferences at such a young developmental age... a bit of a catch-22.
In the end, I think the possibility of neuro-cognitive science to examine our capabilities and potential for being trained will tell us much more about consciousness and how we here rather than innate properties. The current cognitive literature on what is innate is not well supported or conclusive. It's simply too hard (maybe impossible) to test because all specimens (people!) are different and there is no lab that is a vacuum. Empirical method is brought to its knees in the face of so many variables.
But all of us are trained listeners in some way, whether conscious or not. And those are actually variables we can more easily account for (qualitatively)
So in the end, the triangle players, and bass soloists are still important.... : )
Although the image you painted is quite amusing.
Keep up the great work Tyll.

germay0653's picture

Could our heightened timing sensitivity to events that manifest themselves in lower frequencies have developed as a primal response for survival, fight or flight, early in our species development? The awareness of more abstract concepts such as melody coming later with less sensitivity to timing.

Tyll Hertsens's picture
Dunno. The thought that kept coming to my mind was listening to the mother's heart beat prior to birth probably has a lot of low frequency information.
ADU's picture

But I think this study goes over my head.

Rhythm is (mostly) learned imo. I was having a conversation with a drummer recently at a music store. We were sort of noodling around on some of the instruments there, and I asked him if he felt he was good enough to sit in with just about anyone. He replied in the affirmative.

ADU's picture

...so I played a few syncopated latin rhythms for him, to see if he could keep up, and he could not. Because it wasn't the usual "4 on the floor" he was used to.

MRC01's picture

This reminds me of a non-scientific investigation I did a while back to help understand why musicians who play high-pitched instruments have a harder time staying in tune.

It is easier to hear pitch intonation differences at higher frequencies.