The Speech-to-Song Illusion was discovered by Deutsch in 1995, when she was fine-tuning the spoken commentary on her CD ‘Musical Illusions and Paradoxes’1. She had the phrase ‘sometimes behave so strangely’ on a loop, and noticed that after a number of repetitions, the phrase sounded as though sung rather than spoken. Later she included this illusion in the CD ‘Phantom Words and Other Curiosities’2, accompanied by the following commentary:
In our final demonstration, speech is made to be heard as song, and this is achieved without transforming the sounds in any way, or by adding any musical context, but simply by repeating a phrase several times over. The demonstration is based on a sentence at the beginning of the CD Musical Illusions and Paradoxes. When you listen to this sentence in the usual way, it appears to be spoken normally - as indeed it is. However, when you play the phrase that is embedded in it: 'sometimes behave so strangely' over and over again, a curious thing happens. At some point, instead of appearing to be spoken, the words appear to be sung, rather as in the figure below.
Here is the full sentence followed by the phrase played repeatedly:
And here is the notated phrase as it is generally heard after it has been played repeatedly:
Figure 1. The phrase ‘sometimes behave so strangely’ as it appears to be sung after it has been repeated several times.
Now here again is the exact same sentence as you just heard. You will probably find that it begins by sounding as speech, just as before. But when you come to the phrase that had been repeated, it suddenly appears to burst into song.
My colleagues and I investigated this effect in detail 3, 4. In our first experiment we tested three matched groups of subjects, and presented each group with a different condition. The subjects listened to the full sentence and then to ten presentations of the phrase. During each pause between presentations the subjects judged on a five-point scale whether they heard the phrase as exactly like speech, like speech, like either speech or song, like song, or exactly like song.
In all conditions, the first and last presentations of the phrase were identical, and we examined the effects of two manipulations of the intervening presentations on the subjects' judgments. In the first condition, the intervening presentations were exactly as the original. In the second, they were transposed slightly, so that the pitches differed but the pitch relationships were preserved. In the third, the intervening presentations were not transposed, but the syllables were presented in jumbled orderings.
Figure 2. Subjects’ judgments of the spoken phrase after ten repetitions. When the repetitions were exact, the phrase was heard solidly as song. When the phrase was transposed slightly on each repetition, the phrase continued to be judged as speech.
Figure 2 compares the effects of having the intervening repetitions exactly as the original, as compared with having them transposed slightly. We can see that when the repetitions were exact, perception moved solidly from speech to song. However, when the repetitions were transposed slightly, although ratings moved slightly towards song, they remained solidly in the speech region.
Figure 3. Subjects’ judgments of the spoken phrase after ten repetitions. When the repetitions were exact, the phrase was heard solidly as song. But when the same syllables were presented in jumbled orderings, the phrase continued to be judged as speech.
Figure 3 shows what happened when the intervening repetitions consisted of the exact same syllables but in jumbled orderings, again compared with having them exactly as the original. We can see that when the syllables were jumbled, there was no transformation from speech to song. So it seems that, in order for this transformation to occur, the phrase needs to be repeated exactly, without transposition, and without changing the ordering of the syllables.
So we can then ask: What do the subjects actually hear when they say that they are hearing song? To find out, we recruited 11 female subjects who had had experience with singing in choirs or choruses, and tested each subject in isolation from the others. We had them listen to the full sentence and then to the phrase repeated ten times, and asked them to reproduce the phrase exactly as they had heard it.
Here are the reproductions of six of the subjects played in sequence. As is evident, although the phrase was spoken, the subjects reproduced it as song.
And here are the reproductions of all 11 subjects, digitally mixed together so that they are played as a chorus. (A small amount of reverberation has been added, but otherwise the sounds are exactly as they were recorded.)
But one might then wonder whether these subjects could have heard the phrase as sung the first time they heard it. So we recruited another set of 11 subjects on the same basis, and also tested them in isolation from each other. This time we played them the full sentence followed by the phrase presented only once, and asked them to reproduce the phrase exactly as they heard it. Here are the reproductions of six of these subjects played in sequence.
And here are the reproductions of all 11 subjects, again digitally mixed together so that they are played as a chorus. This confirms our finding from the rating experiment that when the phrase is heard only once, it is perceived as speech rather than song.
To make sure that these subjects were able repeat the pitches after a single hearing, we then had them listen only once to the phrase as sung rather than spoken, and again asked them to repeat back exactly what they had heard. Here are the reproductions of the same six subjects that you just heard, and you can see that they had no problem reproducing the sung melody.
Figure 4. Average pitch of each syllable, averaged over all the subjects who repeated back the spoken phrase after hearing it ten times (red line), and averaged over all the subjects who repeated back the spoken phrase after hearing it only once (blue line).
The red line in Figure 4 shows the average pitch of each syllable, averaged over the 11 subjects who repeated back the spoken phrase after having heard it 10 times. The blue line shows the average pitch of each syllable, averaged over the other set of 11 subjects, who repeated back the same spoken phrase after having heard it only once. As we can see, the reproductions of the two groups were very different.
Figure 5. Average pitch of each syllable, averaged over all the subjects who repeated back the spoken phrase after hearing it ten times (red line) and averaged over all the subjects who repeated back the sung phrase after hearing it only once (green line).
The red line in the Figure 5 again shows the average pitch of each syllable in the spoken phrase, averaged over the 11 subjects who repeated it back after having heard it 10 times. The green line shows the average pitch of each syllable, averaged over the other set of 11 subjects, who repeated back the sung phrase after having heard it only once. Notice that there is a remarkable correspondence between these two plots, showing that the subjects' perceptions of the sung phrase were very similar to those of the subjects who had instead heard the spoken phrase repeated 10 times, and quite different from their own perceptions of the spoken phrase when they had heard it only once.
To conclude, this illusion is in line with what philosophers and musicians have been arguing for centuries, that strong linkages must exist between speech and music. We are just beginning to determine the neural processes that are responsible for this striking perceptual transformation5. However, the present experiments show that for a phrase to be heard as spoken or as sung, it does not need to have a set of physical properties that are unique to speech, or a different set of physical properties that are unique to song. Rather, we must conclude that, assuming the neural circuitries underlying speech and song are at some point distinct and separate, they can accept the same input, but process the information in different ways so as to produce different outputs. As a further point, this illusion demonstrates a striking example of very rapid and highly specific perceptual reorganization, so showing an extreme form of short term neural plasticity in the auditory system.
Listen to the WNYC Radio Lab interview (NPR) with Jad Abumrad and Robert Krulwich about "Sometimes Behave So Strangely" and perfect pitch.
The illusion has also been featured in several videos. For example, the video below features the illusion being experienced by the fifth graders of Atwater School, Shorewood, Wisconsin.
Deutsch's illusion "Sometimes Behave So Strangely" experienced by the fifth graders of Atwater School, Shorewood, Wisconsin. Video created by their music teacher Walt Boyer, posted with permission.
3. Deutsch, D., Lapidis, R., and Henthorn, T. The speech-to-song illusion. Invited Lay language paper presented at the 156th meeting of the Acoustical Society of America. Journal of the Acoustical Society of America, 2008, November, Miami.[Laylanguage version]
4. Deutsch, D., Henthorn, T., and Lapidis, R. Illusory transformation from speech to song. Journal of the Acoustical Society of America, 2011, 129, 2245-2252. [PDF Document]
5. Tierney, A., Dick, F., Deutsch, D., and Sereno, M. Speech versus song: Multiple pitch-sensitive areas revealed by a naturally occurring musical illusion. Cerebral Cortex, 2012. [PDF Document]