Inattentional deafness, sound editing and auditory gorillas

A lovely little paper has just got in press in Cognition – Gorillas we have missed: Sustained inattentional deafness for dynamic events, by a couple of ex-colleagues of mine at Royal Holloway – Polly Dalton and Nick Fraenkel. I thought I’d do a brief write-up, as it describes a couple of great experiments that involved some nifty bits of audio recording and editing; something I’ve been meaning to get around to writing on for some time.

The paper is based on an older, visual, effect described by Simons and Chabris (1999) (PDF here) and termed ‘inattentional blindness’. Essentially, this paper demonstrated that participants can fail to notice a highly salient visual stimulus, if their attention is directed towards some other aspect of the visual scene. The stimulus that these authors used was a video of six people passing basketballs to each other in a complex sequence, and the task for the participants was to count the number of passes made. During the movie, a person in a gorilla suit walked through the middle of the basketball players. Despite the bizarre nature of the manipulation, a substantial proportion of participants (between 30% and 50% depending on the exact condition) simply failed to notice the very obvious ‘gorilla in the midst’. You can see one of the videos used in the experiment below, and there’s also a nice interview with Daniel Simons where he talks about the experiment here.

Schematic of the auditory stimulus used in the experiment, reproduced from Figure 1 of Dalton and Fraenkel (2012).

So, what Polly and Nick did in their new paper is to take this visual effect, and cleverly translate it into the auditory domain. They made recordings of a complex auditory scene with two pairs of conversations happening at once – one pair of female voices and one pair of male voices – with both conversation pairs moving around the auditory ‘space’ during the recording. Also present during the recording was an additional (male) voice that walked through the scene repeatedly saying “I’m a gorilla, I’m a gorilla…” for 19 seconds.The majority of participants (90%) who were cued to listen to the male conversation did notice the ‘auditory gorilla’, however when people were cued to listen to the female conversation only 30% reported noticing the gorilla. The implication is that when we are attending to one category of stimulus (i.e. female voices) we can fail to notice even prominent stimuli which belong to an unattended category (male voices). You can try it yourself, using the below video, which contains an edited version of their stimulus. For the full effect you’ll need to use headphones:

This is clearly a complex auditory stimulus, and creating it involved some really interesting techniques. The recordings were made using an ‘artificial head’ – a (roughly) human-head-shaped recording device, with high-quality microphones positioned in each ear. Using such a device for binaural recordings is important, because the shape of the head (and the outer ear) produces subtle frequency-based distortions in perceived sounds, and the brain uses these cues to localise sounds in 3D space. The separate tracks from the two microphones form a single stereo track and when listened to on headphones, recordings of this type tend to produce a very natural-sounding audio environment. You can read more about this technique here. The two attended conversations were recorded separately from the “I’m a gorilla” stimulus, and the two recordings then mixed together to create the final stimulus – this enabled independent manipulation of the spatial placement of the gorilla stimulus within the scene (which was reversed in experiment 2).

This mixing of the two separate recordings was done using Reaper, a piece of software classed as a Digital Audio Workstation (DAW). DAW devices used to be primarily hardware based, and a digital audio lab used to include racks of equipment; samplers, sequencers etc. The vast majority of these functions can be reproduced with software nowadays. I haven’t used it myself, but Reaper looks to be a fantastic piece of professional-grade software, and is available very cheaply ($60 for an individual/educational licence). DAW software allows almost endless recording and editing possibilities for sound recordings, including studio-based recording of music, applying effects and filters, changing pitch and tempo, mixing and mastering of recordings, and even synthesis (e.g. of pure-tones, for use as auditory cues in experiments).

While Reaper looks great, my recommendation for this kind of software is Audacity, an incredibly full-featured, cross-platform (Windows, Mac and Linux), and entirely free audio editor/recorder. I’ve used Audacity a lot for really basic editing/synthesis tasks, but it has an impressive array of features and has (apparently) been used to record, mix and master entire albums. If you have some sound editing task to accomplish, it would definitely be worth investigating whether you can easily achieve it with Audacity before you splurge on some more expensive, professional software. A good list of other free sound-related software is here.

That’s all for now – happy sound editing! TTFN.

PS. For more details of the Royal Holloway attention lab’s research see their webpage here.


About Matt Wall

I do brains. BRAINZZZZ.

Posted on June 29, 2012, in Commentary, Experimental techniques, Software and tagged , , , , , , , , , . Bookmark the permalink. 12 Comments.

  1. Technologically neat, but conceptually isn’t this just a replication of the dichotic listening findings from the 1960s that were the inspiration for Neisser’s and (then, much later) Simon’s visual “gorilla” analogues?

    • Partly, but I think it’s an important extension to the ‘standard’ dichotic listening paradigm. I’m not super-familiar with that literature, but in general I think in those experiments there’s an attended audio stream presented to one ear and an unattended one in the other. This experiment, by contrast, uses a more naturalistic 3D auditory ‘scene’ with three main elements – the attended conversation, the unattended conversation, and the ‘gorilla’. All three move around the scene during the stimulus, and what’s attended is delineated by a stimulus feature (i.e. male/female voices) rather than location. This seems like a) a much more naturalistic stimulus than having two entirely separate audio streams, and b) the inclusion of the third ‘gorilla’ stimulus exactly parallels the (counter-intuitive and surprising) visual effect seen in the Simon and Chabris (1999) paper.

    • Of course, it’s very possible that others have used a similar set-up in the past in dichotic-listening tasks and I’m just unaware of it… please post some links if so!

  2. Thanks for the reply. This work certainly seems to me to reflect an advance in naturalism and stimulus complexity. However, the theoretical conclusion, that people will miss otherwise salient “distractor” stimuli (e.g., the same word repeated again and again; the subject’s first name) while they track one auditory message and ignore another, is an ancient one by cognitive-psychology standards. (Some of those old dichotic listening studies didn’t separate auditory streams by physical location/channel, but rather just by voice, for example, with both a male and female voice presented to both ears.) It would be most accurate, I think, to describe the Simons gorilla finding (and Neisser’s previous opaque-woman-with-umbrella finding) as visual instantiations of this earlier auditory work.

    • Fair enough! I bow to your clearly superior knowledge on this topic; I am merely a humble tech-obsessed methodologist etc. etc. ;o)

      I did email the authors of the new paper (I know them fairly well) and am hoping they’ll glance over this post at some point, and perhaps even leave a comment – perhaps they can shed some additional light.

  3. Great! Please post what they have to say. They are certainly more expert on the relevant literatures than I am, so I’m very interested in their take. Best regards, MKane

    • Thanks Matt for the post and both for the interesting discussion.

      I fully agree with MKane’s chronology, in the sense that Neisser’s original visual work (which inspired Simons & Chabris’s gorilla demo) was itself based on the early dichotic listening studies. However, I do think that our findings go beyond those studies, and this issue is discussed in some detail in the paper.

      One of the most important differences is that our use of a binaural scene allowed us to present three separable scene elements (men, women, gorilla), whereas dichotic presentation allows for only two streams. In our scene, the task of attending to the women while ignoring the men is very much like a dichotic listening task. Based on the dichotic listening findings, you would therefore expect people to process the basics of the unattended men’s stream (e.g. they should be able to tell you about the gender of the speakers) but not its semantic contents. However, the real focus of our study was to ask what then happens if you add an unexpected and clearly separable third scene element (the gorilla) into this set-up. I would argue that the inattentional deafness people experienced for the gorilla (with most remaining completely unaware of his presence) constitutes a more extreme effect of selective attention than those observed within the dichotic listening paradigm (in which people were typically aware of the presence and basic characteristics of the unattended stream, even if they did not recall its semantic contents).

      Anyway, thanks again for this discussion — I would be very interested in any further comments.

  4. Yes, thanks from me, too!

  1. Pingback: Finding peace and quiet in a noisy world « Qua locus

  2. Pingback: Inattentional deafness « Attention Lab

  3. Pingback: A primer on digital audio from Engadget « Computing for Psychologists

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: