Some notes on the use of voice keys in reaction time experiments

Cedrus SV-1 voice key device

Somebody asked me about using a voice key device the other day, and I realised it’s not something I’d ever addressed on here. A voice key is often used in experiments where you need to obtain a vocal response time, for instance in a vocal Stroop experiment, or a picture-naming task.

There are broadly two ways of doing this. The first is easy, but expensive, and not very good. The second is time-consuming, but cheap and very reliable.

The first method involves using a bit of dedicated hardware, essentially a microphone pre-amp, which detects the onset of a vocal response, and sends out a signal when it  occurs. The Cedrus SV-1 device pictured above is a good example. This is easy, because you have all your vocal reaction times logged for you, but not totally reliable because you have to pre-set a loudness threshold for the box, and it might miss some responses, if the person just talks quietly, or there’s some unexpected background noise. It should be relatively simple to get whatever stimulus software you’re running to recognise the input from the device and log it as a response.

The other way is very simple to set up, in that you just plug a microphone into the sound card of your stimulus computer and record the vocal responses on each trial as .wav files. Stimulus software like PsychoPy can do this very easily. The downside to this is that you then have to take those sound files and examine them in some way in order to get the reaction time data out – this could mean literally examining the waveforms for each trial in a sound editor (such as Audacity), putting markers on the start of the speech manually, and calculating vocal RTs relative to the start of the file/trial. This is very reliable and precise, but obviously reasonably time-consuming. Manually putting markers on sound files is still the ‘gold standard’ for voice-onset reaction times. Ideally, you should get someone else to do this for you, so they’ll be ‘blind’ to which trials are which, and unbiased in calculating the reaction times. You can also possibly automate the process using a bit of software called SayWhen (paper here).

Example of a speech waveform, viewed in Audacity

Example of a speech waveform, viewed in Audacity

Which method is best depends largely on the number of trials you have in your experiment. The second method is definitely superior (and cheaper, easier to set up) but if you have eleventy-billion trials in your experiment, manually examining them all post hoc may not be very practical, and a more automatic solution might be worthwhile. If you were really clever you could try and do both at once – have two computers set up, the first running the stimulus program, and the second recording the voice responses, but also running a bit of code that signals the first computer when it detects a voice onset. Might be tricky to set up and get working, but once it was, you’d have all your RTs logged automatically on the first computer, plus the .wav files recorded on the second for post hoc analysis/data-cleaning/error-checking etc. if necessary.

Happy vocalising!


Two researchers have pointed out in the comments, that a system for automatically generating response times from sound-files already exists, called CheckVocal. It seems to be designed to work with the DMDX experimental programming system (free software that uses Microsoft’s DirectX system to present stimuli). Not sure if it’ll work with other systems or not, but worth looking at… Have also added the information to my Links page.


About Matt Wall

I do brains. BRAINZZZZ.

Posted on September 5, 2013, in Experimental techniques, Hardware, Software and tagged , , , , . Bookmark the permalink. 6 Comments.

  1. Voice keys are often not great if you really want decent RT data, see Rastle & Davis, 2002,
    But if you do use them, and you want to check just some of your individual files for mistriggering (e.g. participants breathing, coughing, beard crackling against the microphone) this is very good: because you can use it on a bunch of .wavs that you’ve got from anywhere (as well as from DMDX – an unfriendly but free way of presenting stimuli with great control over timing). In fact you can use DMDX in exactly the way you describe – using the voice trigger to schedule the next trial while still recording the response. Checkvocal can also give you a spectrogram view as well as waveform which is very useful, and handcoding using Checkvocal’s queuing system is MUCH quicker than opening individual responses in Audacity one by one. It is often good to have the original response available if reviewers ask you questions about things that you didn’t think of at the time of submission – e.g. “were the vowel durations in Condition B responses longer?” “what kind of errors did people make?” etc.

  2. i realy need to know about this hardware

  3. Thanks for this informative post – for something so widely used there is not enough information available on voice keys, and this post is a welcome addition!
    A short note: checkVocal is not only useful to check preprocessed sound files but also provides an easy to use and fast interface to manually find voice onsets in a large data set. I used it with a large data set and it reduces processing time to about 500 trials an hour.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: