Some notes on the use of voice keys in reaction time experiments
Somebody asked me about using a voice key device the other day, and I realised it’s not something I’d ever addressed on here. A voice key is often used in experiments where you need to obtain a vocal response time, for instance in a vocal Stroop experiment, or a picture-naming task.
There are broadly two ways of doing this. The first is easy, but expensive, and not very good. The second is time-consuming, but cheap and very reliable.
The first method involves using a bit of dedicated hardware, essentially a microphone pre-amp, which detects the onset of a vocal response, and sends out a signal when it occurs. The Cedrus SV-1 device pictured above is a good example. This is easy, because you have all your vocal reaction times logged for you, but not totally reliable because you have to pre-set a loudness threshold for the box, and it might miss some responses, if the person just talks quietly, or there’s some unexpected background noise. It should be relatively simple to get whatever stimulus software you’re running to recognise the input from the device and log it as a response.
The other way is very simple to set up, in that you just plug a microphone into the sound card of your stimulus computer and record the vocal responses on each trial as .wav files. Stimulus software like PsychoPy can do this very easily. The downside to this is that you then have to take those sound files and examine them in some way in order to get the reaction time data out – this could mean literally examining the waveforms for each trial in a sound editor (such as Audacity), putting markers on the start of the speech manually, and calculating vocal RTs relative to the start of the file/trial. This is very reliable and precise, but obviously reasonably time-consuming. Manually putting markers on sound files is still the ‘gold standard’ for voice-onset reaction times. Ideally, you should get someone else to do this for you, so they’ll be ‘blind’ to which trials are which, and unbiased in calculating the reaction times. You can also possibly automate the process using a bit of software called SayWhen (paper here).
Which method is best depends largely on the number of trials you have in your experiment. The second method is definitely superior (and cheaper, easier to set up) but if you have eleventy-billion trials in your experiment, manually examining them all post hoc may not be very practical, and a more automatic solution might be worthwhile. If you were really clever you could try and do both at once – have two computers set up, the first running the stimulus program, and the second recording the voice responses, but also running a bit of code that signals the first computer when it detects a voice onset. Might be tricky to set up and get working, but once it was, you’d have all your RTs logged automatically on the first computer, plus the .wav files recorded on the second for post hoc analysis/data-cleaning/error-checking etc. if necessary.
Two researchers have pointed out in the comments, that a system for automatically generating response times from sound-files already exists, called CheckVocal. It seems to be designed to work with the DMDX experimental programming system (free software that uses Microsoft’s DirectX system to present stimuli). Not sure if it’ll work with other systems or not, but worth looking at… Have also added the information to my Links page.