A new paper just out in PLoS One (thanks to Neuroskeptic for pointing it out on Twitter) shows the results of some tests conducted on three common pieces of software used in psychology and neuroscience research: DMDX, E-Prime, and PsychoPy. The paper is open-access, and you can read it here. The aim was to test the timing accuracy of the software when presenting simple visual stimuli (alternating black and white screens). As I’ve written about before, accurate timing in experiments can often be of great importance and is by no means guaranteed, so it’s good to see some objective work on this, conducted using modern hardware and software.
The authors followed a pretty standard procedure for this kind of thing, which is to use a piece of external recording equipment (in this case the Black Box Tool Kit) connected to a photo-diode. The photodiode is placed on the screen of the computer being tested and detects every black-to-white or white-to-black transition, and the data is logged by the BBTK device. This provides objective information about when exactly the visual stimulus was displayed on the screen (as opposed to when the software running the display thinks it was displayed on the screen, which is not necessarily the same thing). In this paper the authors tested this flickering black/white stimulus at various speeds from 1000ms, down to 16.6667ms (one screen refresh or ‘tick’ on a standard 60Hz monitor).
It’s a nice little paper and I would urge anyone interested to read it in full; the introduction has a really nice review of some of the issues involved, particularly about different display hardware. At first glance the results are a little disappointing though, particularly if (like me) you’re a fan of PsychoPy. All three bits of software were highly accurate at the slower black/white switching times, but as the stimulus got faster (and was therefore more demanding on the hardware) errors began to creep in. At times less than 100ms, errors (in the form of dropped, or added frames) begin to creep in to E-Prime and PsychoPy; DMDX on the other hand just keeps on truckin’, and is highly accurate even at the fastest-switch conditions. PsychoPy is particularly poor under the most demanding conditions, with only about a third of ‘trials’ being presented ‘correctly’ under the fastest condition.
Why does this happen? The authors suggest that DMDX is so accurate because it uses Microsoft’s DirectX graphics libraries, which are highly optimised for accurate performance on Windows. Likewise, E-Prime uses other features of Windows to optimise its timing. PsychoPy on the other hand is platform-agnostic (it will run natively on Windows, Mac OS X, and various flavours of Linux) and therefore uses a fairly high-level language (Python). In simple terms, PsychoPy can’t get quite as close to the hardware as the others, because it’s designed to work on any operating system; there are more layers of software abstraction between a PsychoPy script and the hardware.
Is this a problem? Yes, and no. Because of the way that the PsychoPy ‘coder’ interface is designed, advanced users who require highly accurate timing have the opportunity to optimise their code, based on the hardware that they happen to be using. There’s no reason why a Python script couldn’t take advantage of the timing features in Windows that make E-Prime accurate too – they’re just not included in PsychoPy by default, because it’s designed to work on unix-based systems too. For most applications, dropping/adding a couple of frames in a 100ms stimulus presentation is nothing in particular to worry about anyway; certainly not for the applications I mostly use it for (e.g. fMRI experiments where, of course, timing is important in many ways, but the variability in the haemodynamic response function tends to render a lot of experimental precision somewhat moot). The authors of this paper agree, and conclude that all three systems are suitable for the majority of experimental paradigms used in cognitive research. For me, the benefits of PsychoPy (cross-platform, free licensing, user-friendly interface) far outweigh the (potential) compromise in accuracy. I haven’t noticed any timing issues with PsychoPy under general usage, but I’ve never had a need to push it as hard as these authors did for their testing purposes. It’s worth noting that all the testing in this paper was done using a single hardware platform; possibly other hardware would give very different results.
Those who are into doing very accurate experiments with very short display times (e.g. research on sub-conscious priming, or visual psychophysics) tend to use pretty specialised and highly-optimised hardware and software, anyway. If I ever had a need for such accuracy, I’d definitely undertake some extensive testing of the kind that these authors performed, no matter what hardware/software I ended up using. As always, the really important thing is to be aware of the potential issues with your experimental set-up, and do the required testing before collecting your data. Never take anything for granted; careful testing, piloting, examination of log files etc. is a potential life-saver.
We have the tools, we have the talent!
– Winston Zeddemore
In the last post I talked about timing of experiments in general, and mentioned that timing of responses is critical for success in a reaction time-type experiment. This topic will discuss some of the options for hardware that’s available for collecting responses.
The simplest solution is just to use the human-interface hardware which you’ve already got i.e. get your participants to click a mouse button, or a key on the keyboard as their response. However, there are several reasons why this will generally be undesirable. Most mice and keyboards on modern computers connect via USB, and this introduces a slight lag. The computer ‘looks at’ or ‘polls’ its USB-connected peripherals at a standard rate of 125Hz (meaning 125 times per second, or every 8 milliseconds). This means that if you make a response, there may be a (variable) lag of anywhere between 0 and 8 ms between the response and the computer actually ‘seeing’ it. USB-keyboards have a similar problem. In addition, mice (and keyboards) have a lot of internal circuitry which can also introduce timing lags of variable durations. This paper (Plant et al., 2003) presents the results of some bench-testing of a set of mice. The best mouse they tested had a minimum button-down latency of just 0.4ms, whereas the worst one showed a minimum lag of 48ms! Running timing-critical experiments with the latter mouse (where the effect you might be chasing could be in the range of 20-30ms) would clearly be disastrous. Read the rest of this entry
There is a tide in the affairs of men.
Which, taken at the flood, leads on to fortune;
Julius Caeser, Act 4, scene III
This will be the first in a (probably fairly lengthy) series of posts about how to design, program and run a successful psychology experiment. For this initial post I want to go over some basics about how computers work, and what that means for running successful experiments. Of course there are many different kinds of experiments that it’s possible to run, but for the purposes of the present discussion I’m going to use as an example a canonical kind of cognitive experiment, where the dependent variable is reaction time, measured using a button-press. This kind of experiment is still widely-used, in paradigms like the dot-probe attentional task, and the Implicit Association Test.
The first thing you need to understand is that modern operating systems are very, very complicated. When you boot up Windows (for example) all you see at first is a nice, clean, uncluttered desktop, but examining the Windows Task Manager reveals a whole host of ‘background’ processes which are buzzing away invisibly all the time. These might be search indexers, printer services, network connections, anti-virus software, firewalls, and a whole mess of other stuff. If you then open a few different applications (a web browser with a few tabs open, Microsoft Word, an email program, some instant messaging application) the computer has a few more processes to juggle, as well as all the background stuff. This is normally fine, as long as you have enough RAM to handle all the requirements of these different processes – modern OSs are multi-tasking, meaning they can handle having lots of things going on at once. Read the rest of this entry