Category Archives: Experimental techniques
A few days ago I reported on a new paper that tested the timing accuracy of experimental software packages. The paper suggested that PsychoPy had some issues with presenting very fast stimuli that only lasted one (or a few) screen refreshes.
However, the creator of PsychoPy (Jon Peirce, of Nottingham University) has already responded to the data in the paper. The authors of the paper made all the data and (importantly) the program scripts used to generate it available on the Open Science Framework site. As a result it was possible for Jon to examine the scripts and see what the issue was. It turns out the authors used an older version of PsychoPy, where the ability to set a stimulus duration in frames (or screen refreshes) wasn’t available. Up-to-date versions have this feature, and as a result are now much more reliable. Those who need the full story should read Jon’s comment (and the response by the authors) on the PLoS comments page for the article.
So, great news if, like me, you’re a fan of PsychoPy. However, there’s a wider picture that’s revealed by this little episode, and a very interesting one I think. As a result of the authors posting their code up to the OSF website, and the fact that PLoS One allows comments to be posted on its articles, the issue could be readily identified and clarified, in a completely open and public manner, and within a matter of days. This is one of the best examples I’ve seen of the power of an open approach to science, and the ability of post-publication review to have a real impact.
Interesting times, fo’ shiz.
A new paper just out in PLoS One (thanks to Neuroskeptic for pointing it out on Twitter) shows the results of some tests conducted on three common pieces of software used in psychology and neuroscience research: DMDX, E-Prime, and PsychoPy. The paper is open-access, and you can read it here. The aim was to test the timing accuracy of the software when presenting simple visual stimuli (alternating black and white screens). As I’ve written about before, accurate timing in experiments can often be of great importance and is by no means guaranteed, so it’s good to see some objective work on this, conducted using modern hardware and software.
The authors followed a pretty standard procedure for this kind of thing, which is to use a piece of external recording equipment (in this case the Black Box Tool Kit) connected to a photo-diode. The photodiode is placed on the screen of the computer being tested and detects every black-to-white or white-to-black transition, and the data is logged by the BBTK device. This provides objective information about when exactly the visual stimulus was displayed on the screen (as opposed to when the software running the display thinks it was displayed on the screen, which is not necessarily the same thing). In this paper the authors tested this flickering black/white stimulus at various speeds from 1000ms, down to 16.6667ms (one screen refresh or ‘tick’ on a standard 60Hz monitor).
It’s a nice little paper and I would urge anyone interested to read it in full; the introduction has a really nice review of some of the issues involved, particularly about different display hardware. At first glance the results are a little disappointing though, particularly if (like me) you’re a fan of PsychoPy. All three bits of software were highly accurate at the slower black/white switching times, but as the stimulus got faster (and was therefore more demanding on the hardware) errors began to creep in. At times less than 100ms, errors (in the form of dropped, or added frames) begin to creep in to E-Prime and PsychoPy; DMDX on the other hand just keeps on truckin’, and is highly accurate even at the fastest-switch conditions. PsychoPy is particularly poor under the most demanding conditions, with only about a third of ‘trials’ being presented ‘correctly’ under the fastest condition.
Why does this happen? The authors suggest that DMDX is so accurate because it uses Microsoft’s DirectX graphics libraries, which are highly optimised for accurate performance on Windows. Likewise, E-Prime uses other features of Windows to optimise its timing. PsychoPy on the other hand is platform-agnostic (it will run natively on Windows, Mac OS X, and various flavours of Linux) and therefore uses a fairly high-level language (Python). In simple terms, PsychoPy can’t get quite as close to the hardware as the others, because it’s designed to work on any operating system; there are more layers of software abstraction between a PsychoPy script and the hardware.
Is this a problem? Yes, and no. Because of the way that the PsychoPy ‘coder’ interface is designed, advanced users who require highly accurate timing have the opportunity to optimise their code, based on the hardware that they happen to be using. There’s no reason why a Python script couldn’t take advantage of the timing features in Windows that make E-Prime accurate too – they’re just not included in PsychoPy by default, because it’s designed to work on unix-based systems too. For most applications, dropping/adding a couple of frames in a 100ms stimulus presentation is nothing in particular to worry about anyway; certainly not for the applications I mostly use it for (e.g. fMRI experiments where, of course, timing is important in many ways, but the variability in the haemodynamic response function tends to render a lot of experimental precision somewhat moot). The authors of this paper agree, and conclude that all three systems are suitable for the majority of experimental paradigms used in cognitive research. For me, the benefits of PsychoPy (cross-platform, free licensing, user-friendly interface) far outweigh the (potential) compromise in accuracy. I haven’t noticed any timing issues with PsychoPy under general usage, but I’ve never had a need to push it as hard as these authors did for their testing purposes. It’s worth noting that all the testing in this paper was done using a single hardware platform; possibly other hardware would give very different results.
Those who are into doing very accurate experiments with very short display times (e.g. research on sub-conscious priming, or visual psychophysics) tend to use pretty specialised and highly-optimised hardware and software, anyway. If I ever had a need for such accuracy, I’d definitely undertake some extensive testing of the kind that these authors performed, no matter what hardware/software I ended up using. As always, the really important thing is to be aware of the potential issues with your experimental set-up, and do the required testing before collecting your data. Never take anything for granted; careful testing, piloting, examination of log files etc. is a potential life-saver.
Regular readers will know that I’m a big fan of PsychoPy, which (for non-regular readers; *tsk*) is a piece of free, open-source software for designing and programming experiments, built on the Python language. I’ve been using it a lot recently, and I’m happy to report my initial ardour for it is still lambently undimmed.
The PsychoPy ‘builder’ interface (a generally brilliant, friendly, GUI front-end) does have one pretty substantial drawback though; it doesn’t support conditional branching. In programming logic, a ‘branch’ is a point in a program which causes the computer to start executing a different set of instructions. A ‘conditional branch’ is where the computer decides what to do out of two of more alternatives (i.e. which branch to follow) based on some value or ‘condition’. Essentially the program says, ‘if A is true: do X, otherwise (or ‘else’ in programming jargon) if B is true; do Y’. One common use of conditional branching in psychology experiments is to repeat trials that the subject got incorrect; for instance, one might want one’s subjects to achieve 90% correct on a block of trials before they continue to the next one, so the program would have something in it which said ‘if (correct trials > 90%); then continue to the next block, else if (correct trials < 90%); repeat the incorrect trials’.
At the bottom of the PsychoPy builder is a time-line graphic (the ‘flow panel’) which shows the parts of the experiment:
The experiment proceeds from left to right, and each part of the flow panel is executed in turn. The loops around parts of the flow panel indicate that the bits inside the loops are run multiple times (i.e. they’re the trial blocks). This is an extremely powerful interface, but there’s no option to ‘skip’ part of the flow diagram – everything is run in the order in which it appears from left-to-right.
This is a slight issue for programming fMRI experiments that use block-designs. In block-design experiments, typically two (or more) blocks of about 15-20 seconds are alternated. They might be a ‘rest’ block (no stimuli) alternated with a visual stimulus, or two different kinds of stimuli, say, household objects vs. faces. For a simple two-condition-alternating experiment one could just produce routines for two conditions, and throw a loop around them for as many repeats as needed. The problem arises when there are more than two block-types in your experiment, and you want to randomise them (i.e. have a sequence which goes ABCBCACAB…etc.). There’s no easy way of doing this in the builder. In an experiment lasting 10 minutes one might have 40 15-second blocks, and the only way to produce the (psuedo-random) sequence you want is with 40 separate elements in the flow panel that all executed one-by-one (with no loops). Building such a task would be very tedious, and more importantly, crashingly inelegant. Furthermore, you probably wouldn’t want to use the same sequence for every participant, so you’d have to laboriously build different versions with different sequences of the blocks. There’s a good reason for why this kind of functionality hasn’t been implemented; it would make the builder interface much more complicated and the PsychoPy developers are (rightly) concerned with keeping the builder as clean and simple as possible.
Fortunately, there’s an easy little hack which was actually suggested by Jon Peirce (and others) on the PsychoPy users forum. You can in fact get PsychoPy to ‘skip’ routines in the flow panel, by the use of loops, and a tiny bit of coding magic. I thought it was worth elaborating the solution on here somewhat, and I even created a simple little demo program which you can download and peruse/modify.
So, this is how it works. I’ve set up my flow panel like this:
So, there are two blocks, each of which have their own loop, a ‘blockSelect’ routine, and a ‘blockSelectLoop’ enclosing the whole thing. The two blocks can contain any kind of (different) stimulus element; one could have pictures, and one could have sounds, for instance – I’ve just put some simple text in each one for demo purposes. The two block-level loops have no condition files associated with them, but in the ‘nReps’ field of their properties box I’ve put a variable ‘nRepsblock1’ for block1 and ‘nRepsblock2’ for block2. This tells the program how many times to go around that loop. The values of these variables are set by the blockSelect routine which contains a code element, which looks like this:
The full code in the ‘Begin Routine’ box above is this:
if selectBlock==1: nRepsblock1=1 nRepsblock2=0 elif selectBlock==2: nRepsblock1=0 nRepsblock2=1
This is a conditional branching statement which says ‘if selectBlock=1, do X, else if selectBlock=2, do Y’. The variable ‘selectBlock’ is derived from the conditions file (an excel workbook) for the blockSelectLoop, which is very simple and looks like this:
So, at the beginning of the experiment I define the two variables for the number of repetitions for the two blocks, then on every go around the big blockSelectLoop, the code in the blockSelect routine sets one of the number of repetitions of each of the small block-level loops to 0, and the other to 1. Setting the number of repetitions for a loop to 0 basically means ‘skip that loop’, so one is always skipped, and the other one is always executed. The blockSelectLoop sequentially executes the conditions in the excel file, so the upshot is that this program runs block1, then block2, then block1 again, then block2 again. Now all I have to do if I want to create a different sequence of blocks is to edit the column in the conditions excel file, to produce any kind of sequence I want.
Hopefully it should be clear how to extend this very simple example to use three (or more) block/trial types. I’ve actually used this technique to program a rapid event-related experiment based on this paper, that includes about 10 different trial types, randomly presented, and it works well. I also hope that this little program is a good example of what can be achieved by using code-snippets in the builder interface; this is a tremendously powerful feature, and really extends the capabilities of the builder well beyond what’s achievable through the GUI. It’s a really good halfway step between relying completely on the builder GUI and the scaryness of working with ‘raw’ python code in the coder interface too.
If you want to download this code, run it yourself and poke it with a stick a little bit, I’ve made it available to download as a zip file with everything you need here. Annoyingly, WordPress doesn’t allow the upload of zip files to its blogs, so I had to change the file extension to .pdf; just download (right-click the link and ‘Save link as…’) and then rename the .pdf bit to .zip and it should work fine. Of course, you’ll also need to have PsychoPy installed as well. Your mileage may vary, any disasters that occur as a result of you using this program are your own fault, etc. etc. blah blah.
Happy coding! TTFN.
Since I’ve switched to using PsychoPy for programming my behavioural and fMRI experiments (and if you spend time coding experiments, I strongly suggest you check it out too, it’s brilliant) I’ve been slowly getting up to speed with the Python programming language and syntax. Even though the PsychoPy GUI ‘Builder’ interface is very powerful and user-friendly, one inevitably needs to start learning to use a bit of code in order to get the best out of the system.
Before, when people occasionally asked me questions like “What programming language should I learn?” I used to give a somewhat vague answer, and say that it depended largely on what exactly they wanted to achieve. Nowadays, I’m happy to recommend that people learn Python, for practically any purpose. There are an incredible number of libraries available that enable you to do almost anything with it, and it’s flexible and powerful enough to fit a wide variety of use cases. Many people are now using it as a free alternative to Matlab, and even using it for ‘standard’ statistical analyses too. The syntax is incredibly straight-forward and sensible; even if an individual then goes on to use a different language, I think Python is a great place to start with programming for a novice. Python seems to have been rapidly adopted by scientists, and there are some terrific resources out there for learning Python in general, and its scientific applications in particular.
For those getting started there are a number of good introductory resources. This ‘Crash Course in Python for Scientists’ is a great and fairly brief introduction which starts from first principles and doesn’t assume any prior knowledge. ‘A Non-Programmers Tutorial for Python 2.6′ is similarly introductory, but covers a bit more material. ‘Learn Python the Hard Way’ is also a well-regarded introductory course which is free to view online, but has a paid option ($29.95) which gives you access to additional PDFs and video material. The ‘official’ Python documentation is also pretty useful, and very comprehensive, and starts off at a basic level. Yet another good option is Google’s Python Classes.
For those who prefer a more interactive experience, CodeAcademy has a fantastic set of interactive tutorials which guide you through from the complete beginning, up to fairly advanced topics. PythonMonk and TryPython.org also have similar systems, and all three are completely free to access – well worth checking out.
For Neuroimagers, there are some interesting Python tools out there, or currently under development. The NIPY (Neuroimaging in Python) community site is well worth a browse. Most interestingly (to me, anyway) is the nipype package, which is a tool that provides a standard interface and workflow for several fMRI analysis packages (FSL, SPM and FreeSurfer) and facilitates interaction between them – very cool. fMRI people might also be very interested in the PyMVPA project which has implemented various Multivariate Pattern Analysis algorithms.
People who want to do some 3D programming for game-like interfaces or experimental tasks will also want to check out VPython (“3D Programming for ordinary mortals”!).
Finally, those readers who are invested in the Apple ecosystem and own an iPhone/iPad will definitely want to check out Pythonista – a full featured development environment for iOS, with a lot of cool features, including exporting directly to XCode (thanks to @aechase for pointing this one out on Twitter). There looks to be a similar app called QPython for Android, though it’s probably not as full-featured; if you’re an Android user, you’re probably fairly used to dealing with that kind of disappointment though. ;o)
Anything I’ve missed? Let me know in the comments and I’ll update the post.
Somebody asked me about using a voice key device the other day, and I realised it’s not something I’d ever addressed on here. A voice key is often used in experiments where you need to obtain a vocal response time, for instance in a vocal Stroop experiment, or a picture-naming task.
There are broadly two ways of doing this. The first is easy, but expensive, and not very good. The second is time-consuming, but cheap and very reliable.
The first method involves using a bit of dedicated hardware, essentially a microphone pre-amp, which detects the onset of a vocal response, and sends out a signal when it occurs. The Cedrus SV-1 device pictured above is a good example. This is easy, because you have all your vocal reaction times logged for you, but not totally reliable because you have to pre-set a loudness threshold for the box, and it might miss some responses, if the person just talks quietly, or there’s some unexpected background noise. It should be relatively simple to get whatever stimulus software you’re running to recognise the input from the device and log it as a response.
The other way is very simple to set up, in that you just plug a microphone into the sound card of your stimulus computer and record the vocal responses on each trial as .wav files. Stimulus software like PsychoPy can do this very easily. The downside to this is that you then have to take those sound files and examine them in some way in order to get the reaction time data out – this could mean literally examining the waveforms for each trial in a sound editor (such as Audacity), putting markers on the start of the speech manually, and calculating vocal RTs relative to the start of the file/trial. This is very reliable and precise, but obviously reasonably time-consuming. Manually putting markers on sound files is still the ‘gold standard’ for voice-onset reaction times. Ideally, you should get someone else to do this for you, so they’ll be ‘blind’ to which trials are which, and unbiased in calculating the reaction times. You can also possibly automate the process using a bit of software called SayWhen (paper here).
Which method is best depends largely on the number of trials you have in your experiment. The second method is definitely superior (and cheaper, easier to set up) but if you have eleventy-billion trials in your experiment, manually examining them all post hoc may not be very practical, and a more automatic solution might be worthwhile. If you were really clever you could try and do both at once – have two computers set up, the first running the stimulus program, and the second recording the voice responses, but also running a bit of code that signals the first computer when it detects a voice onset. Might be tricky to set up and get working, but once it was, you’d have all your RTs logged automatically on the first computer, plus the .wav files recorded on the second for post hoc analysis/data-cleaning/error-checking etc. if necessary.
Two researchers have pointed out in the comments, that a system for automatically generating response times from sound-files already exists, called CheckVocal. It seems to be designed to work with the DMDX experimental programming system (free software that uses Microsoft’s DirectX system to present stimuli). Not sure if it’ll work with other systems or not, but worth looking at… Have also added the information to my Links page.
It’s always exciting when you find a new piece of cool software to play with, and even more so when what you’ve found is totally free, open-source, and available on all platforms. So it is with MakeHuman – an utterly awesome bit of kit. I wrote a piece before about FaceGen, which is also pretty cool, but MakeHuman takes it to the next level, by modelling all kinds of body characteristics as well as faces, and, of course, doing it all for free.
I’ve just downloaded it and played with it for a few minutes, but I’m already impressed by the range of options available. Through a very simple slider and radio-button based interface you have very fine control over all kinds of variables, including gender, weight, age, height, and many more, with endless fine tweak-ability possible of body and face if you dig through the options. There are also basic libraries of clothes and poses included. Here’s a well, a human, I made in just a couple of minutes:
And here’s a close-up of the face, after I added some hair and gave him a nasty expression:
Pretty cool indeed. This could potentially be a massively useful tool for people interested in face/body perception – using this, one could generate a large number of highly-controlled experimental stimuli that just differ in one aspect (say, weight, or race… whatever) very easily and quickly. Download it and have a play around!
I’ve just come across two outstanding tutorial videos over on xiph.org – an open-source organisation dedicated to developing multimedia protocols and tools. So, the first one covers the fundamental principles of digital sampling for audio and video, and discusses sampling rates, bit depth and lots of other fun stuff – if you’ve ever wondered what a 16-bit, 128kbps mp3 is, this is for you.
The second one focusses on audio and gets on to some more advanced topics, about how audio behaves in the real world.
They’re both fairly long (30 mins and 23 mins respectively) but well worth watching. If you’re just getting started with digital audio and/or video editing and production, these could be really useful.
A quick post to point you towards a great website with a lot of really cool content (if you’re into that kind of thing, which if you’re reading this blog, then I assume you probably are… anyway, I digress; I apologise, it was my lab’s Christmas party last night and I’m in a somewhat rambling mood. Anyway, back to the point).
So, the website is called cogsci.nl, and is run by a post-doc at the University of Aix-Marseille called Sebastiaan Mathôt. It’s notable in that it’s the homepage of OpenSesame – a very nice-looking, Python-based graphical experiment builder that I’ve mentioned before on these very pages. There’s a lot of other cool stuff on the site though, including more software (featuring a really cool online tool for instantly creating Gabor patch stimuli), a list of links to stimulus sets, and a selection of really-cool optical illusions. Really worth spending 20 minutes of your time poking around a little and seeing what’s there.
I’ll leave you with a video of Sebastiaan demonstrating an experimental program, written in his OpenSesame system, running on a Google Nexus 7 Tablet (using Ubuntu linux as an OS). The future! It’s here!
Reaction time tasks have been a mainstay of psychology since the technology to accurately time and record such responses became widely available in the 70s. RT tasks have been applied in a bewildering array of research areas and (when used properly) can provide information about memory, attention, emotion and even social behaviour.
This post will focus on the best way to handle such data, which is perhaps not as straightforward as might be assumed. Despite the title, I’m not really going to cover the actual analysis; there’s a lot of literature already out there about what particular statistical tests to use, and in any case, general advice of that kind is not much use as it depends largely on your experimental design. What I’m intending to focus on are the techniques the stats books don’t normally cover – data cleaning, formatting and transformation techniques which are essential to know about if you’re going to get the best out of your data-set.
For the purposes of this discussion I’ll use a simple made-up data-set, like this:
This table is formatted in the way that a lot of common psychology software (i.e. PsychoPy, Inquisit, E-Prime) records response data. From left-to-right, you can see we have three participants’ data here (1, 2, and 3 in column A), four trials for each subject (column B), two experimental conditions (C; presented in a random order), and then the actual reaction times (column D) and then a final column which codes whether the response was correct or not (1=correct, 0= error).
I created the data table using Microsoft Excel, and will do the processing with it too, however I really want to stress that Excel is definitely not the best way of doing this. It suits the present purpose because I’m doing this ‘by hand’ for the purposes of illustration. With a real data-set which might be thousands of lines long, these procedures would be much more easily accomplished by using the functions in your statistics weapon-of-choice (SPSS, R, Matlab, whatever). Needless to say, if you regularly have to deal with RT data it’s well worth putting the time into writing some general-purpose code which can be tweaked and re-used for subsequent data sets.
The procedures we’re going to follow with these data are:
- Remove reaction times on error trials
- Do some basic data-cleaning (removal of outlying data)
- Re-format the data for analysis
1. Remove reaction times on error trials
As a general rule, reaction times from trials on which the participant made an error should not be used in subsequent analysis. The exceptions to this rule are some particular tasks where the error trials might be of particular interest (Go/No-Go tasks, and some others). Generally though, RTs from error trials are thought to be unreliable, since there’s an additional component process operating on error trials (i.e. whatever it was that produced the error). The easiest way of accomplishing this is to insert an additional column, and code all trials with errors as ‘0’, and all trials without an error as the original reaction time. This can be a simple IF/ELSE statement of the form:
IF (error=1) RT=RT,
In this excel-based illustration I entered the formula: =IF(E2=1, D2,0) in cell F2, and then copied it down the rest of the column to apply to all the subsequent rows. Here’s the new data sheet:
2. Data-cleaning – Removal of outlying data
The whole topic of removing outliers from reaction time data is a fairly involved one, and difficult to illustrate with the simple example I’m using here. However, It’s a very important procedure, and something I’m going to return to in a later post, using a ‘real’ data-set. From a theoretical perspective, it’s usually desirable to remove both short and long outliers. Most people cannot push a button in response to, say, a visual stimulus in less than about 300ms, so it can be safely assumed that short RTs of, say, less than 250ms were probably initiated before the stimulus; that is, they were anticipatory. Long outliers are somewhat trickier conceptually – some tasks that involve a lot of effortful cognitive processing before a response (say a task involving doing difficult arithmetic) might have reaction times of several seconds, or even longer. However, very broadly, the mean RT for most ‘simple’ tasks tends to be around 400-700ms; this means that RTs longer than say, 1000ms might reflect some other kind of process. For instance, it might reflect the fact that the participant was bored, became distracted, temporarily forgot which button to push, etc. For these reasons then, it’s generally thought to be desirable to remove outlying reaction times from further analysis.
One (fairly simple-minded, but definitely valid) approach to removing outliers then, is to simply remove all values that fall below 250ms, or above 1000ms. This is what I’ve done in the example data-sheet in columns G and H, using simple IF statements of a similar form used for removal of the error trials:
You can see that two short RTs and one long one have been removed and recoded as 0.
3. Re-format the data for analysis
The structure that most psychology experimental systems use for their data logging (similar to the one we’ve been using as an illustration) is not really appropriate for direct import into standard stats packages like SPSS. SPSS requires that one row on the data sheet is used for each participant, whereas we have one row-per-trial. In order to get our data in the right format we first need to sort the data, first by subject (column A), and then by condition (column C). Doing this sort procedure ensures that we know which entries in the final column are which – the first two rows of each subject’s data are always condition 1, and the second two are always condition 2:
We can then restructure the data from the final column, like so:
I’ve done this ‘by hand’ in Excel by cutting-and-pasting the values for each subject into a new sheet and using the paste-special > transpose function, however this is a stupid way of doing it – the ‘restructure’ functions in SPSS can accomplish this kind of thing very nicely. So, our condition 1 values are now in columns B:C and condition 2 values are in columns D:E. All that remains to do now would be to calculate summary statistics (means, variance, standard deviations, whatever; taking care that our 0 values are coded as missing, and not included in the calculations) for each set of columns (i.e. each condition) and perform the inferential tests of your choice (in this case, with only two within-subject conditions, it would be a paired t-test).
Next time, I’ll use a set of real reaction time data and do these procedures (and others) using SPSS, in order to illustrate some more sophisticated ways of handling outliers than just the simple high and low cutoffs detailed above.