I’ve mentioned OpenSesame briefly on here before, but for those of you who weren’t keeping up, it’s a pretty awesome, free psychology experiment-developing application, built using the Python programming language, and it has a lot in common with PsychoPy (which is also awesome).
The recently-released new version of OpenSesame has just taken an important step, in that it now supports the Android mobile operating system, meaning that it can run natively on Android tablets and smartphones. As far as I’m aware, this is the first time that a psychology-experimental application has been compiled (and released to the masses) for a mobile OS.
This is cool for lots of reasons. It’s an interesting technical achievement; Android is a very different implementation to a desktop OS, being focused heavily on touch interfaces. Such interfaces are now ubiquitous, and are much more accessible, in the sense that people who may struggle with a traditional mouse/keyboard can use them relatively easily. Running psychology experiments on touch-tablets may enable the study of populations (e.g., the very young, very old, or various patient groups) that would be very difficult with a more ‘traditional’ system. Similarly, conducting ‘field’ studies might be much more effective; I can imagine handing a participant a tablet for them to complete some kind of task in the street, or in a shopping mall, for instance. Also, it may open up the possibility of using the variety of sensors in modern mobile devices (light, proximity, accelerometers, magnetometers) in interesting and creative ways. Finally, the hardware is relatively cheap, and (of course) portable.
I’m itching to try this out, but unfortunately don’t have an Android tablet. I love my iPad mini for lots of reasons, but the more restricted nature of Apple’s OS means that it’s unlikely we’ll see a similar system on iOS anytime soon.
So, very exciting times. Here’s a brief demo video of OpenSesame running on a Google Nexus 7 tablet (in the demo the tablet is actually running a version of Ubuntu Linux, but with the new version of OpenSesame it shouldn’t be necessary to replace the Android OS). Let me know in the comments if you have any experience with tablet-experiments, or if you can think of any other creative ways they could be used.
Reaction time tasks have been a mainstay of psychology since the technology to accurately time and record such responses became widely available in the 70s. RT tasks have been applied in a bewildering array of research areas and (when used properly) can provide information about memory, attention, emotion and even social behaviour.
This post will focus on the best way to handle such data, which is perhaps not as straightforward as might be assumed. Despite the title, I’m not really going to cover the actual analysis; there’s a lot of literature already out there about what particular statistical tests to use, and in any case, general advice of that kind is not much use as it depends largely on your experimental design. What I’m intending to focus on are the techniques the stats books don’t normally cover – data cleaning, formatting and transformation techniques which are essential to know about if you’re going to get the best out of your data-set.
For the purposes of this discussion I’ll use a simple made-up data-set, like this:
This table is formatted in the way that a lot of common psychology software (i.e. PsychoPy, Inquisit, E-Prime) records response data. From left-to-right, you can see we have three participants’ data here (1, 2, and 3 in column A), four trials for each subject (column B), two experimental conditions (C; presented in a random order), and then the actual reaction times (column D) and then a final column which codes whether the response was correct or not (1=correct, 0= error).
I created the data table using Microsoft Excel, and will do the processing with it too, however I really want to stress that Excel is definitely not the best way of doing this. It suits the present purpose because I’m doing this ‘by hand’ for the purposes of illustration. With a real data-set which might be thousands of lines long, these procedures would be much more easily accomplished by using the functions in your statistics weapon-of-choice (SPSS, R, Matlab, whatever). Needless to say, if you regularly have to deal with RT data it’s well worth putting the time into writing some general-purpose code which can be tweaked and re-used for subsequent data sets.
The procedures we’re going to follow with these data are:
- Remove reaction times on error trials
- Do some basic data-cleaning (removal of outlying data)
- Re-format the data for analysis
1. Remove reaction times on error trials
As a general rule, reaction times from trials on which the participant made an error should not be used in subsequent analysis. The exceptions to this rule are some particular tasks where the error trials might be of particular interest (Go/No-Go tasks, and some others). Generally though, RTs from error trials are thought to be unreliable, since there’s an additional component process operating on error trials (i.e. whatever it was that produced the error). The easiest way of accomplishing this is to insert an additional column, and code all trials with errors as ‘0’, and all trials without an error as the original reaction time. This can be a simple IF/ELSE statement of the form:
IF (error=1) RT=RT,
In this excel-based illustration I entered the formula: =IF(E2=1, D2,0) in cell F2, and then copied it down the rest of the column to apply to all the subsequent rows. Here’s the new data sheet:
2. Data-cleaning – Removal of outlying data
The whole topic of removing outliers from reaction time data is a fairly involved one, and difficult to illustrate with the simple example I’m using here. However, It’s a very important procedure, and something I’m going to return to in a later post, using a ‘real’ data-set. From a theoretical perspective, it’s usually desirable to remove both short and long outliers. Most people cannot push a button in response to, say, a visual stimulus in less than about 300ms, so it can be safely assumed that short RTs of, say, less than 250ms were probably initiated before the stimulus; that is, they were anticipatory. Long outliers are somewhat trickier conceptually – some tasks that involve a lot of effortful cognitive processing before a response (say a task involving doing difficult arithmetic) might have reaction times of several seconds, or even longer. However, very broadly, the mean RT for most ‘simple’ tasks tends to be around 400-700ms; this means that RTs longer than say, 1000ms might reflect some other kind of process. For instance, it might reflect the fact that the participant was bored, became distracted, temporarily forgot which button to push, etc. For these reasons then, it’s generally thought to be desirable to remove outlying reaction times from further analysis.
One (fairly simple-minded, but definitely valid) approach to removing outliers then, is to simply remove all values that fall below 250ms, or above 1000ms. This is what I’ve done in the example data-sheet in columns G and H, using simple IF statements of a similar form used for removal of the error trials:
You can see that two short RTs and one long one have been removed and recoded as 0.
3. Re-format the data for analysis
The structure that most psychology experimental systems use for their data logging (similar to the one we’ve been using as an illustration) is not really appropriate for direct import into standard stats packages like SPSS. SPSS requires that one row on the data sheet is used for each participant, whereas we have one row-per-trial. In order to get our data in the right format we first need to sort the data, first by subject (column A), and then by condition (column C). Doing this sort procedure ensures that we know which entries in the final column are which – the first two rows of each subject’s data are always condition 1, and the second two are always condition 2:
We can then restructure the data from the final column, like so:
I’ve done this ‘by hand’ in Excel by cutting-and-pasting the values for each subject into a new sheet and using the paste-special > transpose function, however this is a stupid way of doing it – the ‘restructure’ functions in SPSS can accomplish this kind of thing very nicely. So, our condition 1 values are now in columns B:C and condition 2 values are in columns D:E. All that remains to do now would be to calculate summary statistics (means, variance, standard deviations, whatever; taking care that our 0 values are coded as missing, and not included in the calculations) for each set of columns (i.e. each condition) and perform the inferential tests of your choice (in this case, with only two within-subject conditions, it would be a paired t-test).
Next time, I’ll use a set of real reaction time data and do these procedures (and others) using SPSS, in order to illustrate some more sophisticated ways of handling outliers than just the simple high and low cutoffs detailed above.
I got a serious question for you: What the fuck are you doing? This is not shit for you to be messin’ with. Are you ready to hear something? I want you to see if this sounds familiar: any time you try a decent crime, you got fifty ways you’re gonna fuck up. If you think of twenty-five of them, then you’re a genius… and you ain’t no genius.
Body Heat (1981, Lawrence Kasdan)
To consult the statistician after an experiment is finished is often merely to ask him to conduct a post-mortem examination. He can perhaps say what the experiment died of.
R.A. Fisher (1938)
Doing a pilot run of a new psychology experiment is vital. No matter how well you think you’ve designed and programmed your task, there are (almost) always things that you didn’t think of. Going ahead and spending a lot of time and effort collecting a set of data without running a proper pilot is (potentially) a recipe for disaster. Several times I’ve seen data-sets where there was some subtle issue with the data logging, or the counter-balancing, or something else, which meant that the results were, at best, compromised, and at worst completely useless.
All of the resultant suffering, agony, and sobbing could have been avoided by running a pilot study in the right way. It’s not sufficient to run through the experimental program a couple of times; a comprehensive test of an experiment has to include a test of the analysis as well. This is particularly true of any experiment involving methods like fMRI/MEG/EEG where a poor design can lead to a data-set that’s essentially uninterpretable, or perhaps even un-analysable. You may think you’ve logged all the data variables you think you’ll need for the analysis, and your design is a work of art, but you can’t be absolutely sure unless you actually do a test of the analysis.
This might seem like over-kill, or a waste of effort, however, you’re going to have to design your analysis at some point anyway, so why not do it at the beginning? Analyse your pilot data in exactly the way you’re planning on analysing your main data, save the details (using SPSS syntax, R code, SPM batch jobs – or whatever you’re using) and when you have your ‘proper’ data set, all you’ll (in theory) have to do is plug it in to your existing analysis setup.
These are the steps I normally go through when getting a new experiment up and running. Not all will be appropriate for all experiments, your mileage may vary etc. etc.
1. Test the stimulus program. Run through it a couple of times yourself, and get a friend/colleague to do it once too, and ask for feedback. Make sure it looks like it’s doing what you think it should be doing.
2. Check the timing of the stimulus program. This is almost always essential for a fMRI experiment, but may not be desperately important for some kinds of behavioural studies. Run through it with a stopwatch (the stopwatch function on your ‘phone is probably accurate enough). If you’re doing any kind of experiment involving rapid presentation of stimuli (visual masking, RSVP paradigms) you’ll want to do some more extensive testing to make sure your stimuli are being presented in the way that you think – this might involve plugging a light-sensitive diode into an oscilloscope, sticking it to your monitor with a bit of blu-tack and measuring the waveforms produced by your stimuli. For fMRI experiments the timing is critical. Even though the Haemodynamic Response Function (HRF) is slow (and somewhat variable) you’re almost always fighting to pull enough signal out of the noise, so why introduce more? A cumulative error of only a few tens of milliseconds per trial can mean that your experiment is a few seconds out by the end of a 10 minute scan – this means that your model regressors will be way off – and your results will likely suck.*
3. Look at the behavioural data files. I don’t mean do the analysis (yet), I mean just look at the data. First make sure all the variables you want logged are actually there, then dump it into Excel and get busy with the sort function. For instance, if you have 40 trials and 20 stimuli (each presented twice) make sure that each one really is being presented twice, and not some of them once, and some of them three times; sorting by the stimulus ID should make it instantly clear what’s going on. Make sure the correct responses and any errors are being logged correctly. Make sure the counter-balancing is working correctly by sorting on appropriate variables.
4. Do the analysis. Really do it. You’re obviously not looking for any significant results from the data, you’re just trying to validate your analysis pipeline and make sure you have all the information you need to do the stats. For fMRI experiments – look at your design matrix to see that it makes sense and that you’re not getting warnings about non-orthogonality of the regressors from the software. For fMRI data using visual stimuli, you could look at some basic effects (i.e. all stimuli > baseline) to make sure you get activity in the visual cortex. Button-pushing responses should also be visible as activity in the motor cortex in a single subject too – these kinds of sanity checks can be a good indicator of data quality. If you really want to be punctilious, bang it through a quick ICA routine and see if you get a) component(s) that look stimulus-related, b) something that looks like the default-mode network, and c) any suspiciously nasty-looking noise components (a and b = good, c = bad, obviously).
5. After all that, the rest is easy. Collect your proper set of data, analyse it using the routines you developed in point 4. above, write it up, and then send it to Nature.
And that, ladeez and gennulmen, is how to do it. Doing a proper pilot can only save you time and stress in the long run, and you can go ahead with your experiment in the certain knowledge that you’ve done everything in your power to make sure your data is as good as it can possibly be. Of course, it still might be total and utter crap, but that’ll probably be your participants’ fault, not yours.
Happy piloting! TTFN.
*Making sure your responses are being logged with a reasonable level of accuracy is also pretty important for many experiments, although this is a little harder to objectively verify. Hopefully if you’re using some reasonably well-validated piece of software and decent response device you shouldn’t have too many problems.
Another quickie post (it’s been ages since I’ve written anything substantive I know, bear with me just a little while longer…) with some links-of-interest for you.
First up is Open Sesame – this is an experiment-builder application with a nice graphical front-end, which also supports scripting in Python – nice. Looks like a possible alternative to PsychoPy with a fair few similar features. Also, it’s cross-platform, open-source and free – my three favourite things!
Next up is Inkscape – this is a free vector graphics editor (or drawing package), with similar features to Adobe Illustrator or Corel Draw. I tend to use Adobe Illustrator for a few specialised tasks, such as making posters for conferences, and this looks like a potentially really good free alternative.
Neuroimaging Made Easy is a blog I found a while ago that I’ve been meaning to share; it’s mostly a collection of tips and downloadable scripts to accomplish fairly specific tasks. They’re all pretty much optimised for Mac users (using AppleScript) and people who use BrainVoyager or FSL for their neuroimaging – SPM users are likely to be disappointed here (but they’re pretty used to that anyway, right?! Heh…). Really worth digging through the previous posts if you fall in the right segments of that Venn diagram though – I’ve been using a couple of their scripts for a while now.
Penultimately, I thought this recent article on Mind Hacks was really terrific – titled: “Psychological self-defence for the age of email”. It covers several relevant psychological principles and shows how they can be used to better cope with the onslaught of e-mail that many of us are often buried under.
Lastly, I hope you’ll pardon a modicum of self-promotion, but I recently did an interview over Skype with the lovely Ben Thomas of http://the-connectome.com/. Unfortunately the skype connection between London and Los Angeles was less than perfect which meant he couldn’t put it up as a podcast, but he heroically transcribed it instead – if you are so inclined, you can read it here.
I’ve previously written about the importance of response hardware for doing timing-accurate experiments – in a nutshell, anything you connect via USB is likely to be sub-optimal,* because the operating system only polls (or looks for an input) on the USB port some of the time – in Windows the USB polling rate is 125Hz, or every 8ms.
So, for accurate timing of responses, we want to use something other than the USB port. There are various options – my personal favourite is to use the 25-pin parallel (or ‘printer’) port. Some newer desktop models unfortunately don’t have parallel ports anymore, as they’re largely obsolete for connecting peripherals, however any model older than two or three years should have one – and you generally don’t need a super-fast, up-to-date computer for running stimulus programs and collecting data.
I needed a couple of response boxes for a project recently, and decided to just make them up myself. I came across this fantastic little paper (PDF) which describes a simple method of taking apart a couple of standard computer mice, and rewiring the switches into a parallel port plug – this gives you up to six buttons. The circuit diagram is really, really simple:
It’s just the switches, and a 100-ohm resistor for each one, wired up to different pins on the data register of the port, with a common ground (pin 18). Honestly, if you were being lazy, you could even just forget about the resistors and it would still work fine. I decided not to take apart any mice, but just to use some buttons I bought off-the-shelf, as I only needed two for each box. Getting the right buttons is really important for this kind of thing – you want them to be a decent size, and have a good clicky-action, without being too difficult to depress. I also got some small plastic boxes, some multi-core cable (I used standard network cable as it’s quite stiff and robust, but almost anything will do), and some parallel port plugs. You can buy everything you need from Maplin or Radio Spares (Radio Shack, if you’re in the US) for about £10-15. I just drilled some holes in the boxes fairly roughly and secured the buttons there with a dab of epoxy resin, but you can get as fancy as you want in that respect.
The only really tricky bit is deciding which pins on the parallel port you want to wire your switches up to. This will largely be determined by which pins the software you’re going to use can read from. Psychology software like Inquisit or E-prime is able to read inputs from pins 2-9 on the data register (see below diagram) but it’s worth doing a bit of reading about the different pins on the parallel port and what they’re used for. A good place to start is here. Probably what you want to do is use one of the data pins for one pole of each switch, and wire the other pole to a common ground pin, as in the above diagram.
So there you have it – the most simple, inexpensive and accurate solution for recording response times in cognitive experiments. If you’re at all handy with a soldering iron you can probably knock up a couple of these in half an hour or so. If you’ve never done any electronics or soldering before, then this would be an ideal first project to get started with! This was my finished article:
Nice, huh? Happy soldering! TTFN.
*Not quite true – some of the expensive button-boxes you can buy from psychology software companies are USB, but have their own electronics inside them to get around this and time things accurately.
We have the tools, we have the talent!
– Winston Zeddemore
In the last post I talked about timing of experiments in general, and mentioned that timing of responses is critical for success in a reaction time-type experiment. This topic will discuss some of the options for hardware that’s available for collecting responses.
The simplest solution is just to use the human-interface hardware which you’ve already got i.e. get your participants to click a mouse button, or a key on the keyboard as their response. However, there are several reasons why this will generally be undesirable. Most mice and keyboards on modern computers connect via USB, and this introduces a slight lag. The computer ‘looks at’ or ‘polls’ its USB-connected peripherals at a standard rate of 125Hz (meaning 125 times per second, or every 8 milliseconds). This means that if you make a response, there may be a (variable) lag of anywhere between 0 and 8 ms between the response and the computer actually ‘seeing’ it. USB-keyboards have a similar problem. In addition, mice (and keyboards) have a lot of internal circuitry which can also introduce timing lags of variable durations. This paper (Plant et al., 2003) presents the results of some bench-testing of a set of mice. The best mouse they tested had a minimum button-down latency of just 0.4ms, whereas the worst one showed a minimum lag of 48ms! Running timing-critical experiments with the latter mouse (where the effect you might be chasing could be in the range of 20-30ms) would clearly be disastrous. Read the rest of this entry