Very exciting news here: I’ve just been invited to the first British Psychological Society (Maths, Statistics and Computing Section) Psychology open textbook hackathon!
Inspired by this event (where people got together and wrote an open-source maths textbook in a weekend) the day aims to raise awareness and skills, as well as perhaps produce some usable output.
The organisers are Thom Baguley of Nottingham Trent University (and the Serious Stats blog and book) and Sol Nte of Manchester University. They’ve very kindly invited me as a guest, so I’ll be hanging out and learning some new tricks myself, I’m sure.
Here’s the flyer for the event, with sign-up details etc. It’s free, but strictly limited to 20 places – if you’re keen, best be quick… (click the pic below for a bigger version):
Reaction time tasks have been a mainstay of psychology since the technology to accurately time and record such responses became widely available in the 70s. RT tasks have been applied in a bewildering array of research areas and (when used properly) can provide information about memory, attention, emotion and even social behaviour.
This post will focus on the best way to handle such data, which is perhaps not as straightforward as might be assumed. Despite the title, I’m not really going to cover the actual analysis; there’s a lot of literature already out there about what particular statistical tests to use, and in any case, general advice of that kind is not much use as it depends largely on your experimental design. What I’m intending to focus on are the techniques the stats books don’t normally cover – data cleaning, formatting and transformation techniques which are essential to know about if you’re going to get the best out of your data-set.
For the purposes of this discussion I’ll use a simple made-up data-set, like this:
This table is formatted in the way that a lot of common psychology software (i.e. PsychoPy, Inquisit, E-Prime) records response data. From left-to-right, you can see we have three participants’ data here (1, 2, and 3 in column A), four trials for each subject (column B), two experimental conditions (C; presented in a random order), and then the actual reaction times (column D) and then a final column which codes whether the response was correct or not (1=correct, 0= error).
I created the data table using Microsoft Excel, and will do the processing with it too, however I really want to stress that Excel is definitely not the best way of doing this. It suits the present purpose because I’m doing this ‘by hand’ for the purposes of illustration. With a real data-set which might be thousands of lines long, these procedures would be much more easily accomplished by using the functions in your statistics weapon-of-choice (SPSS, R, Matlab, whatever). Needless to say, if you regularly have to deal with RT data it’s well worth putting the time into writing some general-purpose code which can be tweaked and re-used for subsequent data sets.
The procedures we’re going to follow with these data are:
- Remove reaction times on error trials
- Do some basic data-cleaning (removal of outlying data)
- Re-format the data for analysis
1. Remove reaction times on error trials
As a general rule, reaction times from trials on which the participant made an error should not be used in subsequent analysis. The exceptions to this rule are some particular tasks where the error trials might be of particular interest (Go/No-Go tasks, and some others). Generally though, RTs from error trials are thought to be unreliable, since there’s an additional component process operating on error trials (i.e. whatever it was that produced the error). The easiest way of accomplishing this is to insert an additional column, and code all trials with errors as ‘0’, and all trials without an error as the original reaction time. This can be a simple IF/ELSE statement of the form:
IF (error=1) RT=RT,
In this excel-based illustration I entered the formula: =IF(E2=1, D2,0) in cell F2, and then copied it down the rest of the column to apply to all the subsequent rows. Here’s the new data sheet:
2. Data-cleaning – Removal of outlying data
The whole topic of removing outliers from reaction time data is a fairly involved one, and difficult to illustrate with the simple example I’m using here. However, It’s a very important procedure, and something I’m going to return to in a later post, using a ‘real’ data-set. From a theoretical perspective, it’s usually desirable to remove both short and long outliers. Most people cannot push a button in response to, say, a visual stimulus in less than about 300ms, so it can be safely assumed that short RTs of, say, less than 250ms were probably initiated before the stimulus; that is, they were anticipatory. Long outliers are somewhat trickier conceptually – some tasks that involve a lot of effortful cognitive processing before a response (say a task involving doing difficult arithmetic) might have reaction times of several seconds, or even longer. However, very broadly, the mean RT for most ‘simple’ tasks tends to be around 400-700ms; this means that RTs longer than say, 1000ms might reflect some other kind of process. For instance, it might reflect the fact that the participant was bored, became distracted, temporarily forgot which button to push, etc. For these reasons then, it’s generally thought to be desirable to remove outlying reaction times from further analysis.
One (fairly simple-minded, but definitely valid) approach to removing outliers then, is to simply remove all values that fall below 250ms, or above 1000ms. This is what I’ve done in the example data-sheet in columns G and H, using simple IF statements of a similar form used for removal of the error trials:
You can see that two short RTs and one long one have been removed and recoded as 0.
3. Re-format the data for analysis
The structure that most psychology experimental systems use for their data logging (similar to the one we’ve been using as an illustration) is not really appropriate for direct import into standard stats packages like SPSS. SPSS requires that one row on the data sheet is used for each participant, whereas we have one row-per-trial. In order to get our data in the right format we first need to sort the data, first by subject (column A), and then by condition (column C). Doing this sort procedure ensures that we know which entries in the final column are which – the first two rows of each subject’s data are always condition 1, and the second two are always condition 2:
We can then restructure the data from the final column, like so:
I’ve done this ‘by hand’ in Excel by cutting-and-pasting the values for each subject into a new sheet and using the paste-special > transpose function, however this is a stupid way of doing it – the ‘restructure’ functions in SPSS can accomplish this kind of thing very nicely. So, our condition 1 values are now in columns B:C and condition 2 values are in columns D:E. All that remains to do now would be to calculate summary statistics (means, variance, standard deviations, whatever; taking care that our 0 values are coded as missing, and not included in the calculations) for each set of columns (i.e. each condition) and perform the inferential tests of your choice (in this case, with only two within-subject conditions, it would be a paired t-test).
Next time, I’ll use a set of real reaction time data and do these procedures (and others) using SPSS, in order to illustrate some more sophisticated ways of handling outliers than just the simple high and low cutoffs detailed above.
I got a serious question for you: What the fuck are you doing? This is not shit for you to be messin’ with. Are you ready to hear something? I want you to see if this sounds familiar: any time you try a decent crime, you got fifty ways you’re gonna fuck up. If you think of twenty-five of them, then you’re a genius… and you ain’t no genius.
Body Heat (1981, Lawrence Kasdan)
To consult the statistician after an experiment is finished is often merely to ask him to conduct a post-mortem examination. He can perhaps say what the experiment died of.
R.A. Fisher (1938)
Doing a pilot run of a new psychology experiment is vital. No matter how well you think you’ve designed and programmed your task, there are (almost) always things that you didn’t think of. Going ahead and spending a lot of time and effort collecting a set of data without running a proper pilot is (potentially) a recipe for disaster. Several times I’ve seen data-sets where there was some subtle issue with the data logging, or the counter-balancing, or something else, which meant that the results were, at best, compromised, and at worst completely useless.
All of the resultant suffering, agony, and sobbing could have been avoided by running a pilot study in the right way. It’s not sufficient to run through the experimental program a couple of times; a comprehensive test of an experiment has to include a test of the analysis as well. This is particularly true of any experiment involving methods like fMRI/MEG/EEG where a poor design can lead to a data-set that’s essentially uninterpretable, or perhaps even un-analysable. You may think you’ve logged all the data variables you think you’ll need for the analysis, and your design is a work of art, but you can’t be absolutely sure unless you actually do a test of the analysis.
This might seem like over-kill, or a waste of effort, however, you’re going to have to design your analysis at some point anyway, so why not do it at the beginning? Analyse your pilot data in exactly the way you’re planning on analysing your main data, save the details (using SPSS syntax, R code, SPM batch jobs – or whatever you’re using) and when you have your ‘proper’ data set, all you’ll (in theory) have to do is plug it in to your existing analysis setup.
These are the steps I normally go through when getting a new experiment up and running. Not all will be appropriate for all experiments, your mileage may vary etc. etc.
1. Test the stimulus program. Run through it a couple of times yourself, and get a friend/colleague to do it once too, and ask for feedback. Make sure it looks like it’s doing what you think it should be doing.
2. Check the timing of the stimulus program. This is almost always essential for a fMRI experiment, but may not be desperately important for some kinds of behavioural studies. Run through it with a stopwatch (the stopwatch function on your ‘phone is probably accurate enough). If you’re doing any kind of experiment involving rapid presentation of stimuli (visual masking, RSVP paradigms) you’ll want to do some more extensive testing to make sure your stimuli are being presented in the way that you think – this might involve plugging a light-sensitive diode into an oscilloscope, sticking it to your monitor with a bit of blu-tack and measuring the waveforms produced by your stimuli. For fMRI experiments the timing is critical. Even though the Haemodynamic Response Function (HRF) is slow (and somewhat variable) you’re almost always fighting to pull enough signal out of the noise, so why introduce more? A cumulative error of only a few tens of milliseconds per trial can mean that your experiment is a few seconds out by the end of a 10 minute scan – this means that your model regressors will be way off – and your results will likely suck.*
3. Look at the behavioural data files. I don’t mean do the analysis (yet), I mean just look at the data. First make sure all the variables you want logged are actually there, then dump it into Excel and get busy with the sort function. For instance, if you have 40 trials and 20 stimuli (each presented twice) make sure that each one really is being presented twice, and not some of them once, and some of them three times; sorting by the stimulus ID should make it instantly clear what’s going on. Make sure the correct responses and any errors are being logged correctly. Make sure the counter-balancing is working correctly by sorting on appropriate variables.
4. Do the analysis. Really do it. You’re obviously not looking for any significant results from the data, you’re just trying to validate your analysis pipeline and make sure you have all the information you need to do the stats. For fMRI experiments – look at your design matrix to see that it makes sense and that you’re not getting warnings about non-orthogonality of the regressors from the software. For fMRI data using visual stimuli, you could look at some basic effects (i.e. all stimuli > baseline) to make sure you get activity in the visual cortex. Button-pushing responses should also be visible as activity in the motor cortex in a single subject too – these kinds of sanity checks can be a good indicator of data quality. If you really want to be punctilious, bang it through a quick ICA routine and see if you get a) component(s) that look stimulus-related, b) something that looks like the default-mode network, and c) any suspiciously nasty-looking noise components (a and b = good, c = bad, obviously).
5. After all that, the rest is easy. Collect your proper set of data, analyse it using the routines you developed in point 4. above, write it up, and then send it to Nature.
And that, ladeez and gennulmen, is how to do it. Doing a proper pilot can only save you time and stress in the long run, and you can go ahead with your experiment in the certain knowledge that you’ve done everything in your power to make sure your data is as good as it can possibly be. Of course, it still might be total and utter crap, but that’ll probably be your participants’ fault, not yours.
Happy piloting! TTFN.
*Making sure your responses are being logged with a reasonable level of accuracy is also pretty important for many experiments, although this is a little harder to objectively verify. Hopefully if you’re using some reasonably well-validated piece of software and decent response device you shouldn’t have too many problems.
A quickie just to point you to an outstanding post by the never-less-than-excellent Dorothy Bishop (or @deevybee, and yes, you should be following her on twitter) on basic data simulation using Excel. Some brilliant and quite creative ways to use Excel for demos of simple statistical concepts – will definitely be incorporating this into some of my presentations/teaching on stats.
Statistics – the very word is guaranteed to bring a shudder of terror to the average undergraduate, and even full-grown lecturers have been known to quake in fear before its awesome power. Most psychology undergraduates don’t come from a hard-science or mathematics background, and statistics are probably the number one thing that they struggle with during their psychology courses. Personally, I got through my undergraduate stats exams with a mixture of vague understanding and rote memorisation, and it was only during my PhD that I actually started learning how to do things properly and, more importantly, actually understanding what I was doing, and why.
This is not the place to give any detailed information on the basics of statistics. That kind of material has been covered many, many times before by people infinitely more qualified than I. For that kind of stuff, a good place to start would be Andy Field’s book, available here. Andy explains things very clearly and is actually a very nice chap as well. What I’d like to do instead is do a quick run-down of popular stats software, and point out some resources which can help if you run into trouble. Read the rest of this entry