Monthly Archives: December 2012
A quick post to point you towards a great website with a lot of really cool content (if you’re into that kind of thing, which if you’re reading this blog, then I assume you probably are… anyway, I digress; I apologise, it was my lab’s Christmas party last night and I’m in a somewhat rambling mood. Anyway, back to the point).
So, the website is called cogsci.nl, and is run by a post-doc at the University of Aix-Marseille called Sebastiaan Mathôt. It’s notable in that it’s the homepage of OpenSesame – a very nice-looking, Python-based graphical experiment builder that I’ve mentioned before on these very pages. There’s a lot of other cool stuff on the site though, including more software (featuring a really cool online tool for instantly creating Gabor patch stimuli), a list of links to stimulus sets, and a selection of really-cool optical illusions. Really worth spending 20 minutes of your time poking around a little and seeing what’s there.
I’ll leave you with a video of Sebastiaan demonstrating an experimental program, written in his OpenSesame system, running on a Google Nexus 7 Tablet (using Ubuntu linux as an OS). The future! It’s here!
Reaction time tasks have been a mainstay of psychology since the technology to accurately time and record such responses became widely available in the 70s. RT tasks have been applied in a bewildering array of research areas and (when used properly) can provide information about memory, attention, emotion and even social behaviour.
This post will focus on the best way to handle such data, which is perhaps not as straightforward as might be assumed. Despite the title, I’m not really going to cover the actual analysis; there’s a lot of literature already out there about what particular statistical tests to use, and in any case, general advice of that kind is not much use as it depends largely on your experimental design. What I’m intending to focus on are the techniques the stats books don’t normally cover – data cleaning, formatting and transformation techniques which are essential to know about if you’re going to get the best out of your data-set.
For the purposes of this discussion I’ll use a simple made-up data-set, like this:
This table is formatted in the way that a lot of common psychology software (i.e. PsychoPy, Inquisit, E-Prime) records response data. From left-to-right, you can see we have three participants’ data here (1, 2, and 3 in column A), four trials for each subject (column B), two experimental conditions (C; presented in a random order), and then the actual reaction times (column D) and then a final column which codes whether the response was correct or not (1=correct, 0= error).
I created the data table using Microsoft Excel, and will do the processing with it too, however I really want to stress that Excel is definitely not the best way of doing this. It suits the present purpose because I’m doing this ‘by hand’ for the purposes of illustration. With a real data-set which might be thousands of lines long, these procedures would be much more easily accomplished by using the functions in your statistics weapon-of-choice (SPSS, R, Matlab, whatever). Needless to say, if you regularly have to deal with RT data it’s well worth putting the time into writing some general-purpose code which can be tweaked and re-used for subsequent data sets.
The procedures we’re going to follow with these data are:
- Remove reaction times on error trials
- Do some basic data-cleaning (removal of outlying data)
- Re-format the data for analysis
1. Remove reaction times on error trials
As a general rule, reaction times from trials on which the participant made an error should not be used in subsequent analysis. The exceptions to this rule are some particular tasks where the error trials might be of particular interest (Go/No-Go tasks, and some others). Generally though, RTs from error trials are thought to be unreliable, since there’s an additional component process operating on error trials (i.e. whatever it was that produced the error). The easiest way of accomplishing this is to insert an additional column, and code all trials with errors as ‘0’, and all trials without an error as the original reaction time. This can be a simple IF/ELSE statement of the form:
IF (error=1) RT=RT,
In this excel-based illustration I entered the formula: =IF(E2=1, D2,0) in cell F2, and then copied it down the rest of the column to apply to all the subsequent rows. Here’s the new data sheet:
2. Data-cleaning – Removal of outlying data
The whole topic of removing outliers from reaction time data is a fairly involved one, and difficult to illustrate with the simple example I’m using here. However, It’s a very important procedure, and something I’m going to return to in a later post, using a ‘real’ data-set. From a theoretical perspective, it’s usually desirable to remove both short and long outliers. Most people cannot push a button in response to, say, a visual stimulus in less than about 300ms, so it can be safely assumed that short RTs of, say, less than 250ms were probably initiated before the stimulus; that is, they were anticipatory. Long outliers are somewhat trickier conceptually – some tasks that involve a lot of effortful cognitive processing before a response (say a task involving doing difficult arithmetic) might have reaction times of several seconds, or even longer. However, very broadly, the mean RT for most ‘simple’ tasks tends to be around 400-700ms; this means that RTs longer than say, 1000ms might reflect some other kind of process. For instance, it might reflect the fact that the participant was bored, became distracted, temporarily forgot which button to push, etc. For these reasons then, it’s generally thought to be desirable to remove outlying reaction times from further analysis.
One (fairly simple-minded, but definitely valid) approach to removing outliers then, is to simply remove all values that fall below 250ms, or above 1000ms. This is what I’ve done in the example data-sheet in columns G and H, using simple IF statements of a similar form used for removal of the error trials:
You can see that two short RTs and one long one have been removed and recoded as 0.
3. Re-format the data for analysis
The structure that most psychology experimental systems use for their data logging (similar to the one we’ve been using as an illustration) is not really appropriate for direct import into standard stats packages like SPSS. SPSS requires that one row on the data sheet is used for each participant, whereas we have one row-per-trial. In order to get our data in the right format we first need to sort the data, first by subject (column A), and then by condition (column C). Doing this sort procedure ensures that we know which entries in the final column are which – the first two rows of each subject’s data are always condition 1, and the second two are always condition 2:
We can then restructure the data from the final column, like so:
I’ve done this ‘by hand’ in Excel by cutting-and-pasting the values for each subject into a new sheet and using the paste-special > transpose function, however this is a stupid way of doing it – the ‘restructure’ functions in SPSS can accomplish this kind of thing very nicely. So, our condition 1 values are now in columns B:C and condition 2 values are in columns D:E. All that remains to do now would be to calculate summary statistics (means, variance, standard deviations, whatever; taking care that our 0 values are coded as missing, and not included in the calculations) for each set of columns (i.e. each condition) and perform the inferential tests of your choice (in this case, with only two within-subject conditions, it would be a paired t-test).
Next time, I’ll use a set of real reaction time data and do these procedures (and others) using SPSS, in order to illustrate some more sophisticated ways of handling outliers than just the simple high and low cutoffs detailed above.
This post might seem a trifle umm… politically insensitive after recent revelations in the UK about exactly how much corporation tax Google pays (answer – basically none), but I’ve been planning it for a while, and unlike Starbucks (which should be boycotted at all costs, because their coffee sucks) Google is a little harder to avoid, and actually provides a whole slew of incredibly worthwhile, and mostly free, services. One of the first things you should do when you start an undergraduate course at a college/university is sign up for a Google account. Here’s why:
You’ve probably already got an email address, but if you’re not using Gmail then you need to switch. The interface is brilliantly usable and customisable, and you get a massive 10Gb of storage for all your mail – more than you’ll likely ever need. The most important benefit though, is Gmail’s ability to pull all your current and future email accounts together in one place. Gmail can be set up as a POP3 client (here’s how) meaning it can pull email in from several different accounts and present it all in one inbox. You’ve probably got an account already, you’ll definitely get an account on your university’s servers, and when you leave and either go on to postgraduate study (maybe at a different university) or get a job, you’ll almost certainly get given yet another account. Gmail can centralize everything, and mean that you only have to check one inbox for all your accounts. You can even configure it so that it sends mail through, say, your university account by default, so people you contact see your ‘official’ email address. I’ve currently got five email accounts configured to read through Gmail, and I honestly couldn’t manage without it. Additionally, if you start using Gmail from day one, all your contacts and mail are saved in your Gmail account, and won’t be lost when you complete your course and your university account inevitably gets cancelled/deleted. Another benefit of Gmail is its ease of use with various smartphone platforms. Android (obviously) and iOS devices are designed to sync up with Google accounts pretty much seamlessly.
So, set up a Gmail account, and assume it’ll be your email address for life. Be sensible. Don’t choose a username like firstname.lastname@example.org, or email@example.com – choose something you’ll be happy to put on a CV when you leave college, i.e. something that pretty much consists of your name.
In one sense, Google Drive is a simple online storage locker for any kinds of files you like, a bit like Dropbox, or any of the other similar services which have proliferated recently. You get 5Gb of free space, and it’s easy to set up file sharing for specific other users, or to make your files available for download to anyone you send a link to. In another sense, it’s a full-featured web-based alternative to Microsoft Office, with the ability to create/edit documents, spreadsheets or presentations online, collaborate on them simultaneously with other users, and download them in a variety of the usual formats. Use it for just backing important things up, or use the full ‘Docs’ features – it’s up to you.
One other incredibly powerful feature of Google docs are the forms tools. These can be used to create online forms – the best way I currently know of to create online questionnaires for research purposes. The data from the questionnaires all gets dumped into a google docs spreadsheet for easy analysis too – very cool. This page has some good tips.
3. Google Scholar
Google Scholar is pretty much my first port-of-call for literature searches these days, and is often the best way of looking up papers quickly and easily. Yes, for in-depth research on a particular topic then you still need to look at more specialised databases, but as a first-pass tool, it’s fantastic. You can use it without being logged in with a Google account, but if you’re a researcher, you can get a Google Scholar profile page – like this: Isaac Newton’s Google Scholar profile page (only an h-index of 33 Isaac? Better get your thumb out of your arse for the REF old boy). This is the best way to keep track of your publications and some simple citation metrics.
4. Google Calendar
Yes, you need to start using a calendar. Google calendar can pull together several calendars together into one, sync seamlessly with your ‘phone, and send you alerts and emails to make sure you never miss a tutorial or lecture again. Or at least, you never miss one because you just forgot about it.
Blogger is owned by Google, so if you want to start a blog (and it’s something you should definitely think about), all you need to do is go to blogger and hit a few buttons – simples. That’s the easy bit – then you actually have to write something of course…
6. Google Sites
Probably the easiest way to create free websites – as for Blogger above, you can literally create a site with a few clicks. Lots of good free templates that you can use and customise.
Yes, I know you use Facebook, but Google+ is the future. Maybe. The video hangouts are cool, anyway.
8. Other things
Use your Google account to post videos to YouTube, save maps/locations/addresses in Google Maps, find like-minded weirdos who are into the same things as you on Google Groups, read RSS feeds using Google Reader, and oooh… lots of other things.
Honestly, the feature of Gmail should be inducement enough for everyone to sign up for a Google account, the rest is just a bonus. Get to it people – it’s never too late to switch.
Following a couple of comments (below, and on Twitter) I feel it necessary to qualify somewhat my effusive recommendation of Google. Use of Google services inevitably involves surrendering personal information and other data to Google, which is a large corporation, and despite these services being free at the point of use, it should always be remembered that the business of corporations is to deliver profits. Locking oneself into a corporate system should be considered carefully, no matter how ‘convenient’ it might be. This article from Gizmodo is worth a read, as is this blog post from a former Google employee.