Towards open-source psychology research

uncle-sam-open-source-311x400A couple of interesting things have come along recently which have got me thinking about the ways in which research is conducted, and how software is used in psychology research.

The first is some recent publicity around the Many Labs replication project  - a fantastic effort to try and perform replications of some key psychological effects, with large samples, and in labs spread around the world. Ed Yong has written a really great piece on it here for those who are interested. The Many Labs project is part of the Open Science Framework - a free service for archiving and sharing research materials (data, experimental designs, papers, whatever).

The second was a recent paper by Tom Stafford and Mike Dewar in Psychological Science. This is a really impressive piece of research from a very large sample of participants (854,064!) who played an online game. Data from the game was analysed to provide metrics of perception, attention and motor skills, and to see how these improved with training (i.e. more time spent playing the game). The original paper is here (paywalled, unfortunately), but Tom has also written about it on the Mind Hacks site and on his academic blog. The latter piece is interesting (for me anyway) as Tom says that he found his normal approach to analysis just wouldn’t work with this large a dataset and he was obliged to learn Python in order to analyse the data. Python FTW!

Anyway, the other really nice thing about this piece of work is that the authors have made all the data, and the code used to analyse it, publicly available in a GitHub repository here. This is a great thing to do, particularly for a large, probably very rich dataset like this – potentially there are a lot of other analyses that could be run on these data, and making it available enables other researchers to use it.

These two things crystallised an important realisation for me: It’s now possible, and even I would argue preferential, for the majority of not-particularly-technically-minded psychology researchers to perform their research in a completely open manner. Solid, free, user-friendly cross-platform software now exists to facilitate pretty much every stage of the research process, from conception to analysis.

Some examples: PsychoPy is (in my opinion) one of the best pieces of experiment-building software around at the moment, and it’s completely free, cross-platform, and open-source. The R language for statistical computing is getting to be extremely popular, and is likewise free, cross-platform, etc. For analysis of neuroimaging studies, there are several open-source options, including FSL and NiPype. It’s not hard to envision a scenario where researchers who use these kinds of tools could upload all their experimental files (experimental stimulus programs, resulting data files, and analysis code) to GitHub or a similar service. This would enable anyone else in the world who had suitable (now utterly ubiquitous) hardware to perform a near-as-dammit exact replication of the experiment, or (more likely) tweak the experiment in an interesting way (with minimal effort) in order to run their own version. This could potentially really help accelerate the pace of research, and the issue of poorly-described and ambiguous methods in papers would become a thing of the past, as anyone who was interested could simply download and demo the experiment themselves in order to understand what was done. There are some issues with uploading very large datasets (e.g. fMRI or MEG data) but initiatives are springing up, and the problem seems like it should be a very tractable one.

The benefit for researchers should hopefully be greater visibility and awareness of their work (indexed in whatever manner; citations, downloads, page-views etc.). Clearly some researchers (like the authors of the above-mentioned paper) have taken the initiative and are already doing this kind of thing. They should be applauded for taking the lead, but they’ll likely remain a minority unless researchers can be persuaded that this is a good idea. One obvious prod would be if journals started encouraging this kind of open sharing of data and code in order to accept papers for publication.

One of the general tenets of the open-source movement (that open software benefits everyone, including the developers) is doubly true of open science. I look forward to a time when the majority of research code, data, and results are made public in this way and the research community as a whole can benefit from it.

About these ads

About Matt Wall

I do brains. BRAINZZZZ.

Posted on January 8, 2014, in Commentary, Programming, Software and tagged , , , , , , , , . Bookmark the permalink. 8 Comments.

  1. Just thought I could mention my own recently published Python package called psychopy_ext (http://psychopy_ext.klab.lt/) that builds on PsychoPy with an explicit aim to simplify and improve experiment and analysis reproducibility and sharing.

    By the way, I think it would have really cool if Stafford and Dewar shared the source of that game too. I think gaming could vastly improve the amount and quality of data we obtain yet it is not so easy to create games by yourself. Having a “reference” code might encourage more people to gamify their paradigms.

  2. Computational modellers should be leading by example but in reality are some of the worst offenders. Our research all about our source code but it is rarely shared with the community, much less in a way that encourages reuse. I believe our research suffers as result. I’ve said as much here:

    Addyman, C., & French, R. M. (2012). Computational Modeling in Cognitive Science: A Manifesto for Change. Topics in Cognitive Science, 4(3), 332–341.
    http://onlinelibrary.wiley.com/doi/10.1111/j.1756-8765.2012.01206.x/abstract

  3. Very interesting article. I now plan to make my future research open-source. Are there any security/confidentiality concerns with open-sourcing data? I guess as long as it is properly de-identified it should be fine?

    • Excellent point Matt. Of course, all data uploaded to any kind of public space should be completely anonymised, and not contain any PII (Personally Identifiable Information) relating to your participants. In addition, it might be worthwhile including something in the consenting procedure to let your participants know that their data might be used in this way, and to give them an opt-out if necessary.

  1. Pingback: Towards open-source psychology research | consu...

  2. Pingback: Towards open-source psychology research | catch...

  3. Pingback: Towards open-source psychology research | socia...

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 195 other followers

%d bloggers like this: