Blog Archives

Comment on the Button et al. (2013) neuroscience ‘power-failure’ article in NRN

Statistical Spidey knows the score.

Statistical Spidey knows the score.

An article was published in Nature Reviews Neuroscience yesterday which caused a bit of a stir among neuroscientists (or at least among neuroscientists on Twitter, anyway). The authors cleverly used meta-analytic papers to estimate the ‘true’ power of an effect, and then (using the G*Power software) calculated the power for each individual study that made up the meta-analysis, based on the sample size of each one. Their conclusions are pretty damning for the field as a whole: an overall value of 21%, dropping to 8% in some sub-fields. This means that out of 100 studies that are conducted into a genuine effect, only 21 will actually demonstrate it.

The article has been discussed and summarised at length by Ed Yong, Christian Jarrett, and by Kate Button (the study’s first author) on Suzy Gage’s Guardian blog, so I’m not going to re-hash it any more here. The original paper is actually very accessible and well-written, and I encourage interested readers to start there. It’s definitely an important contribution to the debate, however (as always) there are alternative perspectives. I generally have a problem with over-reliance on power analyses (they’re often required for grant applications, and other project proposals). Prospective power analyses (i.e. those conducted before a piece of research is conducted, in order to tell you how many subjects you need) use an estimate of the effect size you expect to achieve – usually derived from previous work that has examined a (broadly) similar problem using (broadly) similar methods. This estimate is essentially a wild shot in the dark (especially because of some of the issues and biases discussed by Button et al., that are likely to operate in the literature), and the resulting power analysis therefore tells you (in my opinion) nothing very useful. Button et al. get around this issue by using the effect size from meta-analyses to estimate the ‘true’ effect size in a given literature area – a neat trick.

The remainder of this post deals with power-issues in fMRI, since it’s my area of expertise, and necessarily gets a bit technical. Readers who don’t have a somewhat nerdy interest in fMRI-methods are advised to check out some of the more accessible summaries linked to above. Braver readers – press on!

An alternative approach used in the fMRI field, and one that I’ve been following when planning projects for years, is a more empirical method. Murphy and Garavan (2004) took a large sample of 58 subjects who had completed a Go/No-Go task and analysed sub-sets of different sizes to look at the reproducibility of the results, with different sample sizes. They showed that reproducibility (assessed by correlation of the statistical maps with the ‘gold standard’ of the entire dataset; Fig. 4) reaches 80% at about 24 or 25 subjects. By this criterion, many fMRI studies are underpowered.

While I like this empirical approach to the issue, there are of course caveats and other things to consider. fMRI is a complex, highly technical research area, and heavily influenced by the advance of technology. MRI scanners have significantly improved in the last ten years, with 32 or even 64-channel head-coils becoming common, faster gradient switching, shorter TRs, higher field strength, and better field/data stability all meaning that the signal-to-noise has improved considerably. This serves to cut down one source of noise in fMRI data – intra-subject variance. The inter-subject variance of course remains the same as it always was, but that’s something that can’t really be mitigated against, and may even be of interest in some (between-group) studies. On the analysis side, new multivariate methods are much more sensitive to detecting differences than the standard mass-univariate approach. This improvement in effective SNR means that the Murphy and Garavan (2004) estimate of 25 subjects for 80% reproducibility may be somewhat inflated, and with modern techniques one could perhaps get away with less.

The other issue with the Murphy and Garavan (2004) approach is that it’s not very generalisable. The Go/No-Go task is widely used and is a ‘standard’ cognitive/attentional task that activates a well-described brain network, but other tasks may produce more or less activation, in different brain regions. Signal-to-noise varies widely across the brain, and across task-paradigms, with simple visual or motor experiments producing very large signal changes and complex cognitive tasks smaller ones. Yet another factor is the experimental design (blocked stimuli, or event-related),  the overall number of trials/stimuli presented, and the total scanning time for each subject, all of which can vary widely.

The upshot is that there are no easy answers, and this is something I try to impress upon people at every opportunity; particularly the statisticians who read my project proposals and object to me not including power analyses. I think prospective power analyses are not only uninformative, but give a false sense of security, and for that reason should be treated with caution. Ultimately the decision about how many subjects to test is generally highly influenced by other factors anyway (most notably, time, and money). You should test as many subjects as you reasonably can, and regard power analysis results as, at best, a rough guide.

Free, interactive MRI courses from Imaios.com (plus lots of other medical/anatomy material too)

A very quick post to point you towards a really fantastic set of online, interactive courses on MRI from a website called Imaios.com – a very nice, very slick set of material. The MRI courses are all free, but you’ll need to register to see the animations. Lots of other medical/anatomy-related courses on the site too – some free, some ‘premium’, and some nice looking mobile apps too.

How to pilot an experiment

I got a serious question for you: What the fuck are you doing? This is not shit for you to be messin’ with. Are you ready to hear something? I want you to see if this sounds familiar: any time you try a decent crime, you got fifty ways you’re gonna fuck up. If you think of twenty-five of them, then you’re a genius… and you ain’t no genius.
Body Heat (1981, Lawrence Kasdan)

To consult the statistician after an experiment is finished is often merely to ask him to conduct a post-mortem examination. He can perhaps say what the experiment died of.
R.A. Fisher (1938)

Don’t crash and burn your experiment.

Doing a pilot run of a new psychology experiment is vital. No matter how well you think you’ve designed and programmed your task, there are (almost) always things that you didn’t think of. Going ahead and spending a lot of time and effort collecting a set of data without running a proper pilot is (potentially) a recipe for disaster. Several times I’ve seen data-sets where there was some subtle issue with the data logging, or the counter-balancing, or something else, which meant that the results were, at best,  compromised, and at worst completely useless.

All of the resultant suffering, agony, and sobbing could have been avoided by running a pilot study in the right way. It’s not sufficient to run through the experimental program a couple of times; a comprehensive test of an experiment has to include a test of the analysis as well. This is particularly true of any experiment involving methods like fMRI/MEG/EEG where a poor design can lead to a data-set that’s essentially uninterpretable, or perhaps even un-analysable. You may think you’ve logged all the data variables you think you’ll need for the analysis, and your design is a work of art, but you can’t be absolutely sure unless you actually do a test of the analysis.

This might seem like over-kill, or a waste of effort, however, you’re going to have to design your analysis at some point anyway, so why not do it at the beginning? Analyse your pilot data in exactly the way you’re planning on analysing your main data, save the details (using SPSS syntax, R code, SPM batch jobs – or whatever you’re using) and when you have your ‘proper’ data set, all you’ll (in theory) have to do is plug it in to your existing analysis setup.

These are the steps I normally go through when getting a new experiment up and running. Not all will be appropriate for all experiments, your mileage may vary etc. etc.

1. Test the stimulus program. Run through it a couple of times yourself, and get a friend/colleague to do it once too, and ask for feedback. Make sure it looks like it’s doing what you think it should be doing.

2. Check the timing of the stimulus program. This is almost always essential for a fMRI experiment, but may not be desperately important for some kinds of behavioural studies. Run through it with a stopwatch (the stopwatch function on your ‘phone is probably accurate enough). If you’re doing any kind of experiment involving rapid presentation of stimuli (visual masking, RSVP paradigms) you’ll want to do some more extensive testing to make sure your stimuli are being presented in the way that you think – this might involve plugging a light-sensitive diode into an oscilloscope, sticking it to your monitor with a bit of blu-tack and measuring the waveforms produced by your stimuli. For fMRI experiments the timing is critical. Even though the Haemodynamic Response Function (HRF) is slow (and somewhat variable) you’re almost always fighting to pull enough signal out of the noise, so why introduce more? A cumulative error of only a few tens of milliseconds per trial can mean that your experiment is a few seconds out by the end of a 10 minute scan – this means that your model regressors will be way off – and your results will likely suck.*

3. Look at the behavioural data files. I don’t mean do the analysis (yet), I mean just look at the data. First make sure all the variables you want logged are actually there, then dump it into Excel and get busy with the sort function. For instance, if you have 40 trials and 20 stimuli (each presented twice) make sure that each one really is being presented twice, and not some of them once, and some of them three times; sorting by the stimulus ID should make it instantly clear what’s going on. Make sure the correct responses and any errors are being logged correctly. Make sure the counter-balancing is working correctly by sorting on appropriate variables.

4. Do the analysis. Really do it. You’re obviously not looking for any significant results from the data, you’re just trying to validate your analysis pipeline and make sure you have all the information you need to do the stats. For fMRI experiments – look at your design matrix to see that it makes sense and that you’re not getting warnings about non-orthogonality of the regressors from the software. For fMRI data using visual stimuli, you could look at some basic effects (i.e. all stimuli > baseline) to make sure you get activity in the visual cortex. Button-pushing responses should also be visible as activity in the motor cortex in a single subject too – these kinds of sanity checks can be a good indicator of data quality. If you really want to be punctilious, bang it through a quick ICA routine and see if you get a) component(s) that look stimulus-related, b) something that looks like the default-mode network, and c) any suspiciously nasty-looking noise components (a and b = good, c = bad, obviously).

5. After all that, the rest is easy. Collect your proper set of data, analyse it using the routines you developed in point 4. above, write it up, and then send it to Nature.

And that, ladeez and gennulmen, is how to do it. Doing a proper pilot can only save you time and stress in the long run, and you can go ahead with your experiment in the certain knowledge that you’ve done everything in your power to make sure your data is as good as it can possibly be. Of course, it still might be total and utter crap, but that’ll probably be your participants’ fault, not yours.

Happy piloting! TTFN.

*Making sure your responses are being logged with a reasonable level of accuracy is also pretty important for many experiments, although this is a little harder to objectively verify. Hopefully if you’re using some reasonably well-validated piece of software and decent response device you shouldn’t have too many problems.

Two like, *totes* awesome websites: ViperLib and mindhive

I’ve come across a couple of more web-links which I thought were important enough to share with you straight away rather than saving them up for a massive splurge of links.

The first is ViperLib, a site which focusses (geddit?) on visual perception and is run by Peter Thompson and Rob Stone of the University of York, with additional input (apparently) from Barry the snake. This is essentially a library of images and movies related to vision science, and currently contains a total of 1850 images – illusions, brain scans, anatomical diagrams, and much more. Registration is required to view the images, but it’s free and easily done, and I would encourage anyone to spend an hour or so of their time poking around amongst the treasures there. I shall be digging through my old hard drives when I get a chance and contributing some optic-flow stimuli from my old vision work to the database.

The second is for the (f)MRI types out there; a fantastic ‘Imaging Knowledge Base’ from the McGovern Institute for Brain Research at MIT. The page has a huge range of great information about fMRI design and analysis, from the basics of Matlab, to how to perform ROI analyses, and all presented in a very friendly, introductory format. If you’re just getting started with neuroimaging, this is one of the best resources I’ve seen for beginners.

The effects of hardware, software, and operating system on brain imaging results

A recent paper (Gronenschild et al., 2012) has caused a modicum of concern amongst neuroimaging researchers. The paper documents a set of results based on analysis of anatomical MRI images using a popular free software tool called FreeSurfer, and essentially reports that there are (sometimes quite substantive) differences in the results that it produces, depending on the exact version of the software used, and whether the analyses were carried out on a Mac (running OS X) or a Hewlett Packard PC (running Linux). In fact, even the exact version of OS X on the Mac systems was also shown to be important in replicating results precisely.

Figure 3 of Gronenschild et al. (2012) showing the effect of different versions of FreeSurfer on obtained grey-matter volume results. Percentage scale at the top, p-values on the bottom.

The fact that results differ from one version of FreeSurfer to another is perhaps not so surprising – after all, we expect that newer versions of software should be ‘improved’ in important ways, otherwise, what would be the point in releasing them? However, the fact that results differ between operating systems is a little more worrying – in theory any operating system capable of running the software should produce the same result. The authors recommendations are that 1) Researchers should not switch from one version/operating system/platform to another in the middle of a research project, and 2) that when reporting results software version numbers, and the workstation/OS used should all be documented. This seems broadly sensible.

It got me thinking about neuroimaging software more generally as well though. In general, people don’t do detailed evaluations of software of the kind reported by Gronenschild et al. (2012).  As an enthusiastic user of several fMRI-related packages (I’m currently using SPM, FSL and BrainVoyager, all on different projects) I’ve often wondered what the real differences were between them, in terms of the results they produce. Given how many people around the world use brain imaging software, you might think that some detailed evaluations would be floating around, but in fact there are very few.

I think there are several reasons for this:

1. It’s (perhaps understandably) regarded as a waste of time. After all, we (meaning researchers who use this software) are generally more interested in how the brain works, than by how software works. Neuroimaging is difficult and time-consuming and we all need to publish papers to survive – it makes more sense to spend our time on ‘real’ brain-related research.

2. Most people have one (or at most two) pieces of software that they like to use for neuroimaging, and they stick with it; I’m somewhat unusual in this respect. The fact that most people use just one package more-or-less exclusively means there’s a dearth of people who actually have the skills necessary to do cross-evaluation of packages. Again, this is understandable – why take the time to learn a new system, if you’re happy with the one you’re using?

3. The differences between the packages make precise comparison of end-results difficult. Even though all the packages use an application of the General Linear Model for basic analysis, other differences in pre-processing conceivably play a role. For instance, FSL handles the spatial transformation of functional data somewhat differently to other packages.

Having said that, there have been a few papers which have tried to do these kind of evaluations. Two examples are here (on motion correction) and here (on segmentation). Another somewhat instructive paper is this one, which summarises the results of a functional-imaging analysis contest held as part of the Human Brain Mapping meeting in Toronto in 2005; developers of popular neuroimaging software were all given the same set of data and asked to analyse it as best they could. Interesting stuff, but as the contestants all used somewhat different methods to get the most out of the data, it’s hard to draw direct comparisons.

If there’s a moral to this story, it’s that (as the recent Gronenschild et al. paper demonstrates) we need to pay close attention to this kind of thing. As responsible researchers we cannot simply assume our results will be replicable with different hardware and software, and detailed reporting of not just the analysis procedures, but also the tools used to achieve the results seems a simple and robust way of at least acknowledging the issue and enabling more precise replicability. Actually solving the issues involved is a substantially more difficult problem, and may be a job for future generations of researchers and developers.

See also:
My previous post on comparisons of different fMRI software: Herehere and here.
Neuroskeptic has also written a short piece on the recent paper mentioned above.

TTFN.

Video analysis software – Tracker

I just came across a gosh-darn drop-dead cool (and free!) piece of software that I just had to write a quick post on. It’s called Tracker, it’s cross-platform, open-source and freely available here.

In a nutshell, it’s designed for analysis of videos, and can do various things, like track the motion of an object across frames (yielding position, velocity and acceleration data) and generate dynamic RGB colour profiles. Very cool. As an example of the kinds of things it can do, see this post on Wired.com where a physicist uses it to analyse the speed of blaster bolts in Star Wars: Episode IV. Super-geeky I know, but I love it.

An example of some motion analyses conducted using Tracker

Whenever I see a piece of software like this I immediately think about what I could use it for in psychology/neuroscience. In this case, I immediately thought about using it for kinematic analysis – that is, tracking the position/velocity/acceleration of the hand as it performed movements or manipulates objects. Another great application would be for analysis of movie stimuli for use in fMRI experiments. Complex and dynamic movies could be analysed in terms of the movement (or colour) stimuli they contain and measures produced which represent movement across time. Sub-sampled versions of these measures could then be entered into a fMRI-GLM analysis as parametric regressors to examine how the visual cortex responds; with careful selection of stimuli, this could be quite a neat and interesting experiment.

Not sure I’ll ever actually need to use it in my own experiments, but it looks like a really neat piece of software which could be a good solution for somebody with a relevant problem.

TTFN.

The immediate and wide-ranging power of Twitter – discussion with Vaughn Bell and others on fMRI

I’ve only started using Twitter reasonably recently, but I’m finding that it’s very quickly become a near-essential part of my daily online routine, and I’d urge anyone who has a modicum of curiosity to give it a serious try. The ability to connect nigh-instantly to people around the world who share similar interests and occupations is incredible, and the 140-character limit is rarely too limiting once you get used to it. In fact, I’m constantly amazed at the high level of discussion which can be conducted within such a constraint.

As an example, this weekend there was an article in the Observer on fMRI by Vaughan Bell of Mindhacks fame. As i blearily groped my way through my regular Sunday morning routine of a cup of tea followed by several espressos, I was also engaged in an interesting debate about the article with the author (@vaughnbell), Chris Chambers (@chrisdc77), Tom Hartley (@tom_hartley), David Dobbs (@David_Dobbs) and a few others. The discussion was very kindly storify-ed for posterity by @creiner, and you can read it here.

I can’t think of any other way in which this discussion could have happened. I’ve never met Vaughn, or any of the other participants for that matter, and would have no direct way of contacting him/them otherwise. The great power of twitter (for me, anyway) is enabling groups of specialists like this to discuss in a public forum where anyone can chip in. If you’ve been on the fence about Twitter for a while, I urge you to give it a try – the more people who are active users, the richer the discussions will be!

TTFN.

PS. More additional thoughts on the original article can also be found on this blogpost.

UPDATE:
PPS. Tom Hartley has also just posted up a fantastic discussion related to the same issues on his blog too.

More on coding for students, plus a teeny bit of recursive self-indulgence

My lovely, lovely friends at TheNeuronClub have just put up a new post, where they say some very kind things about this blog – thanks guys! However, this post is not just about a bit of self-back-slappery – they also mention a few really great-looking resources to help you get started with programming using Python – if you’re interested in coding, this would be a great place to start. Check out their piece here. You can also (*ahem* self-promotion *ahem*), if you felt so inclined, read my previous guest piece on their site (on whether to average functionally or anatomically in fMRI) here.

TTFN.

SPM vs. FSL by Chris Rorden, and some miscellaneous fMRI tips

I’ve noticed that my previous postings (here and here) about the differences between popular bits of neuroimaging software have been pretty popular in terms of traffic, so I thought I’d just point you towards a similar powerpoint presentation I’ve just found authored by Chris Rorden. It’s fairly brief, but does contain a lot of good information about the key differences between FSL and SPM, particularly in terms of their different approaches to spatial normalisation and  You can download the .ppt file here.

For those who might already be a bit more advanced in their practice of the dark art of fMRI analysis, Chris also has an great set of scripts for SPM8 here, plus of course, his MRICron software is outstandingly useful.

Finally, for FSL users (see – something for everyone at this blog!) I have this little tip, from the mysteriously-named “neuroimager”, which is a fantastic and beautiful method of displaying functional results on a 3D-rendered brain image, using freesurfer.

Just found a new blog…

…by a fMRI researcher in Dundee named Akira O’Connor, find it here. Lots of deeply cool academic/tech-related stuff on there, but in particular I really enjoyed his presentation (done using Prezi) on internet tools for academics, and the post on student e-mail etiquette.

That is all.

Follow

Get every new post delivered to your Inbox.

Join 122 other followers