Category Archives: Neuroimaging

Comment on the Button et al. (2013) neuroscience ‘power-failure’ article in NRN

Statistical Spidey knows the score.

Statistical Spidey knows the score.

An article was published in Nature Reviews Neuroscience yesterday which caused a bit of a stir among neuroscientists (or at least among neuroscientists on Twitter, anyway). The authors cleverly used meta-analytic papers to estimate the ‘true’ power of an effect, and then (using the G*Power software) calculated the power for each individual study that made up the meta-analysis, based on the sample size of each one. Their conclusions are pretty damning for the field as a whole: an overall value of 21%, dropping to 8% in some sub-fields. This means that out of 100 studies that are conducted into a genuine effect, only 21 will actually demonstrate it.

The article has been discussed and summarised at length by Ed Yong, Christian Jarrett, and by Kate Button (the study’s first author) on Suzy Gage’s Guardian blog, so I’m not going to re-hash it any more here. The original paper is actually very accessible and well-written, and I encourage interested readers to start there. It’s definitely an important contribution to the debate, however (as always) there are alternative perspectives. I generally have a problem with over-reliance on power analyses (they’re often required for grant applications, and other project proposals). Prospective power analyses (i.e. those conducted before a piece of research is conducted, in order to tell you how many subjects you need) use an estimate of the effect size you expect to achieve – usually derived from previous work that has examined a (broadly) similar problem using (broadly) similar methods. This estimate is essentially a wild shot in the dark (especially because of some of the issues and biases discussed by Button et al., that are likely to operate in the literature), and the resulting power analysis therefore tells you (in my opinion) nothing very useful. Button et al. get around this issue by using the effect size from meta-analyses to estimate the ‘true’ effect size in a given literature area – a neat trick.

The remainder of this post deals with power-issues in fMRI, since it’s my area of expertise, and necessarily gets a bit technical. Readers who don’t have a somewhat nerdy interest in fMRI-methods are advised to check out some of the more accessible summaries linked to above. Braver readers – press on!

An alternative approach used in the fMRI field, and one that I’ve been following when planning projects for years, is a more empirical method. Murphy and Garavan (2004) took a large sample of 58 subjects who had completed a Go/No-Go task and analysed sub-sets of different sizes to look at the reproducibility of the results, with different sample sizes. They showed that reproducibility (assessed by correlation of the statistical maps with the ‘gold standard’ of the entire dataset; Fig. 4) reaches 80% at about 24 or 25 subjects. By this criterion, many fMRI studies are underpowered.

While I like this empirical approach to the issue, there are of course caveats and other things to consider. fMRI is a complex, highly technical research area, and heavily influenced by the advance of technology. MRI scanners have significantly improved in the last ten years, with 32 or even 64-channel head-coils becoming common, faster gradient switching, shorter TRs, higher field strength, and better field/data stability all meaning that the signal-to-noise has improved considerably. This serves to cut down one source of noise in fMRI data – intra-subject variance. The inter-subject variance of course remains the same as it always was, but that’s something that can’t really be mitigated against, and may even be of interest in some (between-group) studies. On the analysis side, new multivariate methods are much more sensitive to detecting differences than the standard mass-univariate approach. This improvement in effective SNR means that the Murphy and Garavan (2004) estimate of 25 subjects for 80% reproducibility may be somewhat inflated, and with modern techniques one could perhaps get away with less.

The other issue with the Murphy and Garavan (2004) approach is that it’s not very generalisable. The Go/No-Go task is widely used and is a ‘standard’ cognitive/attentional task that activates a well-described brain network, but other tasks may produce more or less activation, in different brain regions. Signal-to-noise varies widely across the brain, and across task-paradigms, with simple visual or motor experiments producing very large signal changes and complex cognitive tasks smaller ones. Yet another factor is the experimental design (blocked stimuli, or event-related),  the overall number of trials/stimuli presented, and the total scanning time for each subject, all of which can vary widely.

The upshot is that there are no easy answers, and this is something I try to impress upon people at every opportunity; particularly the statisticians who read my project proposals and object to me not including power analyses. I think prospective power analyses are not only uninformative, but give a false sense of security, and for that reason should be treated with caution. Ultimately the decision about how many subjects to test is generally highly influenced by other factors anyway (most notably, time, and money). You should test as many subjects as you reasonably can, and regard power analysis results as, at best, a rough guide.

Warhol brains.

Here is a pretty Warhol-esque picture I made using a) my own head, b) a Siemens Verio MRI scanner, c) Osirix and d) GIMP.

(Clicky for bigness)

Free, interactive MRI courses from Imaios.com (plus lots of other medical/anatomy material too)

A very quick post to point you towards a really fantastic set of online, interactive courses on MRI from a website called Imaios.com – a very nice, very slick set of material. The MRI courses are all free, but you’ll need to register to see the animations. Lots of other medical/anatomy-related courses on the site too – some free, some ‘premium’, and some nice looking mobile apps too.

Whither forensic psychology software?

Good nutrition's given you some length of bone.

Good nutrition’s given you some length of bone.

Forensic and criminal psychology are somewhat odd disciplines; they sit at the cross-roads between abnormal psychology, law, criminology, and sociology.  Students seem to love forensic psychology courses, and the number of books, movies, and TV shows which feature psychologists cooperating with police (usually in some kind of offender-profiling manner) attests to the fascination that  the general public have for it too. Within hours of the Newtown, CT shooting spree last December, ‘expert’ psychologists were being recruited by the news media to deliver soundbites attesting to the probable mental state of the perpetrator. Whether this kind of armchair diagnosis is appropriate or useful (hint: it’s really not), it’s a testament to the acceptance of such ideas within society at large.

Back in the late 80s and early 90s there were two opposing approaches to offender profiling, rather neatly personified by American and British practitioners. A ‘top-down’ (or deductive) approach was developed by the FBI Behavioral Sciences Unit, and involved interviewing convicted offenders, attempting to derive (somewhat subjective) general principles in order to ‘think like a criminal’. By contrast, the British approach (developed principally by David Canter and colleagues) took a much more ‘bottom-up’ (or inductive) approach focused on empirical research, and more precisely quantifiable aspects of criminal behaviour.

Interestingly, the latter approach was ideally suited to standardised analysis methods, and duly spawned a number of computer-based tools. The most prominent among them was a spatial/geographical profiling tool, developed by Canter’s Centre for Investigative Psychology, and named ‘Dragnet’. The idea behind it was relatively simple – that the most likely location of the residence of a perpetrator of a number of similar crimes could be deduced from the locations of the crimes themselves. For example, a burglar doesn’t tend to rob his next-door neighbours, nor does he tend to travel too far from familiar locations to ply his trade – he commits burglaries at a medium distance from home, and generally roughly the same distance. Also general caution might prevent him from returning to the same exact location twice, so an idealised pattern of burglary might include a central point (the perpetrators home) with a number of crime locations forming the points of a circle around it. For an investigator, of course the location of the central point isn’t known a priori, however it can easily be deduced simply by looking at the size and shape of the circle.

geographicprofiling

In practice of course, it’s never this neat, but  modern techniques incorporate various other features (terrain, social geography, etc.) to build statistical models and have met with some success. Ex-police officer Kim Rossmo has been the leading figure in geographic profiling in recent years, and founded the Center for Geospatial Intelligence and Investigation at Texas State university.

Software like this seems like it should be useful, but by and large has failed to deliver on its promises in a major way. At one point it was thought that the future police service would incorporate these tools (and others) routinely in order to solve, and perhaps even predict, crimes. With the sheer amount and richness of data available on the general populace (through online search histories, social networking sites, insurance company/credit card databases, CCTV images, mobile-phone histories, licence-plate-reading traffic cameras, etc. etc.) and on urban environments (e.g. Google maps) that crime-solving software would now be highly developed, and use all these sources of information. However, it seems to have largely stalled in recent years; the Centre for Investigative Psychology’s website has clearly not been updated in several years, and it seems no-one has even bothered producing versions of their software for modern operating systems.

Some others seem to be pursuing similar ideas with more modern methods (e.g. this company), yet still we’re nowhere near any kind of system like the (fictional) one portrayed in the TV series ‘Person of Interest‘, which can predict crimes by analysis of CCTV footage and behaviour patterns derived therefrom. Whether or not this will ever be possible, there is certainly relevant data out there, freely accessible to law-enforcement agencies; the issue is building the right kind of data-mining algorithms to make sense of it all – clearly, not a trivial endeavour.

Something that will undoubtedly help, is the fairly recent development of pretty sophisticated facial recognition technology. Crude face-recognition technology is now embedded in most modern digital cameras, can be used as ID-verification (i.e. instead of a passcode) to unlock smartphones, and  is used for ‘tagging’ pictures on websites like Facebook and Flickr. Researchers have been rapidly refining the techniques, including some very impressive methods of generating interpolated high-resolution images from low-quality sources (this paper describes an impressive ‘face hallucination’ method; PDF here). These advancements, while impressive, are essentially a somewhat dry problem in computer vision; there’s no real ‘psychology’ involved here.

'Face hallucination' -  Creating high quality face images from low-resolution inputs, by using algorithms with prior information about typical facial features.

‘Face hallucination’ – Creating high quality face images from low-resolution inputs, by using algorithms with prior information about typical facial features.

One other ‘growth area’ in criminal/legal psychology over the last few years has been in fMRI lie-detection. Two companies (the stupidly-or-maybe-ingeniously-named No Lie MRI, and Cephos) have been aggressively pushing for their lie-detection procedures to be introduced as admissible evidence in US courts. So far they’ve only had minor success, but frankly, it’s only a matter of time. Most serious commentators (e.g. this bunch of imaging heavy-hitters) still strike an extremely cautious tone on such technologies, but they may be fighting a losing battle.

Despite these two very technical areas then, in general, the early promise of a systematic scientific approach to forensic psychology that could be instantiated in formal systems has not been fulfilled. I’m not sure if this is because of a lack of investment, expertise, interest, or just because the problem turned out to be substantively harder to address than people originally supposed. There is an alternative explanation of course – that governments and law enforcement agencies have indeed developed sophisticated software that ties together all the major databases of personal information, integrates it with CCTV and traffic-camera footage, and produces robust models of the behaviour of the general public, both as a whole, and at an individual level. A conspiracy theorist might suppose that if such a system existed, information about it would have to be suppressed, and that’s the likely reason for the apparent lack of development in this area in recent years. Far-fetched? Maybe.

TTFN, and remember – they’re probably (not?) watching you…

 

New ‘Links’ page

Just a quick notification to say that I’ve just put up a ‘Links’ page, accessible from the top-level menu on this site, or by clicking here. There’s a couple of hundred categorised and (more or less) colour-coded links there, all more-or-less relevant to psychology and/or computing. Hope it’s useful to someone, because it took me bloody ages… ;o)

More to come on the links page as I find more stuff/get around to it.

TTFN.

How to pilot an experiment

I got a serious question for you: What the fuck are you doing? This is not shit for you to be messin’ with. Are you ready to hear something? I want you to see if this sounds familiar: any time you try a decent crime, you got fifty ways you’re gonna fuck up. If you think of twenty-five of them, then you’re a genius… and you ain’t no genius.
Body Heat (1981, Lawrence Kasdan)

To consult the statistician after an experiment is finished is often merely to ask him to conduct a post-mortem examination. He can perhaps say what the experiment died of.
R.A. Fisher (1938)

Don’t crash and burn your experiment.

Doing a pilot run of a new psychology experiment is vital. No matter how well you think you’ve designed and programmed your task, there are (almost) always things that you didn’t think of. Going ahead and spending a lot of time and effort collecting a set of data without running a proper pilot is (potentially) a recipe for disaster. Several times I’ve seen data-sets where there was some subtle issue with the data logging, or the counter-balancing, or something else, which meant that the results were, at best,  compromised, and at worst completely useless.

All of the resultant suffering, agony, and sobbing could have been avoided by running a pilot study in the right way. It’s not sufficient to run through the experimental program a couple of times; a comprehensive test of an experiment has to include a test of the analysis as well. This is particularly true of any experiment involving methods like fMRI/MEG/EEG where a poor design can lead to a data-set that’s essentially uninterpretable, or perhaps even un-analysable. You may think you’ve logged all the data variables you think you’ll need for the analysis, and your design is a work of art, but you can’t be absolutely sure unless you actually do a test of the analysis.

This might seem like over-kill, or a waste of effort, however, you’re going to have to design your analysis at some point anyway, so why not do it at the beginning? Analyse your pilot data in exactly the way you’re planning on analysing your main data, save the details (using SPSS syntax, R code, SPM batch jobs – or whatever you’re using) and when you have your ‘proper’ data set, all you’ll (in theory) have to do is plug it in to your existing analysis setup.

These are the steps I normally go through when getting a new experiment up and running. Not all will be appropriate for all experiments, your mileage may vary etc. etc.

1. Test the stimulus program. Run through it a couple of times yourself, and get a friend/colleague to do it once too, and ask for feedback. Make sure it looks like it’s doing what you think it should be doing.

2. Check the timing of the stimulus program. This is almost always essential for a fMRI experiment, but may not be desperately important for some kinds of behavioural studies. Run through it with a stopwatch (the stopwatch function on your ‘phone is probably accurate enough). If you’re doing any kind of experiment involving rapid presentation of stimuli (visual masking, RSVP paradigms) you’ll want to do some more extensive testing to make sure your stimuli are being presented in the way that you think – this might involve plugging a light-sensitive diode into an oscilloscope, sticking it to your monitor with a bit of blu-tack and measuring the waveforms produced by your stimuli. For fMRI experiments the timing is critical. Even though the Haemodynamic Response Function (HRF) is slow (and somewhat variable) you’re almost always fighting to pull enough signal out of the noise, so why introduce more? A cumulative error of only a few tens of milliseconds per trial can mean that your experiment is a few seconds out by the end of a 10 minute scan – this means that your model regressors will be way off – and your results will likely suck.*

3. Look at the behavioural data files. I don’t mean do the analysis (yet), I mean just look at the data. First make sure all the variables you want logged are actually there, then dump it into Excel and get busy with the sort function. For instance, if you have 40 trials and 20 stimuli (each presented twice) make sure that each one really is being presented twice, and not some of them once, and some of them three times; sorting by the stimulus ID should make it instantly clear what’s going on. Make sure the correct responses and any errors are being logged correctly. Make sure the counter-balancing is working correctly by sorting on appropriate variables.

4. Do the analysis. Really do it. You’re obviously not looking for any significant results from the data, you’re just trying to validate your analysis pipeline and make sure you have all the information you need to do the stats. For fMRI experiments – look at your design matrix to see that it makes sense and that you’re not getting warnings about non-orthogonality of the regressors from the software. For fMRI data using visual stimuli, you could look at some basic effects (i.e. all stimuli > baseline) to make sure you get activity in the visual cortex. Button-pushing responses should also be visible as activity in the motor cortex in a single subject too – these kinds of sanity checks can be a good indicator of data quality. If you really want to be punctilious, bang it through a quick ICA routine and see if you get a) component(s) that look stimulus-related, b) something that looks like the default-mode network, and c) any suspiciously nasty-looking noise components (a and b = good, c = bad, obviously).

5. After all that, the rest is easy. Collect your proper set of data, analyse it using the routines you developed in point 4. above, write it up, and then send it to Nature.

And that, ladeez and gennulmen, is how to do it. Doing a proper pilot can only save you time and stress in the long run, and you can go ahead with your experiment in the certain knowledge that you’ve done everything in your power to make sure your data is as good as it can possibly be. Of course, it still might be total and utter crap, but that’ll probably be your participants’ fault, not yours.

Happy piloting! TTFN.

*Making sure your responses are being logged with a reasonable level of accuracy is also pretty important for many experiments, although this is a little harder to objectively verify. Hopefully if you’re using some reasonably well-validated piece of software and decent response device you shouldn’t have too many problems.

More useful links… Open Sesame, the psychology of email, Inkscape, and others.

Another quickie post (it’s been ages since I’ve written anything substantive I know, bear with me just a little while longer…) with some links-of-interest for you.

First up is Open Sesame – this is an experiment-builder application with a nice graphical front-end, which also supports scripting in Python – nice. Looks like a possible alternative to PsychoPy with a fair few similar features. Also, it’s cross-platform, open-source and free – my three favourite things!

Next up is Inkscape – this is a free vector graphics editor (or drawing package), with similar features to Adobe Illustrator or Corel Draw. I tend to use Adobe Illustrator for a few specialised tasks, such as making posters for conferences, and this looks like a potentially really good free alternative.

Neuroimaging Made Easy is a blog I found a while ago that I’ve been meaning to share; it’s mostly a collection of tips and downloadable scripts to accomplish fairly specific tasks. They’re all pretty much optimised for Mac users (using AppleScript) and people who use BrainVoyager or FSL for their neuroimaging – SPM users are likely to be disappointed here (but they’re pretty used to that anyway, right?! Heh…). Really worth digging through the previous posts if you fall in the right segments of that Venn diagram though – I’ve been using a couple of their scripts for a while now.

Penultimately, I thought this recent article on Mind Hacks was really terrific – titled: “Psychological self-defence for the age of email”. It covers several relevant psychological principles and shows how they can be used to better cope with the onslaught of e-mail that many of us are often buried under.

Lastly, I hope you’ll pardon a modicum of self-promotion, but I recently did an interview over Skype with the lovely Ben Thomas of http://the-connectome.com/. Unfortunately the skype connection between London and Los Angeles was less than perfect which meant he couldn’t put it up as a podcast, but he heroically transcribed it instead – if you are so inclined, you can read it here.

TTFN.

Some mild pimpage about the Channel 4 program on MDMA: Drugs Live

So, there’s been a bit of press recently about an upcoming (UK) Channel 4 program called Drugs Live. The show will be broadcast next week, on Wednesday and Thursday (that’s the 26th and 27th of September) at 10pm. The reason I’m mentioning it here is because for the last 9 months or so I’ve been heavily involved in an experiment which has involved MRI-scanning volunteers while they’re under the influence of a dose of MDMA, commonly known as ecstasy, and this is what the program will substantially focus on. I’ve been a collaborator on the project, helping out with bits of task-programming, scanning and analysis of data, but the real stars are the project leaders Prof. David Nutt, Prof. Val Curran and Dr Robin Carhart-Harris. I do have to admit to a little ‘squee!’ of excitement when I saw this article on the Guardian website (that’s me in the picture! On the left! Squeee!).

So… if you’re in the UK, be sure to tune in next Wednesday/Thursday for the program. There’ll be a live panel discussion hosted by the always interestingly be-socked-and-tied Jon Snow of Channel 4 news, presentation of some of the results from the experiments and ooh… all kinds of other interesting things. Also, there was a fascinating edition of the (always excellent) BBC radio program ‘The Life Scientific) with Jim Al-Khalili interviewing David Nutt, where he talks about the current research at one point; for anyone interested, it’s well worth a listen. Available on the BBC iPlayer here.

For those outside the UK – you may well be out of luck, I’ve no idea if the program will ever be ‘properly’ broadcast anywhere else. Some altruistic soul might record it and put it up on a torrent site I suppose, but I certainly couldn’t endorse anyone downloading it from an illegal source (*cough*).

More UK press:

The BBC

The Mirror

The Metro (Can’t believe something I’m involved in is in the Metro – this is the absolute pinnacle of my scientific career – it’s all downhill from here.)

Wired (This is a cool article with some other fun videos of people taking drugs on camera.)

Mixmag (Yes! Mixmag! Hahahahaha… *dies laughing*)

And for the sake of balance, here’s a fairly negative take from The Evening Standard (Headline ‘Are they raving mad?’ Good one guys. How long did it take you to come up with that?)

And finally, the Channel 4 trailer for the program:

So… Channel 4 are obviously taking it very seriously and not sensationalising it at all. *Sigh* Don’t forget – next Wednesday/Thursday! 10pm! Channel 4! Be there, or be… I dunno… in the pub?

Oh, and if anyone wants to update my IMDB page for me after the program, that’d be great. Ta.

Bye for now, my lovelies *air kiss, flounces off*.

Two like, *totes* awesome websites: ViperLib and mindhive

I’ve come across a couple of more web-links which I thought were important enough to share with you straight away rather than saving them up for a massive splurge of links.

The first is ViperLib, a site which focusses (geddit?) on visual perception and is run by Peter Thompson and Rob Stone of the University of York, with additional input (apparently) from Barry the snake. This is essentially a library of images and movies related to vision science, and currently contains a total of 1850 images – illusions, brain scans, anatomical diagrams, and much more. Registration is required to view the images, but it’s free and easily done, and I would encourage anyone to spend an hour or so of their time poking around amongst the treasures there. I shall be digging through my old hard drives when I get a chance and contributing some optic-flow stimuli from my old vision work to the database.

The second is for the (f)MRI types out there; a fantastic ‘Imaging Knowledge Base’ from the McGovern Institute for Brain Research at MIT. The page has a huge range of great information about fMRI design and analysis, from the basics of Matlab, to how to perform ROI analyses, and all presented in a very friendly, introductory format. If you’re just getting started with neuroimaging, this is one of the best resources I’ve seen for beginners.

The effects of hardware, software, and operating system on brain imaging results

A recent paper (Gronenschild et al., 2012) has caused a modicum of concern amongst neuroimaging researchers. The paper documents a set of results based on analysis of anatomical MRI images using a popular free software tool called FreeSurfer, and essentially reports that there are (sometimes quite substantive) differences in the results that it produces, depending on the exact version of the software used, and whether the analyses were carried out on a Mac (running OS X) or a Hewlett Packard PC (running Linux). In fact, even the exact version of OS X on the Mac systems was also shown to be important in replicating results precisely.

Figure 3 of Gronenschild et al. (2012) showing the effect of different versions of FreeSurfer on obtained grey-matter volume results. Percentage scale at the top, p-values on the bottom.

The fact that results differ from one version of FreeSurfer to another is perhaps not so surprising – after all, we expect that newer versions of software should be ‘improved’ in important ways, otherwise, what would be the point in releasing them? However, the fact that results differ between operating systems is a little more worrying – in theory any operating system capable of running the software should produce the same result. The authors recommendations are that 1) Researchers should not switch from one version/operating system/platform to another in the middle of a research project, and 2) that when reporting results software version numbers, and the workstation/OS used should all be documented. This seems broadly sensible.

It got me thinking about neuroimaging software more generally as well though. In general, people don’t do detailed evaluations of software of the kind reported by Gronenschild et al. (2012).  As an enthusiastic user of several fMRI-related packages (I’m currently using SPM, FSL and BrainVoyager, all on different projects) I’ve often wondered what the real differences were between them, in terms of the results they produce. Given how many people around the world use brain imaging software, you might think that some detailed evaluations would be floating around, but in fact there are very few.

I think there are several reasons for this:

1. It’s (perhaps understandably) regarded as a waste of time. After all, we (meaning researchers who use this software) are generally more interested in how the brain works, than by how software works. Neuroimaging is difficult and time-consuming and we all need to publish papers to survive – it makes more sense to spend our time on ‘real’ brain-related research.

2. Most people have one (or at most two) pieces of software that they like to use for neuroimaging, and they stick with it; I’m somewhat unusual in this respect. The fact that most people use just one package more-or-less exclusively means there’s a dearth of people who actually have the skills necessary to do cross-evaluation of packages. Again, this is understandable – why take the time to learn a new system, if you’re happy with the one you’re using?

3. The differences between the packages make precise comparison of end-results difficult. Even though all the packages use an application of the General Linear Model for basic analysis, other differences in pre-processing conceivably play a role. For instance, FSL handles the spatial transformation of functional data somewhat differently to other packages.

Having said that, there have been a few papers which have tried to do these kind of evaluations. Two examples are here (on motion correction) and here (on segmentation). Another somewhat instructive paper is this one, which summarises the results of a functional-imaging analysis contest held as part of the Human Brain Mapping meeting in Toronto in 2005; developers of popular neuroimaging software were all given the same set of data and asked to analyse it as best they could. Interesting stuff, but as the contestants all used somewhat different methods to get the most out of the data, it’s hard to draw direct comparisons.

If there’s a moral to this story, it’s that (as the recent Gronenschild et al. paper demonstrates) we need to pay close attention to this kind of thing. As responsible researchers we cannot simply assume our results will be replicable with different hardware and software, and detailed reporting of not just the analysis procedures, but also the tools used to achieve the results seems a simple and robust way of at least acknowledging the issue and enabling more precise replicability. Actually solving the issues involved is a substantially more difficult problem, and may be a job for future generations of researchers and developers.

See also:
My previous post on comparisons of different fMRI software: Herehere and here.
Neuroskeptic has also written a short piece on the recent paper mentioned above.

TTFN.

Follow

Get every new post delivered to your Inbox.

Join 123 other followers