Data-mining in neuroscience – the next great frontier?
The really-very-excellent Ben Thomas (of The Connectome) recently posted something on facebook which got me thinking; it was a link to a project called NeuroSynth, which is an ongoing collaboration between several high-profile brain researchers and groups (details here) to provide an easy method for performing automated large-scale analyses (or meta-analyses) across a large portion of the neuroimaging literature. Briefly, the builders of this system have developed a way of automatically parsing the full text of published articles, and extracting 1) the parts of the brain which are active (as reported in the paper by a commonly-used 3-axis coordinate system) and 2) the topic of the paper (by looking at which terms are used with high frequency in the paper). Using these two bits of information, a huge meta-analysis is then conducted, and brain-maps showing areas which are reliably associated with particular terms in the literature can be produced. Wonderfully, they’ve made the brain maps available on the web, and you can even download these maps in the standard NIFTI (*.nii) format.
Give it a try with some common terms, e.g.:
Fun, huh? One of the best applications that immediately springs to mind when looking at these data is that these brain maps could be used to constrain the search-space in new brain-imaging experiments – for instance, by using these maps to define ROIs for hypothesis-driven analyses (something which I’m very keen on), or for defining regions for multi-voxel-pattern-analysis.
However, it’s the general approach used here which I think is more interesting. This kind of large-scale analysis of meta-data derived from a large number of individual research projects is quite a new thing in neuroscience, and like most new developments it’s only been enabled relatively recently by advances in data storage, algorithms, and statistical methods. Data-mining as a discipline of computer science is a rapidly developing field that’s still to an extent defining its parameters, however the general idea is to use ‘intelligent’ algorithms to interrogate large data-sets in order to generate new results or conclusions that wouldn’t necessarily have been evident before. It potentially has applications for understanding of large-scale systems for which a lot of data is provided, for example agricultural and meteorological data, surveillance (textual analysis of e-mail and SMS messages, and others), and analysis of various aspects of customer behaviour.
Modern neuroscience methods generate huge volumes of data; often many gigabytes per subject, and also generate lots of different types of data (fMRI scans, PET scans, EEG data, spectroscopy etc.). Since the brain seems to operate at multiple spatial and temporal scales, all of these data sources are potentially of use, and a full understanding of ‘how the brain works’ is probably only possible by synthesising information across individual studies and methods. These (semi-)intelligent data-mining approaches might well be of use in providing the synthesis and overview that we need in order to really come to strong conclusions about what’s going on in the brain.
Fortunately, there are lots of initiatives currently underway in this direction. This neuroimaging paper-with-the-most-authors-ever and the associated 1000 functional connectomes project outlined a proposal for a ‘discovery science of human brain function’ based on analyses of resting-state fMRI data from multiple labs. An even more ambitious project is the NIH’s Human Connectome Project, which aims to map the human connectome by integrating across several different kinds of data sources (Diffusion Tensor Imaging, RS-fMRI, Task-fMRI, EEG, MEG). Bradley and Jessica Voytek’s ‘BrainSCANr‘ website seeks to identify relationships between concepts by analysing their co-occurrence in the literature, and does so using some very nice interactive graphics. The OpenfMRI project aims to provide a method for fMRI data to be shared and publicly accessible (though as yet, there seems to be relatively few data-sets available).
Most of these resources are in a relatively early stage of development, and it’s tremendously exciting to imagine how these kinds of resources might be developed and what novel kinds of results they might generate in the future. As more (and different kinds of) data is added to the databases and as algorithmic and statistical analysis methods develop it’s almost certain that theses kinds of approaches will produce results of great value and interest. In general, as a scientist I often feel awash and adrift in data, papers and results, with little opportunity to step back and try to come to some kind of overview. These methods might provide a convenient way for all researchers to do just that, and perhaps discover something new as well.
Posted on November 19, 2011, in Cool new tech, Internet, Programming, Software and tagged collaboration, connectome, connectomics, data, data-mining, database, EEG, fMRI, MEG, Neuroscience, PET, resting-state, web. Bookmark the permalink. 6 Comments.