Following on from fMRI in 1000 words, which seemed to go down well, here’s the next step: how to analyze the data.
There are many software packages available for fMRI analysis, such as FSL, SPM, AFNI, and BrainVoyager. The following principles, however, apply to most. The first step is pre-processing, which involves:
Motion Correction – during the course of the experiment subjects often move their heads slightly; during realignment, all of the volumes are automatically adjusted to eliminate motion.
Smoothing – all MRI signals contain some degree of random noise. During smoothing, the image of the whole brain is blurred. This tends to smooth out random fluctuations. The degree of smoothing is given by the “Full Width to Half Maximum” (FWHM) of the smoother. Between 5 and 8 mm is most common.
Spatial Normalization – Everyone’s brain has a unique shape and size. In order to compare activations between two or more people, you need to eliminate these differences. Each subject’s brain is warped so that it fits with a standard template (the Montreal Neurological Institute or template is most popular.)
Other techniques are also sometimes used, depending on the user’s preference and the software package.
Then the real fun begins: the stats. By far the most common statistical approach for detecting task-related neural activation is that based upon the General Linear Model (GLM), though there are alternatives.
We first need to define a model of what responses we’re looking for, which makes predictions as to what the neural signal should look like. The simplest model would be that the brain is more active at certain times, say, when a picture is on the screen. So our model would be simply a record of when the stimulus was on the screen. This is called a “boxcar” function (guess why):
In fact, we know that the neural response has a certain time lag. So we can improve our model by adding the canonical (meaning “standard”) haemodynamic response function (HRF).
Now consider a single voxel. The MRI signal in this voxel (the brightness) varies over time. If there were no particular neural activation in this area, we’d expect the variation to be purely noise:
Now suppose that this voxel was responding to a stimulus present from time-point 40 to 80. While the signal is on average higher during this period of activation, there’s still a lot of noise, so the data doesn’t fit with the model exactly.
The GLM is a way of asking, for each voxel, how closely it fits a particular model. It estimates a parameter, ?, representing the “goodness-of-fit” of the model at that voxel, relative to noise. Higher ?, better fit. Note that a model could be more complex than the one above. For example, we could have two kinds of pictures, Faces and Houses, presented on the screen at different times:
In this case, we are estimating two ? scores for each voxel, ?-faces and ?-houses. Each stimulus type is called an explanatory variable (EV). But how do we decide which ? scores are high enough to qualify as “activations”? Just by chance, some voxels which contain pure noise will have quite high ? scores (even a stopped clock’s right twice per day!)
The answer is to calculate the t score, which for each voxel is ? / standard deviation of ? across the whole brain. The higher the t score, the more unlikely it is that the model would fit that well by chance alone. It’s conventional to finally convert the t score into the closely-related z score.
We therefore end up with a map of the brain in terms of z. z is a statistical parameter, so fMRI analysis is a form of statistical parametric mapping (even if you don’t use the “SPM” software!) Higher z scores mean more likely activation.
Note also that we are often interested in the difference or contrast between two EVs. For example, we might be interested in areas that respond to Faces more than Houses. In this case, rather than comparing ? scores to zero, we compare them to each other – but we still end up with a z score. In fact, even an analysis with just one EV is still a contrast: it’s a contrast between the EV, and an “implicit baseline”, which is that nothing happens.
Now we still need to decide how high of a z score we consider “high enough”, in other words we need to set a threshold. We could use conventional criteriafor significance: p less than 0.05. But there are 10,000 voxels in a typical fMRI scan, so that would leave us with 500 false positives.
We could go for a p value 10,000 times smaller, but that would be too conservative. Luckily, real brain activations tend to happen in clusters of connected voxels, especially when you’ve smoothed the data, and clusters are unlikely to occur due to chance. So the solution is to threshold clusters, not voxels.
A typical threshold would be “z greater than 2.3, p less than 0.05”, meaning that you’re searching for clusters of voxels, all of which has a z score of at least 2.3, where there’s only a 5% chance of finding a cluster that size by chance (based on this theory.) This is called a cluster corrected analysis. Not everyone uses cluster correction, but they should. This is what happens if you don’t.
Thus, after all that, we hopefully get some nice colorful blobs for each subject, each blob representing a cluster and colour representing voxel z scores:
This is called a first-level, or single-subject, analysis. Comparing the activations across multiple subjects is called the second-level or group-level analysis, and it relies on similar principles to find clusters which significantly activate across most people.
This discussion has focused on the most common method of model-based detection of activations. There are other “data driven” or “model free” approaches, such as this. There are also ways of analyzing fMRI data to find connections and patterns rather than just activations. But that’s another story…