A highly acclaimed neuroscientist whose work offered hope for many patients with brain injury has fallen from grace.
Prof. Niels Birbaumer, of the Eberhard-Karls University of Tübingen in Germany, came under investigation earlier this year. The probe began after researcher Martin Spüler raised serious concerns over a 2017 paper in PLoS Biology by Ujwal Chaudhary et al. Birbaumer was the senior author.
Now, the Eberhard-Karls University of Tübingen has announced that two neuroscientists have been found to have committed misconduct (h/t). The two offenders are not named, but based on the information given, one is clearly Birbaumer, and the other, I think, is Chaudhary.
According to Google’s translation of the press release, the university investigative commission has recommended retraction of the 2017 PLoS Biology paper after finding several concerns, namely:
“Selective data selection during data collection”
“Lack of disclosure of data and scripts”
“Missing data”
“Possible data corruption due to incorrect analysis”
So what went wrong?
The 2017 paper reported that communication with paralyzed patients in the complete locked-in state (CLIS) was possible with the help of functional near-infrared spectroscopy (fNIRS), a method to record brain activity. CLIS patients are conscious, but lack any way to communicate as they are totally paralyzed.
Using fNIRS, Chaudhary et al.’s key finding was that a different pattern of brain activity occured when the CLIS patients were thinking “yes” as opposed to “no”. This implied that fNIRS could provide a way for the patients to answer yes-no questions without moving a muscle. Four patients were included. Here are the key results for one of them, Patient F, showing the time-course of fNIRS response for 20 channels across the head:
According to Martin Spüler (also of Eberhard-Karls University of Tübingen), the central flaw of the 2017 paper was the statistical analysis – and the problem is right there in the image above. As Spüler put it:
[The authors] averaged the data first over all trials and then over all sessions and performed a t test on those averages. The problem with this kind of analysis is that the variance over trials/sessions is removed by the averaging, and only the variance over the channels is retained.
Performing a statistical test will then compare the mean of yes-trials with the mean of no-trials while considering the variance over all channels. As the channels are highly correlated (not independent), the variance is very low and will lead to the wrong result, that the difference is significant.
It’s easy to see the non-independence in the figure I highlighted above: all 20 lines under “YES response” are very similar, and the 20 “NO response” lines are very similar to each other. The channels are highly correlated with each other.
This is an instance of the old statistical pitfall of treating non-independent measures as independent, a problem which I’ve accused plenty of papers of myself.
The story is a bit more complicated however. Birbaumer and some of the authors of the 2017 paper published a rebuttal to Spüler defending their analysis. In my view, it’s a very weak rebuttal, but after reading it, I am not sure what analysis the authors actually did in the original paper. The paper was confusingly written, and the rebuttal did little to clarify matters. So it’s possible that Chaudhary et al. didn’t do exactly what Spüler accused them of doing, but I am quite certain that whatever they did, it wasn’t a valid analysis.
Interestingly, these statistical problems are not explicitly mentioned in the university report, although the “Possible data corruption due to incorrect analysis” paragraph might allude to them.
So what did the university commission find? As far as I can see (from the Google translation), the university report doesn’t accuse Birbaumer or colleagues of the most serious kind of fraud i.e. data fabrication, or making up results.
Rather, the commission found that some of the data underlying the paper were missing, e.g. in one patient, “results were reported for twelve days. However, the Commission had only data available for eight days.” EEG data was reported in the paper, but was also missing.
Without knowing the details, it is hard to say how serious this is. The most benign interpretation is that Birbaumer and his team were extremely sloppy. Alternatively, one might conclude that the fact that so much data was missing raises questions over whether it ever existed in the first place.
The investigation also found that some of the data collected during the study had been excluded from analysis, partly “due to technical problems in the survey, partly due to personal decisions of the senior professor”, without disclosure. Again, this could just be poor scientific practice, or it could be more sinister.
Yet in my opinion, the stats in the 2017 paper are so questionable that the missing and excluded data are rather irrelevant. The paper’s results couldn’t be trusted even if the data were all present and correct. In fact, the committee noted that a former colleague of Birbaumer told him, back in November 2015, that (if this is the correct translation) “no significant results could be found from the data with a statistically correct analysis.”
Overall, this is a sad case because Birbaumer’s work offered so much hope to patients. His other work may or may not suffer from similar problems as the 2017 paper, but I would say it needs to be looked at with great scrutiny.