Decoding Thoughts: AI Translates Brain Scans into Text
How useful it would be to know what the person standing next to you on the train was thinking. Or what your boss was going to offer as a pay rise or what a potential partner thought of you.
This ability is entirely futuristic, of course. But the groundwork is being laid now. Various groups have demonstrated the ability to decode certain kinds of thoughts, particularly what people are looking at, based on functional MRI brain scan images. This is difficult work and the results have been, well, let’s say developmental.
But this capability is now a step closer thanks to the work of Weikang Qiu at Yale University in New Haven, and colleagues, who have developed an AI system capable of decoding fMRI scans. The machine, called MindLLM, produces a text description of a subject’s thought process while looking at an image.
The work paves the way to better understand the human brain and its thought processes. It also substantially improves on what has been done before. "MindLLM outperforms the baselines, improving downstream tasks by 12.0%, unseen subject generalization by 16.4%, and novel task adaptation by 25.0%," say Qiu and co.
Mind Mapping
Functional magnetic resonance imaging (fMRI) measures brain activity indirectly by detecting changes in blood oxygenation levels, known as the hemodynamic response. This response, which lags neural activity by several seconds, provides a spatial map of brain activation, albeit with limited temporal resolution. The technique has provided numerous insights into the role that various parts of the brain play. But decoding complex thoughts and ideas from these scans has been a long-standing challenge.
Previous methods have struggled with accuracy, limited task variety, and difficulty generalizing across different individuals. The variability in brain structures and activation patterns between people makes it challenging to develop a universal decoding model.
MindLLM tackles these challenges head-on. It consists of two main components: an fMRI encoder and a large language model (LLM). The fMRI encoder processes the scan data and converts it into a format that the LLM can understand. The LLM has been pre-trained on a wide range of images with their text description.
The fMRI data consists of brain scans of subjects undertaking tasks such as looking at an image and answering a simple question about it. For example, given an image of a clock next to some lettering, the tasks might be to determine the letters in the image or, given an image of a baseball player in the act of throwing, the question might be what object is being thrown. So given the fMRI data, MindLLM must generate text that describes the brain activity captured in the scan.
One of the key innovations of MindLLM is its ability to focus on the most relevant parts of the fMRI data, improving its accuracy and efficiency. Another critical aspect of MindLLM is a technique known as Brain Instruction Tuning (BIT). This involves training the model on a diverse dataset of images and text, enabling it to capture a wide range of representations from fMRI signals. The BIT dataset includes tasks related to perception, memory, language processing, and complex reasoning, ensuring that MindLLM can decode various aspects of human thought.
The potential applications of MindLLM are significant. Qui and co say it could be used to develop brain-computer interfaces that allow people to control devices with their thoughts, revolutionizing assistive technology for individuals with disabilities. The model could also provide insights into cognitive processes, helping researchers better understand how the brain works.
Moreover, MindLLM's ability to decode thoughts has ethical implications that society will have to consider. The possibility of decoding private thoughts raises concerns about privacy and security, and it is essential to establish ethical guidelines for the development and deployment of such technology. Qui and co acknowledge that they will not necessarily know how it will be used. "It is common that users want to adapt the MindLLM to their own specific use cases," they say.
Fact or Fiction
Despite its impressive performance, MindLLM is still in its early stages. One limitation is that fMRI is not a real-time imaging technique and requires significant processing time plus expensive bulky equipment. Future research could explore faster and more portable brain imaging techniques, such as electroencephalography (EEG) or functional near-infrared spectroscopy, to complement or replace fMRI in practical applications.
Additionally, the researchers aim to investigate the relationship between fMRI data and other modalities, such as videos, to gain a more comprehensive understanding of brain activity.
That’s interesting work that shows how mind-reading techniques are progressing in leaps and bounds. The ability to decode thoughts has long been a staple of science fiction, but it may not be longer until it acquires the status of science fact.
Ref: MindLLM: A Subject-Agnostic and Versatile Model for fMRI-to-Text Decoding : arxiv.org/abs/2502.15786