You know when a thought pops into your head and you wonder if anyone can read it? Well, today it’s no longer just science fiction. A group of Japanese researchers just found a way to translate brain activity into written text, and the result is surprisingly precise.
The project is called “Mind Captioning,” and it promises to change the way we understand the human mind — if, let’s face it, a little restless.
How the technology that transforms thoughts into words works
The idea was born in the communication science laboratories of NTT in Japan, where researcher Tomoyasu Horikawa combined functional magnetic resonance imaging (fMRI) and artificial intelligence language models to “listen” to the brain. In practice, the system does not directly decode words, but interprets the brain models that are formed before thought becomes language. A bit like reading the mental draft of what we are about to say.
The study, published on Science Advancesshows that even when a person watches or remembers silent videos, the brain contains enough information for the AI to describe the scene in a coherent and meaningful way, resulting in natural language sentences that tell with surprising fidelity what the person is seeing or remembering.
The brain “speaks” even without words
Horikawa has dubbed his method “mind captioning”: a two-step approach. First, brain activity is translated into semantic features, i.e. representations of meaning, thanks to a deep language model called DeBERTa. Then, a second model, RoBERTa, uses that information to reconstruct complete sentences. Through an iterative process of attempts and corrections, AI manages to generate descriptions that are increasingly consistent with the original thought.
The experiment involved six Japanese volunteers who watched thousands of short videos without sound. During viewing and the subsequent mental recall phase, their brain activities were recorded with fMRI. The system then generated sentences which, when compared with the real captions, proved to be consistent up to 50% of the cases, a huge achievement considering that the random result would have been only 1%.
From mind to text: what they really discovered

What makes everything even more fascinating is that the system does not simply list objects, but recognizes relationships and actions: who does what, where and how. When the researchers mixed up the order of the words, the accuracy of the descriptions plummeted – a sign that the model had actually understood the logical structure of the scene.
What’s more, the decoder works even without involving the traditional language areas of the brain. It means that meaning is distributed across many brain regions — not just language regions — and that our thoughts have an understandable structure even before being translated into words.
Ethics, limits and future of “mind captioning”
Obviously, reading thoughts is not possible (yet). Horikawa himself clarifies that this technology does not “read the mind”, but interprets non-linguistic mental representations. And for now, the method only works on people who undergo long MRI sessions — as many as 17 hours per participant — and with a huge amount of data.
Then there is the question of mental privacy, a topic that the scientist approaches with caution:
We need cooperation, consensus and ethical use of brain data. We should never consider these results as a person’s true thoughts.
Despite its limitations, mind captioning opens up fascinating scenarios: it could help those who cannot communicate with language, such as people with aphasia or paralysis, or improve the understanding of complex mental states, such as dreams, emotions or memories.
Ultimately, this research reminds us that the brain is a natural narrator: even before speaking, it constructs stories.
And perhaps, one day, we will be able to use technology not to violate the mind, but to give voice to those who cannot express themselves. As Horikawa himself says:
The real goal is to understand how consciousness constructs meaning, even before it becomes words.”
You might also be interested in: