In a windowless basement in San Francisco, a small AI startup has quietly assembled what may be the largest collection of human brain-language data ever recorded. Over the past six months, Conduit says it has gathered around 10,000 hours of non-invasive neural recordings from thousands of volunteers, all with a single aim: teaching machines to translate thoughts into text.
The effort, which unfolded largely out of public view, relied on a steady stream of participants rotating through compact recording booths for two-hour sessions. Inside, they talked or typed freely while wearing custom-built headsets designed to capture subtle neural signals in the moments before words were spoken or typed. The resulting dataset, Conduit believes, surpasses anything previously collected for neuro-language research.
Also Read: Why the brain gets tired: Researchers uncover the biology of mental fatigue
Rather than treating the sessions as clinical experiments, the company leaned into conversation. Early on, participants were guided through structured tasks, but the team quickly noticed a problem: rigid prompts drained energy and produced flatter data. The setup was redesigned to allow open-ended dialogue with a large language model, giving participants room to speak naturally. That shift, engineers say, led to richer language output and cleaner alignment between brain activity, audio, and text.
To make the recordings possible, Conduit built its own hardware from scratch. Off-the-shelf headsets, the team found, could not capture enough signals at once. Their solution combined EEG (electroencephalogram), functional near-infrared spectroscopy, and additional sensors into heavy, 3D-printed rigs weighing about four pounds. These training headsets were never meant to be comfortable; they were designed to pull in as much data as possible. Lighter versions intended for everyday use will come later, shaped by what the models actually need.
Data from the various sensors is fed into a unified storage system that keeps everything precisely synchronised. That timing matters. The models are trained to look at brain activity just seconds before a person speaks or types, searching for patterns that hint at meaning before language takes physical form.
The team’s biggest early headache was electrical noise. Power interference distorted signals, so staff wrapped cables, experimented with filters, and even shut off the building’s main electricity, running the lab entirely on batteries. The workaround helped, but introduced new problems, from dropped data to the logistics of swapping heavy battery packs. In time, scale itself became the solution. Once the dataset had passed several thousand hours, the models began to generalise across individuals and recording setups, making extreme noise suppression less critical.
Story continues below this ad
As the project grew, so did efficiency. Backend systems were rebuilt to flag corrupted sessions instantly, and a small group of supervisors began monitoring multiple booths at once. A custom scheduling system kept headsets in near-constant use, sometimes operating for up to 20 hours a day. Conduit says these changes cut the cost of each usable hour of data by roughly 40 per cent over the course of the project.
With data collection largely complete, the company is now turning its attention inward, training and refining its decoding models. Details about how accurately those systems can reconstruct meaning from brain signals are, however, still under wraps.