A first principles approach to building with AI in healthcare.

Hey everyone,

I think we are all on the same page and believe AI will redefine several industries including healthcare. The low hanging fruit being operational inefficiencies and then diagnostics and treatment modalities.  

As we do deeper into an AI first reality, a challenge is starting to emerge: the limitation of training AI solely on human-formatted or human-interpreted inputs. I realize this isn't inherently obvious and will explain my thinking more below.

Our current scientific understanding and sensory limitations act as a filter, preventing AI from accessing the raw, unfettered signals that could unlock genuinely groundbreaking insights.

So far in our history, medical science has advanced by observing, measuring and interpreting the world with tools designed for human comprehension. We have developed standardized formats to simplify complex biological processes and data. This created an invisible data wall for AI and its where some of the most promising opportunities lie. There is a shift happening where we are starting to feed AI raw data uninterpretable by us. 

How our perception limits AI

1. Blood samples vs lab values: When a patient's blood is drawn, it contains a vast universe of biological information. Yet, what we typically feed AI are human-defined "lab values" – specific measurements like glucose levels, white blood cell counts, or cholesterol ratios. These are pre-selected, aggregated, and interpreted data points. What if AI had direct access to the raw spectral, photometry and electrical data from a blood sample, or the unfiltered cellular morphology? It might detect subtle patterns, novel biomarkers, or early disease indicators that our current assays and human interpretations simply aren't designed to see or quantify.

2. Electrode data vs ECGs: An electrocardiogram (ECG) is a human-formatted visual representation of the heart's electrical activity. It's a powerful diagnostic tool, but it's a processed output. What if AI could analyze the raw, high-fidelity electrical signals directly from electrodes? It might discern minute fluctuations, complex wave interactions, or early arrhythmia signatures that are imperceptible to the human eye on an ECG strip, or even beyond the current parameters of standard ECG analysis.

3. Audio frequencies we cannot hear: The human ear is limited to a specific frequency range. We record and analyze sounds within these human-detectable frequencies. But what if a patient's body, or even a pathogen, emits audio signals outside the frequencies we can detect? An AI given access to the full spectrum of audio data could potentially identify early indicators of various diseases, the most obvious being diseases which we can interpret a difference in our voice/ breathing but also diseases process we cannot, long before they manifest in human-perceptible symptoms or standard diagnostic tests.

4. Raw imaging data : When a patient undergoes an MRI, CT scan, or X-ray, the machine captures a massive amount of raw pixel data. What we typically use for AI training are often radiologist-annotated images or their written reports, which summarize human interpretations. If AI could directly analyze the unfiltered, high-resolution data (k space for MRI, acoustic echoes for ultrasound, raw pixel density for x-rays and raw measurements for CT scans)  from these scans, it might detect micro-lesions, subtle textural changes, or relationships that are too faint or complex for the human senses to consistently identify, leading to earlier and more precise diagnoses.

5. Continuous wearable sensor data vs vital signs: Modern wearables collect continuous streams of physiological data – raw PPG signals, accelerometer data, skin temperature fluctuations. However, AI is often fed human-defined aggregated metrics like "average heart rate," "steps per day," or "sleep score." If AI could analyze the raw, high-frequency, multi-modal sensor data directly, it could identify highly personalized baseline deviations, predict acute events or even detect subtle shifts indicating onset of chronic conditions, far beyond what simple averages can reveal.

6. Large sets of genome vs known mutations: When DNA is sequenced, it produces vast amounts of raw "reads" – the individual fragments of genetic code. Historically, AI was typically trained on databases of known gene expressions as opposed to raw unprocessed sequencing reads. We are now seeing  AI have access to the raw sequencing data itself, including epigenetic markers (like methylation patterns) or complex structural variations. It now has the potential to  uncover novel genetic predispositions, identify previously unknown disease-causing variants, or reveal how environmental factors interact with gene expression in ways we're only beginning to understand.

What limiting AI here would do

By limiting AI to human formatted inputs, we risk:

  • Overlooking early disease indicators: The earliest, most nuanced indicators of disease might exist in the raw data, but are filtered out or averaged away by human interpretation.

  • Perpetuating late diagnosis: If AI can only learn from symptoms and test results that are already apparent to humans, its ability to provide truly early diagnosis is inherently limited.

  • Stifling novel discoveries: The most disruptive breakthroughs often come from seeing patterns no one else has. If AI is confined to human-defined data structures, it's less likely to make these "out-of-the-box" connections.

  • Delaying precision medicine: Achieving genuine precision medicine, where treatments are tailored to an individual's unique biological blueprint (like epigenetics), requires an understanding of raw, multi-modal data far beyond current lab values or standard imaging.

Unlocking AI's Full Potential

To truly unleash AI's transformative power in healthcare, we must embrace a open source approach to raw unfiltered data:

  1. Enable raw data capture: Invest in technologies and methodologies that capture the highest fidelity, rawest forms of biological, physiological, and environmental data.

  2. Focus on unstructured data: Train AI to process and find patterns directly within these complex, unstructured, and multi-modal raw datasets, rather than relying on human pre-processing.

  3. Challenge our assumptions: Encourage AI to explore correlations and insights that may contradict existing human paradigms, fostering a true "idea meritocracy" of data.

  4. Focus on feeding AI signals we cannot perceive: Develop AI that can detect and interpret signals that are currently imperceptible to human senses or unquantifiable by current human-designed metrics.

  5. Build explainability into AI systems: When AI identifies a novel pattern in raw data, it must be able to explain its reasoning in a way that allows human experts to validate, understand, and then integrate these new insights into medical knowledge.

By breaking free from the "human filter" and giving AI access to  raw healthcare data, we can move beyond merely optimizing existing processes. We can empower AI to discover the unseen, accelerate breakthroughs in pathophysiology and disease reversal, and ultimately build a future where healthcare is truly predictive, preventative, and personalized.

Thank you for reading,

Rishad

Next
Next

Agentic AI in Healthcare