March 21, 2025

10 min read

Your Digital Twin May Be Arriving Soon

Researchers are building AI-powered models of human biology to help us live our healthiest lives

Scott Penberthy

Gif of a computerized-like body — Phil Wheeler

This series was created for Google, the Buck Institute, Optispan and Phenome Health by Scientific American Custom Media, a division separate from the magazine’s board of editors.

Imagine your life as a vast landscape, with many pathways representing the myriad possibilities for your future health and well-being. Each choice you make, each genetic trait you’ve inherited and every environmental factor you encounter guides you along this landscape’s infinite intersecting paths. Some lead toward vitality and longevity, others veer off into illness and decline.

What if you could map such a landscape for your own life, understand its contours and use that knowledge to steer yourself toward the healthiest possible future?

This vision describes the frontier of artificial intelligence and human health, where data becomes insight, and insight becomes action. By converting vast amounts of diverse data into extremely complex possibilities—or, in the language of computer science, high-dimensional vector spaces—AI is helping scientists visualize this landscape of life, and potentially help each one of us steer ourselves toward health and longevity. Being a coder, I prefer to think of this landscape as a manifold—“a collection of points forming … a topologically closed surface or an analog of this in three or more dimensions,” as Oxford defines it. The analogy of a manifold captures the complex, multidimensional aspect of AI.

What does navigating the manifold of life mean for you and me? It suggests a shift from reactive to proactive healthcare—instead of waiting for diseases to manifest, we anticipate and prevent them. AI can detect subtle deviations from our optimal path and suggest interventions—lifestyle changes, medical treatments or environmental adjustments—to guide us back to health.

Imagine receiving personalized health recommendations based on a comprehensive analysis of your genetic makeup, daily habits and environmental exposures. Wearable devices and health apps could continuously monitor your well-being, providing real-time feedback. These insights could help you avoid chronic illnesses, optimize well-being and perhaps extend your healthy years well beyond current expectations.

AI is no longer just a tool for crunching numbers or predicting stock markets; it’s becoming a compass that can guide us through the complex terrain of our biological existence.

The symphony of data

Think of your body as an orchestra, with each cell, gene and biomolecule playing its own instrument. Traditional medicine often listens to these instruments in isolation—a gene here, a symptom there.

The true music of health emerges only when we consider the harmony between all these elements. Recent advances make it possible to integrate data from our genomes, microbiomes, lifestyles and even real-time health metrics like heart rate and activity levels. AI allows us to listen to the entire symphony.

This integration is made possible through a mathematical tool known as embeddings—a mathematical way of representing real-world objects in a way that machine-learning algorithms can digest them. In this sense, a real-world object can be words, images, audio and so forth. In the case of healthcare, real-world objects can be genomic sequences, MRI scans and other biological data.

Each real-world object is assigned a series of numbers, collectively called a vector. Each number in the vector corresponds to a dimension that defines some characteristic of the object. Together, the values of each dimension represent a point in very high-dimensional space. That point is a computer representation of an idea that captures the relevant information inherent in the original data.

The virtue of defining an object as a vector is that the math we know and use today—dots, lines and curves in two and three dimensions—generalizes well to higher dimensions. The smaller AI models now reason in 768 dimensions; larger ones use 11,000 dimensions or more. Using classic statistics, machines can now find relationships between these points and shapes.

We can then represent a healthy life as a larger mathematical structure made up of many vectors, called a point cloud, which represents our biology and health as we progress through life. This point cloud exists within the larger manifold, which, as indicated earlier, represents all possible health outcomes. As an individual moves through life from one day to the next, the point cloud represents life changes. It can stay in the region of health, or drift towards that part of the manifold that represents disease.

With these models, based on math and data, we are learning what simple actions we can take to reverse the process of approaching illness, and steer our point cloud back to the healthy regions of the manifold.

Clarity from complexity

Transforming the complexity of human health into something that an algorithm can handle would not be possible without some powerful mathematical tools.

Variational autoencoders (VAEs) play an important role by compressing intricate data about real-world objects, such as genomic sequences or MRI scans, into simpler representations that preserve essential patterns while discarding unnecessary details. Without VAEs, making sense of the myriad objects in an individual’s point cloud would be like trying to take in a complex mural by examining each brushstroke individually. VAEs allow us to step back and see the entire painting.

VAEs do this by taking high-dimensional vectors, which represent the full datasets describing real-world objects, and encoding them into lower-dimensional vectors that still retain the core features most relevant for analysis—a process akin to compressing a high-resolution photograph to a thumbnail. By applying math to these vectors, machines can discover relationships and draw inferences. To borrow an example from large language models, such as ChatGPT or Gemini, it means we can do math on words, such that: king - male + female = queen.

VAEs also introduce mathematical constraints that accommodate computational models called large neural networks. To continue with the example of large language models, these neural networks can be expanded so that the input is not just a word, but a sequence of words.

Scientists have expanded on these ideas and generalized the technique to proteins, DNA, sound, images and more. VAEs are now used in analyzing genomic data to predict disease risk. In such healthcare applications, the latent space becomes a common language for different types of biological data, enabling AI to find correlations across diverse datasets. For example, a VAE can help us understand how a specific genetic variant might influence brain structure, linking genomic data with imaging studies in a way that was previously unattainable.

VAEs are also used in analyzing genomic data to predict disease risk. By embedding genetic information into a vector space, AI models can identify patterns associated with conditions like heart disease or diabetes. This allows for earlier interventions and personalized treatment plans, moving us closer to the ideal of precision medicine.

VAEs are not without challenges. Compressing data risks losing subtle details that could be clinically significant. Researchers are working on enhancing VAEs to preserve crucial information while still benefiting from the reduction in dimensions. Advances like hierarchical VAEs, which combine vectors at multiple scales (for instance, using vectors for words, sentences and paragraphs), aim to retain more nuanced features, improving the utility of these models in healthcare.

Sculpting Insights from Noise

While VAEs provide a point cloud of embeddings for the computational models, diffusion models are like skilled artists who bring that point cloud to life.

In physics, a diffusion model explains how particles, like molecules in a gas or liquid, spread out over time from an area of high concentration to an area of low concentration by moving randomly, bumping into each other and slowly moving away from one another. Likewise, a diffusion model in AI starts with a point cloud filled with random numbers, like white noise from an old TV. The diffusion models iteratively refine that image, taking away just a bit of noise at a time, while the embeddings guide the diffusion process to produce what we want.

Diffusion models are trained on pairs of text descriptions and images. They learn how to map from input embeddings, produced by the VAEs, to noise reduction, such that anywhere from 50 to 1,000 steps will produce a pixel-perfect output. I like to imagine a diffusion model as Michelangelo, starting with a block of marble (the noisy data) and chiseling away to reveal a statue (the meaningful insight). Our text embeddings guide the sculpture, “a standing, athletic man named David.”

As we saw with VAEs, the technique generalizes from text and images. In healthcare, diffusion models can reconstruct high-resolution images from imperfect data, enhance medical imaging or predict the folding patterns of proteins (a critical factor in drug development). They have been used to improve the quality of MRI scans, allowing for earlier and more accurate diagnosis of tumors or neurodegenerative diseases.

They’ve also enhanced low-dose CT scans, in which the reduction of radiation exposure usually leads to lower image quality. Diffusion models can take these grainy images and refine them, producing clear visuals that help radiologists detect abnormalities without subjecting patients to higher radiation levels.

By gradually removing noise and focusing on the underlying structures, these models help us visualize complex biological processes with unprecedented clarity. This capability is transforming fields like radiology and pathology, where image quality and detail are paramount.

Control Nets: Guiding AI with Precision

Even the most sophisticated models need guidance to ensure they produce meaningful and accurate results. Sometimes, text isn’t enough. It’s far easier to show a model where you want something, or place constraints (for instance, “two atoms cannot occupy the same space”), than to describe the outcome.

This is where control nets come into play. These mathematical tools act like a blueprint or wireframe, steering AI models to adhere to specific constraints or desired outcomes. If VAEs and diffusion models are explorers, control nets are the compass, ensuring they stay on course.

For example, in generating a model of a protein, a control net can impose physical and chemical constraints that reflect real-world biological rules. This ensures that the AI produced not only a plausible structure, but also one that is biologically feasible and useful for practical applications like drug design.

In the realm of medical imaging, control nets can guide AI to focus on areas of interest, such as highlighting potential tumor regions in an MRI scan. By integrating expert knowledge into the AI’s processing, control nets enhance the accuracy and reliability of the results, providing clinicians with actionable insights.

By integrating text descriptions or sample images, control nets guide the AI to generate outputs that align with specific criteria. This precision is crucial when developing treatments or diagnostic tools, where even minor inaccuracies can have significant consequences.

Orchestrating the ensemble

Integrating these powerful tools requires a well-coordinated approach, much like conducting an orchestra. Workflows serve as the conductor, orchestrating the sequence in which VAEs, diffusion models and control nets operate. They ensure that data flows seamlessly from one stage to the next, maintaining accuracy and efficiency throughout the process.

Workflows have been common in medicine for decades. They are well-documented, tested, multistep plans to produce medicines, run lab equipment, perform surgery, practice medicine and more.

In practical terms, workflows enable us to encode diverse data into a unified format using VAEs; analyze and interpret this data within the manifold of life; diagnose potential issues and predict outcomes; and generate actionable insights or interventions, guided by control nets to ensure feasibility. In drug discovery, for instance, a workflow might begin by encoding molecular structures into vector spaces using VAEs. Diffusion models then explore potential modifications to these molecules, generating new compounds. Control nets ensure these compounds adhere to chemical and biological constraints. Finally, the workflow evaluates the effectiveness and safety of these compounds with robotic labs, accelerating the development of new medications. By automating and structuring these steps, workflows make complex AI processes accessible and applicable in real-world healthcare settings.

Real-world applications

These concepts might sound abstract, but they’re already making tangible impacts. Many new companies have begun to apply them to real-world problems.

For instance, Ginkgo Bioworks, a startup, is pioneering the use of AI in synthetic biology. By using VAEs to embed extensive protein and genetic data into vector spaces, they can design novel organisms and biological processes. This approach is streamlining the creation of custom enzymes and microorganisms for pharmaceuticals, agriculture and even environmental remediation. And Every Cure, a nonprofit organization, is leveraging AI to uncover new applications for existing drugs. By mapping drugs within a high-dimensional manifold of biological effects, their AI systems can identify potential treatments for diseases lacking effective therapies. This approach not only reduces development time but also cuts costs, making treatments more accessible.

Hospitals are adopting AI workflows to improve diagnostic imaging. For example, Siemens Healthineers uses AI to enhance MRI and CT scans, improving image quality and reducing scan times. Diffusion models and control nets work together to produce clearer images, aiding in the early detection of diseases like cancer and improving patient outcomes.

Researchers at MIT and McMaster University have used AI models to sift through millions of chemical compounds, identifying new antibiotics capable of combating drug-resistant bacteria. By integrating VAEs and diffusion models, they rapidly discovered a molecule named halicin, which has shown effectiveness in the lab against a range of pathogens, including those resistant to existing antibiotics. Versions of halicin are now in clinical trials.

Google DeepMind’s AlphaFold project has transformed our ability to predict protein structures. By employing advanced AI techniques, AlphaFold can determine, with remarkable accuracy, the three-dimensional shape of proteins based on their amino acid sequences. This breakthrough is accelerating research in areas ranging from drug development to genetic diseases. Google has created a company, Isomorphic Labs, to pursue commercial applications.

Ethical Considerations and Future Outlook

As we embrace these advanced technologies, at least three ethical challenges need attention. First, handling sensitive health data requires stringent security measures to protect patient confidentiality. In this regard, ensuring compliance with regulations like HIPAA and GDPR is essential. Second, it is also essential to make the benefits of AI-driven healthcare accessible to all, regardless of socioeconomic status or geographic location. Bridging this digital divide is necessary to prevent widening health disparities. Third, AI models can be complex and opaque. Developing methods to interpret and explain AI decisions builds trust among clinicians and patients, paving the way for adoption.

As we look ahead, the integration of AI into healthcare promises a future in which personalized medicine is the norm. Continuous learning from new data will make AI systems more accurate and adaptable. The potential to predict and prevent diseases before they manifest could transform healthcare from a reactive to a proactive paradigm.

As these technologies continue to evolve, they hold the promise of demystifying diseases, personalizing treatments and ultimately enhancing the human experience. The journey through the manifold of life is one of discovery, not just of the world around us, but of ourselves. And in this journey, AI doesn’t replace the human element—doctors and other healthcare providers—it enriches it, providing us with deeper insights and empowering us to make informed choices about our health and well-being.

Through AI, we’re not just collecting data; we’re gaining insight into the very fabric of life. Carl Sagan, the astrophysicist and science communicator, once said: “We are a way for the cosmos to know itself.” We are now learning to read the complex code that shapes our existence, and with that knowledge comes the power to shape our future.

Explore the emerging science of healthspan in other stories in this special report.