Skip to content
    Back to writing
    April 22, 2024 · updated May 8, 2026 · 3 min read

    Three reasons the 'health-data-as-an-asset' framing rings hollow.

    Three reasons the 'health-data-as-an-asset' framing rings hollow — by Thomas Jankowski, aided by AI
    Three stations, three failures— TJ x AI

    The "patient health data is the new oil" framing has compounded since 2018. It rings hollow for three reasons that map cleanly to the three things any asset has to be: collectible at a quality that makes it useful, accessible to the parties that would buy it, and useful enough at the point of sale to clear its price. Health data fails on all three, and the failure is structural rather than incidental. The framing keeps producing the same disappointed quarterly-investor calls because the framing was never describing an asset.

    Reason one: data quality

    The data is bad. Not bad in the polite, qualified sense. Bad in the operational sense that any team that has tried to build a real product on it can describe in concrete failure modes. Vital signs are entered as free text in fields that should be numeric. ICD-10 codes are entered for the billing maximization rather than the clinical truth. Medication lists carry forward across encounters without ever being reconciled against what the patient is actually taking. Lab results are stored under the encounter where they were ordered rather than the encounter where they were resulted, which produces phantom results in some systems and orphaned results in others. The patient identifier hops three master-patient-index systems on the way from registration to discharge and the hop introduces collisions, duplicates, and orphans at a rate that is small per encounter and substantial in aggregate.

    A team building on this data spends most of its early product cycle building data-cleanup infrastructure rather than building the product. The cleanup is not a one-time exercise; it is a recurring operating cost because each new source of data brings new failure modes. The framing that treats the data as an asset implicitly assumes the data is in a state where it can be used. The data is in a state where it has to be repaired.

    Reason two: data access

    The buyer cannot get the data. This is the framing's most-discussed failure mode and the one that has had the most regulatory attention paid to it (HIPAA business-associate agreements, the 21st Century Cures Act information-blocking rule, ONC's USCDI versioning), and the regulatory attention has produced incremental progress without producing structural change. The dominant pattern is that a buyer who would pay to access a population-level dataset has to negotiate dozens of data-use agreements with dozens of provider organizations, each with its own legal review cycle, each with its own technical ingestion pattern, each with its own price. The transaction cost of assembling a dataset large enough to be useful is high enough that the buyer either gives up or builds a separate vertical product to sit closer to the data and capture some of the value the data was supposed to produce.

    The information-blocking rules have helped at the margin. The structural problem they have not solved is that the data is held by tens of thousands of legal entities with no common access pattern and no common interface for selling. An asset that requires a separate negotiation per holder is not an asset in the way the framing implies. It is a tax on every transaction that uses the asset.

    Reason three: data utility at the point of sale

    The data is not as useful as the framing suggests when the buyer finally gets it. The buyer who pays to access a clinical dataset typically wants to do one of three things: run a population-health analytic, train or evaluate a clinical-AI model, or recruit patients for a clinical trial. Each of those workloads has discovered, slowly and at significant expense, that the data needs additional context the dataset does not carry (practice patterns at the source institution, patient-level adherence, social-determinants context, the unstructured note content that captures what the structured fields did not), and the additional context is either not available or not transmissible under the original data-use agreement.

    The result is that the buyer paid for data and received a partial dataset that requires a follow-on engagement with the source to be useful. The follow-on engagement is where the actual value transfer happens. The data was the introduction. The asset is the follow-on relationship, and the relationship is held by the provider rather than by the data-broker who sold the introduction.

    The three reasons compound. The data is bad enough that the buyer has to clean it. The data is hard enough to access that the buyer pays a transaction-cost premium. The data is incomplete enough at the point of use that the buyer has to engage the source to make it useful. Each of those is a structural feature of healthcare data, not an incidental defect that better infrastructure or better policy will eliminate. The framing that calls health data an asset is treating the data as oil. Oil's price clears because the buyer can ship it, refine it, and burn it without a follow-on conversation with the well operator. Health data does not work that way and cannot be made to work that way without restructuring the underlying healthcare delivery system, which is the kind of restructuring that takes a generation rather than a quarterly results cycle.

    The framing is not exactly wrong — there is real value latent in clinical data, and the parties capturing it are the parties that can resolve all three frictions at once (the integrated systems, the payer-providers, the small set of vendors that own a vertical workflow end-to-end). The framing is wrong in its packaging. The packaging implies an asset with a market, a price, and a clearing mechanism. The reality is a relationship that takes years to build and produces value at the boundary where the data finally stops being data and starts being an answer to a clinical question. That boundary is where the value is. Everywhere else upstream is the friction the framing keeps under-accounting for.

    —TJ