Field Notes

Observations from the gap between authorization and outcome.

Refill #14 just got approved. Nobody looked. The gap between authorization and continuation doesn't feel like a failure. It feels like nobody's job.

I spent four years watching specialty pharmacy data before I could name what I was seeing. A prior authorization gets approved. The medication ships. Twelve months later, the same patient is still filling the same medication, still getting approved, and nobody in the authorization chain has asked a single clinical question since month one.

For most of that time, I assumed the system was working and I was just measuring poorly. That's the honest version. I kept building dashboards, refining cohort definitions, adding filters. Convinced that if I sliced the data the right way, the reassessment activity would show up somewhere. It didn't. What showed up instead was the finding: 60% of specialty patients on continued therapy had no documented clinical reassessment tied to their ongoing authorization. Sixty percent. A majority of the book was running on autopilot.

I remember the pharmacy director who sat across from me at a regional plan and said, "Wait. Nobody checks after PA?" She wasn't being rhetorical. She genuinely didn't know. Her team processed the approvals. Her team dispensed the medication. But nobody on her team owned the question of whether month fourteen still looked like month one. The data existed in six different systems. Nobody's dashboard put it together.

Structural gaps are strange. They don't announce themselves. The prescriber assumes the payer is watching. The payer assumes the prescriber is managing. And the patient on refill fourteen exists in the space between both assumptions, where nobody is looking because everybody thinks somebody else already did.

That created Cadence. Specifically: the realization that the continuation problem was an ownership problem. Somebody had to decide that the space between authorization one and authorization fourteen was their job. I spent two years thinking about it before I built anything. Probably should have been one.

28.4% stop in the first month. I blamed the patients for two years before I looked at the charts.

When I ran SUD operations across thirteen locations, I watched the retention numbers every week. 28.4% of buprenorphine patients dropped out in the first thirty days. I had all the clinical explanations. The patient wasn't ready. The medication wasn't right. The dosage needed adjustment. Those explanations were comfortable. They let the system off the hook.

Then I pulled the records of the patients who left.

Most of them had no documented follow-up attempt within 72 hours of their last visit. The EMR tracked appointments. The billing system tracked claims. Nobody tracked the silence between the last visit and the disappearance. There was a woman in our Lexington program who called me directly, eleven days after her last appointment, to ask if anyone was going to call her. Eleven days. She'd been waiting. We hadn't noticed she was gone.

I remember the specific week I stopped blaming the patients. I was reviewing a cohort of 340 people who had all started buprenorphine in the same month. By day 30, 97 were gone. I pulled every chart. In 61 of those 97 cases, there was no outreach attempt documented. The system didn't fail to retain them. The system didn't try.

The retention cliff was architectural, not clinical. The system that initiated treatment and the system that monitored engagement were owned by two different teams with two different incentive structures. Between those two systems was a 72-hour window where a patient could vanish and nobody would notice for weeks. I spent the next year building the navigator model that eventually became Continuum. The insight was embarrassingly simple: someone has to own the episode, start to finish, or nobody does. I'm not sure why it took me two years of watching people leave to figure that out.

Every public company audits its books. Self-funded employers don't audit their claims. The leverage disappears the moment the check clears, and almost nobody thinks about that.

Self-funded employers pay their own claims. What most of them don't do is look at the claims before they pay them. The TPA processes the claim, it hits the account, the check clears. Maybe six months later someone runs a retrospective audit and discovers that a portion of what they paid shouldn't have been paid. Recovery rates on post-payment audits hover around 1-2% of what was overpaid. The rest is gone.

I used to think the solution was better auditing. Faster recovery. Smarter analytics on the back end. I built a whole product around that idea and it worked, technically. ClearBill returned $9.2 million to payers in its first six months of full deployment. But something about it kept bothering me. We were celebrating finding money that never should have left. The whole model was predicated on paying first and looking later.

The moment that reframed everything was a meeting with a mid-size manufacturer. 4,200 covered lives. Their benefits consultant had just presented the annual claims audit. $2.1 million in findings. Charges that shouldn't have been paid, duplicate claims, pricing errors. The consultant presented this as good news. The employer's CFO looked at the room and said, "Why didn't someone find this before we paid it?"

Nobody had a good answer. I certainly didn't, even though I'd built the tool that found it. The CFO wasn't asking a technical question. She was asking a structural one. Why is the default in this industry to pay first? In every other category of business expense, the audit happens before the check. In health benefits, it happens after, and we call the recovery a win.

That question became Caliber. The original ClearBill engine moved from post-payment to pre-payment. Same verification logic, different moment. Before the check clears, not after. I still think about how long I spent building a better recovery system before it occurred to me that recovery is the wrong frame entirely. The moment of maximum leverage is the moment before payment. Everything after that is collection.

Two residential SUD programs. Same city. Same network. Same reimbursement. One completes 70% of patients. The other completes 35%. The employer's benefits team has no mechanism to tell them apart.

I spent three years routing members to behavioral health programs before I understood what I was actually doing. The network directory said "in-network." The authorization said "approved." I assumed those two words meant something about quality. They don't. They mean the provider filed the right paperwork and agreed to the contracted rate.

The moment that broke the assumption was a spreadsheet I built in 2023. I pulled completion rates and 30-day readmission data for every residential SUD program in a single state. The range was staggering. One facility completed 70% of admissions with single-digit readmission. The one eleven miles away completed 35% with readmission above 30%. Both in-network, both credentialed, both paid identically.

I showed it to a benefits director at a mid-size manufacturer. She stared at it for maybe ten seconds and said, "We've been sending people to both of these?" Yes. For years. Her plan had no quality layer between "in-network" and "authorized." Nobody's did. The carrier's network team verified credentials. The UM team verified medical necessity. Nobody verified whether the program actually worked.

That's not a clinical failure. It's a routing failure. The clinical infrastructure exists. Programs that complete 70% of patients are doing something measurably right. But the routing infrastructure, the layer that would steer a member toward the 70% program instead of the 35% program, doesn't exist in any standard benefits architecture I've seen. The information to build it is available. The PHQ-9 is validated. The GAD-7 takes four minutes. Completion rates are calculable from claims. Nobody in the payment chain collects it independently because nobody's economics depend on it.

That's what Curated is. Not a better network. A quality layer that sits between the member and the network and asks the question nobody else is paid to ask: does this program actually work, and can I prove it?

Hospital A: 19% C-section rate. Hospital B: 34%. Same metro. Same OB privileges at both. The member picks based on which one her coworker liked. Nobody shows her the numbers.

I manage a maternity book at Carelon (Elevance Health) — one of six specialty risk books I own — and it taught me something uncomfortable: most of the cost variation in maternity care is visible before delivery. The data exists. Hospital-level C-section rates, NICU admission patterns, average episode costs by facility. All of it calculable from claims data the employer already owns. None of it reaches the member making the decision.

I remember pulling facility-level data for a single MSA. Fourteen hospitals. C-section rates ranged from 13% to 41%. Average maternity episode cost ranged from $14,000 to $31,000. These weren't different populations. Same commercial lives, same benefit design, same OB groups with privileges at multiple facilities. The variation was the facility, not the patient.

I sat with that data for a while. The uncomfortable part wasn't the variation. I expected variation. The uncomfortable part was that every member in that MSA chose their delivery hospital based on proximity, a friend's recommendation, or which facility their OB mentioned first. Nobody showed them the numbers. Not the OB, not the payer, not the employer. The most expensive healthcare decision most families will make in a given year, and the member has less comparison data than she'd get buying a dishwasher.

The standard industry response is "we can't steer to facilities." But that's not what I'm talking about. I'm talking about showing people information that already exists and letting them make a decision with it. A maternity navigator who says, "Your OB delivers at two hospitals. Here's what each one looks like on C-section rates and NICU outcomes." That's not steering. It's the baseline transparency that every other industry considers obvious.

Waybright started from that gap. The clinical data exists. The member never sees it. Somebody has to be the layer that connects the two, and their economics have to depend on getting it right. Otherwise the data just sits in a warehouse while the member picks the hospital with the nicer lobby.

I asked a regional health system how they modeled their value-based contract. One person built it in Excel. He left in March. Nobody can explain the assumptions.

Every value-based contract I've seen in the last five years was modeled in a spreadsheet. Usually one spreadsheet, built by one person, and that person is the only one who understands what the cells are doing.

I learned this the hard way during a consulting engagement with a regional system entering a shared-savings arrangement. They'd been negotiating for four months. I asked to see the financial model. It was an Excel file with 14 tabs, no documentation, and a circular reference in the risk corridor calculation that the analyst had solved by turning on iterative calculation and hoping. The CFO had signed the LOI based on outputs from this file. When I asked the analyst to walk me through the savings rate assumptions, he said, "Those are from Dave." Dave had left the organization in March.

This is not an unusual story. I've seen versions of it at health plans, provider organizations, and employer consultancies. The model that determines whether an organization takes on downside risk for a 50,000-member population is built the same way someone would build a household budget. One person, one file, no version control, no shared assumptions between the negotiating parties.

The problem isn't the math. Actuaries know the math. The problem is that the math lives in a format that one person understands, that can't be stress-tested in front of both parties, and that produces a single point estimate instead of showing the range of outcomes. A value-based contract that looks profitable at 3% savings rate looks catastrophic at 1.5%. Both are plausible. The spreadsheet shows one of them.

I built Compass because I kept watching smart people make eight-figure decisions on models they couldn't interrogate. A shared tool that shows the loss case, not just the pitch case, becomes the thing both sides trust. That's a low bar. But nobody had cleared it.

A residential program bills at psychiatrist rates for medication management. Their outcomes are 3 points lower on the PHQ-9 than programs using actual psychiatrists. Nobody connects those two datasets because they live in different buildings.

This one came to me sideways. I was reviewing credential verification data on a set of behavioral health providers. Standard billing hygiene, checking whether the billed provider level matched the actual clinician delivering care. Separately, I had outcome data on some of the same programs. PHQ-9 deltas, completion rates, 30-day readmission.

I put them next to each other on a screen. Not because I expected to find anything. Because they were open at the same time.

The pattern was there immediately. Programs where the billed credential matched the actual clinician had measurably better PHQ-9 improvement than programs where there was a mismatch, where a program billed at psychiatrist rates but the clinical notes showed an NP or PA managing medication. The delta was small but consistent. About 3 points on the PHQ-9 across the cohort.

I don't know what that means yet. It could mean that programs investing in actual psychiatrist involvement produce better outcomes. It could mean that programs willing to misrepresent their staffing are also cutting corners clinically. It could be confounded by something I haven't measured. I'm honest about that.

But here's what I do know: nobody else is looking at this. Billing accuracy data lives in the claims department. Clinical outcome data lives in the quality department. In most organizations, those departments don't share a hallway, let alone a dataset. The intersection, what billing patterns reveal about clinical quality, is a question that requires both datasets, and no single team owns both.

That's not an insight I've built a product around. It's an observation that connects two products I've already built. Caliber produces the billing truth. Curated produces the outcome truth. Together they can ask a question neither can ask alone. I don't know where it leads yet. But I know nobody else is standing in the place where both datasets are visible at the same time.

A health plan delegates prior authorization to a vendor. The vendor approves or denies. The plan reports the vendor's numbers as its own. Nobody checks whether the vendor actually followed the clinical criteria.

Delegation is the word health plans use when they pay someone else to make clinical decisions on their behalf. Prior auth, utilization management, case management. Plans delegate these functions to specialized vendors all the time. The vendor runs the program. The plan reports the results to regulators. The member never knows the difference.

I spent two years inside that reporting chain and the thing that struck me wasn't the quality of the decisions. Most vendors are competent. It was the quality of the oversight. The plan delegates a function, receives a monthly report from the vendor, and treats the report as evidence that the function is being performed correctly. But the report is produced by the entity being overseen. That's like asking a student to grade their own exam.

I watched a plan receive 18 months of delegation reports showing 98% compliance with turnaround time requirements. When an external audit pulled a random sample of 200 cases, 31% had documentation gaps that would have changed the compliance calculation. The report said 98%. The reality was somewhere around 67%. Nobody caught it because nobody was looking at the underlying cases. They were looking at the report.

The structural problem is straightforward: the plan retains regulatory accountability for delegated functions but has no independent mechanism to verify that the delegate is performing them correctly. NCQA requires a delegation oversight plan. Most plans have one. Most plans also treat it as a compliance document rather than an operational tool. The oversight exists on paper. The verification doesn't happen in practice.

I built the Delegation Governance Standard because the gap between "we delegate this" and "we verified the delegate did it right" is where regulatory risk accumulates silently. By the time it surfaces (in an audit, a lawsuit, a member complaint), the exposure has been compounding for years.

Major payers are pulling prior authorization back. UnitedHealthcare cut 30% of PA requirements. Gold card programs auto-approve. CMS is compressing timelines. Nobody is asking what governs the therapy after the gate is gone.

I've been watching the prior authorization news for the last eighteen months with a feeling I can only describe as darkly validating. UnitedHealthcare eliminated PA requirements for 231 procedures. Cigna dropped PA for nearly 100 services. Gold card programs are auto-approving high-performing providers. CMS compressed standard PA turnaround from 14 days to 7. Fifty insurers signed a pledge to reduce PA volume. States are passing reform bills by the dozen.

All of it is about initiation. Getting the patient onto the therapy faster. Less friction at the front door. Politically popular, clinically defensible, long overdue. I don't disagree with any of it.

But here's what nobody is talking about: the PA was the only structured governance touchpoint most payers had on specialty therapy. It was blunt, slow, adversarial, and often clinically inappropriate. But it existed. When a biologic got approved, the PA process at least forced someone to document why. When that gate comes down, what's left? The prescription refills. The pharmacy dispenses. The claim pays. Month after month, with no structured reassessment, no documented clinical checkpoint, no independent verification that month fourteen still looks like month one.

The industry is solving the problem patients and providers hate, the front door friction, without building anything to replace the governance function that friction accidentally provided. They're tearing down the fence without asking what it was keeping in.

I built Cadence for the continuation layer before the PA conversation reached this pitch. Now the argument is simpler than it was two years ago. If you remove the only gate you had, the continuation layer isn't optional anymore. It's the only governance you've got left.

We watch the front door and ignore the living room. Every product I've built lives in the space between what gets approved and what actually happens after.

Sometime around the third product, I realized I was building the same thing over and over. Not the same tool. The same observation, applied to different populations and different cost centers, producing different operational layers that all live in the same structural gap.

Cadence exists because specialty pharmacy has aggressive initiation governance and zero continuation governance. Continuum, because SUD treatment has rigid intake protocols and no retention architecture. Caliber started when I realized claims had aggressive processing workflows but nobody verified the bill before it paid. Curated came from watching BH networks credential aggressively while measuring outcomes not at all. Waybright, from seeing maternity programs screen prenatally and then go silent on facility-level transparency. Compass, from watching VBC contracts get negotiated hard and modeled badly.

Every one of those has the same structure. Aggressive front door. Empty living room.

I didn't plan it that way. I built Cadence because I found the continuation gap in specialty pharmacy data and couldn't stop thinking about it. I built Continuum because I watched patients disappear from SUD treatment and realized nobody was tracking the silence. Each product felt like a separate insight at the time. It took four of them before the pattern became obvious enough to name.

Healthcare systems are extraordinarily good at gatekeeping. Prior authorization, step therapy, network credentialing, medical necessity review. The amount of energy and infrastructure devoted to the moment of initiation is immense. But the moment something clears the gate, the system relaxes. The therapy refills, the episode continues, the claim pays. And the space between initiation and outcome, where most of the money moves and most of the quality variance lives, is structurally unattended.

I don't think this is incompetence. I think it's a design artifact. The systems were built to say yes or no at the gate. Nobody designed the system that watches what happens after yes. That's the living room. I work there.