The HIPAA Questions to Ask Any AI Healthcare Vendor

The most useful conversation a clinical IT leader can have with an AI vendor is not the technical demo. It's the compliance one. Not because the demo doesn't matter — it does — but because the technical capabilities of competing AI tools have largely converged, while their compliance postures vary widely. The compliance conversation is what separates vendors who treat patient data with appropriate care from those who treat it as a training resource.

Here's the working checklist for that conversation.

Start with the BAA

The first question, before any product evaluation, is whether the vendor will sign a Business Associate Agreement covering all uses of patient data through their platform. The answer should be an unequivocal yes, with the BAA available for review before procurement, not after.

What to actually look for in the BAA:

Scope. Does the BAA cover all data flows through the platform, including any third-party services the vendor uses (cloud providers, model providers, analytics services)? A BAA that covers the vendor's first-party services but not their subcontractors leaves gaps.

Permitted uses. The BAA should specify what the vendor is allowed to do with PHI. "Provide the contracted service" is the standard. Anything broader — improving products, training models, sharing with affiliates — should be explicitly disallowed or, at minimum, surfaced clearly.

Data deletion obligations. What happens to PHI when the relationship ends? The BAA should commit the vendor to returning or destroying all PHI upon termination, with documentation of the destruction.

The Department of Health and Human Services maintains accessible guidance on BAA requirements, and the framework has been stable enough for long enough that there's no excuse for vendors who treat BAA negotiation as a friction point rather than a core part of healthcare procurement.

The training data question

This is where AI healthcare vendors meaningfully differ.

The clearest, most important question to ask: "Is patient data — anywhere in your pipeline — used to train, fine-tune, or evaluate your models?"

The answer should be no. Not "anonymized data may be used." Not "with patient consent." Not "for quality improvement." No.

The reason is straightforward. Anonymization of clinical data is harder than non-clinicians understand. The combination of clinical findings, dates, geographic information, and demographics often allows re-identification even after standard de-identification. A vendor who tells you their models are improved by training on your patient data is a vendor whose data practices warrant additional scrutiny — even if the data is technically de-identified.

Reputable AI healthcare vendors design their architecture to make this impossible by default: customer data flows through inference but never into training. The technical architecture matters here because policies can be changed; architectural choices are durable.

Coverage of this issue in STAT News and JAMA Network has been particularly thorough, and procurement teams should expect vendors to be able to discuss their architecture in concrete terms.

Encryption and data flow

Standard but worth verifying:

Encryption at rest and in transit. TLS for all transit. Encrypted storage for all data at rest. Modern key management with regular rotation. These should be table stakes, but vendors vary in how rigorously they implement them, especially newer entrants without enterprise security maturity.

Geographic data residency. Where does the patient data physically live? US-resident vendors typically use US data centers, but vendors that use international cloud regions or international processing services may be exposing data to legal regimes other than US law. This matters for some health systems and is irrelevant to others.

Network architecture. Is patient data ever sent to third-party APIs? If the vendor uses a third-party language model provider, is that flow covered by a BAA? What happens to the data on that provider's systems?

Access controls and audit

The audit and access posture of a vendor reveals how seriously they take security operations:

Who at the vendor can access patient data? Most modern tools should provide a clear answer. "No one routinely; access is logged and requires explicit justification" is the right answer. "Engineers occasionally access for debugging" is a yellow flag worth probing.

Audit logs. Are access logs maintained? Can the customer review them on request? Are they retained for the period required by HIPAA?

Single sign-on and session management. Modern enterprise security expects SAML/OIDC-based SSO, MFA, and session controls. Tools that authenticate via username/password without enterprise integration options are not enterprise-ready.

Data retention and deletion

Default retention windows. How long does the vendor keep patient data by default? The shortest defensible window is the right answer. Recordings of clinical encounters should be deleted as soon as the documentation is generated. Generated notes should not persist longer than the customer needs them.

Customer-controlled deletion. Can the customer delete their own data on demand? Without involving vendor support? Tools that require a vendor request for routine data deletion are friction-heavy.

Deletion verification. When data is deleted, is there a record of the deletion? Are backups also affected? Is the deletion cryptographically verifiable?

Subcontractors and data flows

Most modern AI vendors are not single-stack — they use cloud providers, model providers, observability tools, and analytics. Each represents a place where patient data could flow.

Subcontractor list. Reasonable vendors maintain and disclose a current list of subcontractors that have access to PHI. They sign BAAs with each. They notify customers when they add or change subcontractors.

Model providers. If the vendor uses a third-party model (which is increasingly common), the relationship between the vendor and the model provider matters. A vendor who runs models in their own infrastructure has cleaner data isolation than one who calls a third-party API. Both can be done compliantly, but the architecture should be transparent.

Incident response

Breach notification SLAs. What's the contractual obligation for notification in the event of a data breach? HIPAA's 60-day requirement is the floor. Better vendors commit to faster.

Incident response history. Has the vendor had reportable incidents? How were they handled? Vendors with no history are typically vendors who have not been operating at scale.

Independent verification

Annual SOC 2 Type II reports, HITRUST certifications, and ISO 27001 audits are the lingua franca of enterprise security. They're not perfect — they verify processes, not outcomes — but they signal a vendor who has invested in operational security maturity.

The American Medical Association has published procurement guidance for AI tools that incorporates many of these checks.

The shortest version

If you only ask three questions:

1. Will you sign a BAA covering all data flows including subcontractors? 2. Is patient data ever used to train, fine-tune, or evaluate your models? 3. What's your data retention policy by default, and can we override it?

The answers to those three reveal more about a vendor's seriousness than any product demo. If a vendor is hesitant on any of them, that's the answer to whether they're ready for enterprise healthcare deployment.

Start with the BAA

The training data question

Encryption and data flow

Access controls and audit

Data retention and deletion

Subcontractors and data flows

Incident response

Independent verification

The shortest version

Related reading

Medical Scribes: The Professionals Helping Doctors Really Focus on Their Patients

Why Modern Healthcare Cannot Run Without EMR Systems — and How to Make Them Better

Ambient AI Scribes Are Reshaping the Emergency Department