Synthetic Claims Datasets

This page demonstrates the structure of synthetic datasets available in this category, including variable schema, preview data, and reproducibility using the Syntherx SDK.

Example Dataset

Example Claims Schema

Schema and example preview for datasets in this category, fetched from the blueprint API.

Variable Schema

Column NameTypeDescription
patient_idstringUnique synthetic patient identifier
agenumberPatient age
sexstringPatient sex
diagnosis_codestringICD diagnosis code
procedure_codestringCPT/HCPCS procedure code
claim_amountnumberTotal claim cost in USD
insurance_typestringPayer type (Medicare, Medicaid, Private)
visit_typestringInpatient, Outpatient, Emergency

Data Preview

First 9 rows (preview only)

patient_idagesexdiagnosis_codeprocedure_codeclaim_amountinsurance_typevisit_type
P00000168FemaleI1099213245.5MedicareOutpatient
P00000255MaleE11.98303689.2PrivateOutpatient
P00000372FemaleJ18.9992231450.75MedicareInpatient
P00000460MaleI21.39292818250PrivateInpatient
P00000547FemaleM54.597110120PrivateOutpatient
P00000680MaleI50.999285980.4MedicareEmergency
P00000766FemaleE78.58006175.3MedicareOutpatient
P00000852MaleK21.9432391350PrivateOutpatient
P00000970FemaleN18.990935450MedicareOutpatient

Reproduce This Dataset

Recreate this dataset in Python (Jupyter, Kaggle, or Google Colab) using the Syntherx SDK.

# Install Syntherx SDK
pip install syntherx

from syntherx import generate_dataset

df = generate_dataset(
    blueprint="claims_utilization",
    rows=5000
)

df.to_csv("claims_utilization.csv")

Use Cases

  • Healthcare cost analysis
  • Claims-based machine learning models
  • Utilization and payer analytics

Privacy-Safe Synthetic Dataset

  • Contains no real patient data
  • Generated using statistical simulation
  • Designed for machine learning research

No datasets in this category yet. Browse all datasets

Unlock the Syntherx Platform

Generate custom datasets tailored to your research and AI needs.

Generate custom datasets