Claims Utilization

Name: Claims Utilization
Creator: Syntherx

Synthetic healthcare claims and utilization dataset. Includes encounter types, admission diagnoses, ICD codes, and discharge dispositions for utilization analysis.

This resource represents a fully synthetic cohort patterned after claims scenarios: there are no real patients or protected health information, only statistically plausible records for method development and reproducible benchmarks.

Rows include variables such as patient_id, age, sex, diagnosis_code, procedure_code, claim_amount, insurance_type, visit_type. You can inspect the full schema and representative preview below before downloading or generating a fresh cohort with the Syntherx SDK.

Teams use datasets like this for AI and statistical modeling, digital twin and pathway simulation, curriculum and sandbox environments, and cross-institutional collaborations where sharing real data is impractical.

Research Dataset — $99

Secure checkout via Stripe.

Includes CSV, JSON, and Parquet — ready for ML pipelines

Variable Schema

Column Name	Type	Description
patient_id	string	Unique synthetic patient identifier
age	number	Patient age
sex	string	Patient sex
diagnosis_code	string	ICD diagnosis code
procedure_code	string	CPT/HCPCS procedure code
claim_amount	number	Total claim cost in USD
insurance_type	string	Payer type (Medicare, Medicaid, Private)
visit_type	string	Inpatient, Outpatient, Emergency

Data Preview

First 9 rows (preview only)

Includes CSV, JSON, and Parquet — ready for ML pipelines

No preview data available.

Reproduce This Dataset

Recreate this dataset in Python (Jupyter, Kaggle, or Google Colab) using the Syntherx SDK.

# Install Syntherx SDK
pip install syntherx

from syntherx import generate_dataset

df = generate_dataset(
    blueprint="claims_utilization",
    rows=5000
)

df.to_csv("claims_utilization.csv")

Use Cases

Build and validate AI/ML pipelines for Claims scenarios without using real patient data.
Train and evaluate models on structured fields such as patient_id, age, sex, diagnosis_code.
Run simulations, power analyses, and exploratory analytics in a privacy-safe sandbox.
Prototype dashboards, ETL flows, and feature stores before touching production systems.

Dataset Characteristics

Fully synthetic — no PHI; suitable for sharing, teaching, and external collaboration.
Schema includes 8 variables: patient_id, age, sex, diagnosis_code, procedure_code, claim_amount, insurance_type, visit_type
Delivered in researcher-friendly formats (CSV, JSON, Parquet) for downstream tooling.
Generated with the Syntherx simulation engine for reproducible cohort-scale draws.

Privacy-Safe Synthetic Dataset

Contains no real patient data
Generated using statistical simulation
Designed for machine learning research

Related Datasets

Explore adjacent synthetic cohorts in the same domain or browse nearby clinical themes.