Claims

Name: Claims
Creator: Syntherx

Synthetic patient-level rows with fields: patient_id, age, sex, diagnosis_code, procedure_code, claim_amount, ….

This resource represents a fully synthetic cohort patterned after claims scenarios: there are no real patients or protected health information, only statistically plausible records for method development and reproducible benchmarks.

Rows include variables such as patient_id, age, sex, diagnosis_code, procedure_code, claim_amount, insurance_type, visit_type. You can inspect the full schema and representative preview below before downloading or generating a fresh cohort with the Syntherx SDK.

Teams use datasets like this for AI and statistical modeling, digital twin and pathway simulation, curriculum and sandbox environments, and cross-institutional collaborations where sharing real data is impractical.

Research Dataset — $99

Secure checkout via Stripe.

Includes CSV, JSON, and Parquet — ready for ML pipelines

Variable Schema

Column Name	Type	Description
patient_id	string	Unique synthetic patient identifier
age	number	Patient age
sex	string	Patient sex
diagnosis_code	string	ICD diagnosis code
procedure_code	string	CPT/HCPCS procedure code
claim_amount	number	Total claim cost in USD
insurance_type	string	Payer type (Medicare, Medicaid, Private)
visit_type	string	Inpatient, Outpatient, Emergency

Data Preview

First 9 rows (preview only)

Includes CSV, JSON, and Parquet — ready for ML pipelines

patient_id	age	sex	diagnosis_code	procedure_code	claim_amount	insurance_type	visit_type
P000001	68	Female	I10	99213	245.5	Medicare	Outpatient
P000002	55	Male	E11.9	83036	89.2	Private	Outpatient
P000003	72	Female	J18.9	99223	1450.75	Medicare	Inpatient
P000004	60	Male	I21.3	92928	18250	Private	Inpatient
P000005	47	Female	M54.5	97110	120	Private	Outpatient
P000006	80	Male	I50.9	99285	980.4	Medicare	Emergency
P000007	66	Female	E78.5	80061	75.3	Medicare	Outpatient
P000008	52	Male	K21.9	43239	1350	Private	Outpatient
P000009	70	Female	N18.9	90935	450	Medicare	Outpatient

Reproduce This Dataset

Recreate this dataset in Python (Jupyter, Kaggle, or Google Colab) using the Syntherx SDK.

# Install Syntherx SDK
pip install syntherx

from syntherx import generate_dataset

df = generate_dataset(
    blueprint="claims",
    rows=5000
)

df.to_csv("claims.csv")

Use Cases

Build and validate AI/ML pipelines for Claims scenarios without using real patient data.
Train and evaluate models on structured fields such as patient_id, age, sex, diagnosis_code.
Run simulations, power analyses, and exploratory analytics in a privacy-safe sandbox.
Prototype dashboards, ETL flows, and feature stores before touching production systems.

Dataset Characteristics

Fully synthetic — no PHI; suitable for sharing, teaching, and external collaboration.
Schema includes 8 variables: patient_id, age, sex, diagnosis_code, procedure_code, claim_amount, insurance_type, visit_type
Delivered in researcher-friendly formats (CSV, JSON, Parquet) for downstream tooling.
Generated with the Syntherx simulation engine for reproducible cohort-scale draws.

Privacy-Safe Synthetic Dataset

Contains no real patient data
Generated using statistical simulation
Designed for machine learning research

Related Datasets

Explore adjacent synthetic cohorts in the same domain or browse nearby clinical themes.