Claims
Synthetic patient-level rows with fields: patient_id, age, sex, diagnosis_code, procedure_code, claim_amount, ….
This resource represents a fully synthetic cohort patterned after claims scenarios: there are no real patients or protected health information, only statistically plausible records for method development and reproducible benchmarks.
Rows include variables such as patient_id, age, sex, diagnosis_code, procedure_code, claim_amount, insurance_type, visit_type. You can inspect the full schema and representative preview below before downloading or generating a fresh cohort with the Syntherx SDK.
Teams use datasets like this for AI and statistical modeling, digital twin and pathway simulation, curriculum and sandbox environments, and cross-institutional collaborations where sharing real data is impractical.
Research Dataset — $99
Secure checkout via Stripe.
Includes CSV, JSON, and Parquet — ready for ML pipelines
Variable Schema
| Column Name | Type | Description |
|---|---|---|
| patient_id | string | Unique synthetic patient identifier |
| age | number | Patient age |
| sex | string | Patient sex |
| diagnosis_code | string | ICD diagnosis code |
| procedure_code | string | CPT/HCPCS procedure code |
| claim_amount | number | Total claim cost in USD |
| insurance_type | string | Payer type (Medicare, Medicaid, Private) |
| visit_type | string | Inpatient, Outpatient, Emergency |
Data Preview
First 9 rows (preview only)
Includes CSV, JSON, and Parquet — ready for ML pipelines
| patient_id | age | sex | diagnosis_code | procedure_code | claim_amount | insurance_type | visit_type |
|---|---|---|---|---|---|---|---|
| P000001 | 68 | Female | I10 | 99213 | 245.5 | Medicare | Outpatient |
| P000002 | 55 | Male | E11.9 | 83036 | 89.2 | Private | Outpatient |
| P000003 | 72 | Female | J18.9 | 99223 | 1450.75 | Medicare | Inpatient |
| P000004 | 60 | Male | I21.3 | 92928 | 18250 | Private | Inpatient |
| P000005 | 47 | Female | M54.5 | 97110 | 120 | Private | Outpatient |
| P000006 | 80 | Male | I50.9 | 99285 | 980.4 | Medicare | Emergency |
| P000007 | 66 | Female | E78.5 | 80061 | 75.3 | Medicare | Outpatient |
| P000008 | 52 | Male | K21.9 | 43239 | 1350 | Private | Outpatient |
| P000009 | 70 | Female | N18.9 | 90935 | 450 | Medicare | Outpatient |
Reproduce This Dataset
Recreate this dataset in Python (Jupyter, Kaggle, or Google Colab) using the Syntherx SDK.
# Install Syntherx SDK
pip install syntherx
from syntherx import generate_dataset
df = generate_dataset(
blueprint="claims",
rows=5000
)
df.to_csv("claims.csv")Use Cases
- Build and validate AI/ML pipelines for Claims scenarios without using real patient data.
- Train and evaluate models on structured fields such as patient_id, age, sex, diagnosis_code.
- Run simulations, power analyses, and exploratory analytics in a privacy-safe sandbox.
- Prototype dashboards, ETL flows, and feature stores before touching production systems.
Dataset Characteristics
- Fully synthetic — no PHI; suitable for sharing, teaching, and external collaboration.
- Schema includes 8 variables: patient_id, age, sex, diagnosis_code, procedure_code, claim_amount, insurance_type, visit_type
- Delivered in researcher-friendly formats (CSV, JSON, Parquet) for downstream tooling.
- Generated with the Syntherx simulation engine for reproducible cohort-scale draws.
Privacy-Safe Synthetic Dataset
- Contains no real patient data
- Generated using statistical simulation
- Designed for machine learning research
Related Datasets
Explore adjacent synthetic cohorts in the same domain or browse nearby clinical themes.
- ClaimsSynthetic patient-level rows with fields: patient_id, age, sex, diagnosis_code, procedure_code, claim_amount, ….
- Cardiology OutcomesSynthetic patient-level cardiovascular risk factors and biomarkers for ML and outcomes research.
- Clinical Trial OutcomesSynthetic patient-level rows with fields: patient_id, age, sex, trial_arm, baseline_value, endpoint_value, ….
- EhrSynthetic patient-level rows with fields: patient_id, visit_date, age, sex, diagnosis, medication, ….
- Diabetes Hba1c TrialSynthetic patient-level rows with fields: patient_id, age, sex, treatment_group, baseline_measure, outcome_measure.
Unlock the Syntherx Platform
Generate custom datasets tailored to your research and AI needs.
Generate Custom Datasets