Ehr
Synthetic patient-level rows with fields: patient_id, visit_date, age, sex, diagnosis, medication, ….
This resource represents a fully synthetic cohort patterned after ehr scenarios: there are no real patients or protected health information, only statistically plausible records for method development and reproducible benchmarks.
Rows include variables such as patient_id, visit_date, age, sex, diagnosis, medication, lab_result_type, lab_result_value. You can inspect the full schema and representative preview below before downloading or generating a fresh cohort with the Syntherx SDK.
Teams use datasets like this for AI and statistical modeling, digital twin and pathway simulation, curriculum and sandbox environments, and cross-institutional collaborations where sharing real data is impractical.
Research Dataset — $99
Secure checkout via Stripe.
Includes CSV, JSON, and Parquet — ready for ML pipelines
Variable Schema
| Column Name | Type | Description |
|---|---|---|
| patient_id | string | Unique patient identifier |
| visit_date | string | Date of visit |
| age | number | Patient age |
| sex | string | Patient sex |
| diagnosis | string | Primary diagnosis |
| medication | string | Prescribed medication |
| lab_result_type | string | Type of lab test (e.g., HbA1c, Glucose) |
| lab_result_value | number | Lab result value |
Data Preview
First 9 rows (preview only)
Includes CSV, JSON, and Parquet — ready for ML pipelines
| patient_id | visit_date | age | sex | diagnosis | medication | lab_result_type | lab_result_value |
|---|---|---|---|---|---|---|---|
| P000001 | 2024-01-10 | 65 | Female | Type 2 Diabetes | Metformin | HbA1c | 8.2 |
| P000001 | 2024-03-15 | 65 | Female | Type 2 Diabetes | Metformin | HbA1c | 7.5 |
| P000002 | 2024-02-20 | 58 | Male | Hypertension | Lisinopril | Blood Pressure | 140 |
| P000003 | 2024-01-05 | 72 | Female | COPD | Albuterol | Oxygen Saturation | 92 |
| P000004 | 2024-02-12 | 60 | Male | Hyperlipidemia | Atorvastatin | LDL | 155 |
| P000005 | 2024-03-08 | 50 | Female | Type 2 Diabetes | Insulin | Glucose | 180 |
| P000006 | 2024-01-25 | 72 | Male | Heart Failure | Furosemide | BNP | 450 |
| P000007 | 2024-04-02 | 45 | Female | Asthma | Albuterol | Peak Flow | 350 |
| P000008 | 2024-02-28 | 67 | Male | Chronic Kidney Disease | Losartan | Creatinine | 2.1 |
Reproduce This Dataset
Recreate this dataset in Python (Jupyter, Kaggle, or Google Colab) using the Syntherx SDK.
# Install Syntherx SDK
pip install syntherx
from syntherx import generate_dataset
df = generate_dataset(
blueprint="ehr_longitudinal",
rows=5000
)
df.to_csv("ehr_longitudinal.csv")Use Cases
- Build and validate AI/ML pipelines for EHR scenarios without using real patient data.
- Train and evaluate models on structured fields such as patient_id, visit_date, age, sex.
- Run simulations, power analyses, and exploratory analytics in a privacy-safe sandbox.
- Prototype dashboards, ETL flows, and feature stores before touching production systems.
Dataset Characteristics
- Fully synthetic — no PHI; suitable for sharing, teaching, and external collaboration.
- Schema includes 8 variables: patient_id, visit_date, age, sex, diagnosis, medication, lab_result_type, lab_result_value
- Delivered in researcher-friendly formats (CSV, JSON, Parquet) for downstream tooling.
- Generated with the Syntherx simulation engine for reproducible cohort-scale draws.
Privacy-Safe Synthetic Dataset
- Contains no real patient data
- Generated using statistical simulation
- Designed for machine learning research
Related Datasets
Explore adjacent synthetic cohorts in the same domain or browse nearby clinical themes.
- Cardiology OutcomesSynthetic patient-level cardiovascular risk factors and biomarkers for ML and outcomes research.
- ClaimsSynthetic patient-level rows with fields: patient_id, age, sex, diagnosis_code, procedure_code, claim_amount, ….
- ClaimsSynthetic patient-level rows with fields: patient_id, age, sex, diagnosis_code, procedure_code, claim_amount, ….
- Clinical Trial OutcomesSynthetic patient-level rows with fields: patient_id, age, sex, trial_arm, baseline_value, endpoint_value, ….
- Diabetes Hba1c TrialSynthetic patient-level rows with fields: patient_id, age, sex, treatment_group, baseline_measure, outcome_measure.
Unlock the Syntherx Platform
Generate custom datasets tailored to your research and AI needs.
Generate Custom Datasets