Synthetic EHR Datasets
This page demonstrates the structure of synthetic datasets available in this category, including variable schema, preview data, and reproducibility using the Syntherx SDK.
Example Dataset
Ehr
Variable schema and example preview from the blueprint definition.
Variable Schema
| Column Name | Type | Description |
|---|---|---|
| patient_id | string | Unique patient identifier |
| visit_date | string | Date of visit |
| age | number | Patient age |
| sex | string | Patient sex |
| diagnosis | string | Primary diagnosis |
| medication | string | Prescribed medication |
| lab_result_type | string | Type of lab test |
| lab_result_value | number | Lab result value |
Data Preview
First 10 rows (preview only)
| patient_id | visit_date | age | sex | diagnosis | medication | lab_result_type | lab_result_value |
|---|---|---|---|---|---|---|---|
| P000001 | 2024-01-10 | 65 | Female | Type 2 Diabetes | Metformin | HbA1c | 8.2 |
| P000001 | 2024-03-15 | 65 | Female | Type 2 Diabetes | Metformin | HbA1c | 7.5 |
| P000002 | 2024-02-20 | 58 | Male | Hypertension | Lisinopril | Blood Pressure | 140 |
| P000003 | 2024-01-05 | 72 | Female | COPD | Albuterol | Oxygen Saturation | 92 |
| P000004 | 2024-02-12 | 60 | Male | Hyperlipidemia | Atorvastatin | LDL | 155 |
| P000005 | 2024-03-08 | 50 | Female | Type 2 Diabetes | Insulin | Glucose | 180 |
| P000006 | 2024-01-25 | 72 | Male | Heart Failure | Furosemide | BNP | 450 |
| P000007 | 2024-04-02 | 45 | Female | Asthma | Albuterol | Peak Flow | 350 |
| P000008 | 2024-02-28 | 67 | Male | Chronic Kidney Disease | Losartan | Creatinine | 2.1 |
| P000009 | 2024-04-10 | 63 | Female | Type 2 Diabetes | Metformin | HbA1c | 7.2 |
Includes CSV, JSON, and Parquet — ready for ML pipelines
Reproduce This Dataset
Recreate this dataset in Python (Jupyter, Kaggle, or Google Colab) using the Syntherx SDK
# Install Syntherx SDK
pip install syntherx
from syntherx import generate_dataset
df = generate_dataset(
blueprint="ehr_longitudinal",
rows=5000
)
df.to_csv("ehr_longitudinal.csv")Use Cases
- Longitudinal patient trajectory modeling
- Disease progression analysis
- Clinical decision support simulations
Privacy-Safe Synthetic Dataset
- Contains no real patient data
- Generated using statistical simulation
- Designed for machine learning research
Purchase Dataset
Research Dataset — $99
Secure checkout via Stripe.
Includes CSV, JSON, and Parquet — ready for ML pipelines
Unlock the Syntherx Platform
Generate custom datasets tailored to your research and AI needs.
Generate custom datasets