Ehr

Name: Ehr
Creator: Syntherx

Synthetic patient-level rows with fields: patient_id, visit_date, age, sex, diagnosis, medication, ….

This resource represents a fully synthetic cohort patterned after ehr scenarios: there are no real patients or protected health information, only statistically plausible records for method development and reproducible benchmarks.

Rows include variables such as patient_id, visit_date, age, sex, diagnosis, medication, lab_result_type, lab_result_value. You can inspect the full schema and representative preview below before downloading or generating a fresh cohort with the Syntherx SDK.

Teams use datasets like this for AI and statistical modeling, digital twin and pathway simulation, curriculum and sandbox environments, and cross-institutional collaborations where sharing real data is impractical.

Research Dataset — $99

Secure checkout via Stripe.

Includes CSV, JSON, and Parquet — ready for ML pipelines

Variable Schema

Column Name	Type	Description
patient_id	string	Unique patient identifier
visit_date	string	Date of visit
age	number	Patient age
sex	string	Patient sex
diagnosis	string	Primary diagnosis
medication	string	Prescribed medication
lab_result_type	string	Type of lab test (e.g., HbA1c, Glucose)
lab_result_value	number	Lab result value

Data Preview

First 9 rows (preview only)

Includes CSV, JSON, and Parquet — ready for ML pipelines

patient_id	visit_date	age	sex	diagnosis	medication	lab_result_type	lab_result_value
P000001	2024-01-10	65	Female	Type 2 Diabetes	Metformin	HbA1c	8.2
P000001	2024-03-15	65	Female	Type 2 Diabetes	Metformin	HbA1c	7.5
P000002	2024-02-20	58	Male	Hypertension	Lisinopril	Blood Pressure	140
P000003	2024-01-05	72	Female	COPD	Albuterol	Oxygen Saturation	92
P000004	2024-02-12	60	Male	Hyperlipidemia	Atorvastatin	LDL	155
P000005	2024-03-08	50	Female	Type 2 Diabetes	Insulin	Glucose	180
P000006	2024-01-25	72	Male	Heart Failure	Furosemide	BNP	450
P000007	2024-04-02	45	Female	Asthma	Albuterol	Peak Flow	350
P000008	2024-02-28	67	Male	Chronic Kidney Disease	Losartan	Creatinine	2.1

Reproduce This Dataset

Recreate this dataset in Python (Jupyter, Kaggle, or Google Colab) using the Syntherx SDK.

# Install Syntherx SDK
pip install syntherx

from syntherx import generate_dataset

df = generate_dataset(
    blueprint="ehr_longitudinal",
    rows=5000
)

df.to_csv("ehr_longitudinal.csv")

Use Cases

Build and validate AI/ML pipelines for EHR scenarios without using real patient data.
Train and evaluate models on structured fields such as patient_id, visit_date, age, sex.
Run simulations, power analyses, and exploratory analytics in a privacy-safe sandbox.
Prototype dashboards, ETL flows, and feature stores before touching production systems.

Dataset Characteristics

Fully synthetic — no PHI; suitable for sharing, teaching, and external collaboration.
Schema includes 8 variables: patient_id, visit_date, age, sex, diagnosis, medication, lab_result_type, lab_result_value
Delivered in researcher-friendly formats (CSV, JSON, Parquet) for downstream tooling.
Generated with the Syntherx simulation engine for reproducible cohort-scale draws.

Privacy-Safe Synthetic Dataset

Contains no real patient data
Generated using statistical simulation
Designed for machine learning research

Related Datasets

Explore adjacent synthetic cohorts in the same domain or browse nearby clinical themes.

Ehr