Ehr

Synthetic patient-level rows with fields: patient_id, visit_date, age, sex, diagnosis, medication, ….

This resource represents a fully synthetic cohort patterned after ehr scenarios: there are no real patients or protected health information, only statistically plausible records for method development and reproducible benchmarks.

Rows include variables such as patient_id, visit_date, age, sex, diagnosis, medication, lab_result_type, lab_result_value. You can inspect the full schema and representative preview below before downloading or generating a fresh cohort with the Syntherx SDK.

Teams use datasets like this for AI and statistical modeling, digital twin and pathway simulation, curriculum and sandbox environments, and cross-institutional collaborations where sharing real data is impractical.

Research Dataset — $99

Secure checkout via Stripe.

Includes CSV, JSON, and Parquet — ready for ML pipelines

Variable Schema

Column NameTypeDescription
patient_idstringUnique patient identifier
visit_datestringDate of visit
agenumberPatient age
sexstringPatient sex
diagnosisstringPrimary diagnosis
medicationstringPrescribed medication
lab_result_typestringType of lab test (e.g., HbA1c, Glucose)
lab_result_valuenumberLab result value

Data Preview

First 9 rows (preview only)

Includes CSV, JSON, and Parquet — ready for ML pipelines

patient_idvisit_dateagesexdiagnosismedicationlab_result_typelab_result_value
P0000012024-01-1065FemaleType 2 DiabetesMetforminHbA1c8.2
P0000012024-03-1565FemaleType 2 DiabetesMetforminHbA1c7.5
P0000022024-02-2058MaleHypertensionLisinoprilBlood Pressure140
P0000032024-01-0572FemaleCOPDAlbuterolOxygen Saturation92
P0000042024-02-1260MaleHyperlipidemiaAtorvastatinLDL155
P0000052024-03-0850FemaleType 2 DiabetesInsulinGlucose180
P0000062024-01-2572MaleHeart FailureFurosemideBNP450
P0000072024-04-0245FemaleAsthmaAlbuterolPeak Flow350
P0000082024-02-2867MaleChronic Kidney DiseaseLosartanCreatinine2.1

Reproduce This Dataset

Recreate this dataset in Python (Jupyter, Kaggle, or Google Colab) using the Syntherx SDK.

# Install Syntherx SDK
pip install syntherx

from syntherx import generate_dataset

df = generate_dataset(
    blueprint="ehr_longitudinal",
    rows=5000
)

df.to_csv("ehr_longitudinal.csv")

Use Cases

  • Build and validate AI/ML pipelines for EHR scenarios without using real patient data.
  • Train and evaluate models on structured fields such as patient_id, visit_date, age, sex.
  • Run simulations, power analyses, and exploratory analytics in a privacy-safe sandbox.
  • Prototype dashboards, ETL flows, and feature stores before touching production systems.

Dataset Characteristics

  • Fully synthetic — no PHI; suitable for sharing, teaching, and external collaboration.
  • Schema includes 8 variables: patient_id, visit_date, age, sex, diagnosis, medication, lab_result_type, lab_result_value
  • Delivered in researcher-friendly formats (CSV, JSON, Parquet) for downstream tooling.
  • Generated with the Syntherx simulation engine for reproducible cohort-scale draws.

Privacy-Safe Synthetic Dataset

  • Contains no real patient data
  • Generated using statistical simulation
  • Designed for machine learning research

Related Datasets

Explore adjacent synthetic cohorts in the same domain or browse nearby clinical themes.

Unlock the Syntherx Platform

Generate custom datasets tailored to your research and AI needs.

Generate Custom Datasets