# SAPS-II and Sepsis-3 on vanilla PostgreSQL MIMIC-III v1.3 This folder contains the SQL scripts needed to compute, on a stock PostgreSQL restore of the MIMIC-III v1.3 dump: - the **Simplified Acute Physiology Score II (SAPS-II)** — a one-row-per-ICU-stay severity score computed on day 1; and - the **Sepsis-3 onset time** (Singer et al., JAMA 2016) — the earliest hour at which a patient has SOFA ≥ 2 within ±48 h of suspicion of infection. The SAPS-II scripts are mirrored verbatim from the upstream repository [`MIT-LCP/mimic-code`](https://github.com/MIT-LCP/mimic-code) (specifically the `mimic-iii/concepts_postgres/` tree, the auto-generated PostgreSQL port of the BigQuery concepts). The Sepsis-3 pipeline is bespoke: upstream ships only an empty stub for MIMIC-III, so we adapted the MIMIC-IV scripts under `mimic-iv/concepts/{score,sepsis,medication}/` to MIMIC-III's schema (see "Sepsis-3 port notes" below). ## Files ``` sql/ ├── build_sapsii.sql <-- master runner for SAPS-II ├── build_sepsis3.sql <-- master runner for Sepsis-3 ├── postgres-functions.sql <-- PL/pgSQL shims for DATETIME_DIFF / _ADD / _SUB ├── echo_data.sql <-- weight imputation helper ├── durations/ │ ├── ventilation_classification.sql │ ├── ventilation_durations.sql │ ├── weight_durations.sql │ ├── dobutamine_dose.sql │ ├── dopamine_dose.sql │ ├── epinephrine_dose.sql │ └── norepinephrine_dose.sql ├── firstday/ │ ├── blood_gas_first_day.sql │ ├── blood_gas_first_day_arterial.sql │ ├── gcs_first_day.sql │ ├── labs_first_day.sql │ ├── urine_output_first_day.sql │ └── vitals_first_day.sql ├── fluid_balance/ │ └── urine_output.sql <-- all-time UO; used by Sepsis-3 ├── severityscores/ │ └── sapsii.sql └── sepsis/ <-- bespoke Sepsis-3 pipeline ├── blood_gas_arterial.sql <-- all-time arterial BG (PaO2/FiO2) ├── gcs_all.sql <-- all-time GCS ├── sofa_hourly.sql <-- staged hourly SOFA pipeline ├── antibiotic.sql <-- filtered antibiotic prescriptions ├── suspicion_of_infection.sql <-- abx <-> culture pairing └── sepsis3.sql <-- final onset table ``` ## Dependency graph (SAPS-II) ``` postgres-functions.sql (DATETIME_* shims, used everywhere below) │ ├── durations/ventilation_classification.sql │ └── durations/ventilation_durations.sql │ ├── firstday/blood_gas_first_day.sql │ └── firstday/blood_gas_first_day_arterial.sql │ ├── firstday/gcs_first_day.sql ├── firstday/labs_first_day.sql ├── firstday/urine_output_first_day.sql └── firstday/vitals_first_day.sql │ └── severityscores/sapsii.sql <-- final table: `sapsii` ``` ## Dependency graph (Sepsis-3) ``` postgres-functions.sql │ ├── echo_data.sql │ └── durations/weight_durations.sql │ ├── durations/dobutamine_dose.sql │ ├── durations/dopamine_dose.sql │ ├── durations/epinephrine_dose.sql │ └── durations/norepinephrine_dose.sql │ ├── fluid_balance/urine_output.sql (all-time UO; consumed by sofa_hourly) ├── durations/ventilation_classification.sql │ └── durations/ventilation_durations.sql │ ├── sepsis/blood_gas_arterial.sql (all-time arterial PaO2/FiO2) ├── sepsis/gcs_all.sql (all-time GCS) │ │ │ └── sepsis/sofa_hourly.sql <-- table: `sofa_hourly` │ (one row per ICU hour) │ ├── sepsis/antibiotic.sql │ └── sepsis/suspicion_of_infection.sql │ └── sepsis/sepsis3.sql <-- final table: `sepsis3` (one row per ICU stay) ``` `fluid_balance/urine_output.sql` is included because it appears in the upstream `postgres-make-concepts.sql` and is harmless to build for SAPS-II; for Sepsis-3 it is required (the renal-SOFA branch reads it). ## Required base MIMIC-III tables The scripts assume the standard MIMIC-III v1.3 schema with these tables already present and accessible via `search_path`: `admissions`, `chartevents`, `diagnoses_icd`, `icustays`, `inputevents_cv`, `inputevents_mv`, `labevents`, `microbiologyevents`, `noteevents` (for `echo_data`), `outputevents`, `patients`, `prescriptions`, `procedureevents_mv`, `services`. ## Running SAPS-II 1. Restore the MIMIC-III v1.3 dump into a PostgreSQL database (the usual `postgres_create_tables.sql` / `postgres_load_data.sql` flow from the `mimic-iii/buildmimic/postgres/` directory of the upstream repo). 2. Make sure the schema containing those tables is on your `search_path` (commonly `mimiciii`). 3. Run the master script from this directory: ```bash psql -d mimic -v ON_ERROR_STOP=1 \ -c 'SET search_path TO mimiciii, public;' \ -f sql/build_sapsii.sql ``` 4. Query the result: ```sql SELECT subject_id, hadm_id, icustay_id, sapsii, sapsii_prob FROM sapsii ORDER BY icustay_id LIMIT 10; ``` ## Running Sepsis-3 1. Same prerequisites as SAPS-II. 2. Run: ```bash psql -d mimic -v ON_ERROR_STOP=1 \ -c 'SET search_path TO mimiciii, public;' \ -f sql/build_sepsis3.sql ``` Expect a few hours runtime on stock PostgreSQL with default indexes. Most of that is the per-measurement scans of `chartevents` (~330 M rows), `labevents` (~30 M rows), and `outputevents` (~4 M rows) that drive the hourly SOFA pipeline. 3. Query the result: ```sql SELECT icustay_id, suspected_infection_time, sofa_time, sofa_score, sepsis3 FROM sepsis3 WHERE sepsis3 = TRUE ORDER BY icustay_id LIMIT 10; ``` For per-hour SOFA across the whole stay (e.g. for time-series modelling), query `sofa_hourly` directly. ### Sepsis-3 port notes The Sepsis-3 onset definition is *time-stamped*: it requires a SOFA score at any moment during the ICU stay, not just on day 1, because suspicion of infection can fall anywhere in the stay. MIMIC-IV provides a hourly SOFA via a stack of pre-built `mimic_derived.*` tables (`icustay_hourly`, `bg`, `vitalsign`, `gcs`, `urine_output_rate`, `chemistry`, `enzyme`, `complete_blood_count`, `epinephrine`, `norepinephrine`, `dopamine`, `dobutamine`, `ventilation` with `ventilation_status='InvasiveVent'`). Almost none of those exist in `mimic-iii/concepts_postgres/`, so we ported the needed bits inline. Notable differences from MIMIC-IV behaviour: 1. **Hourly grid** is built inline with `generate_series` on `icustays.intime` / `outtime` (no `icustay_hourly` table). 2. **Ventilation status granularity** is missing. MIMIC-IV's `InvasiveVent` filter (used to split `pao2fio2ratio_vent` from `pao2fio2ratio_novent` so a ventilated patient doesn't get a stratospheric SOFA respiratory score from a noisy unventilated PaO2:FiO2) doesn't have a clean MIMIC-III equivalent. We use the lumped `ventilation_durations`, treating any active ventilation row as invasive ventilation. 3. **Urine output adjustment** is replicated inline in `sofa_uo`. We materialise both `uo_24hr` (sum) and `uo_tm_24hr` (count of distinct hours within the rolling 24 h window that actually had a UO observation), and the renal CASE in `sofa_components` uses `GREATEST(uo_24hr, 0) * 24.0 / uo_tm_24hr` only when `uo_tm_24hr BETWEEN 22 AND 30`, falling back to creatinine alone otherwise. The `GREATEST(_, 0)` clip prevents patients on continuous bladder irrigation (which `fluid_balance/urine_output.sql` subtracts as a negative volume) from being mis-scored as oliguric. Effect: at hours where the stay has lasted less than ~22 h or where charting is sparse, renal SOFA is computed from creatinine only, instead of being over-scored from a partial UO sum. 4. **Vasopressor rates** come from upstream `durations/{epinephrine,norepinephrine,dopamine,dobutamine}_dose.sql`, which already merge CareVue + MetaVision and convert to mcg/kg/min. Weight-based unit conversion uses `weight_durations.sql`, which itself depends on `echo_data.sql` for weight imputation. 5. **Antibiotic time precision is one day.** MIMIC-III `prescriptions.startdate` is DATE-precision only, so `antibiotic_time` (and any sepsis-3 onset time driven by it) lands on midnight of the prescription start date. Suspicion of infection driven by a culture with charttime is still timestamped to the minute. The hourly SOFA pipeline is built **column-by-column into staged tables** (`sofa_grid`, `sofa_vs`, `sofa_gcs`, `sofa_bili`, `sofa_cr`, `sofa_plt`, `sofa_pf`, `sofa_uo`, `sofa_vaso`, then `sofa_wide`, `sofa_components`, `sofa_hourly`). Each stage scans exactly one giant raw table, so each stage can be `EXPLAIN ANALYZE`d independently and re-run in isolation if you crash partway through. The intermediates are kept (not dropped) so you can inspect them. ## Re-fetching from upstream Re-run `fetch.ps1` (PowerShell, Windows) to pull the latest versions of all upstream-mirrored scripts from the `main` branch of `MIT-LCP/mimic-code`. Files under `sql/sepsis/` are bespoke and not fetched. ## Caveats - Severity scores (SAPS-II, SOFA) and sepsis-3 are computed for **every** ICU stay, including neonates, re-admissions, and short stays. Filter `icustay_id` yourself to match your study population (the canonical adult-ICU sepsis-3 cohort excludes patients < 18 and ICU re-admissions, but we do not enforce that here). - The PostgreSQL scripts in `concepts_postgres/` are auto-generated from the BigQuery-flavored originals in `concepts/`; the `DATETIME_*` function calls you'll see in the SQL are resolved by the wrappers in `postgres-functions.sql`, so it must be sourced first (the master scripts do this for you).