10 KiB
SAPS-II and Sepsis-3 on vanilla PostgreSQL MIMIC-III v1.3
This folder contains the SQL scripts needed to compute, on a stock PostgreSQL restore of the MIMIC-III v1.3 dump:
- the Simplified Acute Physiology Score II (SAPS-II) — a one-row-per-ICU-stay severity score computed on day 1; and
- the Sepsis-3 onset time (Singer et al., JAMA 2016) — the earliest hour at which a patient has SOFA ≥ 2 within ±48 h of suspicion of infection.
The SAPS-II scripts are mirrored verbatim from the upstream repository
MIT-LCP/mimic-code (specifically
the mimic-iii/concepts_postgres/ tree, the auto-generated PostgreSQL
port of the BigQuery concepts). The Sepsis-3 pipeline is bespoke:
upstream ships only an empty stub for MIMIC-III, so we adapted the
MIMIC-IV scripts under mimic-iv/concepts/{score,sepsis,medication}/
to MIMIC-III's schema (see "Sepsis-3 port notes" below).
Files
sql/
├── build_sapsii.sql <-- master runner for SAPS-II
├── build_sepsis3.sql <-- master runner for Sepsis-3
├── postgres-functions.sql <-- PL/pgSQL shims for DATETIME_DIFF / _ADD / _SUB
├── echo_data.sql <-- weight imputation helper
├── durations/
│ ├── ventilation_classification.sql
│ ├── ventilation_durations.sql
│ ├── weight_durations.sql
│ ├── dobutamine_dose.sql
│ ├── dopamine_dose.sql
│ ├── epinephrine_dose.sql
│ └── norepinephrine_dose.sql
├── firstday/
│ ├── blood_gas_first_day.sql
│ ├── blood_gas_first_day_arterial.sql
│ ├── gcs_first_day.sql
│ ├── labs_first_day.sql
│ ├── urine_output_first_day.sql
│ └── vitals_first_day.sql
├── fluid_balance/
│ └── urine_output.sql <-- all-time UO; used by Sepsis-3
├── severityscores/
│ └── sapsii.sql
└── sepsis/ <-- bespoke Sepsis-3 pipeline
├── blood_gas_arterial.sql <-- all-time arterial BG (PaO2/FiO2)
├── gcs_all.sql <-- all-time GCS
├── sofa_hourly.sql <-- staged hourly SOFA pipeline
├── antibiotic.sql <-- filtered antibiotic prescriptions
├── suspicion_of_infection.sql <-- abx <-> culture pairing
└── sepsis3.sql <-- final onset table
Dependency graph (SAPS-II)
postgres-functions.sql (DATETIME_* shims, used everywhere below)
│
├── durations/ventilation_classification.sql
│ └── durations/ventilation_durations.sql
│
├── firstday/blood_gas_first_day.sql
│ └── firstday/blood_gas_first_day_arterial.sql
│
├── firstday/gcs_first_day.sql
├── firstday/labs_first_day.sql
├── firstday/urine_output_first_day.sql
└── firstday/vitals_first_day.sql
│
└── severityscores/sapsii.sql <-- final table: `sapsii`
Dependency graph (Sepsis-3)
postgres-functions.sql
│
├── echo_data.sql
│ └── durations/weight_durations.sql
│ ├── durations/dobutamine_dose.sql
│ ├── durations/dopamine_dose.sql
│ ├── durations/epinephrine_dose.sql
│ └── durations/norepinephrine_dose.sql
│
├── fluid_balance/urine_output.sql (all-time UO; consumed by sofa_hourly)
├── durations/ventilation_classification.sql
│ └── durations/ventilation_durations.sql
│
├── sepsis/blood_gas_arterial.sql (all-time arterial PaO2/FiO2)
├── sepsis/gcs_all.sql (all-time GCS)
│ │
│ └── sepsis/sofa_hourly.sql <-- table: `sofa_hourly`
│ (one row per ICU hour)
│
├── sepsis/antibiotic.sql
│ └── sepsis/suspicion_of_infection.sql
│
└── sepsis/sepsis3.sql <-- final table: `sepsis3`
(one row per ICU stay)
fluid_balance/urine_output.sql is included because it appears in the
upstream postgres-make-concepts.sql and is harmless to build for
SAPS-II; for Sepsis-3 it is required (the renal-SOFA branch reads it).
Required base MIMIC-III tables
The scripts assume the standard MIMIC-III v1.3 schema with these tables
already present and accessible via search_path:
admissions, chartevents, diagnoses_icd, icustays,
inputevents_cv, inputevents_mv, labevents, microbiologyevents,
noteevents (for echo_data), outputevents, patients,
prescriptions, procedureevents_mv, services.
Running SAPS-II
-
Restore the MIMIC-III v1.3 dump into a PostgreSQL database (the usual
postgres_create_tables.sql/postgres_load_data.sqlflow from themimic-iii/buildmimic/postgres/directory of the upstream repo). -
Make sure the schema containing those tables is on your
search_path(commonlymimiciii). -
Run the master script from this directory:
psql -d mimic -v ON_ERROR_STOP=1 \ -c 'SET search_path TO mimiciii, public;' \ -f sql/build_sapsii.sql -
Query the result:
SELECT subject_id, hadm_id, icustay_id, sapsii, sapsii_prob FROM sapsii ORDER BY icustay_id LIMIT 10;
Running Sepsis-3
-
Same prerequisites as SAPS-II.
-
Run:
psql -d mimic -v ON_ERROR_STOP=1 \ -c 'SET search_path TO mimiciii, public;' \ -f sql/build_sepsis3.sqlExpect a few hours runtime on stock PostgreSQL with default indexes. Most of that is the per-measurement scans of
chartevents(~330 M rows),labevents(~30 M rows), andoutputevents(~4 M rows) that drive the hourly SOFA pipeline. -
Query the result:
SELECT icustay_id, suspected_infection_time, sofa_time, sofa_score, sepsis3 FROM sepsis3 WHERE sepsis3 = TRUE ORDER BY icustay_id LIMIT 10;For per-hour SOFA across the whole stay (e.g. for time-series modelling), query
sofa_hourlydirectly.
Sepsis-3 port notes
The Sepsis-3 onset definition is time-stamped: it requires a SOFA
score at any moment during the ICU stay, not just on day 1, because
suspicion of infection can fall anywhere in the stay. MIMIC-IV
provides a hourly SOFA via a stack of pre-built mimic_derived.*
tables (icustay_hourly, bg, vitalsign, gcs,
urine_output_rate, chemistry, enzyme, complete_blood_count,
epinephrine, norepinephrine, dopamine, dobutamine,
ventilation with ventilation_status='InvasiveVent'). Almost none
of those exist in mimic-iii/concepts_postgres/, so we ported the
needed bits inline. Notable differences from MIMIC-IV behaviour:
- Hourly grid is built inline with
generate_seriesonicustays.intime/outtime(noicustay_hourlytable). - Ventilation status granularity is missing. MIMIC-IV's
InvasiveVentfilter (used to splitpao2fio2ratio_ventfrompao2fio2ratio_noventso a ventilated patient doesn't get a stratospheric SOFA respiratory score from a noisy unventilated PaO2:FiO2) doesn't have a clean MIMIC-III equivalent. We use the lumpedventilation_durations, treating any active ventilation row as invasive ventilation. - Urine output adjustment is replicated inline in
sofa_uo. We materialise bothuo_24hr(sum) anduo_tm_24hr(count of distinct hours within the rolling 24 h window that actually had a UO observation), and the renal CASE insofa_componentsusesGREATEST(uo_24hr, 0) * 24.0 / uo_tm_24hronly whenuo_tm_24hr BETWEEN 22 AND 30, falling back to creatinine alone otherwise. TheGREATEST(_, 0)clip prevents patients on continuous bladder irrigation (whichfluid_balance/urine_output.sqlsubtracts as a negative volume) from being mis-scored as oliguric. Effect: at hours where the stay has lasted less than ~22 h or where charting is sparse, renal SOFA is computed from creatinine only, instead of being over-scored from a partial UO sum. - Vasopressor rates come from upstream
durations/{epinephrine,norepinephrine,dopamine,dobutamine}_dose.sql, which already merge CareVue + MetaVision and convert to mcg/kg/min. Weight-based unit conversion usesweight_durations.sql, which itself depends onecho_data.sqlfor weight imputation. - Antibiotic time precision is one day. MIMIC-III
prescriptions.startdateis DATE-precision only, soantibiotic_time(and any sepsis-3 onset time driven by it) lands on midnight of the prescription start date. Suspicion of infection driven by a culture with charttime is still timestamped to the minute.
The hourly SOFA pipeline is built column-by-column into staged
tables (sofa_grid, sofa_vs, sofa_gcs, sofa_bili, sofa_cr,
sofa_plt, sofa_pf, sofa_uo, sofa_vaso, then sofa_wide,
sofa_components, sofa_hourly). Each stage scans exactly one
giant raw table, so each stage can be EXPLAIN ANALYZEd
independently and re-run in isolation if you crash partway through.
The intermediates are kept (not dropped) so you can inspect them.
Re-fetching from upstream
Re-run fetch.ps1 (PowerShell, Windows) to pull the latest versions of
all upstream-mirrored scripts from the main branch of
MIT-LCP/mimic-code. Files under sql/sepsis/ are bespoke and not
fetched.
Caveats
- Severity scores (SAPS-II, SOFA) and sepsis-3 are computed for
every ICU stay, including neonates, re-admissions, and short
stays. Filter
icustay_idyourself to match your study population (the canonical adult-ICU sepsis-3 cohort excludes patients < 18 and ICU re-admissions, but we do not enforce that here). - The PostgreSQL scripts in
concepts_postgres/are auto-generated from the BigQuery-flavored originals inconcepts/; theDATETIME_*function calls you'll see in the SQL are resolved by the wrappers inpostgres-functions.sql, so it must be sourced first (the master scripts do this for you).