2026-05-05 10:22:17 +02:00
2026-05-05 10:22:17 +02:00
2026-05-05 10:22:17 +02:00
2026-05-05 10:22:17 +02:00

SAPS-II and Sepsis-3 on vanilla PostgreSQL MIMIC-III v1.3

This folder contains the SQL scripts needed to compute, on a stock PostgreSQL restore of the MIMIC-III v1.3 dump:

  • the Simplified Acute Physiology Score II (SAPS-II) — a one-row-per-ICU-stay severity score computed on day 1; and
  • the Sepsis-3 onset time (Singer et al., JAMA 2016) — the earliest hour at which a patient has SOFA ≥ 2 within ±48 h of suspicion of infection.

The SAPS-II scripts are mirrored verbatim from the upstream repository MIT-LCP/mimic-code (specifically the mimic-iii/concepts_postgres/ tree, the auto-generated PostgreSQL port of the BigQuery concepts). The Sepsis-3 pipeline is bespoke: upstream ships only an empty stub for MIMIC-III, so we adapted the MIMIC-IV scripts under mimic-iv/concepts/{score,sepsis,medication}/ to MIMIC-III's schema (see "Sepsis-3 port notes" below).

Files

sql/
├── build_sapsii.sql                       <-- master runner for SAPS-II
├── build_sepsis3.sql                      <-- master runner for Sepsis-3
├── postgres-functions.sql                 <-- PL/pgSQL shims for DATETIME_DIFF / _ADD / _SUB
├── echo_data.sql                          <-- weight imputation helper
├── durations/
│   ├── ventilation_classification.sql
│   ├── ventilation_durations.sql
│   ├── weight_durations.sql
│   ├── dobutamine_dose.sql
│   ├── dopamine_dose.sql
│   ├── epinephrine_dose.sql
│   └── norepinephrine_dose.sql
├── firstday/
│   ├── blood_gas_first_day.sql
│   ├── blood_gas_first_day_arterial.sql
│   ├── gcs_first_day.sql
│   ├── labs_first_day.sql
│   ├── urine_output_first_day.sql
│   └── vitals_first_day.sql
├── fluid_balance/
│   └── urine_output.sql                   <-- all-time UO; used by Sepsis-3
├── severityscores/
│   └── sapsii.sql
└── sepsis/                                <-- bespoke Sepsis-3 pipeline
    ├── blood_gas_arterial.sql             <-- all-time arterial BG (PaO2/FiO2)
    ├── gcs_all.sql                        <-- all-time GCS
    ├── sofa_hourly.sql                    <-- staged hourly SOFA pipeline
    ├── antibiotic.sql                     <-- filtered antibiotic prescriptions
    ├── suspicion_of_infection.sql         <-- abx <-> culture pairing
    └── sepsis3.sql                        <-- final onset table

Dependency graph (SAPS-II)

postgres-functions.sql                 (DATETIME_* shims, used everywhere below)
│
├── durations/ventilation_classification.sql
│       └── durations/ventilation_durations.sql
│
├── firstday/blood_gas_first_day.sql
│       └── firstday/blood_gas_first_day_arterial.sql
│
├── firstday/gcs_first_day.sql
├── firstday/labs_first_day.sql
├── firstday/urine_output_first_day.sql
└── firstday/vitals_first_day.sql
        │
        └── severityscores/sapsii.sql      <-- final table: `sapsii`

Dependency graph (Sepsis-3)

postgres-functions.sql
│
├── echo_data.sql
│       └── durations/weight_durations.sql
│               ├── durations/dobutamine_dose.sql
│               ├── durations/dopamine_dose.sql
│               ├── durations/epinephrine_dose.sql
│               └── durations/norepinephrine_dose.sql
│
├── fluid_balance/urine_output.sql        (all-time UO; consumed by sofa_hourly)
├── durations/ventilation_classification.sql
│       └── durations/ventilation_durations.sql
│
├── sepsis/blood_gas_arterial.sql         (all-time arterial PaO2/FiO2)
├── sepsis/gcs_all.sql                    (all-time GCS)
│       │
│       └── sepsis/sofa_hourly.sql        <-- table: `sofa_hourly`
│                                            (one row per ICU hour)
│
├── sepsis/antibiotic.sql
│       └── sepsis/suspicion_of_infection.sql
│
└── sepsis/sepsis3.sql                    <-- final table: `sepsis3`
                                              (one row per ICU stay)

fluid_balance/urine_output.sql is included because it appears in the upstream postgres-make-concepts.sql and is harmless to build for SAPS-II; for Sepsis-3 it is required (the renal-SOFA branch reads it).

Required base MIMIC-III tables

The scripts assume the standard MIMIC-III v1.3 schema with these tables already present and accessible via search_path:

admissions, chartevents, diagnoses_icd, icustays, inputevents_cv, inputevents_mv, labevents, microbiologyevents, noteevents (for echo_data), outputevents, patients, prescriptions, procedureevents_mv, services.

Running SAPS-II

  1. Restore the MIMIC-III v1.3 dump into a PostgreSQL database (the usual postgres_create_tables.sql / postgres_load_data.sql flow from the mimic-iii/buildmimic/postgres/ directory of the upstream repo).

  2. Make sure the schema containing those tables is on your search_path (commonly mimiciii).

  3. Run the master script from this directory:

    psql -d mimic -v ON_ERROR_STOP=1 \
         -c 'SET search_path TO mimiciii, public;' \
         -f sql/build_sapsii.sql
    
  4. Query the result:

    SELECT subject_id, hadm_id, icustay_id, sapsii, sapsii_prob
    FROM   sapsii
    ORDER  BY icustay_id
    LIMIT  10;
    

Running Sepsis-3

  1. Same prerequisites as SAPS-II.

  2. Run:

    psql -d mimic -v ON_ERROR_STOP=1 \
         -c 'SET search_path TO mimiciii, public;' \
         -f sql/build_sepsis3.sql
    

    Expect a few hours runtime on stock PostgreSQL with default indexes. Most of that is the per-measurement scans of chartevents (~330 M rows), labevents (~30 M rows), and outputevents (~4 M rows) that drive the hourly SOFA pipeline.

  3. Query the result:

    SELECT icustay_id, suspected_infection_time,
           sofa_time, sofa_score, sepsis3
    FROM   sepsis3
    WHERE  sepsis3 = TRUE
    ORDER  BY icustay_id
    LIMIT  10;
    

    For per-hour SOFA across the whole stay (e.g. for time-series modelling), query sofa_hourly directly.

Sepsis-3 port notes

The Sepsis-3 onset definition is time-stamped: it requires a SOFA score at any moment during the ICU stay, not just on day 1, because suspicion of infection can fall anywhere in the stay. MIMIC-IV provides a hourly SOFA via a stack of pre-built mimic_derived.* tables (icustay_hourly, bg, vitalsign, gcs, urine_output_rate, chemistry, enzyme, complete_blood_count, epinephrine, norepinephrine, dopamine, dobutamine, ventilation with ventilation_status='InvasiveVent'). Almost none of those exist in mimic-iii/concepts_postgres/, so we ported the needed bits inline. Notable differences from MIMIC-IV behaviour:

  1. Hourly grid is built inline with generate_series on icustays.intime / outtime (no icustay_hourly table).
  2. Ventilation status granularity is missing. MIMIC-IV's InvasiveVent filter (used to split pao2fio2ratio_vent from pao2fio2ratio_novent so a ventilated patient doesn't get a stratospheric SOFA respiratory score from a noisy unventilated PaO2:FiO2) doesn't have a clean MIMIC-III equivalent. We use the lumped ventilation_durations, treating any active ventilation row as invasive ventilation.
  3. Urine output adjustment is replicated inline in sofa_uo. We materialise both uo_24hr (sum) and uo_tm_24hr (count of distinct hours within the rolling 24 h window that actually had a UO observation), and the renal CASE in sofa_components uses GREATEST(uo_24hr, 0) * 24.0 / uo_tm_24hr only when uo_tm_24hr BETWEEN 22 AND 30, falling back to creatinine alone otherwise. The GREATEST(_, 0) clip prevents patients on continuous bladder irrigation (which fluid_balance/urine_output.sql subtracts as a negative volume) from being mis-scored as oliguric. Effect: at hours where the stay has lasted less than ~22 h or where charting is sparse, renal SOFA is computed from creatinine only, instead of being over-scored from a partial UO sum.
  4. Vasopressor rates come from upstream durations/{epinephrine,norepinephrine,dopamine,dobutamine}_dose.sql, which already merge CareVue + MetaVision and convert to mcg/kg/min. Weight-based unit conversion uses weight_durations.sql, which itself depends on echo_data.sql for weight imputation.
  5. Antibiotic time precision is one day. MIMIC-III prescriptions.startdate is DATE-precision only, so antibiotic_time (and any sepsis-3 onset time driven by it) lands on midnight of the prescription start date. Suspicion of infection driven by a culture with charttime is still timestamped to the minute.

The hourly SOFA pipeline is built column-by-column into staged tables (sofa_grid, sofa_vs, sofa_gcs, sofa_bili, sofa_cr, sofa_plt, sofa_pf, sofa_uo, sofa_vaso, then sofa_wide, sofa_components, sofa_hourly). Each stage scans exactly one giant raw table, so each stage can be EXPLAIN ANALYZEd independently and re-run in isolation if you crash partway through. The intermediates are kept (not dropped) so you can inspect them.

Re-fetching from upstream

Re-run fetch.ps1 (PowerShell, Windows) to pull the latest versions of all upstream-mirrored scripts from the main branch of MIT-LCP/mimic-code. Files under sql/sepsis/ are bespoke and not fetched.

Caveats

  • Severity scores (SAPS-II, SOFA) and sepsis-3 are computed for every ICU stay, including neonates, re-admissions, and short stays. Filter icustay_id yourself to match your study population (the canonical adult-ICU sepsis-3 cohort excludes patients < 18 and ICU re-admissions, but we do not enforce that here).
  • The PostgreSQL scripts in concepts_postgres/ are auto-generated from the BigQuery-flavored originals in concepts/; the DATETIME_* function calls you'll see in the SQL are resolved by the wrappers in postgres-functions.sql, so it must be sourced first (the master scripts do this for you).
Description
Ported saps-ii and sepsis-3 to MIMIC-III v1.3 from MIMIC-IV using AI, with caveats
Readme 94 KiB
Languages
PLpgSQL 89.2%
PowerShell 10.8%