Files

247 lines
10 KiB
Markdown
Raw Permalink Normal View History

2026-05-05 10:22:17 +02:00
# SAPS-II and Sepsis-3 on vanilla PostgreSQL MIMIC-III v1.3
This folder contains the SQL scripts needed to compute, on a stock
PostgreSQL restore of the MIMIC-III v1.3 dump:
- the **Simplified Acute Physiology Score II (SAPS-II)** — a
one-row-per-ICU-stay severity score computed on day 1; and
- the **Sepsis-3 onset time** (Singer et al., JAMA 2016) — the
earliest hour at which a patient has SOFA ≥ 2 within ±48 h
of suspicion of infection.
The SAPS-II scripts are mirrored verbatim from the upstream repository
[`MIT-LCP/mimic-code`](https://github.com/MIT-LCP/mimic-code) (specifically
the `mimic-iii/concepts_postgres/` tree, the auto-generated PostgreSQL
port of the BigQuery concepts). The Sepsis-3 pipeline is bespoke:
upstream ships only an empty stub for MIMIC-III, so we adapted the
MIMIC-IV scripts under `mimic-iv/concepts/{score,sepsis,medication}/`
to MIMIC-III's schema (see "Sepsis-3 port notes" below).
## Files
```
sql/
├── build_sapsii.sql <-- master runner for SAPS-II
├── build_sepsis3.sql <-- master runner for Sepsis-3
├── postgres-functions.sql <-- PL/pgSQL shims for DATETIME_DIFF / _ADD / _SUB
├── echo_data.sql <-- weight imputation helper
├── durations/
│ ├── ventilation_classification.sql
│ ├── ventilation_durations.sql
│ ├── weight_durations.sql
│ ├── dobutamine_dose.sql
│ ├── dopamine_dose.sql
│ ├── epinephrine_dose.sql
│ └── norepinephrine_dose.sql
├── firstday/
│ ├── blood_gas_first_day.sql
│ ├── blood_gas_first_day_arterial.sql
│ ├── gcs_first_day.sql
│ ├── labs_first_day.sql
│ ├── urine_output_first_day.sql
│ └── vitals_first_day.sql
├── fluid_balance/
│ └── urine_output.sql <-- all-time UO; used by Sepsis-3
├── severityscores/
│ └── sapsii.sql
└── sepsis/ <-- bespoke Sepsis-3 pipeline
├── blood_gas_arterial.sql <-- all-time arterial BG (PaO2/FiO2)
├── gcs_all.sql <-- all-time GCS
├── sofa_hourly.sql <-- staged hourly SOFA pipeline
├── antibiotic.sql <-- filtered antibiotic prescriptions
├── suspicion_of_infection.sql <-- abx <-> culture pairing
└── sepsis3.sql <-- final onset table
```
## Dependency graph (SAPS-II)
```
postgres-functions.sql (DATETIME_* shims, used everywhere below)
├── durations/ventilation_classification.sql
│ └── durations/ventilation_durations.sql
├── firstday/blood_gas_first_day.sql
│ └── firstday/blood_gas_first_day_arterial.sql
├── firstday/gcs_first_day.sql
├── firstday/labs_first_day.sql
├── firstday/urine_output_first_day.sql
└── firstday/vitals_first_day.sql
└── severityscores/sapsii.sql <-- final table: `sapsii`
```
## Dependency graph (Sepsis-3)
```
postgres-functions.sql
├── echo_data.sql
│ └── durations/weight_durations.sql
│ ├── durations/dobutamine_dose.sql
│ ├── durations/dopamine_dose.sql
│ ├── durations/epinephrine_dose.sql
│ └── durations/norepinephrine_dose.sql
├── fluid_balance/urine_output.sql (all-time UO; consumed by sofa_hourly)
├── durations/ventilation_classification.sql
│ └── durations/ventilation_durations.sql
├── sepsis/blood_gas_arterial.sql (all-time arterial PaO2/FiO2)
├── sepsis/gcs_all.sql (all-time GCS)
│ │
│ └── sepsis/sofa_hourly.sql <-- table: `sofa_hourly`
│ (one row per ICU hour)
├── sepsis/antibiotic.sql
│ └── sepsis/suspicion_of_infection.sql
└── sepsis/sepsis3.sql <-- final table: `sepsis3`
(one row per ICU stay)
```
`fluid_balance/urine_output.sql` is included because it appears in the
upstream `postgres-make-concepts.sql` and is harmless to build for
SAPS-II; for Sepsis-3 it is required (the renal-SOFA branch reads it).
## Required base MIMIC-III tables
The scripts assume the standard MIMIC-III v1.3 schema with these tables
already present and accessible via `search_path`:
`admissions`, `chartevents`, `diagnoses_icd`, `icustays`,
`inputevents_cv`, `inputevents_mv`, `labevents`, `microbiologyevents`,
`noteevents` (for `echo_data`), `outputevents`, `patients`,
`prescriptions`, `procedureevents_mv`, `services`.
## Running SAPS-II
1. Restore the MIMIC-III v1.3 dump into a PostgreSQL database (the usual
`postgres_create_tables.sql` / `postgres_load_data.sql` flow from the
`mimic-iii/buildmimic/postgres/` directory of the upstream repo).
2. Make sure the schema containing those tables is on your `search_path`
(commonly `mimiciii`).
3. Run the master script from this directory:
```bash
psql -d mimic -v ON_ERROR_STOP=1 \
-c 'SET search_path TO mimiciii, public;' \
-f sql/build_sapsii.sql
```
4. Query the result:
```sql
SELECT subject_id, hadm_id, icustay_id, sapsii, sapsii_prob
FROM sapsii
ORDER BY icustay_id
LIMIT 10;
```
## Running Sepsis-3
1. Same prerequisites as SAPS-II.
2. Run:
```bash
psql -d mimic -v ON_ERROR_STOP=1 \
-c 'SET search_path TO mimiciii, public;' \
-f sql/build_sepsis3.sql
```
Expect a few hours runtime on stock PostgreSQL with default
indexes. Most of that is the per-measurement scans of
`chartevents` (~330 M rows), `labevents` (~30 M rows), and
`outputevents` (~4 M rows) that drive the hourly SOFA pipeline.
3. Query the result:
```sql
SELECT icustay_id, suspected_infection_time,
sofa_time, sofa_score, sepsis3
FROM sepsis3
WHERE sepsis3 = TRUE
ORDER BY icustay_id
LIMIT 10;
```
For per-hour SOFA across the whole stay (e.g. for time-series
modelling), query `sofa_hourly` directly.
### Sepsis-3 port notes
The Sepsis-3 onset definition is *time-stamped*: it requires a SOFA
score at any moment during the ICU stay, not just on day 1, because
suspicion of infection can fall anywhere in the stay. MIMIC-IV
provides a hourly SOFA via a stack of pre-built `mimic_derived.*`
tables (`icustay_hourly`, `bg`, `vitalsign`, `gcs`,
`urine_output_rate`, `chemistry`, `enzyme`, `complete_blood_count`,
`epinephrine`, `norepinephrine`, `dopamine`, `dobutamine`,
`ventilation` with `ventilation_status='InvasiveVent'`). Almost none
of those exist in `mimic-iii/concepts_postgres/`, so we ported the
needed bits inline. Notable differences from MIMIC-IV behaviour:
1. **Hourly grid** is built inline with `generate_series` on
`icustays.intime` / `outtime` (no `icustay_hourly` table).
2. **Ventilation status granularity** is missing. MIMIC-IV's
`InvasiveVent` filter (used to split `pao2fio2ratio_vent` from
`pao2fio2ratio_novent` so a ventilated patient doesn't get a
stratospheric SOFA respiratory score from a noisy unventilated
PaO2:FiO2) doesn't have a clean MIMIC-III equivalent. We use the
lumped `ventilation_durations`, treating any active ventilation
row as invasive ventilation.
3. **Urine output adjustment** is replicated inline in `sofa_uo`.
We materialise both `uo_24hr` (sum) and `uo_tm_24hr` (count of
distinct hours within the rolling 24 h window that actually had
a UO observation), and the renal CASE in `sofa_components` uses
`GREATEST(uo_24hr, 0) * 24.0 / uo_tm_24hr` only when
`uo_tm_24hr BETWEEN 22 AND 30`, falling back to creatinine alone
otherwise. The `GREATEST(_, 0)` clip prevents patients on
continuous bladder irrigation (which `fluid_balance/urine_output.sql`
subtracts as a negative volume) from being mis-scored as
oliguric. Effect: at hours where the stay has lasted less than
~22 h or where charting is sparse, renal SOFA is computed from
creatinine only, instead of being over-scored from a partial UO
sum.
4. **Vasopressor rates** come from upstream
`durations/{epinephrine,norepinephrine,dopamine,dobutamine}_dose.sql`,
which already merge CareVue + MetaVision and convert to
mcg/kg/min. Weight-based unit conversion uses
`weight_durations.sql`, which itself depends on `echo_data.sql`
for weight imputation.
5. **Antibiotic time precision is one day.** MIMIC-III
`prescriptions.startdate` is DATE-precision only, so
`antibiotic_time` (and any sepsis-3 onset time driven by it)
lands on midnight of the prescription start date. Suspicion of
infection driven by a culture with charttime is still
timestamped to the minute.
The hourly SOFA pipeline is built **column-by-column into staged
tables** (`sofa_grid`, `sofa_vs`, `sofa_gcs`, `sofa_bili`, `sofa_cr`,
`sofa_plt`, `sofa_pf`, `sofa_uo`, `sofa_vaso`, then `sofa_wide`,
`sofa_components`, `sofa_hourly`). Each stage scans exactly one
giant raw table, so each stage can be `EXPLAIN ANALYZE`d
independently and re-run in isolation if you crash partway through.
The intermediates are kept (not dropped) so you can inspect them.
## Re-fetching from upstream
Re-run `fetch.ps1` (PowerShell, Windows) to pull the latest versions of
all upstream-mirrored scripts from the `main` branch of
`MIT-LCP/mimic-code`. Files under `sql/sepsis/` are bespoke and not
fetched.
## Caveats
- Severity scores (SAPS-II, SOFA) and sepsis-3 are computed for
**every** ICU stay, including neonates, re-admissions, and short
stays. Filter `icustay_id` yourself to match your study population
(the canonical adult-ICU sepsis-3 cohort excludes patients < 18
and ICU re-admissions, but we do not enforce that here).
- The PostgreSQL scripts in `concepts_postgres/` are auto-generated from
the BigQuery-flavored originals in `concepts/`; the `DATETIME_*`
function calls you'll see in the SQL are resolved by the wrappers in
`postgres-functions.sql`, so it must be sourced first (the master
scripts do this for you).