initial
This commit is contained in:
246
README.md
Normal file
246
README.md
Normal file
@@ -0,0 +1,246 @@
|
||||
# SAPS-II and Sepsis-3 on vanilla PostgreSQL MIMIC-III v1.3
|
||||
|
||||
This folder contains the SQL scripts needed to compute, on a stock
|
||||
PostgreSQL restore of the MIMIC-III v1.3 dump:
|
||||
|
||||
- the **Simplified Acute Physiology Score II (SAPS-II)** — a
|
||||
one-row-per-ICU-stay severity score computed on day 1; and
|
||||
- the **Sepsis-3 onset time** (Singer et al., JAMA 2016) — the
|
||||
earliest hour at which a patient has SOFA ≥ 2 within ±48 h
|
||||
of suspicion of infection.
|
||||
|
||||
The SAPS-II scripts are mirrored verbatim from the upstream repository
|
||||
[`MIT-LCP/mimic-code`](https://github.com/MIT-LCP/mimic-code) (specifically
|
||||
the `mimic-iii/concepts_postgres/` tree, the auto-generated PostgreSQL
|
||||
port of the BigQuery concepts). The Sepsis-3 pipeline is bespoke:
|
||||
upstream ships only an empty stub for MIMIC-III, so we adapted the
|
||||
MIMIC-IV scripts under `mimic-iv/concepts/{score,sepsis,medication}/`
|
||||
to MIMIC-III's schema (see "Sepsis-3 port notes" below).
|
||||
|
||||
## Files
|
||||
|
||||
```
|
||||
sql/
|
||||
├── build_sapsii.sql <-- master runner for SAPS-II
|
||||
├── build_sepsis3.sql <-- master runner for Sepsis-3
|
||||
├── postgres-functions.sql <-- PL/pgSQL shims for DATETIME_DIFF / _ADD / _SUB
|
||||
├── echo_data.sql <-- weight imputation helper
|
||||
├── durations/
|
||||
│ ├── ventilation_classification.sql
|
||||
│ ├── ventilation_durations.sql
|
||||
│ ├── weight_durations.sql
|
||||
│ ├── dobutamine_dose.sql
|
||||
│ ├── dopamine_dose.sql
|
||||
│ ├── epinephrine_dose.sql
|
||||
│ └── norepinephrine_dose.sql
|
||||
├── firstday/
|
||||
│ ├── blood_gas_first_day.sql
|
||||
│ ├── blood_gas_first_day_arterial.sql
|
||||
│ ├── gcs_first_day.sql
|
||||
│ ├── labs_first_day.sql
|
||||
│ ├── urine_output_first_day.sql
|
||||
│ └── vitals_first_day.sql
|
||||
├── fluid_balance/
|
||||
│ └── urine_output.sql <-- all-time UO; used by Sepsis-3
|
||||
├── severityscores/
|
||||
│ └── sapsii.sql
|
||||
└── sepsis/ <-- bespoke Sepsis-3 pipeline
|
||||
├── blood_gas_arterial.sql <-- all-time arterial BG (PaO2/FiO2)
|
||||
├── gcs_all.sql <-- all-time GCS
|
||||
├── sofa_hourly.sql <-- staged hourly SOFA pipeline
|
||||
├── antibiotic.sql <-- filtered antibiotic prescriptions
|
||||
├── suspicion_of_infection.sql <-- abx <-> culture pairing
|
||||
└── sepsis3.sql <-- final onset table
|
||||
```
|
||||
|
||||
## Dependency graph (SAPS-II)
|
||||
|
||||
```
|
||||
postgres-functions.sql (DATETIME_* shims, used everywhere below)
|
||||
│
|
||||
├── durations/ventilation_classification.sql
|
||||
│ └── durations/ventilation_durations.sql
|
||||
│
|
||||
├── firstday/blood_gas_first_day.sql
|
||||
│ └── firstday/blood_gas_first_day_arterial.sql
|
||||
│
|
||||
├── firstday/gcs_first_day.sql
|
||||
├── firstday/labs_first_day.sql
|
||||
├── firstday/urine_output_first_day.sql
|
||||
└── firstday/vitals_first_day.sql
|
||||
│
|
||||
└── severityscores/sapsii.sql <-- final table: `sapsii`
|
||||
```
|
||||
|
||||
## Dependency graph (Sepsis-3)
|
||||
|
||||
```
|
||||
postgres-functions.sql
|
||||
│
|
||||
├── echo_data.sql
|
||||
│ └── durations/weight_durations.sql
|
||||
│ ├── durations/dobutamine_dose.sql
|
||||
│ ├── durations/dopamine_dose.sql
|
||||
│ ├── durations/epinephrine_dose.sql
|
||||
│ └── durations/norepinephrine_dose.sql
|
||||
│
|
||||
├── fluid_balance/urine_output.sql (all-time UO; consumed by sofa_hourly)
|
||||
├── durations/ventilation_classification.sql
|
||||
│ └── durations/ventilation_durations.sql
|
||||
│
|
||||
├── sepsis/blood_gas_arterial.sql (all-time arterial PaO2/FiO2)
|
||||
├── sepsis/gcs_all.sql (all-time GCS)
|
||||
│ │
|
||||
│ └── sepsis/sofa_hourly.sql <-- table: `sofa_hourly`
|
||||
│ (one row per ICU hour)
|
||||
│
|
||||
├── sepsis/antibiotic.sql
|
||||
│ └── sepsis/suspicion_of_infection.sql
|
||||
│
|
||||
└── sepsis/sepsis3.sql <-- final table: `sepsis3`
|
||||
(one row per ICU stay)
|
||||
```
|
||||
|
||||
`fluid_balance/urine_output.sql` is included because it appears in the
|
||||
upstream `postgres-make-concepts.sql` and is harmless to build for
|
||||
SAPS-II; for Sepsis-3 it is required (the renal-SOFA branch reads it).
|
||||
|
||||
## Required base MIMIC-III tables
|
||||
|
||||
The scripts assume the standard MIMIC-III v1.3 schema with these tables
|
||||
already present and accessible via `search_path`:
|
||||
|
||||
`admissions`, `chartevents`, `diagnoses_icd`, `icustays`,
|
||||
`inputevents_cv`, `inputevents_mv`, `labevents`, `microbiologyevents`,
|
||||
`noteevents` (for `echo_data`), `outputevents`, `patients`,
|
||||
`prescriptions`, `procedureevents_mv`, `services`.
|
||||
|
||||
## Running SAPS-II
|
||||
|
||||
1. Restore the MIMIC-III v1.3 dump into a PostgreSQL database (the usual
|
||||
`postgres_create_tables.sql` / `postgres_load_data.sql` flow from the
|
||||
`mimic-iii/buildmimic/postgres/` directory of the upstream repo).
|
||||
2. Make sure the schema containing those tables is on your `search_path`
|
||||
(commonly `mimiciii`).
|
||||
3. Run the master script from this directory:
|
||||
|
||||
```bash
|
||||
psql -d mimic -v ON_ERROR_STOP=1 \
|
||||
-c 'SET search_path TO mimiciii, public;' \
|
||||
-f sql/build_sapsii.sql
|
||||
```
|
||||
|
||||
4. Query the result:
|
||||
|
||||
```sql
|
||||
SELECT subject_id, hadm_id, icustay_id, sapsii, sapsii_prob
|
||||
FROM sapsii
|
||||
ORDER BY icustay_id
|
||||
LIMIT 10;
|
||||
```
|
||||
|
||||
## Running Sepsis-3
|
||||
|
||||
1. Same prerequisites as SAPS-II.
|
||||
2. Run:
|
||||
|
||||
```bash
|
||||
psql -d mimic -v ON_ERROR_STOP=1 \
|
||||
-c 'SET search_path TO mimiciii, public;' \
|
||||
-f sql/build_sepsis3.sql
|
||||
```
|
||||
|
||||
Expect a few hours runtime on stock PostgreSQL with default
|
||||
indexes. Most of that is the per-measurement scans of
|
||||
`chartevents` (~330 M rows), `labevents` (~30 M rows), and
|
||||
`outputevents` (~4 M rows) that drive the hourly SOFA pipeline.
|
||||
|
||||
3. Query the result:
|
||||
|
||||
```sql
|
||||
SELECT icustay_id, suspected_infection_time,
|
||||
sofa_time, sofa_score, sepsis3
|
||||
FROM sepsis3
|
||||
WHERE sepsis3 = TRUE
|
||||
ORDER BY icustay_id
|
||||
LIMIT 10;
|
||||
```
|
||||
|
||||
For per-hour SOFA across the whole stay (e.g. for time-series
|
||||
modelling), query `sofa_hourly` directly.
|
||||
|
||||
### Sepsis-3 port notes
|
||||
|
||||
The Sepsis-3 onset definition is *time-stamped*: it requires a SOFA
|
||||
score at any moment during the ICU stay, not just on day 1, because
|
||||
suspicion of infection can fall anywhere in the stay. MIMIC-IV
|
||||
provides a hourly SOFA via a stack of pre-built `mimic_derived.*`
|
||||
tables (`icustay_hourly`, `bg`, `vitalsign`, `gcs`,
|
||||
`urine_output_rate`, `chemistry`, `enzyme`, `complete_blood_count`,
|
||||
`epinephrine`, `norepinephrine`, `dopamine`, `dobutamine`,
|
||||
`ventilation` with `ventilation_status='InvasiveVent'`). Almost none
|
||||
of those exist in `mimic-iii/concepts_postgres/`, so we ported the
|
||||
needed bits inline. Notable differences from MIMIC-IV behaviour:
|
||||
|
||||
1. **Hourly grid** is built inline with `generate_series` on
|
||||
`icustays.intime` / `outtime` (no `icustay_hourly` table).
|
||||
2. **Ventilation status granularity** is missing. MIMIC-IV's
|
||||
`InvasiveVent` filter (used to split `pao2fio2ratio_vent` from
|
||||
`pao2fio2ratio_novent` so a ventilated patient doesn't get a
|
||||
stratospheric SOFA respiratory score from a noisy unventilated
|
||||
PaO2:FiO2) doesn't have a clean MIMIC-III equivalent. We use the
|
||||
lumped `ventilation_durations`, treating any active ventilation
|
||||
row as invasive ventilation.
|
||||
3. **Urine output adjustment** is replicated inline in `sofa_uo`.
|
||||
We materialise both `uo_24hr` (sum) and `uo_tm_24hr` (count of
|
||||
distinct hours within the rolling 24 h window that actually had
|
||||
a UO observation), and the renal CASE in `sofa_components` uses
|
||||
`GREATEST(uo_24hr, 0) * 24.0 / uo_tm_24hr` only when
|
||||
`uo_tm_24hr BETWEEN 22 AND 30`, falling back to creatinine alone
|
||||
otherwise. The `GREATEST(_, 0)` clip prevents patients on
|
||||
continuous bladder irrigation (which `fluid_balance/urine_output.sql`
|
||||
subtracts as a negative volume) from being mis-scored as
|
||||
oliguric. Effect: at hours where the stay has lasted less than
|
||||
~22 h or where charting is sparse, renal SOFA is computed from
|
||||
creatinine only, instead of being over-scored from a partial UO
|
||||
sum.
|
||||
4. **Vasopressor rates** come from upstream
|
||||
`durations/{epinephrine,norepinephrine,dopamine,dobutamine}_dose.sql`,
|
||||
which already merge CareVue + MetaVision and convert to
|
||||
mcg/kg/min. Weight-based unit conversion uses
|
||||
`weight_durations.sql`, which itself depends on `echo_data.sql`
|
||||
for weight imputation.
|
||||
5. **Antibiotic time precision is one day.** MIMIC-III
|
||||
`prescriptions.startdate` is DATE-precision only, so
|
||||
`antibiotic_time` (and any sepsis-3 onset time driven by it)
|
||||
lands on midnight of the prescription start date. Suspicion of
|
||||
infection driven by a culture with charttime is still
|
||||
timestamped to the minute.
|
||||
|
||||
The hourly SOFA pipeline is built **column-by-column into staged
|
||||
tables** (`sofa_grid`, `sofa_vs`, `sofa_gcs`, `sofa_bili`, `sofa_cr`,
|
||||
`sofa_plt`, `sofa_pf`, `sofa_uo`, `sofa_vaso`, then `sofa_wide`,
|
||||
`sofa_components`, `sofa_hourly`). Each stage scans exactly one
|
||||
giant raw table, so each stage can be `EXPLAIN ANALYZE`d
|
||||
independently and re-run in isolation if you crash partway through.
|
||||
The intermediates are kept (not dropped) so you can inspect them.
|
||||
|
||||
## Re-fetching from upstream
|
||||
|
||||
Re-run `fetch.ps1` (PowerShell, Windows) to pull the latest versions of
|
||||
all upstream-mirrored scripts from the `main` branch of
|
||||
`MIT-LCP/mimic-code`. Files under `sql/sepsis/` are bespoke and not
|
||||
fetched.
|
||||
|
||||
## Caveats
|
||||
|
||||
- Severity scores (SAPS-II, SOFA) and sepsis-3 are computed for
|
||||
**every** ICU stay, including neonates, re-admissions, and short
|
||||
stays. Filter `icustay_id` yourself to match your study population
|
||||
(the canonical adult-ICU sepsis-3 cohort excludes patients < 18
|
||||
and ICU re-admissions, but we do not enforce that here).
|
||||
- The PostgreSQL scripts in `concepts_postgres/` are auto-generated from
|
||||
the BigQuery-flavored originals in `concepts/`; the `DATETIME_*`
|
||||
function calls you'll see in the SQL are resolved by the wrappers in
|
||||
`postgres-functions.sql`, so it must be sourced first (the master
|
||||
scripts do this for you).
|
||||
Reference in New Issue
Block a user