diff --git a/monai/tutorials/health_kpi_analysis/health_kpi_analysis.ipynb b/monai/tutorials/health_kpi_analysis/health_kpi_analysis.ipynb new file mode 100644 index 0000000000..e60aef6b4a --- /dev/null +++ b/monai/tutorials/health_kpi_analysis/health_kpi_analysis.ipynb @@ -0,0 +1,252 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Healthcare KPI Analysis with MONAI\n", + "\n", + "## Overview\n", + "\n", + "In this tutorial we demonstrate how to compute common healthcare Key Performance Indicators (KPIs) using synthetic inpatient admissions data. We illustrate how tabular health data can be integrated into analytical workflows relevant to medical AI research.\n", + "\n", + "The KPIs computed in this tutorial include:\n", + "\n", + "- **Average Length of Stay (LOS)**\n", + "- **30-day Readmission Rate**\n", + "- **Daily Bed Occupancy**\n", + "\n", + "Although this tutorial does not use real patient data, the methodology is representative of analytics performed in hospital operations, population health management and clinical ML evaluation contexts.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Motivation\n", + "\n", + "Healthcare operations and clinical pathways generate complex tabular datasets that include admission, diagnosis and discharge patterns. These datasets complement medical imaging and can support:\n", + "\n", + "- risk stratification\n", + "- resource allocation\n", + "- quality metrics\n", + "- patient flow analysis\n", + "- clinical outcome modelling\n", + "\n", + "By combining synthetic EHR-like tabular data with MONAI workflows, we demonstrate how such metrics can be derived reproducibly and ethically for research and prototyping.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Install required dependencies if running standalone\n", + "# !pip install health-analytics-toolkit pandas matplotlib seaborn\n", + "\n", + "import pandas as pd\n", + "import seaborn as sns\n", + "import matplotlib.pyplot as plt\n", + "import health_analytics_toolkit as hat\n", + "\n", + "sns.set(style=\"whitegrid\")\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 1. Generate Synthetic Data\n", + "\n", + "We generate a synthetic inpatient admission dataset using the `health-analytics-toolkit` package. The dataset includes:\n", + "\n", + "- demographic attributes\n", + "- diagnosis codes\n", + "- admission and discharge timestamps\n", + "- hospital site codes\n", + "- readmission indicators\n", + "\n", + "Synthetic data avoids patient privacy concerns while preserving realistic structure.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "df = hat.generate_synthetic_patients(n=2000)\n", + "df.head()\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 2. Define Cohorts\n", + "\n", + "To emulate analytic workflows, we create specific patient cohorts. Cohorts can be defined by:\n", + "\n", + "- age thresholds\n", + "- diagnosis categories\n", + "- hospital sites\n", + "- admission period\n", + "\n", + "In real environments this supports service line analysis and operational decision-making.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Example: patients aged 65+ (elderly cohort)\n", + "elderly = hat.filter_by_age(df, min_age=65)\n", + "\n", + "# Example: patients with specific chronic diagnoses\n", + "chronic_codes = [\"I10\", \"E11\", \"N18\"] # hypertension, diabetes, kidney disease\n", + "chronic = hat.filter_by_diagnosis_codes(elderly, chronic_codes)\n", + "\n", + "# Example: admissions to a specific hospital site\n", + "siteA = hat.filter_by_hospital_site(chronic, [\"NHS-TRUST-A\"])\n", + "\n", + "print(f\"Original dataset: {len(df)} patients\")\n", + "print(f\"Elderly cohort: {len(elderly)} patients\")\n", + "print(f\"Chronic elderly cohort: {len(chronic)} patients\")\n", + "print(f\"Site A chronic elderly cohort: {len(siteA)} patients\")\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 3. Compute Healthcare KPIs\n", + "\n", + "We compute three common hospital operations KPIs:\n", + "\n", + "### **Average Length of Stay (LOS)** \n", + "Measures inpatient duration and informs acuity, throughput and discharge planning.\n", + "\n", + "### **30-day Readmission Rate** \n", + "Proxy for care quality and care coordination, commonly monitored in public healthcare systems.\n", + "\n", + "### **Daily Bed Occupancy** \n", + "Estimates operational capacity utilisation across inpatient wards.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "alos = hat.average_length_of_stay(siteA)\n", + "readmit_rate = hat.readmission_rate(siteA)\n", + "bed_occ = hat.daily_bed_occupancy(siteA)\n", + "\n", + "print(f\"Average LOS: {alos:.2f} days\")\n", + "print(f\"30-day Readmission Rate: {readmit_rate:.1%}\")\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 4. Visualise Outputs\n", + "\n", + "Operational analytics frequently rely on visualisation to communicate patterns to clinical and administrative stakeholders.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "plt.figure(figsize=(12, 4))\n", + "bed_occ.plot()\n", + "plt.title(\"Daily Bed Occupancy (Site A Chronic Elderly Cohort)\")\n", + "plt.ylabel(\"Occupied Beds\")\n", + "plt.xlabel(\"Date\")\n", + "plt.show()\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "plt.figure(figsize=(6,4))\n", + "df_los = siteA.copy()\n", + "df_los[\"los\"] = (pd.to_datetime(df_los.discharge_date) - pd.to_datetime(df_los.admission_date)).dt.days\n", + "\n", + "sns.histplot(df_los[\"los\"], bins=20, kde=False)\n", + "plt.title(\"Distribution of Length of Stay (days)\")\n", + "plt.xlabel(\"Length of Stay\")\n", + "plt.ylabel(\"Count\")\n", + "plt.show()\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 5. Discussion\n", + "\n", + "This workflow demonstrates that synthetic EHR-like tabular data can support healthcare analytics tasks such as:\n", + "\n", + "- resource planning (bed occupancy)\n", + "- quality benchmarking (readmission)\n", + "- pathway efficiency measurement (LOS)\n", + "- cohort stratification (diagnosis, age, site)\n", + "\n", + "These metrics can complement MONAI workflows that analyse medical imaging datasets, enabling multimodal clinical ML model development.\n", + "\n", + "In real-world settings such analyses may contribute to:\n", + "\n", + "- population health management\n", + "- service line optimisation\n", + "- discharge planning\n", + "- clinical commissioning\n", + "- digital clinical transformation programmes\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 6. Reproducibility & Notes\n", + "\n", + "- This tutorial uses synthetic data to ensure full privacy compliance.\n", + "- Underlying distributions are configurable and can be adapted for benchmarking scenarios.\n", + "- No Protected Health Information (PHI) is used.\n", + "- All code is executable on standard CPUs without specialised hardware.\n", + "\n", + "### Dependencies\n", + "\n", + "- Python \u2265 3.9\n", + "- pandas \u2265 1.5\n", + "- health-analytics-toolkit \u2265 0.1.0\n", + "- matplotlib, seaborn (optional for plots)\n", + "\n", + "### Suggested Extensions\n", + "\n", + "- incorporate imaging-derived features (e.g., MONAI outputs)\n", + "- integrate survival analysis packages for clinical outcomes research\n", + "- link with FHIR-like schema for interoperability\n" + ] + } + ], + "metadata": { + "language_info": { + "name": "python" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +}