-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Add healthcare KPI analysis tutorial using synthetic inpatient admissions data #8713
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
Rifa-111
wants to merge
3
commits into
Project-MONAI:dev
Choose a base branch
from
Rifa-111:tutorial-health-kpis
base: dev
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
252 changes: 252 additions & 0 deletions
252
monai/tutorials/health_kpi_analysis/health_kpi_analysis.ipynb
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,252 @@ | ||
| { | ||
| "cells": [ | ||
| { | ||
| "cell_type": "code", | ||
| "execution_count": null, | ||
| "metadata": {}, | ||
| "outputs": [], | ||
| "source": [ | ||
| "# Healthcare KPI Analysis with MONAI\n", | ||
| "\n", | ||
| "## Overview\n", | ||
| "\n", | ||
| "In this tutorial we demonstrate how to compute common healthcare Key Performance Indicators (KPIs) using synthetic inpatient admissions data. We illustrate how tabular health data can be integrated into analytical workflows relevant to medical AI research.\n", | ||
| "\n", | ||
| "The KPIs computed in this tutorial include:\n", | ||
| "\n", | ||
| "- **Average Length of Stay (LOS)**\n", | ||
| "- **30-day Readmission Rate**\n", | ||
| "- **Daily Bed Occupancy**\n", | ||
| "\n", | ||
| "Although this tutorial does not use real patient data, the methodology is representative of analytics performed in hospital operations, population health management and clinical ML evaluation contexts.\n" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "markdown", | ||
| "metadata": {}, | ||
| "source": [ | ||
| "## Motivation\n", | ||
| "\n", | ||
| "Healthcare operations and clinical pathways generate complex tabular datasets that include admission, diagnosis and discharge patterns. These datasets complement medical imaging and can support:\n", | ||
| "\n", | ||
| "- risk stratification\n", | ||
| "- resource allocation\n", | ||
| "- quality metrics\n", | ||
| "- patient flow analysis\n", | ||
| "- clinical outcome modelling\n", | ||
| "\n", | ||
| "By combining synthetic EHR-like tabular data with MONAI workflows, we demonstrate how such metrics can be derived reproducibly and ethically for research and prototyping.\n" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "code", | ||
| "execution_count": null, | ||
| "metadata": {}, | ||
| "outputs": [], | ||
| "source": [ | ||
| "# Install required dependencies if running standalone\n", | ||
| "# !pip install health-analytics-toolkit pandas matplotlib seaborn\n", | ||
| "\n", | ||
| "import pandas as pd\n", | ||
| "import seaborn as sns\n", | ||
| "import matplotlib.pyplot as plt\n", | ||
| "import health_analytics_toolkit as hat\n", | ||
| "\n", | ||
| "sns.set(style=\"whitegrid\")\n" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "markdown", | ||
| "metadata": {}, | ||
| "source": [ | ||
| "## 1. Generate Synthetic Data\n", | ||
| "\n", | ||
| "We generate a synthetic inpatient admission dataset using the `health-analytics-toolkit` package. The dataset includes:\n", | ||
| "\n", | ||
| "- demographic attributes\n", | ||
| "- diagnosis codes\n", | ||
| "- admission and discharge timestamps\n", | ||
| "- hospital site codes\n", | ||
| "- readmission indicators\n", | ||
| "\n", | ||
| "Synthetic data avoids patient privacy concerns while preserving realistic structure.\n" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "code", | ||
| "execution_count": null, | ||
| "metadata": {}, | ||
| "outputs": [], | ||
| "source": [ | ||
| "df = hat.generate_synthetic_patients(n=2000)\n", | ||
| "df.head()\n" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "markdown", | ||
| "metadata": {}, | ||
| "source": [ | ||
| "## 2. Define Cohorts\n", | ||
| "\n", | ||
| "To emulate analytic workflows, we create specific patient cohorts. Cohorts can be defined by:\n", | ||
| "\n", | ||
| "- age thresholds\n", | ||
| "- diagnosis categories\n", | ||
| "- hospital sites\n", | ||
| "- admission period\n", | ||
| "\n", | ||
| "In real environments this supports service line analysis and operational decision-making.\n" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "code", | ||
| "execution_count": null, | ||
| "metadata": {}, | ||
| "outputs": [], | ||
| "source": [ | ||
| "# Example: patients aged 65+ (elderly cohort)\n", | ||
| "elderly = hat.filter_by_age(df, min_age=65)\n", | ||
| "\n", | ||
| "# Example: patients with specific chronic diagnoses\n", | ||
| "chronic_codes = [\"I10\", \"E11\", \"N18\"] # hypertension, diabetes, kidney disease\n", | ||
| "chronic = hat.filter_by_diagnosis_codes(elderly, chronic_codes)\n", | ||
| "\n", | ||
| "# Example: admissions to a specific hospital site\n", | ||
| "siteA = hat.filter_by_hospital_site(chronic, [\"NHS-TRUST-A\"])\n", | ||
| "\n", | ||
| "print(f\"Original dataset: {len(df)} patients\")\n", | ||
| "print(f\"Elderly cohort: {len(elderly)} patients\")\n", | ||
| "print(f\"Chronic elderly cohort: {len(chronic)} patients\")\n", | ||
| "print(f\"Site A chronic elderly cohort: {len(siteA)} patients\")\n" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "markdown", | ||
| "metadata": {}, | ||
| "source": [ | ||
| "## 3. Compute Healthcare KPIs\n", | ||
| "\n", | ||
| "We compute three common hospital operations KPIs:\n", | ||
| "\n", | ||
| "### **Average Length of Stay (LOS)** \n", | ||
| "Measures inpatient duration and informs acuity, throughput and discharge planning.\n", | ||
| "\n", | ||
| "### **30-day Readmission Rate** \n", | ||
| "Proxy for care quality and care coordination, commonly monitored in public healthcare systems.\n", | ||
| "\n", | ||
| "### **Daily Bed Occupancy** \n", | ||
| "Estimates operational capacity utilisation across inpatient wards.\n" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "code", | ||
| "execution_count": null, | ||
| "metadata": {}, | ||
| "outputs": [], | ||
| "source": [ | ||
| "alos = hat.average_length_of_stay(siteA)\n", | ||
| "readmit_rate = hat.readmission_rate(siteA)\n", | ||
| "bed_occ = hat.daily_bed_occupancy(siteA)\n", | ||
| "\n", | ||
| "print(f\"Average LOS: {alos:.2f} days\")\n", | ||
| "print(f\"30-day Readmission Rate: {readmit_rate:.1%}\")\n" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "markdown", | ||
| "metadata": {}, | ||
| "source": [ | ||
| "## 4. Visualise Outputs\n", | ||
| "\n", | ||
| "Operational analytics frequently rely on visualisation to communicate patterns to clinical and administrative stakeholders.\n" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "code", | ||
| "execution_count": null, | ||
| "metadata": {}, | ||
| "outputs": [], | ||
| "source": [ | ||
| "plt.figure(figsize=(12, 4))\n", | ||
| "bed_occ.plot()\n", | ||
| "plt.title(\"Daily Bed Occupancy (Site A Chronic Elderly Cohort)\")\n", | ||
| "plt.ylabel(\"Occupied Beds\")\n", | ||
| "plt.xlabel(\"Date\")\n", | ||
| "plt.show()\n" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "code", | ||
| "execution_count": null, | ||
| "metadata": {}, | ||
| "outputs": [], | ||
| "source": [ | ||
| "plt.figure(figsize=(6,4))\n", | ||
| "df_los = siteA.copy()\n", | ||
| "df_los[\"los\"] = (pd.to_datetime(df_los.discharge_date) - pd.to_datetime(df_los.admission_date)).dt.days\n", | ||
| "\n", | ||
| "sns.histplot(df_los[\"los\"], bins=20, kde=False)\n", | ||
| "plt.title(\"Distribution of Length of Stay (days)\")\n", | ||
| "plt.xlabel(\"Length of Stay\")\n", | ||
| "plt.ylabel(\"Count\")\n", | ||
| "plt.show()\n" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "markdown", | ||
| "metadata": {}, | ||
| "source": [ | ||
| "## 5. Discussion\n", | ||
| "\n", | ||
| "This workflow demonstrates that synthetic EHR-like tabular data can support healthcare analytics tasks such as:\n", | ||
| "\n", | ||
| "- resource planning (bed occupancy)\n", | ||
| "- quality benchmarking (readmission)\n", | ||
| "- pathway efficiency measurement (LOS)\n", | ||
| "- cohort stratification (diagnosis, age, site)\n", | ||
| "\n", | ||
| "These metrics can complement MONAI workflows that analyse medical imaging datasets, enabling multimodal clinical ML model development.\n", | ||
| "\n", | ||
| "In real-world settings such analyses may contribute to:\n", | ||
| "\n", | ||
| "- population health management\n", | ||
| "- service line optimisation\n", | ||
| "- discharge planning\n", | ||
| "- clinical commissioning\n", | ||
| "- digital clinical transformation programmes\n" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "markdown", | ||
| "metadata": {}, | ||
| "source": [ | ||
| "## 6. Reproducibility & Notes\n", | ||
| "\n", | ||
| "- This tutorial uses synthetic data to ensure full privacy compliance.\n", | ||
| "- Underlying distributions are configurable and can be adapted for benchmarking scenarios.\n", | ||
| "- No Protected Health Information (PHI) is used.\n", | ||
| "- All code is executable on standard CPUs without specialised hardware.\n", | ||
| "\n", | ||
| "### Dependencies\n", | ||
| "\n", | ||
| "- Python \u2265 3.9\n", | ||
| "- pandas \u2265 1.5\n", | ||
| "- health-analytics-toolkit \u2265 0.1.0\n", | ||
| "- matplotlib, seaborn (optional for plots)\n", | ||
| "\n", | ||
| "### Suggested Extensions\n", | ||
| "\n", | ||
| "- incorporate imaging-derived features (e.g., MONAI outputs)\n", | ||
| "- integrate survival analysis packages for clinical outcomes research\n", | ||
| "- link with FHIR-like schema for interoperability\n" | ||
| ] | ||
| } | ||
| ], | ||
| "metadata": { | ||
| "language_info": { | ||
| "name": "python" | ||
| } | ||
| }, | ||
| "nbformat": 4, | ||
| "nbformat_minor": 2 | ||
| } | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make the intro cell markdown.
Line 4-8 defines a code cell but the content is markdown, so headings render as comments.
💡 Suggested fix
📝 Committable suggestion
🤖 Prompt for AI Agents