{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "a5af592b",
   "metadata": {},
   "source": [
    "# Example 2: Data Storage Modes — Precomputed QoIs vs Raw MCDS\n",
    "\n",
    "[(GitHub link)](https://github.com/heberlr/UQ_PhysiCell/tree/main/examples/ex2_storage_modes.ipynb)\n",
    "\n",
    "`ModelAnalysisContext` supports two storage strategies, controlled by whether `qois_info` is empty or not:\n",
    "\n",
    "| | **Mode A — Precomputed QoIs** | **Mode B — Raw MCDS** |\n",
    "|---|---|---|\n",
    "| `qois_info` | defined (lambdas) | `{}` (empty) |\n",
    "| Stored in DB | small QoI DataFrame | full `list[pcdl.TimeStep]` |\n",
    "| DB size | small (~KB) | large (~MB per run) |\n",
    "| Query later | only pre-defined QoIs | **any** QoI, no re-run |\n",
    "| Best for | SA with many runs (ex3+) | exploratory / uncertain QoIs |\n",
    "\n",
    "The remarkable thing: **`calculate_qoi_statistics` accepts both** — it detects what is stored, handles the computation transparently, and always returns a **long-format `(SampleID, time)` MultiIndex DataFrame** with QoI names as columns."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "ffc446d4",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "PhysiCell already exists at: PhysiCell-master\n",
      "Skipping download. Use force_download=True to override.\n"
     ]
    }
   ],
   "source": [
    "import os, warnings\n",
    "warnings.filterwarnings('ignore')\n",
    "\n",
    "from uq_physicell import get_physicell\n",
    "from uq_physicell.model_analysis import ModelAnalysisContext, calculate_qoi_statistics\n",
    "\n",
    "get_physicell(target_dir=\".\")\n",
    "\n",
    "model_config = {\"ini_path\": \"Model_Struct.ini\", \"struc_name\": \"physicell_model_2\"}\n",
    "\n",
    "# QoI functions we want to measure — used in Mode A at run time, and in Mode B for post-hoc query\n",
    "qoi_funcs = {\n",
    "    \"live_cells\":      lambda df_cell: len(df_cell[df_cell['dead'] == False]),\n",
    "    \"interferon_mean\": lambda df_subs: df_subs['interferon'].mean(),\n",
    "}\n",
    "\n",
    "# A single parameter set used for both modes (same simulation, different storage)\n",
    "samples = {0: {\"viral_replication_rate\": 0.125, \"min_virion_count\": 1.0}}"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f4c5c1df",
   "metadata": {},
   "source": [
    "## Mode A — Precomputed QoIs\n",
    "\n",
    "QoI functions are passed to the context. At each timestep the functions run on the PhysiCell output, the resulting DataFrame is stored in the database, and the raw output folder is deleted. You commit to specific QoIs at run time but keep storage minimal."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "b41ffb6c",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Inserting {'live_cells': \"lambda df_cell: len(df_cell[df_cell['dead'] == False])\", 'interferon_mean': \"lambda df_subs: df_subs['interferon'].mean()\"} QoIs into the database\n",
      "Simulations completed and results stored in the database: ex2_mode_a.db.\n",
      "Mode A DB size: 24.0 KB\n"
     ]
    }
   ],
   "source": [
    "context_a = ModelAnalysisContext(\n",
    "    \"ex2_mode_a.db\", model_config,\n",
    "    sampler='User-defined',\n",
    "    params_info={},\n",
    "    qois_info=qoi_funcs,   # QoIs computed at run time → DataFrame stored\n",
    "    num_workers=1,\n",
    ")\n",
    "context_a.set_samples(samples)\n",
    "context_a.run()\n",
    "\n",
    "print(f\"Mode A DB size: {os.path.getsize('ex2_mode_a.db') / 1024:.1f} KB\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "50c7e79b",
   "metadata": {},
   "source": [
    "## Mode B — Raw MCDS Storage\n",
    "\n",
    "`qois_info={}` tells the context to store the full `list[pcdl.TimeStep]` objects instead of computing QoIs. The output folder is still cleaned up, but the complete simulation state is preserved in the database. Larger storage, but you can compute **any** QoI after the fact without re-running."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "a85aba4e",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Inserting {} QoIs into the database\n",
      "Simulations completed and results stored in the database: ex2_mode_b.db.\n",
      "Mode A (precomputed QoIs): 24.0 KB\n",
      "Mode B (raw MCDS):         3468.0 KB  (144× larger)\n"
     ]
    }
   ],
   "source": [
    "context_b = ModelAnalysisContext(\n",
    "    \"ex2_mode_b.db\", model_config,\n",
    "    sampler='User-defined',\n",
    "    params_info={},\n",
    "    qois_info={},          # empty → raw MCDS list stored\n",
    "    num_workers=1,\n",
    ")\n",
    "context_b.set_samples(samples)\n",
    "context_b.run()\n",
    "\n",
    "size_a = os.path.getsize('ex2_mode_a.db') / 1024\n",
    "size_b = os.path.getsize('ex2_mode_b.db') / 1024\n",
    "print(f\"Mode A (precomputed QoIs): {size_a:.1f} KB\")\n",
    "print(f\"Mode B (raw MCDS):         {size_b:.1f} KB  ({size_b/size_a:.0f}× larger)\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fc7e4eb9",
   "metadata": {},
   "source": [
    "## Querying both databases with the same call\n",
    "\n",
    "`calculate_qoi_statistics` detects what is stored and handles the computation automatically — no code change needed on the query side.\n",
    "\n",
    "Both modes now return a **long-format DataFrame** with a `(SampleID, time)` MultiIndex and QoI names as columns.\n",
    "\n",
    "**Mode B bonus:** you can query a QoI you never defined at run time, with no re-simulation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "389da677",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "No QoI data provided, calculating QoIs from the database...\n",
      "All samples in Samples table have corresponding entries in Output table.\n",
      "Extracting QoIs from DataFrame...\n",
      "No QoI data provided, calculating QoIs from the database...\n",
      "All samples in Samples table have corresponding entries in Output table.\n",
      "Calculating QoIs from mcds list...\n",
      "Mode A — mean QoIs:\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>live_cells</th>\n",
       "      <th>interferon_mean</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>SampleID</th>\n",
       "      <th>time</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"9\" valign=\"top\">0</th>\n",
       "      <th>0.0</th>\n",
       "      <td>1060.0</td>\n",
       "      <td>0.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>360.0</th>\n",
       "      <td>1056.8</td>\n",
       "      <td>0.000845</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>720.0</th>\n",
       "      <td>1053.2</td>\n",
       "      <td>0.000398</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1080.0</th>\n",
       "      <td>1023.0</td>\n",
       "      <td>0.002876</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1440.0</th>\n",
       "      <td>1008.8</td>\n",
       "      <td>0.001156</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1800.0</th>\n",
       "      <td>957.8</td>\n",
       "      <td>0.006680</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2160.0</th>\n",
       "      <td>916.0</td>\n",
       "      <td>0.003641</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2520.0</th>\n",
       "      <td>761.6</td>\n",
       "      <td>0.005035</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2880.0</th>\n",
       "      <td>711.6</td>\n",
       "      <td>0.003222</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                 live_cells  interferon_mean\n",
       "SampleID time                               \n",
       "0        0.0         1060.0         0.000000\n",
       "         360.0       1056.8         0.000845\n",
       "         720.0       1053.2         0.000398\n",
       "         1080.0      1023.0         0.002876\n",
       "         1440.0      1008.8         0.001156\n",
       "         1800.0       957.8         0.006680\n",
       "         2160.0       916.0         0.003641\n",
       "         2520.0       761.6         0.005035\n",
       "         2880.0       711.6         0.003222"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Mode B — mean QoIs (computed post-hoc from raw MCDS):\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>interferon_mean</th>\n",
       "      <th>live_cells</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>SampleID</th>\n",
       "      <th>time</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"9\" valign=\"top\">0</th>\n",
       "      <th>0.0</th>\n",
       "      <td>0.000000</td>\n",
       "      <td>1060.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>360.0</th>\n",
       "      <td>0.000820</td>\n",
       "      <td>1056.6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>720.0</th>\n",
       "      <td>0.000497</td>\n",
       "      <td>1054.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1080.0</th>\n",
       "      <td>0.003034</td>\n",
       "      <td>1025.4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1440.0</th>\n",
       "      <td>0.001482</td>\n",
       "      <td>998.8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1800.0</th>\n",
       "      <td>0.005664</td>\n",
       "      <td>923.8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2160.0</th>\n",
       "      <td>0.007086</td>\n",
       "      <td>717.8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2520.0</th>\n",
       "      <td>0.001539</td>\n",
       "      <td>602.2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2880.0</th>\n",
       "      <td>0.000851</td>\n",
       "      <td>596.4</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                 interferon_mean  live_cells\n",
       "SampleID time                               \n",
       "0        0.0            0.000000      1060.0\n",
       "         360.0          0.000820      1056.6\n",
       "         720.0          0.000497      1054.0\n",
       "         1080.0         0.003034      1025.4\n",
       "         1440.0         0.001482       998.8\n",
       "         1800.0         0.005664       923.8\n",
       "         2160.0         0.007086       717.8\n",
       "         2520.0         0.001539       602.2\n",
       "         2880.0         0.000851       596.4"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "No QoI data provided, calculating QoIs from the database...\n",
      "All samples in Samples table have corresponding entries in Output table.\n",
      "Calculating QoIs from mcds list...\n",
      "Mode B — new QoI computed without re-running:\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>dead_cells</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>SampleID</th>\n",
       "      <th>time</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"9\" valign=\"top\">0</th>\n",
       "      <th>0.0</th>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>360.0</th>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>720.0</th>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1080.0</th>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1440.0</th>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1800.0</th>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2160.0</th>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2520.0</th>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2880.0</th>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                 dead_cells\n",
       "SampleID time              \n",
       "0        0.0            0.0\n",
       "         360.0          0.0\n",
       "         720.0          0.0\n",
       "         1080.0         0.0\n",
       "         1440.0         0.0\n",
       "         1800.0         0.0\n",
       "         2160.0         0.0\n",
       "         2520.0         0.0\n",
       "         2880.0         0.0"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Mode A: reads pre-stored QoI DataFrame directly\n",
    "df_mean_a, _, _ = calculate_qoi_statistics(\"ex2_mode_a.db\", qoi_funcs)\n",
    "\n",
    "# Mode B: recomputes qoi_funcs from stored MCDS objects\n",
    "df_mean_b, _, _ = calculate_qoi_statistics(\"ex2_mode_b.db\", qoi_funcs)\n",
    "\n",
    "print(\"Mode A — mean QoIs:\")\n",
    "display(df_mean_a)\n",
    "\n",
    "print(\"Mode B — mean QoIs (computed post-hoc from raw MCDS):\")\n",
    "display(df_mean_b)\n",
    "\n",
    "# Mode B only: compute a QoI that was never defined at run time\n",
    "new_qoi = {\"dead_cells\": lambda df_cell: len(df_cell[df_cell['dead'] == True])}\n",
    "df_mean_new, _, _ = calculate_qoi_statistics(\"ex2_mode_b.db\", new_qoi)\n",
    "print(\"Mode B — new QoI computed without re-running:\")\n",
    "display(df_mean_new)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5af2ffdc",
   "metadata": {},
   "source": [
    "---\n",
    "**Rule of thumb:**\n",
    "- Use **Mode A** when you already know your QoIs and plan to run many simulations (ex3+). Storage stays small and `calculate_qoi_statistics` is fastest.\n",
    "- Use **Mode B** during early exploration when you are not sure what to measure, or when you want to apply multiple analysis approaches (sensitivity analysis, calibration, topology) to the same runs without repeating them.\n",
    "\n",
    "**Next:** [ex3](ex3_runSA_MultiTask.ipynb) shows how to scale Mode A to a full Sobol sensitivity analysis with `generate_samples(N=8)` and multi-process parallelization."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "pcvenv",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}