{ "cells": [ { "cell_type": "markdown", "id": "a5af592b", "metadata": {}, "source": [ "# Example 2: Data Storage Modes — Precomputed QoIs vs Raw MCDS\n", "\n", "[(GitHub link)](https://github.com/heberlr/UQ_PhysiCell/tree/main/examples/ex2_storage_modes.ipynb)\n", "\n", "`ModelAnalysisContext` supports two storage strategies, controlled by whether `qois_info` is empty or not:\n", "\n", "| | **Mode A — Precomputed QoIs** | **Mode B — Raw MCDS** |\n", "|---|---|---|\n", "| `qois_info` | defined (lambdas) | `{}` (empty) |\n", "| Stored in DB | small QoI DataFrame | full `list[pcdl.TimeStep]` |\n", "| DB size | small (~KB) | large (~MB per run) |\n", "| Query later | only pre-defined QoIs | **any** QoI, no re-run |\n", "| Best for | SA with many runs (ex3+) | exploratory / uncertain QoIs |\n", "\n", "The remarkable thing: **`calculate_qoi_statistics` accepts both** — it detects what is stored, handles the computation transparently, and always returns a **long-format `(SampleID, time)` MultiIndex DataFrame** with QoI names as columns." ] }, { "cell_type": "code", "execution_count": 1, "id": "ffc446d4", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "PhysiCell already exists at: PhysiCell-master\n", "Skipping download. Use force_download=True to override.\n" ] } ], "source": [ "import os, warnings\n", "warnings.filterwarnings('ignore')\n", "\n", "from uq_physicell import get_physicell\n", "from uq_physicell.model_analysis import ModelAnalysisContext, calculate_qoi_statistics\n", "\n", "get_physicell(target_dir=\".\")\n", "\n", "model_config = {\"ini_path\": \"Model_Struct.ini\", \"struc_name\": \"physicell_model_2\"}\n", "\n", "# QoI functions we want to measure — used in Mode A at run time, and in Mode B for post-hoc query\n", "qoi_funcs = {\n", " \"live_cells\": lambda df_cell: len(df_cell[df_cell['dead'] == False]),\n", " \"interferon_mean\": lambda df_subs: df_subs['interferon'].mean(),\n", "}\n", "\n", "# A single parameter set used for both modes (same simulation, different storage)\n", "samples = {0: {\"viral_replication_rate\": 0.125, \"min_virion_count\": 1.0}}" ] }, { "cell_type": "markdown", "id": "f4c5c1df", "metadata": {}, "source": [ "## Mode A — Precomputed QoIs\n", "\n", "QoI functions are passed to the context. At each timestep the functions run on the PhysiCell output, the resulting DataFrame is stored in the database, and the raw output folder is deleted. You commit to specific QoIs at run time but keep storage minimal." ] }, { "cell_type": "code", "execution_count": 2, "id": "b41ffb6c", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Inserting {'live_cells': \"lambda df_cell: len(df_cell[df_cell['dead'] == False])\", 'interferon_mean': \"lambda df_subs: df_subs['interferon'].mean()\"} QoIs into the database\n", "Simulations completed and results stored in the database: ex2_mode_a.db.\n", "Mode A DB size: 24.0 KB\n" ] } ], "source": [ "context_a = ModelAnalysisContext(\n", " \"ex2_mode_a.db\", model_config,\n", " sampler='User-defined',\n", " params_info={},\n", " qois_info=qoi_funcs, # QoIs computed at run time → DataFrame stored\n", " num_workers=1,\n", ")\n", "context_a.set_samples(samples)\n", "context_a.run()\n", "\n", "print(f\"Mode A DB size: {os.path.getsize('ex2_mode_a.db') / 1024:.1f} KB\")" ] }, { "cell_type": "markdown", "id": "50c7e79b", "metadata": {}, "source": [ "## Mode B — Raw MCDS Storage\n", "\n", "`qois_info={}` tells the context to store the full `list[pcdl.TimeStep]` objects instead of computing QoIs. The output folder is still cleaned up, but the complete simulation state is preserved in the database. Larger storage, but you can compute **any** QoI after the fact without re-running." ] }, { "cell_type": "code", "execution_count": 3, "id": "a85aba4e", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Inserting {} QoIs into the database\n", "Simulations completed and results stored in the database: ex2_mode_b.db.\n", "Mode A (precomputed QoIs): 24.0 KB\n", "Mode B (raw MCDS): 3468.0 KB (144× larger)\n" ] } ], "source": [ "context_b = ModelAnalysisContext(\n", " \"ex2_mode_b.db\", model_config,\n", " sampler='User-defined',\n", " params_info={},\n", " qois_info={}, # empty → raw MCDS list stored\n", " num_workers=1,\n", ")\n", "context_b.set_samples(samples)\n", "context_b.run()\n", "\n", "size_a = os.path.getsize('ex2_mode_a.db') / 1024\n", "size_b = os.path.getsize('ex2_mode_b.db') / 1024\n", "print(f\"Mode A (precomputed QoIs): {size_a:.1f} KB\")\n", "print(f\"Mode B (raw MCDS): {size_b:.1f} KB ({size_b/size_a:.0f}× larger)\")" ] }, { "cell_type": "markdown", "id": "fc7e4eb9", "metadata": {}, "source": [ "## Querying both databases with the same call\n", "\n", "`calculate_qoi_statistics` detects what is stored and handles the computation automatically — no code change needed on the query side.\n", "\n", "Both modes now return a **long-format DataFrame** with a `(SampleID, time)` MultiIndex and QoI names as columns.\n", "\n", "**Mode B bonus:** you can query a QoI you never defined at run time, with no re-simulation." ] }, { "cell_type": "code", "execution_count": 4, "id": "389da677", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "No QoI data provided, calculating QoIs from the database...\n", "All samples in Samples table have corresponding entries in Output table.\n", "Extracting QoIs from DataFrame...\n", "No QoI data provided, calculating QoIs from the database...\n", "All samples in Samples table have corresponding entries in Output table.\n", "Calculating QoIs from mcds list...\n", "Mode A — mean QoIs:\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
live_cellsinterferon_mean
SampleIDtime
00.01060.00.000000
360.01056.80.000845
720.01053.20.000398
1080.01023.00.002876
1440.01008.80.001156
1800.0957.80.006680
2160.0916.00.003641
2520.0761.60.005035
2880.0711.60.003222
\n", "
" ], "text/plain": [ " live_cells interferon_mean\n", "SampleID time \n", "0 0.0 1060.0 0.000000\n", " 360.0 1056.8 0.000845\n", " 720.0 1053.2 0.000398\n", " 1080.0 1023.0 0.002876\n", " 1440.0 1008.8 0.001156\n", " 1800.0 957.8 0.006680\n", " 2160.0 916.0 0.003641\n", " 2520.0 761.6 0.005035\n", " 2880.0 711.6 0.003222" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Mode B — mean QoIs (computed post-hoc from raw MCDS):\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
interferon_meanlive_cells
SampleIDtime
00.00.0000001060.0
360.00.0008201056.6
720.00.0004971054.0
1080.00.0030341025.4
1440.00.001482998.8
1800.00.005664923.8
2160.00.007086717.8
2520.00.001539602.2
2880.00.000851596.4
\n", "
" ], "text/plain": [ " interferon_mean live_cells\n", "SampleID time \n", "0 0.0 0.000000 1060.0\n", " 360.0 0.000820 1056.6\n", " 720.0 0.000497 1054.0\n", " 1080.0 0.003034 1025.4\n", " 1440.0 0.001482 998.8\n", " 1800.0 0.005664 923.8\n", " 2160.0 0.007086 717.8\n", " 2520.0 0.001539 602.2\n", " 2880.0 0.000851 596.4" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "No QoI data provided, calculating QoIs from the database...\n", "All samples in Samples table have corresponding entries in Output table.\n", "Calculating QoIs from mcds list...\n", "Mode B — new QoI computed without re-running:\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
dead_cells
SampleIDtime
00.00.0
360.00.0
720.00.0
1080.00.0
1440.00.0
1800.00.0
2160.00.0
2520.00.0
2880.00.0
\n", "
" ], "text/plain": [ " dead_cells\n", "SampleID time \n", "0 0.0 0.0\n", " 360.0 0.0\n", " 720.0 0.0\n", " 1080.0 0.0\n", " 1440.0 0.0\n", " 1800.0 0.0\n", " 2160.0 0.0\n", " 2520.0 0.0\n", " 2880.0 0.0" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Mode A: reads pre-stored QoI DataFrame directly\n", "df_mean_a, _, _ = calculate_qoi_statistics(\"ex2_mode_a.db\", qoi_funcs)\n", "\n", "# Mode B: recomputes qoi_funcs from stored MCDS objects\n", "df_mean_b, _, _ = calculate_qoi_statistics(\"ex2_mode_b.db\", qoi_funcs)\n", "\n", "print(\"Mode A — mean QoIs:\")\n", "display(df_mean_a)\n", "\n", "print(\"Mode B — mean QoIs (computed post-hoc from raw MCDS):\")\n", "display(df_mean_b)\n", "\n", "# Mode B only: compute a QoI that was never defined at run time\n", "new_qoi = {\"dead_cells\": lambda df_cell: len(df_cell[df_cell['dead'] == True])}\n", "df_mean_new, _, _ = calculate_qoi_statistics(\"ex2_mode_b.db\", new_qoi)\n", "print(\"Mode B — new QoI computed without re-running:\")\n", "display(df_mean_new)" ] }, { "cell_type": "markdown", "id": "5af2ffdc", "metadata": {}, "source": [ "---\n", "**Rule of thumb:**\n", "- Use **Mode A** when you already know your QoIs and plan to run many simulations (ex3+). Storage stays small and `calculate_qoi_statistics` is fastest.\n", "- Use **Mode B** during early exploration when you are not sure what to measure, or when you want to apply multiple analysis approaches (sensitivity analysis, calibration, topology) to the same runs without repeating them.\n", "\n", "**Next:** [ex3](ex3_runSA_MultiTask.ipynb) shows how to scale Mode A to a full Sobol sensitivity analysis with `generate_samples(N=8)` and multi-process parallelization." ] } ], "metadata": { "kernelspec": { "display_name": "pcvenv", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.3" } }, "nbformat": 4, "nbformat_minor": 5 }