Example 2: Data Storage Modes — Precomputed QoIs vs Raw MCDS
ModelAnalysisContext supports two storage strategies, controlled by whether qois_info is empty or not:
Mode A — Precomputed QoIs |
Mode B — Raw MCDS |
|
|---|---|---|
|
defined (lambdas) |
|
Stored in DB |
small QoI DataFrame |
full |
DB size |
small (~KB) |
large (~MB per run) |
Query later |
only pre-defined QoIs |
any QoI, no re-run |
Best for |
SA with many runs (ex3+) |
exploratory / uncertain QoIs |
The remarkable thing: calculate_qoi_statistics accepts both — it detects what is stored, handles the computation transparently, and always returns a long-format (SampleID, time) MultiIndex DataFrame with QoI names as columns.
import os, warnings
warnings.filterwarnings('ignore')
from uq_physicell import get_physicell
from uq_physicell.model_analysis import ModelAnalysisContext, calculate_qoi_statistics
get_physicell(target_dir=".")
model_config = {"ini_path": "Model_Struct.ini", "struc_name": "physicell_model_2"}
# QoI functions we want to measure — used in Mode A at run time, and in Mode B for post-hoc query
qoi_funcs = {
"live_cells": lambda df_cell: len(df_cell[df_cell['dead'] == False]),
"interferon_mean": lambda df_subs: df_subs['interferon'].mean(),
}
# A single parameter set used for both modes (same simulation, different storage)
samples = {0: {"viral_replication_rate": 0.125, "min_virion_count": 1.0}}
PhysiCell already exists at: PhysiCell-master
Skipping download. Use force_download=True to override.
Mode A — Precomputed QoIs
QoI functions are passed to the context. At each timestep the functions run on the PhysiCell output, the resulting DataFrame is stored in the database, and the raw output folder is deleted. You commit to specific QoIs at run time but keep storage minimal.
context_a = ModelAnalysisContext(
"ex2_mode_a.db", model_config,
sampler='User-defined',
params_info={},
qois_info=qoi_funcs, # QoIs computed at run time → DataFrame stored
num_workers=1,
)
context_a.set_samples(samples)
context_a.run()
print(f"Mode A DB size: {os.path.getsize('ex2_mode_a.db') / 1024:.1f} KB")
Inserting {'live_cells': "lambda df_cell: len(df_cell[df_cell['dead'] == False])", 'interferon_mean': "lambda df_subs: df_subs['interferon'].mean()"} QoIs into the database
Simulations completed and results stored in the database: ex2_mode_a.db.
Mode A DB size: 24.0 KB
Mode B — Raw MCDS Storage
qois_info={} tells the context to store the full list[pcdl.TimeStep] objects instead of computing QoIs. The output folder is still cleaned up, but the complete simulation state is preserved in the database. Larger storage, but you can compute any QoI after the fact without re-running.
context_b = ModelAnalysisContext(
"ex2_mode_b.db", model_config,
sampler='User-defined',
params_info={},
qois_info={}, # empty → raw MCDS list stored
num_workers=1,
)
context_b.set_samples(samples)
context_b.run()
size_a = os.path.getsize('ex2_mode_a.db') / 1024
size_b = os.path.getsize('ex2_mode_b.db') / 1024
print(f"Mode A (precomputed QoIs): {size_a:.1f} KB")
print(f"Mode B (raw MCDS): {size_b:.1f} KB ({size_b/size_a:.0f}× larger)")
Inserting {} QoIs into the database
Simulations completed and results stored in the database: ex2_mode_b.db.
Mode A (precomputed QoIs): 24.0 KB
Mode B (raw MCDS): 3468.0 KB (144× larger)
Querying both databases with the same call
calculate_qoi_statistics detects what is stored and handles the computation automatically — no code change needed on the query side.
Both modes now return a long-format DataFrame with a (SampleID, time) MultiIndex and QoI names as columns.
Mode B bonus: you can query a QoI you never defined at run time, with no re-simulation.
# Mode A: reads pre-stored QoI DataFrame directly
df_mean_a, _, _ = calculate_qoi_statistics("ex2_mode_a.db", qoi_funcs)
# Mode B: recomputes qoi_funcs from stored MCDS objects
df_mean_b, _, _ = calculate_qoi_statistics("ex2_mode_b.db", qoi_funcs)
print("Mode A — mean QoIs:")
display(df_mean_a)
print("Mode B — mean QoIs (computed post-hoc from raw MCDS):")
display(df_mean_b)
# Mode B only: compute a QoI that was never defined at run time
new_qoi = {"dead_cells": lambda df_cell: len(df_cell[df_cell['dead'] == True])}
df_mean_new, _, _ = calculate_qoi_statistics("ex2_mode_b.db", new_qoi)
print("Mode B — new QoI computed without re-running:")
display(df_mean_new)
No QoI data provided, calculating QoIs from the database...
All samples in Samples table have corresponding entries in Output table.
Extracting QoIs from DataFrame...
No QoI data provided, calculating QoIs from the database...
All samples in Samples table have corresponding entries in Output table.
Calculating QoIs from mcds list...
Mode A — mean QoIs:
| live_cells | interferon_mean | ||
|---|---|---|---|
| SampleID | time | ||
| 0 | 0.0 | 1060.0 | 0.000000 |
| 360.0 | 1056.8 | 0.000845 | |
| 720.0 | 1053.2 | 0.000398 | |
| 1080.0 | 1023.0 | 0.002876 | |
| 1440.0 | 1008.8 | 0.001156 | |
| 1800.0 | 957.8 | 0.006680 | |
| 2160.0 | 916.0 | 0.003641 | |
| 2520.0 | 761.6 | 0.005035 | |
| 2880.0 | 711.6 | 0.003222 |
Mode B — mean QoIs (computed post-hoc from raw MCDS):
| interferon_mean | live_cells | ||
|---|---|---|---|
| SampleID | time | ||
| 0 | 0.0 | 0.000000 | 1060.0 |
| 360.0 | 0.000820 | 1056.6 | |
| 720.0 | 0.000497 | 1054.0 | |
| 1080.0 | 0.003034 | 1025.4 | |
| 1440.0 | 0.001482 | 998.8 | |
| 1800.0 | 0.005664 | 923.8 | |
| 2160.0 | 0.007086 | 717.8 | |
| 2520.0 | 0.001539 | 602.2 | |
| 2880.0 | 0.000851 | 596.4 |
No QoI data provided, calculating QoIs from the database...
All samples in Samples table have corresponding entries in Output table.
Calculating QoIs from mcds list...
Mode B — new QoI computed without re-running:
| dead_cells | ||
|---|---|---|
| SampleID | time | |
| 0 | 0.0 | 0.0 |
| 360.0 | 0.0 | |
| 720.0 | 0.0 | |
| 1080.0 | 0.0 | |
| 1440.0 | 0.0 | |
| 1800.0 | 0.0 | |
| 2160.0 | 0.0 | |
| 2520.0 | 0.0 | |
| 2880.0 | 0.0 |
Rule of thumb:
Use Mode A when you already know your QoIs and plan to run many simulations (ex3+). Storage stays small and
calculate_qoi_statisticsis fastest.Use Mode B during early exploration when you are not sure what to measure, or when you want to apply multiple analysis approaches (sensitivity analysis, calibration, topology) to the same runs without repeating them.
Next: ex3 shows how to scale Mode A to a full Sobol sensitivity analysis with generate_samples(N=8) and multi-process parallelization.