Example 2: Data Storage Modes — Precomputed QoIs vs Raw MCDS

ModelAnalysisContext supports two storage strategies, controlled by whether qois_info is empty or not:

	Mode A — Precomputed QoIs	Mode B — Raw MCDS
`qois_info`	defined (lambdas)	`{}` (empty)
Stored in DB	small QoI DataFrame	full `list[pcdl.TimeStep]`
DB size	small (~KB)	large (~MB per run)
Query later	only pre-defined QoIs	any QoI, no re-run
Best for	SA with many runs (ex3+)	exploratory / uncertain QoIs

The remarkable thing: calculate_qoi_statistics accepts both — it detects what is stored, handles the computation transparently, and always returns a long-format (SampleID, time) MultiIndex DataFrame with QoI names as columns.

import os, warnings
warnings.filterwarnings('ignore')

from uq_physicell import get_physicell
from uq_physicell.model_analysis import ModelAnalysisContext, calculate_qoi_statistics

get_physicell(target_dir=".")

model_config = {"ini_path": "Model_Struct.ini", "struc_name": "physicell_model_2"}

# QoI functions we want to measure — used in Mode A at run time, and in Mode B for post-hoc query
qoi_funcs = {
    "live_cells":      lambda df_cell: len(df_cell[df_cell['dead'] == False]),
    "interferon_mean": lambda df_subs: df_subs['interferon'].mean(),
}

# A single parameter set used for both modes (same simulation, different storage)
samples = {0: {"viral_replication_rate": 0.125, "min_virion_count": 1.0}}

PhysiCell already exists at: PhysiCell-master
Skipping download. Use force_download=True to override.

Mode A — Precomputed QoIs

QoI functions are passed to the context. At each timestep the functions run on the PhysiCell output, the resulting DataFrame is stored in the database, and the raw output folder is deleted. You commit to specific QoIs at run time but keep storage minimal.

context_a = ModelAnalysisContext(
    "ex2_mode_a.db", model_config,
    sampler='User-defined',
    params_info={},
    qois_info=qoi_funcs,   # QoIs computed at run time → DataFrame stored
    num_workers=1,
)
context_a.set_samples(samples)
context_a.run()

print(f"Mode A DB size: {os.path.getsize('ex2_mode_a.db') / 1024:.1f} KB")

Inserting {'live_cells': "lambda df_cell: len(df_cell[df_cell['dead'] == False])", 'interferon_mean': "lambda df_subs: df_subs['interferon'].mean()"} QoIs into the database
Simulations completed and results stored in the database: ex2_mode_a.db.
Mode A DB size: 24.0 KB

Mode B — Raw MCDS Storage

qois_info={} tells the context to store the full list[pcdl.TimeStep] objects instead of computing QoIs. The output folder is still cleaned up, but the complete simulation state is preserved in the database. Larger storage, but you can compute any QoI after the fact without re-running.

context_b = ModelAnalysisContext(
    "ex2_mode_b.db", model_config,
    sampler='User-defined',
    params_info={},
    qois_info={},          # empty → raw MCDS list stored
    num_workers=1,
)
context_b.set_samples(samples)
context_b.run()

size_a = os.path.getsize('ex2_mode_a.db') / 1024
size_b = os.path.getsize('ex2_mode_b.db') / 1024
print(f"Mode A (precomputed QoIs): {size_a:.1f} KB")
print(f"Mode B (raw MCDS):         {size_b:.1f} KB  ({size_b/size_a:.0f}× larger)")

Inserting {} QoIs into the database
Simulations completed and results stored in the database: ex2_mode_b.db.
Mode A (precomputed QoIs): 24.0 KB
Mode B (raw MCDS):         3468.0 KB  (144× larger)

Querying both databases with the same call

calculate_qoi_statistics detects what is stored and handles the computation automatically — no code change needed on the query side.

Both modes now return a long-format DataFrame with a (SampleID, time) MultiIndex and QoI names as columns.

Mode B bonus: you can query a QoI you never defined at run time, with no re-simulation.

# Mode A: reads pre-stored QoI DataFrame directly
df_mean_a, _, _ = calculate_qoi_statistics("ex2_mode_a.db", qoi_funcs)

# Mode B: recomputes qoi_funcs from stored MCDS objects
df_mean_b, _, _ = calculate_qoi_statistics("ex2_mode_b.db", qoi_funcs)

print("Mode A — mean QoIs:")
display(df_mean_a)

print("Mode B — mean QoIs (computed post-hoc from raw MCDS):")
display(df_mean_b)

# Mode B only: compute a QoI that was never defined at run time
new_qoi = {"dead_cells": lambda df_cell: len(df_cell[df_cell['dead'] == True])}
df_mean_new, _, _ = calculate_qoi_statistics("ex2_mode_b.db", new_qoi)
print("Mode B — new QoI computed without re-running:")
display(df_mean_new)

No QoI data provided, calculating QoIs from the database...
All samples in Samples table have corresponding entries in Output table.
Extracting QoIs from DataFrame...
No QoI data provided, calculating QoIs from the database...
All samples in Samples table have corresponding entries in Output table.
Calculating QoIs from mcds list...
Mode A — mean QoIs:

		live_cells	interferon_mean
SampleID	time
0	0.0	1060.0	0.000000
	360.0	1056.8	0.000845
	720.0	1053.2	0.000398
	1080.0	1023.0	0.002876
	1440.0	1008.8	0.001156
	1800.0	957.8	0.006680
	2160.0	916.0	0.003641
	2520.0	761.6	0.005035
	2880.0	711.6	0.003222

Mode B — mean QoIs (computed post-hoc from raw MCDS):

		interferon_mean	live_cells
SampleID	time
0	0.0	0.000000	1060.0
	360.0	0.000820	1056.6
	720.0	0.000497	1054.0
	1080.0	0.003034	1025.4
	1440.0	0.001482	998.8
	1800.0	0.005664	923.8
	2160.0	0.007086	717.8
	2520.0	0.001539	602.2
	2880.0	0.000851	596.4

No QoI data provided, calculating QoIs from the database...
All samples in Samples table have corresponding entries in Output table.
Calculating QoIs from mcds list...
Mode B — new QoI computed without re-running:

		dead_cells
SampleID	time
0	0.0	0.0
	360.0	0.0
	720.0	0.0
	1080.0	0.0
	1440.0	0.0
	1800.0	0.0
	2160.0	0.0
	2520.0	0.0
	2880.0	0.0

Rule of thumb:

Use Mode A when you already know your QoIs and plan to run many simulations (ex3+). Storage stays small and calculate_qoi_statistics is fastest.
Use Mode B during early exploration when you are not sure what to measure, or when you want to apply multiple analysis approaches (sensitivity analysis, calibration, topology) to the same runs without repeating them.

Next: ex3 shows how to scale Mode A to a full Sobol sensitivity analysis with generate_samples(N=8) and multi-process parallelization.