Example 2: Data Storage Modes — Precomputed QoIs vs Raw MCDS

(GitHub link)

ModelAnalysisContext supports two storage strategies, controlled by whether qois_info is empty or not:

Mode A — Precomputed QoIs

Mode B — Raw MCDS

qois_info

defined (lambdas)

{} (empty)

Stored in DB

small QoI DataFrame

full list[pcdl.TimeStep]

DB size

small (~KB)

large (~MB per run)

Query later

only pre-defined QoIs

any QoI, no re-run

Best for

SA with many runs (ex3+)

exploratory / uncertain QoIs

The remarkable thing: calculate_qoi_statistics accepts both — it detects what is stored, handles the computation transparently, and always returns a long-format (SampleID, time) MultiIndex DataFrame with QoI names as columns.

import os, warnings
warnings.filterwarnings('ignore')

from uq_physicell import get_physicell
from uq_physicell.model_analysis import ModelAnalysisContext, calculate_qoi_statistics

get_physicell(target_dir=".")

model_config = {"ini_path": "Model_Struct.ini", "struc_name": "physicell_model_2"}

# QoI functions we want to measure — used in Mode A at run time, and in Mode B for post-hoc query
qoi_funcs = {
    "live_cells":      lambda df_cell: len(df_cell[df_cell['dead'] == False]),
    "interferon_mean": lambda df_subs: df_subs['interferon'].mean(),
}

# A single parameter set used for both modes (same simulation, different storage)
samples = {0: {"viral_replication_rate": 0.125, "min_virion_count": 1.0}}
PhysiCell already exists at: PhysiCell-master
Skipping download. Use force_download=True to override.

Mode A — Precomputed QoIs

QoI functions are passed to the context. At each timestep the functions run on the PhysiCell output, the resulting DataFrame is stored in the database, and the raw output folder is deleted. You commit to specific QoIs at run time but keep storage minimal.

context_a = ModelAnalysisContext(
    "ex2_mode_a.db", model_config,
    sampler='User-defined',
    params_info={},
    qois_info=qoi_funcs,   # QoIs computed at run time → DataFrame stored
    num_workers=1,
)
context_a.set_samples(samples)
context_a.run()

print(f"Mode A DB size: {os.path.getsize('ex2_mode_a.db') / 1024:.1f} KB")
Inserting {'live_cells': "lambda df_cell: len(df_cell[df_cell['dead'] == False])", 'interferon_mean': "lambda df_subs: df_subs['interferon'].mean()"} QoIs into the database
Simulations completed and results stored in the database: ex2_mode_a.db.
Mode A DB size: 24.0 KB

Mode B — Raw MCDS Storage

qois_info={} tells the context to store the full list[pcdl.TimeStep] objects instead of computing QoIs. The output folder is still cleaned up, but the complete simulation state is preserved in the database. Larger storage, but you can compute any QoI after the fact without re-running.

context_b = ModelAnalysisContext(
    "ex2_mode_b.db", model_config,
    sampler='User-defined',
    params_info={},
    qois_info={},          # empty → raw MCDS list stored
    num_workers=1,
)
context_b.set_samples(samples)
context_b.run()

size_a = os.path.getsize('ex2_mode_a.db') / 1024
size_b = os.path.getsize('ex2_mode_b.db') / 1024
print(f"Mode A (precomputed QoIs): {size_a:.1f} KB")
print(f"Mode B (raw MCDS):         {size_b:.1f} KB  ({size_b/size_a:.0f}× larger)")
Inserting {} QoIs into the database
Simulations completed and results stored in the database: ex2_mode_b.db.
Mode A (precomputed QoIs): 24.0 KB
Mode B (raw MCDS):         3468.0 KB  (144× larger)

Querying both databases with the same call

calculate_qoi_statistics detects what is stored and handles the computation automatically — no code change needed on the query side.

Both modes now return a long-format DataFrame with a (SampleID, time) MultiIndex and QoI names as columns.

Mode B bonus: you can query a QoI you never defined at run time, with no re-simulation.

# Mode A: reads pre-stored QoI DataFrame directly
df_mean_a, _, _ = calculate_qoi_statistics("ex2_mode_a.db", qoi_funcs)

# Mode B: recomputes qoi_funcs from stored MCDS objects
df_mean_b, _, _ = calculate_qoi_statistics("ex2_mode_b.db", qoi_funcs)

print("Mode A — mean QoIs:")
display(df_mean_a)

print("Mode B — mean QoIs (computed post-hoc from raw MCDS):")
display(df_mean_b)

# Mode B only: compute a QoI that was never defined at run time
new_qoi = {"dead_cells": lambda df_cell: len(df_cell[df_cell['dead'] == True])}
df_mean_new, _, _ = calculate_qoi_statistics("ex2_mode_b.db", new_qoi)
print("Mode B — new QoI computed without re-running:")
display(df_mean_new)
No QoI data provided, calculating QoIs from the database...
All samples in Samples table have corresponding entries in Output table.
Extracting QoIs from DataFrame...
No QoI data provided, calculating QoIs from the database...
All samples in Samples table have corresponding entries in Output table.
Calculating QoIs from mcds list...
Mode A — mean QoIs:
live_cells interferon_mean
SampleID time
0 0.0 1060.0 0.000000
360.0 1056.8 0.000845
720.0 1053.2 0.000398
1080.0 1023.0 0.002876
1440.0 1008.8 0.001156
1800.0 957.8 0.006680
2160.0 916.0 0.003641
2520.0 761.6 0.005035
2880.0 711.6 0.003222
Mode B — mean QoIs (computed post-hoc from raw MCDS):
interferon_mean live_cells
SampleID time
0 0.0 0.000000 1060.0
360.0 0.000820 1056.6
720.0 0.000497 1054.0
1080.0 0.003034 1025.4
1440.0 0.001482 998.8
1800.0 0.005664 923.8
2160.0 0.007086 717.8
2520.0 0.001539 602.2
2880.0 0.000851 596.4
No QoI data provided, calculating QoIs from the database...
All samples in Samples table have corresponding entries in Output table.
Calculating QoIs from mcds list...
Mode B — new QoI computed without re-running:
dead_cells
SampleID time
0 0.0 0.0
360.0 0.0
720.0 0.0
1080.0 0.0
1440.0 0.0
1800.0 0.0
2160.0 0.0
2520.0 0.0
2880.0 0.0

Rule of thumb:

  • Use Mode A when you already know your QoIs and plan to run many simulations (ex3+). Storage stays small and calculate_qoi_statistics is fastest.

  • Use Mode B during early exploration when you are not sure what to measure, or when you want to apply multiple analysis approaches (sensitivity analysis, calibration, topology) to the same runs without repeating them.

Next: ex3 shows how to scale Mode A to a full Sobol sensitivity analysis with generate_samples(N=8) and multi-process parallelization.