API Reference

Core UQ-PhysiCell Module

The main module provides the core PhysiCell model interface and utilities.

UQ-PhysiCell: Uncertainty Quantification for PhysiCell Models

This package provides tools for uncertainty quantification, sensitivity analysis, and Bayesian optimization of PhysiCell models.

class uq_physicell.PhysiCell_Model(configFilePath: str, keyModel: str, verbose: bool = False)[source]

Bases: object

A class to manage PhysiCell model configurations and executions.

This class handles the setup of PhysiCell models, including reading configuration files, generating XML files, and running simulations with specified parameters.

Parameters:

configFilePath (str) – Path to the configuration file (INI format).
keyModel (str) – Key in the configuration file to identify the model.
verbose (bool) – If True, prints detailed information during execution.

info() → None[source]: Print model configuration information.

RunModel(SampleID: int, ReplicateID: int, Parameters: ndarray | dict = {}, ParametersRules: ndarray | dict = {}, RemoveConfigFile: bool = True, SummaryFunction: None | str = None) → None | DataFrame[source]

Run a single simulation with specified parameters.

Parameters:

SampleID (int) – Identifier for the parameter sample
ReplicateID (int) – Identifier for the simulation replicate
Parameters (np.ndarray or dict, optional) – Parameter values for XML configuration
ParametersRules (np.ndarray or dict, optional) – Parameter values for RULES configuration
RemoveConfigFile (bool, optional) – If True, removes the generated XML and RULES files after simulation
SummaryFunction (function, optional) – Function to summarize simulation output

run_simulation_subprocess(XMLFile, sample_id=None, replicate_id=None)[source]

Start the simulation as a subprocess and return the process handle.

Parameters:

XMLFile (str) – Path to the XML configuration file for the simulation
sample_id (int, optional) – Identifier for the parameter sample
replicate_id (int, optional) – Identifier for the simulation replicate

Returns:

Process handle for the running simulation

Return type:

subprocess.Popen

terminate_all_simulations()[source]

Terminate all active simulation processes.

Returns:: Dictionary of process IDs and their termination return codes
Return type:: dict

get_active_processes_info()[source]

Get information about all active processes.

Returns:: Dictionary containing information about all active processes
Return type:: dict

remove_io_folders()[source]

Model Analysis Module

Tools for sensitivity analysis, parameter sampling, and model analysis.

class uq_physicell.model_analysis.ma_context.ModelAnalysisContext(db_path: str, model_config: dict, sampler: str, params_info: dict, qois_info: dict, qoi_def: dict = {}, parallel_method: str = 'inter-process', num_workers: int = 1, summary_function=None, logger: Logger = None)[source]

Bases: object

Context manager for running PhysiCell model analysis simulations.

This class manages the configuration, database setup, and execution context for running sensitivity analysis and uncertainty quantification simulations on PhysiCell models.

Parameters:

db_path (str) – Path to the SQLite database file for storing results.
model_config (dict) – Dictionary containing PhysiCell model configuration. Must include ‘ini_path’ and ‘struc_name’ keys.
sampler (str) – Name of the sampling method to use (e.g., ‘LHS’, ‘Sobol’, ‘OAT’).
params_info (dict) – Dictionary containing parameter definitions with keys for each parameter name and values containing ‘ref_value’, ‘lower_bound’, ‘upper_bound’, and ‘perturbation’ information.
qois_info (dict) – Dictionary containing Quantities of Interest definitions.
qoi_def (dict) – first-class object, that can be used in qoi_functions lambda string, mapped to their name.
parallel_method (str, optional) – Parallelization method. Options are: ‘inter-process’ (single node), ‘inter-node’ (MPI), or ‘serial’. Defaults to ‘inter-process’.
num_workers (int, optional) – Number of parallel workers for inter-process execution. Defaults to 1.
summary_function (callable, optional) – Custom function for summarizing simulation output. Defaults to None.

Raises:

ImportError – If required parallelization libraries are not available.
ValueError – If invalid parallel_method is specified.

generate_samples(N: int = None, M: int = 4, seed: int = 42)[source]

set_samples(samples)[source]

Set user-defined parameter combinations for the ‘User-defined’ sampler.

Parameters:: samples – dict mapping integer sample IDs to parameter dicts, e.g. {0: {‘param1’: 1.0, ‘param2’: 2.0}, 1: {‘param1’: 1.5, ‘param2’: 3.0}} or a list of parameter dicts (IDs are assigned automatically starting from 0), e.g. [{‘param1’: 1.0}, {‘param1’: 1.5}]

run()[source]

Run simulations — convenience alias for run_simulations(context).

Allows the fluent pattern:

context.generate_samples(N=8)
context.run()

cancelled()[source]

Check if cancellation has been requested.

Returns:: True if cancellation was requested, False otherwise
Return type:: bool

request_cancellation()[source]

Request cancellation of all simulations.

This sets the internal cancellation flag to True, which will be checked by the simulation process at various points.

Returns:: Always returns True
Return type:: bool

uq_physicell.model_analysis.ma_context.run_simulations(context: ModelAnalysisContext)[source]

Run PhysiCell simulations based on the provided analysis context.

This function executes sensitivity analysis simulations using the specified parallelization method (serial, inter-process, or MPI). It manages database initialization, parameter sampling, simulation execution, and result storage.

Parameters:

context (ModelAnalysisContext) – The analysis context containing model configuration, sampling parameters, parallelization settings, and database information.

Raises:

ValueError – If there are issues with PhysiCell model initialization, database operations, or simulation execution.
ImportError – If required parallelization libraries are missing.

Note

This function handles three execution modes:

Serial: Single-threaded execution for small analyses
Inter-process: Multi-processing on a single node using concurrent.futures
Inter-node: Distributed execution across multiple nodes using MPI

Sampling Methods

uq_physicell.model_analysis.samplers.run_global_sampler(params_dict: dict, sampler: str, N: int = None, M: int = 4, seed: int = 42) → dict[source]

Generate parameter samples using global sampling methods.

This function creates parameter samples using various global sampling strategies implemented in SALib, suitable for global sensitivity analysis methods.

Parameters:

params_dict (dict) – Dictionary containing parameter definitions with ‘lower_bound’ and ‘upper_bound’ for each parameter.
sampler (str) – Sampling method to use. Supported methods include: - ‘Fast’: FAST sampling for Fourier Amplitude Sensitivity Test - ‘Fractional Factorial’: Fractional factorial design - ‘Finite Difference’: Finite difference sampling - ‘Latin hypercube sampling (LHS)’: Latin hypercube sampling - ‘Sobol’: Sobol sequence sampling
N (int, optional) – Number of samples to generate. If None, method-specific defaults are used. Defaults to None.
M (int, optional) – Number of harmonics for FAST sampler. Only used with ‘Fast’ method. Defaults to 4.
seed (int, optional) – Random seed for reproducible sampling. Defaults to 42.

Returns:

Dictionary with sample IDs as keys and parameter dictionaries as values.: Each sample dictionary contains parameter names as keys and sampled values as values.

Return type:

dict

Raises:

ValueError – If an unsupported sampling method is specified or if required parameters are missing.

uq_physicell.model_analysis.samplers.run_local_sampler(params_dict: dict, sampler: str = 'OAT') → dict[source]

Generate parameter samples using local sampling methods for sensitivity analysis.

This function creates parameter samples using local sampling strategies, particularly the One-At-a-Time (OAT) method, where parameters are perturbed individually around reference values.

Parameters:

params_dict (dict) – Dictionary containing parameter definitions. Each parameter should have: - ‘ref_value’: Reference value for the parameter - ‘perturbation’: Single value or list of perturbation percentages
sampler (str, optional) – Local sampling method to use. Currently only ‘OAT’ (One-At-a-Time) is supported. Defaults to ‘OAT’.

Returns:

Dictionary with sample IDs as keys and parameter dictionaries as values.: Sample 0 contains the reference values, and subsequent samples contain perturbations of individual parameters.

Return type:

dict

Note

For OAT sampling, the first sample (ID 0) contains all reference values. Subsequent samples perturb one parameter at a time while keeping others at their reference values. If multiple perturbations are specified for a parameter, multiple samples are generated for that parameter.

Sensitivity Analysis

uq_physicell.model_analysis.sensitivity_analysis.get_global_SA_parameters(db_file)[source]

uq_physicell.model_analysis.sensitivity_analysis.get_local_SA_parameters(db_file)[source]

uq_physicell.model_analysis.sensitivity_analysis.run_global_sa(params_dict: dict, qoi_names: list, df_qois: DataFrame, method: str, qoi_time_values: dict = None) → tuple[source]

Run global sensitivity analysis using the specified method.

Parameters:

params_dict (dict) – Dictionary containing parameter names, properties, and sample values. Must include a ‘samples’ key with parameter sample dictionaries.
qoi_names (list) – List of QoI names to analyze.
df_qois (pd.DataFrame) – DataFrame containing QoI values with columns formatted as ‘{qoi_name}_{time_index}’ or ‘{qoi_name}’ and time as an index.
method (str) – Name of the sensitivity analysis method to use. Supported methods include ‘FAST - Fourier Amplitude Sensitivity Test’, ‘Sobol Sensitivity Analysis’, ‘PAWN Sensitivity Analysis’, etc.
qoi_time_values (dict, optional) – Dictionary mapping time labels to their values. If None, time labels will be extracted from df_qois. Defaults to None.

Returns:

A tuple containing:

sa_results_dict (dict): Nested dictionary with sensitivity analysis results. Structure: {qoi_name: {time_label: analysis_results}}
qoi_time_values (dict): Dictionary mapping time labels to their values, sorted by time.

Return type:

tuple

Raises:

ValueError – If there’s a mismatch between number of samples and QoI results, or if the specified method fails during analysis.

uq_physicell.model_analysis.sensitivity_analysis.OAT_analyze(dic_samples: dict, dic_qoi: dict, sample_ref: int = 0) → dict[source]

Perform One-At-a-Time (OAT) analysis on the simulation results.

Parameters:

dic_samples (dict) – Dictionary of parameter sample dictionaries, where each key is a sample ID and each value is a dictionary of parameter names and values.
dic_qoi (dict) – Dictionary of Quantities of Interest (QoI) values, where each key is a sample ID and each value is the corresponding QoI result.
sample_ref (int, optional) – Sample ID to use as the reference for OAT analysis. Defaults to 0.

Returns:

Dictionary containing sensitivity indices for each parameter. Keys are: parameter names and values are arrays of sensitivity indices for each perturbation.

Return type:

dict

Note

The specified sample is treated as the reference sample. All other samples are compared against this reference to compute sensitivity indices.

uq_physicell.model_analysis.sensitivity_analysis.run_local_sa(params_dict: dict, qoi_names: list, df_qois: DataFrame, method: str = 'OAT', sample_ref: int = 0) → tuple[source]

Run local sensitivity analysis using the One-At-a-Time (OAT) method.

Parameters:

params_dict (dict) – Dictionary containing parameter names, properties, and sample values. Must include a ‘samples’ key with parameter sample dictionaries.
qoi_names (list) – List of QoI names to analyze.
df_qois (pd.DataFrame) – DataFrame containing QoI values with columns formatted as ‘{qoi_name}_{time_index}’ or ‘{qoi_name}’ and time as an index.
method (str, optional) – Local sensitivity analysis method. Currently only ‘OAT’ (One-At-a-Time) is supported. Defaults to “OAT”.
sample_ref (int, optional) – Sample ID to use as the reference for OAT analysis. Defaults to 0.

Returns:

A tuple containing:

sa_results_dict (dict): Nested dictionary with sensitivity analysis results. Structure: {qoi_name: {time_label: {param_name: sensitivity_index}}}
qoi_time_values (dict): Dictionary mapping time labels to their values, sorted by time.

Return type:

tuple

Note

The OAT method computes sensitivity indices by comparing parameter perturbations against a reference sample (sample 0). Results are summed across all perturbations for each parameter.

uq_physicell.model_analysis.sensitivity_analysis.get_sa_results(dbfile: str, qoi_names: list, df_qois: DataFrame, method: str, sample_ref: int = 0, qoi_time_values: dict = None) → tuple[source]

Get sensitivity analysis results for Global and Local methods.

Parameters:

dbfile (str) – Path to the database file containing simulation results.
qoi_names (list) – List of QoI names to analyze.
df_qois (pd.DataFrame) – DataFrame containing QoI values with columns formatted as ‘{qoi_name}_{time_index}’ or ‘{qoi_name}’ and time as an index.
method (str) – Name of the sensitivity analysis method to use. Supported methods include ‘FAST - Fourier Amplitude Sensitivity Test’, ‘Sobol Sensitivity Analysis’, ‘PAWN Sensitivity Analysis’, etc. For local sensitivity analysis, use “OAT”.
sample_ref (int, optional) – Sample ID to use as the reference for OAT analysis. Defaults to 0. Only used for local sensitivity analysis with method “OAT”.
qoi_time_values (dict, optional) – Dictionary mapping time labels to their values. If None, time labels will be extracted from df_qois. Defaults to None.

Returns:

A tuple containing:

sa_results_dict (dict): Nested dictionary with sensitivity analysis results. Structure: {qoi_name: {time_label: analysis_results}}
qoi_time_values (dict): Dictionary mapping time labels to their values, sorted by time.

Return type:

tuple

Utils

uq_physicell.model_analysis.utils.mcds_list_to_qoi_df_for_sa(recreated_qoi_funcs, all_sample_ids, chunk_size, db_file, verbose=False) → DataFrame[source]

Convert a list of MCDS objects to a DataFrame of quantities of interest for sensitivity analysis.

This function processes a list of MCDS simulation results, extracting relevant quantities of interest (QoIs) at each time point and organizing them into a structured DataFrame suitable for sensitivity analysis.

Parameters:

recreated_qoi_funcs (dict) – Dictionary of QoI functions where keys are QoI names and values are callable functions.
all_sample_ids (list) – List of all sample IDs to process.
chunk_size (int) – Number of samples to process in each chunk to manage memory usage.
db_file (str) – Path to the database file containing simulation output.

Returns:

DataFrame with calculated QoI values indexed by SampleID and ReplicateID, with columns for each QoI - columns combined with time points.

Return type:

pd.DataFrame

uq_physicell.model_analysis.utils.mcds_list_to_qoi_df_long(recreated_qoi_funcs, all_sample_ids, chunk_size, db_file, verbose=False) → DataFrame[source]

Convert a list of MCDS objects to a DataFrame of quantities of interest in long format.

This function processes a list of MCDS simulation results, extracting relevant quantities of interest (QoIs) at each time point and organizing them into a long structured DataFrame.

Parameters:

recreated_qoi_funcs (dict) – Dictionary of QoI functions where keys are QoI names and values are callable functions.
all_sample_ids (list) – List of all sample IDs to process.
chunk_size (int) – Number of samples to process in each chunk to manage memory usage.
db_file (str) – Path to the database file containing simulation output.

Returns:

DataFrame with calculated QoI values indexed by SampleID and ReplicateID, with columns for each QoI - columns combined with time points.

Return type:

pd.DataFrame

uq_physicell.model_analysis.utils.mcds_list_to_qoi_df_for_calib(recreated_qoi_funcs, all_sample_ids, chunk_size, db_file, verbose=False) → DataFrame[source]

Convert a list of MCDS objects to a DataFrame of quantities of interest for calibration.

This function processes a list of MCDS simulation results, extracting relevant quantities of interest (QoIs) and organizing them into a structured DataFrame suitable for calibration tasks.

Parameters:

recreated_qoi_funcs (dict) – Dictionary of QoI functions where keys are QoI names and values are callable functions.
all_sample_ids (list) – List of all sample IDs to process.
chunk_size (int) – Number of samples to process in each chunk to manage memory usage.
db_file (str) – Path to the database file containing simulation output.

Returns:

DataFrame with calculated QoI values indexed by SampleID and ReplicateID, with columns for each QoI - columns is not combined with time points.

Return type:

pd.DataFrame

uq_physicell.model_analysis.utils.get_qoi_from_db_file(db_file: str, qoi_names: list) → DataFrame[source]

Extract quantities of interest (QoIs) from a database file containing simulation results.

Returns a long-format DataFrame with one row per (SampleID, time, ReplicateID), consistent with the raw-MCDS storage path. Old Mode-A databases (Data column contains precomputed QoI DataFrames) are handled transparently.

Parameters:

db_file (str) – Path to the SQLite database containing simulation results.
qoi_names (list) – List of QoI names to extract.

Returns:

Long-format DataFrame with columns [SampleID, time, ReplicateID, <qoi_names>],: sorted by (SampleID, time, ReplicateID).

Return type:

pd.DataFrame

uq_physicell.model_analysis.utils.calculate_qoi_from_db_file(db_file: str, qoi_functions: dict, qoi_def: dict = {}, chunk_size: int = 10, mode='long', verbose=False) → DataFrame[source]

Calculate quantities of interest from sensitivity analysis database results.

This function loads simulation results from a database in chunks and applies QoI functions to extract meaningful metrics from the time-series data. Processing in chunks helps avoid excessive memory usage for large databases.

Parameters:

db_file (str) – Path to the SQLite database containing simulation results.
qoi_functions (dict) – Dictionary of QoI functions where keys are QoI names and values are lambda functions or string representations.
qoi_def (dict) –
first-class object, that can be used in qoi_functions lambda string, mapped to their name. e.g. for a function definition, if the function definition is: def my_func():

print(‘hello world!’) return 0

then the qoi_def dict would look like this: {‘my_func’: my_func}
chunk_size (int, optional) – Number of samples to process at a time. Default is 10. Adjust based on available memory and data size.
mode – Specify the form of the result dataframe. Possible modes are sa, calib, and long. The default is long.

Returns:

DataFrame with calculated QoI values indexed by SampleID and ReplicateID, with columns for each QoI.

Return type:

pd.DataFrame

Example

>>> qoi_funcs = {
...     'live_cells': lambda df: len(df[df['dead'] == False]),
...     'dead_cells': lambda df: len(df[df['dead'] == True])
... }
>>> qoi_df = calculate_qoi_from_db_file('study.db', qoi_funcs, chunk_size=20)

uq_physicell.model_analysis.utils.get_mean_std_qois(df_qois: DataFrame, filter_columns: list = []) → DataFrame[source]

Calculate the mean and standard deviation of quantities of interest (QoIs) across replicates.

This function computes the mean and standard deviation QoI values for each parameter sample across all replicates, providing a central tendency and variability measure for the QoI estimates.

Parameters:

df_qois (pd.DataFrame) – DataFrame containing QoI values with SampleID, ReplicateID, and QoI columns.
filter_columns (list, optional) – List of columns to include in the calculation. If None, all numeric columns are used.

Returns: tuple: A tuple containing:: pd.DataFrame: DataFrame containing the mean QoI values for each SampleID, indexed by SampleID and ReplicateID. pd.DataFrame: DataFrame containing the standard deviation QoI values for each SampleID, indexed by SampleID and ReplicateID.

Note

The mean QoI values are calculated by averaging the QoI estimates across all replicates for each parameter sample. This provides a central tendency measure for the QoI estimates, which can be used for sensitivity analysis and comparison between different parameter samples.

uq_physicell.model_analysis.utils.get_relative_mcse_qois(df_mean: DataFrame, df_std: DataFrame, num_replicates: int, time_columns: list) → DataFrame[source]

Calculate the relative Monte Carlo Standard Error (MCSE) for QoI estimates.

This function computes the relative MCSE for each QoI across replicates, providing insight into the uncertainty of the QoI estimates due to finite sampling in the simulations.

Parameters:

df_mean (pd.DataFrame) – DataFrame containing the mean QoI values for each SampleID, indexed by SampleID and ReplicateID.
df_std (pd.DataFrame) – DataFrame containing the standard deviation QoI values for each SampleID, indexed by SampleID and ReplicateID.
num_replicates (int) – The number of replicates used to calculate the mean and standard deviation of the QoIs.
time_columns (list) – List of column names corresponding to time points in the DataFrame, which should not be included in the MCSE calculation.

Returns:

DataFrame containing the relative MCSE values for each QoI, indexed by SampleID and ReplicateID.

Return type:

pd.DataFrame

Note

Relative Monte Carlo Standard Error (MCSE) is calculated as the standard deviation of the QoI across replicates divided by the square root of the number of replicates, and then normalized by the mean QoI value to express it as a percentage. This provides insight into the uncertainty of the QoI estimates due to finite sampling in the simulations.

< 1% (Excellent): This is the gold standard. If your relative MCSE is under 1%, your mean estimate is highly stable and precise. You can confidently use this metric for sensitivity analysis or publication.

1% to 5% (Good / Acceptable): For stochastic biological simulations like PhysiCell, getting under 5% is generally considered reliable and practical.

5% to 10% (Caution): You can use these metrics to observe broad trends, but small differences between parameters might just be noise. You likely need to run more replicates.

> 10% (Unreliable): The metric is too noisy. If you are running 50+ replicates and still have a relative MCSE > 10%, that specific QoI (Quantity of Interest) is likely a poor choice, or your biological system is fundamentally chaotic in that aspect.

uq_physicell.model_analysis.utils.get_summary_statistics_qois(df_qois: DataFrame) → tuple[source]

Calculate summary statistics (mean, standard deviation, and relative MCSE) of quantities of interest.

Parameters:: df_qois (pd.DataFrame) – DataFrame containing QoI values with SampleID and ReplicateID columns.

Returns: tuple: A tuple containing:: pd.DataFrame: DataFrame with statistical summaries (mean) of QoIs grouped by SampleID, with columns for each QoI statistic. pd.DataFrame: DataFrame with standard deviation of QoIs grouped by SampleID, with columns for each QoI statistic. pd.DataFrame: DataFrame with relative Monte Carlo Standard Error (MCSE) of QoIs grouped by SampleID, with columns for each QoI statistic.

uq_physicell.model_analysis.utils.calculate_qoi_statistics(db_file_path: str, qoi_funcs: dict, df_qois_data: DataFrame = None, ignore_db_consistency: bool = False, qoi_def: dict = {}, chunk_size: int = 10) → tuple[source]

Calculate statistical summaries (mean and relative MCSE) of quantities of interest across replicates.

This function computes mean and relative Monte Carlo Standard Error (MCSE) of QoI values across simulation replicates for each parameter sample, enabling uncertainty quantification.

Parameters:

db_file_path (str) – Path to the database file for context.
qoi_funcs (dict) – Dictionary of QoI functions where keys are QoI names and values are lambda functions or None.
df_qois_data (pd.DataFrame) – DataFrame containing QoI values with SampleID, ReplicateID, and QoI columns. Default is None, in which case the function will attempt to load QoI data from the database.
ignore_db_consistency (bool) – If True, bypasses the database consistency check.
qoi_def (dict) –
first-class object, that can be used in qoi_funcs lambda string, mapped to their name. e.g. for a function definition, if the function definition is: def my_func():

print(‘hello world!’) return 0

then the qoi_def dict would look like this: {‘my_func’: my_func}
chunk_size (int, optional) – Number of samples to process at a time when loading from the database. Default is 10. Adjust based on available memory and data size.

Returns:

A tuple containing:

df_mean (pd.DataFrame): DataFrame with statistical summaries (mean) of QoIs grouped by SampleID, with columns for each QoI statistic.
df_std (pd.DataFrame): DataFrame with standard deviation of QoIs grouped by SampleID, with columns for each QoI statistic.
df_relative_mcse (pd.DataFrame): DataFrame with relative Monte Carlo Standard Error (MCSE) of QoIs grouped by SampleID, with columns for each QoI statistic.

Return type:

tuple

Note

Relative Monte Carlo Standard Error (MCSE) is calculated as the standard deviation of the QoI across replicates divided by the square root of the number of replicates, and then normalized by the mean QoI value to express it as a percentage. This provides insight into the uncertainty of the QoI estimates due to finite sampling in the simulations.

< 1% (Excellent): This is the gold standard. If your relative MCSE is under 1%, your mean estimate is highly stable and precise. You can confidently use this metric for sensitivity analysis or publication.

1% to 5% (Good / Acceptable): For stochastic biological simulations like PhysiCell, getting under 5% is generally considered reliable and practical.

5% to 10% (Caution): You can use these metrics to observe broad trends, but small differences between parameters might just be noise. You likely need to run more replicates.

> 10% (Unreliable): The metric is too noisy. If you are running 50+ replicates and still have a relative MCSE > 10%, that specific QoI (Quantity of Interest) is likely a poor choice, or your biological system is fundamentally chaotic in that aspect.

Raises:: ValueError – If no QoI functions are defined or data format is invalid.

Example

>>> qoi_funcs = {
...     'live_cells': lambda df: len(df[df['dead'] == False]),
...     'dead_cells': lambda df: len(df[df['dead'] == True])
... }
>>> df_mean, df_std, df_mcse = calculate_qoi_statistics(qoi_data, qoi_funcs, 'study.db')

uq_physicell.model_analysis.utils.apply_pca_to_qois(df_mean: DataFrame, latent_dim: int = 3, seed: int = None)[source]

Apply PCA to the selected QoIs.

This function uses PCA (scikit-learn) as a linear dimensionality reduction technique on the QoI matrix (samples x features).

Parameters:

df_mean (pd.DataFrame) – DataFrame with QoI mean values indexed by SampleID.
latent_dim (int) – Dimension of the latent encoding.
seed (int) – Random seed for reproducibility (used if PCA requires it). If None, no seed is set.

Returns:

{: ‘method’: ‘pca’, ‘encoder_output’: np.ndarray (n_samples x latent_dim), ‘reconstruction’: np.ndarray (n_samples x n_features), ‘model’: PCA object, ‘scaler’: StandardScaler object

}

Return type:

dict

uq_physicell.model_analysis.utils.apply_autoencoder_to_qois(df_mean: DataFrame, latent_dim: int = 2, epochs: int = 200, batch_size: int = 32, verbose: bool = False, seed: int = None)[source]

Apply an autoencoder to the selected QoIs.

This function attempts to use PyTorch to train a small autoencoder on the QoI matrix (samples x features). If a seed is provided, results will be reproducible.

Parameters:

df_mean (pd.DataFrame) – DataFrame with QoI mean values indexed by SampleID.
latent_dim (int) – Dimension of the latent encoding.
epochs (int) – Training epochs for the PyTorch autoencoder.
batch_size (int) – Batch size for training.
verbose (bool) – Verbosity flag.
seed (int) – Random seed for reproducibility. If None, no seed is set.

Returns:

{: ‘method’: ‘torch’, ‘encoder_output’: np.ndarray (n_samples x latent_dim), ‘reconstruction’: np.ndarray (n_samples x n_features), ‘model’: trained model object ‘scaler’: StandardScaler object

}

Return type:

dict

uq_physicell.model_analysis.utils.regression_accuracy_parameters(df_parameters: DataFrame, encoder_output: ndarray) → DataFrame[source]

Calculate regression accuracy of parameters from the latent encoding.

This function trains a regression model (e.g., Random Forest) to predict each parameter from the latent encoding and evaluates the R² score for each parameter.

Parameters:

df_parameters (pd.DataFrame) – DataFrame containing parameter values indexed by SampleID.
encoder_output (np.ndarray) – Latent encoding of the QoI data (n_samples x latent_dim).

Returns:

DataFrame containing R² scores for each parameter, indexed by parameter name.

R² ≈ 1.0: Perfect fit. The model explains all the variance in the parameter values from the latent encoding.
R² < 0.5: Poor fit. The model does not capture the relationship between the latent encoding and the parameter values well.
R² ≈ 0: No fit. The model does not explain any of the variance in the parameter values from the latent encoding, indicating no relationship.
R² < 0: Worse than no fit. The model performs worse than simply predicting the mean parameter value for all samples, suggesting a very poor relationship between the latent encoding and the parameter values.

Return type:

pd.DataFrame

uq_physicell.model_analysis.utils.align_params_to_qois(df_params: DataFrame, df_qois: DataFrame) → DataFrame[source]

Align parameter DataFrame to match the multi-index structure of QoI DataFrame.

If df_qois has a multi-index (SampleID, time) but df_params only has SampleID, this function replicates parameter rows for each time point so that both DataFrames can be joined on (SampleID, time).

String/categorical columns in df_params are automatically encoded to numeric values using LabelEncoder to ensure compatibility with scikit-learn regression models.

Parameters:

df_params (pd.DataFrame) – Parameter DataFrame indexed by SampleID.
df_qois (pd.DataFrame) – QoI DataFrame, possibly with multi-index (SampleID, time).

Returns:

Parameter DataFrame aligned to match df_qois index structure,: with categorical columns encoded as numeric.

Return type:

pd.DataFrame

uq_physicell.model_analysis.utils.find_optimal_qoi_set(df_qois: DataFrame, df_params: DataFrame) → list[source]

Use Recursive Feature Elimination with Cross-Validation (RFECV) to find the optimal set of QoIs that best predict the parameters. This function trains a regression model (e.g., Random Forest) to predict parameters from the QoIs and uses RFECV to identify the most important QoIs for accurate parameter prediction. The optimal set of QoIs is determined based on the R² score obtained through cross-validation. :param df_qois: DataFrame containing QoI mean values indexed by SampleID and time. :type df_qois: pd.DataFrame :param df_params: DataFrame containing parameter values indexed by SampleID and time. :type df_params: pd.DataFrame

Returns:: List of optimal QoI names that best predict the parameters based on RFECV analysis.
Return type:: list

uq_physicell.model_analysis.utils.recursive_feature_elimination(df_qois_mean, df_qois_mcse, df_params, autoencoder_params={}, mcse_threshold=0.1, correlation_threshold=0.95, verbose=False)[source]

Perform smart feature elimination using balanced Autoencoder-first approach.

Philosophy (Hybrid): Removes only pathological features, preserves useful noise: 1. Remove EXTREME outliers (MCSE > 10% by default, configurable) 2. Remove highly correlated redundancy (>95%, configurable) 3. Apply Autoencoder on cleaned feature set 4. Use RFECV on latent space to find optimal encoding 5. Validate with Synthetic Recovery Test

This avoids “noise = non-informative” while still cleaning truly problematic features.

Parameters:

df_qois_mean (pd.DataFrame) – DataFrame containing mean QoI values indexed by SampleID and time.
df_qois_mcse (pd.DataFrame) – DataFrame containing relative MCSE values for QoIs indexed by SampleID and time.
df_params (pd.DataFrame) – DataFrame containing parameter values indexed by SampleID.
autoencoder_params (dict) – Parameters for autoencoder (latent_dim, epochs, batch_size, seed).
mcse_threshold (float) – Threshold for removing EXTREME noise (default 0.10 = 10%). Only removes pathological features.
correlation_threshold (float) – Threshold for removing redundant correlations (default 0.95 = 95%). Only removes near-duplicates.
verbose (bool) – Print detailed pipeline stages.

Returns:

{: ‘removed_extreme_noise’: list of features removed as outliers, ‘removed_redundant’: list of highly correlated features removed, ‘cleaned_qois’: list of QoIs kept for autoencoder, ‘correlation_matrix’: full pairwise correlation matrix, ‘full_autoencoder_results’: autoencoder results on cleaned set, ‘full_regression_r2’: R² from full set, ‘final_qois’: QoI names selected by RFECV, ‘reduced_autoencoder_results’: autoencoder on final selected QoIs, ‘reduced_regression_r2’: R² from reduced set, ‘synthetic_recovery_test’: Full vs Reduced R² comparison,

}

Return type:

dict

Bayesian Optimization Module

Multi-objective Bayesian optimization for model calibration.

class uq_physicell.bo.bo_context.CalibrationContext(db_path: str, obsData: str | dict, obsData_columns: dict, model_config: dict, qoi_functions: dict, bo_options: dict, distance_functions: dict = None, search_space: dict = None, qoi_def: dict = {}, logger: Logger = None)[source]

Bases: object

Context for Bayesian Optimization calibration for single objective or multi-objective.

Parameters:

db_path (str) – Path to the database file for storing and retrieving samples.
obsData (str or dict) – Path or dict containing the observed data.
obsData_columns (dict) – Dictionary mapping QoI names to their corresponding columns in the observed data.
model_config (dict) – Configuration dictionary for the PhysiCell model, including paths and structure names.
qoi_functions (dict) – Dictionary of functions to compute quantities of interest (QoIs) from model outputs.
qoi_def (dict) – first-class object, that can be used in qoi_functions lambda string, mapped to their name.
distance_functions (dict) – Dictionary of functions to compute distances between model outputs and observed data.
search_space (dict) – Dictionary defining the search space for parameters, including bounds and types.
bo_options (dict) – Options for Bayesian Optimization including sampling parameters. - ‘use_correlated_gp’ (bool): If True, use MultiTaskGP to model correlations between objectives. If False (default), use independent GPs per objective. - Other options: num_initial_samples, num_iterations, batch_size_per_iteration, etc.
logger (logging.Logger) – Logger instance for logging messages during the calibration process.

default_run_single_replicate(sample_id: int, replicate_id: int, params: dict) → dict[source]

Run a single replicate of the PhysiCell model. This function is responsible for executing the model with the given parameters and returning the results. :param sample_id: Unique identifier for the sample being processed. :type sample_id: int :param replicate_id: Unique identifier for the replicate being processed. :type replicate_id: int :param params: Dictionary of parameters to be used in the model run. :type params: dict

Returns:: dictionary of model outputs.
Return type:: dict

default_aggregation_func(replicate_results: list, sample_id: int) → tuple[source]

Aggregate results from multiple replicates. This function computes the mean and standard deviation for each key in the results. :param replicate_results: List of dictionaries containing the results from each replicate. :type replicate_results: list

Returns:: A tuple containing the aggregated results, noise estimates, and a dictionary of all results.
Return type:: tuple

evaluate_params(params, sample_index)[source]

Evaluate a single parameter set by running replicates in parallel, aggregate outputs, compute multi-objective metrics and return a dict. :param params: Dictionary of parameters to be evaluated. :type params: dict :param sample_index: Index of the sample being evaluated. :type sample_index: int

Returns:

objectives (dict) containing the computed objectives for the given parameters,: obj_noise (dict) containing the standard deviation of objectives across replicates, dic_results (dict) containing the results from all replicates.

Return type:

tuple

save_results_to_db(sample_index: int, objectives: dict, noise_std: dict, dic_results: dict)[source]: Save results to database. :param sample_index: Index of the sample being saved. :type sample_index: int :param objectives: Dictionary containing the objective values. :type objectives: dict :param noise_std: Dictionary containing the noise standard deviations. :type noise_std: dict :param dic_results: Dictionary containing the results to save. :type dic_results: dict

generate_initial_samples_from_db(start_sample_id: int = 0, iteration_id: int = 0) → tuple[source]

Generate initial samples from existing database for Bayesian optimization.

This function retrieves initial samples from the database, evaluates them with the QoI functions, and prepares the training tensors for the BO pipeline. Used for resume functionality.

Parameters:

start_sample_id (int, optional) – Starting sample ID. default will use 0 for new databases
iteration_id (int, optional) – Current iteration ID for tracking.

Returns:

(train_x, train_obj, train_obj_std) - Tensors ready for BO pipeline.

Return type:

tuple

generate_and_evaluate_samples(start_sample_id: int = 0, iteration_id: int = 0) → tuple[source]

Generate and evaluate samples for Bayesian optimization using Sobol sequences.

This function generates samples in the search space using Sobol sequences, evaluates them with the model, and saves results to the database. Used for both initial sampling and restart functionality.

Parameters:

start_sample_id (int, optional) – Starting sample ID. default will use 0 for new databases or caller should provide the appropriate starting ID.
iteration_id (int, optional) – Current iteration ID for tracking.

Returns:

(train_x, train_obj, train_obj_std) - Tensors ready for BO pipeline.

Return type:

tuple

load_existing_data() → tuple[source]: Load existing data from the database for resume functionality. :returns: A tuple containing training data tensors, latest iteration, and hypervolume. :rtype: tuple

update_bo_iterations(additional_iterations: int)[source]: Update the number of BO iterations to include additional iterations for resume.

analyze_convergence(hvs_list: list, train_obj: Any, train_obj_std: Any, train_x: Any, iteration: int) → dict[source]

Noise-aware convergence analysis that distinguishes between: 1. True convergence (optimization found optimal solutions) 2. Noise-limited convergence (converged given noise level) 3. Stagnation with good coverage (likely converged to optimal region) 4. Stagnation with poor coverage (stuck in suboptimal region) 5. Still in progress

Parameters:

hvs_list (list) – History of hypervolume values
train_obj (torch_Tensor) – Objective values (used for Pareto analysis)
train_obj_std (torch_Tensor) – Standard deviation across replicates (noise estimate)
train_x (torch_Tensor) – Parameter values (normalized)
iteration (int) – Current iteration

Returns:

Convergence analysis results with recommendations

Return type:

dict

uq_physicell.bo.bo_context.single_objective_bayesian_optimization(calib_context, train_x, train_obj, train_obj_std, start_iteration)[source]: Single-objective Bayesian optimization loop.

uq_physicell.bo.bo_context.multi_objective_bayesian_optimization(calib_context, train_x, train_obj, train_obj_std, start_iteration, latest_hypervolume, resume_from_db)[source]: Multi-objective Bayesian optimization loop.

uq_physicell.bo.bo_context.run_bayesian_optimization(calib_context: CalibrationContext, additional_iterations: int | None = None)[source]

Execute the complete Bayesian optimization process.

Parameters:

calib_context (CalibrationContext) – The calibration context containing all configuration
additional_iterations (Optional[int]) – Additional iterations for resume functionality

Note

The Bayesian Optimization module requires additional dependencies (botorch, gpytorch, torch). Install them with: pip install botorch gpytorch torch

Plotting and Visualization

uq_physicell.bo.plots.plot_parameter_space(df_samples: DataFrame, df_param_space: DataFrame, params: dict = None, real_value: dict = None, axis=None)[source]

Plot the parameter space from the samples DataFrame.

Parameters:

df_samples – DataFrame containing the samples.
df_param_space – DataFrame defining the search space for each parameter.
params – Dictionary with parameter names as keys and their best values as values (optional).
real_value – Dictionary with real parameter values to plot (optional).
axis – Matplotlib axis to plot on (optional).

Returns: