Model Calibration
This section covers the methods of calibration available in UQ-PhysiCell: Bayesian Optimization and Approximate Bayesian Computation (ABC). Note that this package allow another forms of calibration via API, including optimization algorithms like least-squares, Nelder-Mead, genetic algorithm, MCMC, amoung others.
Bayesian Optimization
This module provides a comprehensive Bayesian optimization framework for calibrating PhysiCell models using multi-objective optimization to learn the Pareto front (more details in BoTorch). The framework is designed to efficiently find optimal parameter configurations that minimize the discrepancy between model predictions and observed experimental data.
The optimization problem is formulated as a multi-objective maximization:
Where:
\(\Theta\) is the feasible parameter space defined by bounds
\(\theta\) is the parameter vector to be optimized
\(f_i(\theta)\) is the fitness value for the i-th quantity of interest (QoI), measuring agreement between model predictions and observed data
The goal is to find the Pareto-optimal set of parameters that maximizes agreement across all QoIs
1. Parameter Space (\(\Theta\))
The search space is defined by parameter bounds and types:
Real parameters: Continuous variables with lower and upper bounds
Integer parameters: Discrete variables with specified ranges
Categorical parameters: Discrete choices from predefined sets
2. Quantities of Interest (QoIs)
QoIs are model outputs that correspond to experimental observables:
Time series data: Cell counts, concentrations, spatial metrics over time
Aggregate metrics: Final values, peak values, areas under curves
Derived quantities: Ratios, differences, or complex functions of raw outputs
3. Distance Metrics
The discrepancy between model predictions and observed data is quantified using distance metrics:
Sum of Squared Differences (\(L_2^2\) norm): \(d(\text{QoI}, \text{Obs}) = \sum_{i=1}^{n}(\text{QoI}_i - \text{Obs}_i)^2\) Penalizes large deviations heavily
Manhattan Distance (\(L_1\) norm): \(d(\text{QoI}, \text{Obs}) = \sum_{i=1}^{n}|\text{QoI}_i - \text{Obs}_i|\) Robust to outliers
Chebyshev Distance (\(L_\infty\) norm): \(d(\text{QoI}, \text{Obs}) = \max_{i=1,...,n}|\text{QoI}_i - \text{Obs}_i|\) Focuses on maximum deviation
4. Fitness Functions
Distance values are transformed into fitness values (to be maximized) using one of two methods:
Standard Transformation (default): \( f(\theta) = \frac{1}{1 + d(\text{QoI}, \text{Obs})} \)
Exponential Transformation: \( f(\theta) = \exp(-d(\text{QoI}, \text{Obs})) \)
Both transformations ensure:
All objectives are in (0, 1] range (values closer to 1 = better fit, closer to 0 = worse fit)
Numerical stability for optimization algorithms
Proper hypervolume computation for multi-objective optimization
The exponential transformation provides steeper gradients for small distances, which can be beneficial when fine-tuning parameters near optimal values. Enable exponential transformation by setting use_exponential_fitness: True in the bo_options.
Approximate Bayesian Computation (ABC)
This module helps to solve the problem of parameter inference using a Sequential Monte Carlo scheme (more details in pyabc documentation). The ABC-SMC creates a sequence of intermediate posterior distributions that gradually approach the true, intractable posterior distribution. The ABC posterior in iteration \(t\) is:
Where:
\(\pi(\theta)\) is the prior distribution of parameters.
\(\theta\) is the parameter vector to be inferred.
\(f_{\epsilon_t}(y_0|\theta)\) is the approximate likelihood based on the distance between simulated and observed data, with tolerance threshold \(\epsilon_t\).
\(y_0\) is the observed data.
\(\epsilon_t\) is the tolerance threshold at iteration \(t\), which typically decreases as \(t\) increases.
The goal is to obtain an approximation of the posterior distribution of parameters given the observational data \(y_0\).