Bayesian Optimization for Hyperparameter Tuning: A Practical Guide
Grid search is dead. Random search is better but still wasteful. Bayesian optimization finds better hyperparameters in fewer evaluations by building a probabilistic model of the objective function.
The Core Idea
Instead of evaluating at predetermined points, we:
- Build a surrogate model (typically a Gaussian process) of
- Use an acquisition function to decide where to evaluate next
- Update the surrogate with the new observation
- Repeat until budget exhausted
where is the acquisition function and are our observations.
Gaussian Process Priors
A GP defines a distribution over functions:
The posterior after observing gives us both a mean prediction and uncertainty:
The uncertainty estimate is what makes Bayesian optimization sample-efficient. High uncertainty regions are unexplored — the acquisition function balances exploitation (low predicted loss) with exploration (high uncertainty).
Expected Improvement
The most common acquisition function, Expected Improvement, measures the expected amount by which we'll improve over the current best:
This has a closed-form solution under the GP posterior, making it cheap to optimize.
import numpy as np
from scipy.stats import norm
def expected_improvement(X, gp_model, best_y):
mu, sigma = gp_model.predict(X, return_std=True)
z = (mu - best_y) / (sigma + 1e-8)
ei = (mu - best_y) * norm.cdf(z) + sigma * norm.pdf(z)
return ei
When to Use What
| Method | Evaluations | Best For |
|---|---|---|
| Grid Search | ≤3 hyperparameters | |
| Random Search | Budget-limited | Initial exploration |
| Bayesian Opt | 10-200 | Expensive evaluations |
| Multi-fidelity | 100-1000 | Cheap approximations available |
For most deep learning tasks, I recommend starting with random search for the first 20 evaluations, then switching to Bayesian optimization with Expected Improvement for refinement.