Noel Jeffrey Pinton
Department of Computer Science
University of the Philippines Cebu
What is parameter estimation?
Match sample to theory
Optimal estimation
Real-world ML examples
Inferring unknown distribution parameters from observed data samples.
Sometimes a little bias can reduce overall error.
Match sample moments to theoretical moments to estimate parameters.
$m_k(\theta)$
$\hat{m}_k = \frac{1}{n}\sum x_i^k$
Estimate $\mu$ and $\sigma^2$ for $N(\mu, \sigma^2)$
$m_1 = \mu$
$m_2 = \mu^2 + \sigma^2$
$\hat{\mu} = \bar{x}$
$\hat{\sigma}^2 = \frac{1}{n}\sum(x_i - \bar{x})^2$
Theory: $E[X] = \lambda$
$E[X] = \alpha\beta$, $Var(X) = \alpha\beta^2$
Simple, consistent, general
Not optimal, may give invalid estimates
Find parameters that make observed data most likely.
Method: Solve $\frac{d\ell}{d\theta} = 0$
Estimate $\mu$ and $\sigma^2$ for $N(\mu, \sigma^2)$
$\ell(\lambda) = (\sum x_i)\log\lambda - n\lambda$
$\frac{d\ell}{d\lambda} = \frac{\sum x_i}{\lambda} - n = 0$
Same as MoM for Poisson!
$f(x) = \lambda e^{-\lambda x}$
$Var(\hat{\theta}) \geq \frac{1}{I(\theta)}$
Higher information = lower variance
No closed-form solution
| MoM | MLE | |
|---|---|---|
| Speed | Fast | Varies |
| Efficiency | Lower | Optimal |
| Complexity | Simple | Complex |
Quick estimates, starting values, simple distributions
Optimal estimates, inference, model comparison
Use MoM estimates as starting values for MLE optimization.
$y = \beta_0 + \beta_1 x + \epsilon$
$\hat{\beta}_1 = \frac{\sum(x_i-\bar{x})(y_i-\bar{y})}{\sum(x_i-\bar{x})^2}$
$P(Y=1) = \frac{1}{1+e^{-(\beta_0+\beta_1 X)}}$
No closed-form; requires numerical optimization.
$f(x) = \sum_k \pi_k N(x|\mu_k, \sigma_k^2)$
E: Compute assignments
M: Update params
Yule-Walker equations
Kalman filter + optimization
MLE sensitive to outliers
Resample from data to estimate uncertainty.
$AIC = -2\ell + 2k$
$BIC = -2\ell + k\log n$
Lower is better
Use EDA and goodness-of-fit tests
Use bootstrap or Bayesian
Use robust methods
Use AIC/BIC, cross-validation
Match moments. Simple, quick, good for starting values.
Maximize likelihood. Optimal, efficient, asymptotically normal.
Parameter estimation is fundamental to statistical modeling and ML!
GMM, Regularization, Bayesian MCMC
scipy.optimize, statsmodels, PyMC
Ready for Module 2: Linear Regression
Questions?