CMSC 178DA | Week 10 — Session 1

Predicting the Future with Data

Time Series Fundamentals

Department of Computer Science

University of the Philippines Cebu

Lecture 19: Fundamentals & Smoothing

Every Quarter, $8.9 Billion Flows into the Philippines

OFW remittances follow the same seasonal pattern year after year.

~8% of Philippine GDP
2.2M OFWs abroad
Dec peak every year

This repeating pattern is a time series. Today we learn to analyze it.

Agenda

Session 1 Objectives

Components

Decompose time series into trend, seasonality, and residuals.

Stationarity

Test and transform data for forecasting readiness using ADF and differencing.

Smoothing

Apply moving average and exponential smoothing methods to extract signal from noise.

Part I

What Makes Time Unique

Unlike cross-sectional data, time series carries memory. Today depends on yesterday.

This section covers time series structure, decomposition, and resampling.

Part I · Fundamentals

Time Series Data Has Memory

A time series is a sequence of data points indexed in time order where each observation depends on previous ones.

Key Characteristics

  • Temporal dependence — today affects tomorrow
  • Trend — long-term direction
  • Seasonality — repeating patterns
  • Noise — random variation
Philippine OFW remittances monthly data showing trend and seasonal December peaks
Part I · Fundamentals

Four Components Hide Inside Every Series

Time series with annotated trend, seasonal, and noise components
Part I · Fundamentals Interactive

Decomposition Reveals the Hidden Structure

Part I · Code

Three Lines Decompose Any Series

Python's statsmodels handles decomposition automatically.

Parameters

  • model: 'additive' or 'multiplicative'
  • period: seasonal cycle length (12 for monthly)
python
import pandas as pd
from statsmodels.tsa.seasonal import seasonal_decompose

import matplotlib.pyplot as plt

# Load PH remittance data
df = pd.read_csv('bsp_remittances.csv',
                  parse_dates=['date'])
df.set_index('date', inplace=True)

# Decompose (monthly, annual cycle)
decomp = seasonal_decompose(
    df['remittances_usd'],
    model='additive',
    period=12
)

# Plot all four components
decomp.plot()
plt.tight_layout()
Part I · Fundamentals

Resampling Changes the Granularity

Same data at daily, weekly, monthly, and quarterly granularity
Part II

The Stationarity Requirement

Most forecasting models assume the future looks statistically like the past.

If the mean or variance drifts over time, predictions break down.

Part II · Stationarity

Forecasting Breaks When the Rules Keep Changing

Side-by-side stationary noise vs non-stationary random walk
Part II · Stationarity

The ADF Test Catches Non-Stationarity

Augmented Dickey-Fuller Test

  • H0: Series has unit root (non-stationary)
  • H1: Series is stationary

Decision Rule

p < 0.05 → Reject H0, declare stationary.
p ≥ 0.05 → Non-stationary, needs differencing.

python
from statsmodels.tsa.stattools import adfuller

result = adfuller(df['remittances_usd'])

print(f'ADF Statistic: {result[0]:.4f}')
print(f'p-value:       {result[1]:.4f}')

# Interpret
if result[1] < 0.05:
    print("Stationary! Ready to model.")
else:
    print("Non-stationary. Apply diff.")
Part II · Stationarity

Differencing Removes the Trend

Three-panel differencing: original, first diff, second diff with ADF p-values
Knowledge Check

Stationarity Quiz

A PSEi closing price series has a clear upward trend over 5 years. What should you do before applying ARIMA?

A) Nothing, ARIMA handles trends
B) Apply first differencing
C) Remove all outliers first
D) Use a larger rolling window

Click to reveal answer

B) Apply first differencing

An upward trend means the series is non-stationary. First differencing (d=1) removes the linear trend and makes the series suitable for ARIMA.

Part III

Reading the Autocorrelation Signature

ACF and PACF plots are the fingerprint of any time series — they tell you which model to use.

Part III · Autocorrelation

Past Values Predict Future Values

ACF (Autocorrelation Function)

Correlation between Yt and Yt-k at each lag k. Includes indirect effects through intermediate lags.

PACF (Partial Autocorrelation)

Direct correlation between Yt and Yt-k after removing effects of intervening lags.

ρk = Cov(Yt, Yt-k) / Var(Yt)
ACF showing exponential decay and PACF showing cutoff at lag 2 for AR(2) process
Part III · Autocorrelation

ACF/PACF Patterns Guide Model Choice

ACF Pattern PACF Pattern Model Suggested Interpretation
Cuts off at lag q Exponential decay MA(q) Past errors drive the series
Exponential decay Cuts off at lag p AR(p) Past values drive the series
Exponential decay Exponential decay ARMA(p,q) Both values and errors matter
Significant at lag s Significant at lag s Seasonal Calendar-driven pattern
Part IV

Smoothing the Signal

Before forecasting, we need to separate signal from noise.

Smoothing techniques reveal underlying patterns by reducing random variation.

Part IV · Smoothing Interactive

Moving Averages Trade Detail for Clarity

7
30
Part IV · Smoothing Interactive

Exponential Smoothing Weights Recent Data More

St = 0.30 · Yt + 0.70 · St-1

The α Parameter

  • α → 0: Smooth, slow to react
  • α → 1: Reactive, follows every wiggle
  • Sweet spot: 0.2–0.3 for most business data
Part IV · Smoothing

From SES to Holt-Winters: Handling Trend and Seasonality

Diagram showing progression from SES to Holt's to Holt-Winters method

Session 1 Key Takeaways

  1. Time series = ordered data where today depends on yesterday
  2. Decomposition separates trend, seasonality, and noise
  3. Stationarity is required for ARIMA — test with ADF, fix with differencing
  4. ACF/PACF plots are your model selection guide
  5. Exponential smoothing adapts to trend and seasonality

Next: Session 2 — Forecasting Methods (ARIMA, Prophet, Evaluation)

CMSC 178DA | Week 10 — Session 2

From Understanding to Prediction

ARIMA, Prophet & Evaluation

Department of Computer Science

University of the Philippines Cebu

Lecture 20: Forecasting & Evaluation

Jollibee Group Opens 700+ Stores Per Year

Every new location needs a multi-year sales forecast before opening day.

10,000+ outlets (all brands)
700+ new stores/year
5 yr strategic plan horizon

The tool they need? ARIMA and Prophet.

Agenda

Session 2 Objectives

ARIMA

Build ARIMA/SARIMA models and choose p, d, q parameters systematically.

Prophet

Use Meta Prophet for business forecasting with holidays and changepoints.

Evaluation

Measure forecast accuracy with MAE, RMSE, MAPE and proper temporal splits.

Part I

ARIMA: The Workhorse of Forecasting

Three ideas from Session 1 — autoregression, differencing, and moving average — combined into one powerful model.

Part I · ARIMA

ARIMA Combines Three Ideas You Already Know

Diagram showing AR(p) + I(d) + MA(q) combining into ARIMA(p,d,q)
Part I · ARIMA

The ARIMA Equation in Plain English

Yt = c + φ1Yt-1 + … + φpYt-p + θ1εt-1 + … + θqεt-q + εt

In words: "Today's value = constant + weighted past values + weighted past errors + new shock."

φ1, φ2, … φp

AR coefficients — how much past values influence the present.

θ1, θ2, … θq

MA coefficients — how much past errors correct the present.

εt

White noise — the unpredictable random shock at time t.

d (Integration)

Not in the equation directly — it's the number of times you differenced before fitting.

Part I · ARIMA

Building ARIMA in Python

The statsmodels ARIMA class handles fitting, diagnostics, and forecasting.

Workflow

  1. Choose p, d, q (from ACF/PACF or auto)
  2. Fit model & check summary
  3. Run diagnostics (residual plots)
  4. Forecast with confidence intervals
python
from statsmodels.tsa.arima.model import ARIMA

# Fit ARIMA(2,1,1)
model = ARIMA(df['sales'], order=(2, 1, 1))
results = model.fit()

# Summary table
print(results.summary())

# Diagnostic plots (residuals)
results.plot_diagnostics(figsize=(12, 8))

# Forecast 30 steps ahead
forecast = results.get_forecast(steps=30)
mean = forecast.predicted_mean
ci = forecast.conf_int()  # 95% CI
Part I · ARIMA

Choosing p, d, q: Two Approaches

Box-Jenkins method flowchart for choosing ARIMA parameters

Manual (Box-Jenkins Method)

  1. Test stationarity (ADF) → determine d
  2. Plot PACF → cutoff lag = p
  3. Plot ACF → cutoff lag = q

Automatic (pmdarima)

Searches over all combinations and picks the best by AIC.

python
from pmdarima import auto_arima

auto_model = auto_arima(
    df['sales'],
    start_p=0, max_p=5,
    start_q=0, max_q=5,
    d=None,  # auto-detect
    seasonal=False,
    trace=True
)
print(auto_model.summary())
Part I · ARIMA Interactive

Forecasting with Confidence Intervals

30 steps
Part I · ARIMA

SARIMA Adds Seasonal Intelligence

ARIMA(p,d,q)(P,D,Q)s

Seasonal Parameters

  • P: Seasonal AR order
  • D: Seasonal differencing
  • Q: Seasonal MA order
  • s: Seasonal period (12=monthly, 7=weekly)

Example: SARIMA(1,1,1)(1,1,1)12

python
from statsmodels.tsa.statespace.sarimax \
    import SARIMAX

# SARIMA with monthly seasonality
model = SARIMAX(
    df['remittances_usd'],
    order=(1, 1, 1),
    seasonal_order=(1, 1, 1, 12)
)
results = model.fit(disp=False)

# Forecast next 12 months
forecast = results.forecast(steps=12)

# Or use auto_arima with seasonal
from pmdarima import auto_arima
auto = auto_arima(df['remittances_usd'],
    seasonal=True, m=12,
    trace=True)
Knowledge Check

ARIMA Quiz

Your ADF test gives p=0.03 after first differencing. PACF cuts off at lag 2 and ACF decays exponentially. What ARIMA order should you try?

A) ARIMA(0, 1, 2)
B) ARIMA(2, 1, 0)
C) ARIMA(1, 0, 1)
D) ARIMA(2, 0, 2)

Click to reveal answer

B) ARIMA(2, 1, 0)

PACF cutoff at 2 → p=2. One differencing needed (p=0.03 after) → d=1. ACF decays (doesn't cut off) → q=0. This is a pure AR(2) model on differenced data.

Part II

Prophet: Built for Business

Meta's open-source tool handles missing data, holidays, and changepoints automatically.

Designed for analysts who need good forecasts fast, not ARIMA experts.

Part II · Prophet

Prophet Solves Real Business Problems

Missing Data

Handles gaps automatically — no imputation needed.

Outlier Robust

COVID-era spikes won't break your forecast.

Changepoints

Detects trend shifts automatically (e.g., policy changes).

Holiday Effects

Add Christmas, Undas, or any custom event.

Part II · Prophet

Prophet Setup and Forecasting

Prophet-style forecast with trend, weekly, and yearly components
python
from prophet import Prophet

# Prepare data (must be ds + y)
df_p = df.reset_index()
df_p.columns = ['ds', 'y']

# Create and fit model
model = Prophet(
    yearly_seasonality=True,
    weekly_seasonality=True,
    daily_seasonality=False
)
model.fit(df_p)

# Create future dates
future = model.make_future_dataframe(
    periods=30
)

# Predict
forecast = model.predict(future)

# Visualize
model.plot(forecast)
model.plot_components(forecast)
Part II · Prophet

Philippine Holidays Make Forecasts Smarter

Forecast comparison with and without Philippine holiday effects
Part III

Measuring Forecast Quality

A forecast without an error estimate is just a guess.

This section covers metrics, temporal splits, and model comparison.

Part III · Evaluation

Four Metrics Every Analyst Must Know

Metric Formula Interpretation When to Use
MAE mean(|y − ŷ|) Average absolute error in original units General purpose
RMSE √mean((y − ŷ)²) Penalizes large errors more When big misses are costly
MAPE mean(|y − ŷ|/y) × 100 Percentage error — scale-free Stakeholder reports
MASE MAE / naive_MAE <1 means better than naive forecast Comparing across datasets
Part III · Evaluation

Time Series Train-Test Split: Never Shuffle

Correct temporal split vs wrong random shuffle comparison
Part III · Capstone

Philippine Remittances: Complete Forecasting Pipeline

Complete pipeline comparing ARIMA vs Prophet on Philippine remittance data

Session 2 Key Takeaways

  1. ARIMA(p,d,q) = AR + differencing + MA in one model
  2. Use ACF/PACF or auto_arima to choose parameters
  3. SARIMA adds seasonal (P,D,Q)s for periodic data
  4. Prophet is ideal for business forecasting with holidays and missing data
  5. Always use temporal train-test splits, never random shuffle
  6. MAPE is the most intuitive metric for business stakeholders

Lab 10: Time Series Forecasting Project

Forecast a Philippine economic indicator. Compare ARIMA vs Prophet. Present results to "management."