Stationarity, autocorrelation and partial autocorrelation functions. AR, MA, ARMA, ARIMA models. Box-Jenkins methodology.
Many real-world datasets are sequences in time — from daily stock prices to hourly sensor readings — and understanding their time-dependent patterns lets you forecast, detect anomalies, and make decisions with confidence.
Time series foundations give you tools to tell whether a sequence is stable over time, measure how past values influence future ones (autocorrelation and partial autocorrelation), build AR/MA/ARMA/ARIMA models, and apply the Box–Jenkins cycle to identify, estimate, and validate models for forecasting.
Time series analysis studies observations indexed by time: , where is the value at time . The foundations of time series cover three core concepts: stationarity (is the process stable over time?), dependence (how does affect ?), and parsimonious models that capture that dependence (AR, MA, ARMA, ARIMA). These let you move from raw data to forecasts and inference.
Why care? Because many applied problems are sequential: forecasting electricity load, modeling GDP, or controlling processes in engineering. A model that respects time-dependence is usually far more accurate than treating observations as iid. The prerequisites you already know — Covariance and Correlation (measuring linear relation) and Linear Regression (estimating linear coefficients via least squares) — are used heavily: sample autocovariances extend covariance to lags, and regression ideas underpin estimation of autoregressive parameters.
Stationarity: Intuitively, a stationary time series 'behaves the same' through time. We commonly use weak (covariance) stationarity: 1) constant mean for all , 2) autocovariance depends only on lag , i.e. . Strict stationarity (distribution invariance under time shifts) is stronger but weaker in practice because many models are only second-order.
Concrete example: Random walk vs AR(1)
Numeric example: if and , then This shows a finite, time-invariant variance.
Autocorrelation and autocovariance: the autocovariance at lag is
and the autocorrelation is
where . These generalize the concepts from Covariance and Correlation. A quick sample calculation: suppose observed series . The sample mean . The sample lag-1 autocovariance (using denominator for simplicity) is
Compute terms: , , . Sum = 3-1+3=5. So . The sample variance . Thus sample autocorrelation at lag 1 is .
Partial Autocorrelation (PACF): the PACF at lag , denoted or , measures the correlation between and after removing linear effects of intermediate lags . A simple way to compute PACF is to fit the linear regression
and take the coefficient on . This uses Linear Regression skills directly. Numeric example: for an AR(1) with coefficient , PACF at lag 1 is $0.6$ and at higher lags it is (theoretically) 0.
Model types (intuition):
These building blocks let you represent many time series parsimoniously and prepare for forecasting. The rest of the lesson develops the main mechanical tools: ACF/PACF, estimation via Yule–Walker and least squares, and the Box–Jenkins cycle for practical modeling.
ACF and PACF are the primary diagnostic tools for identifying the dependence structure in a stationary time series. They are analogous to the correlation and partial correlation concepts from Covariance and Correlation and use Linear Regression for PACF computations.
Definitions and sample formulas
Example: with we computed and , so (see Section 1). If you prefer the unbiased denominator replace with ; both are used in practice.
Properties of ACF for simple models (critical identification clues)
Thus the ACF of AR(1) decays exponentially, while the ACF of MA(1) cuts off after lag 1. This distinction is foundational for identification.
Partial Autocorrelation Function (PACF)
then from that regression. For stationary AR(p), the PACF cuts off after lag (i.e., for ). For MA(q), the PACF decays gradually.
Computational methods
Small numeric demonstration of PACF via regression: Data: . Compute . To estimate PACF at lag 2 (i.e., coefficient on when regressing on lags 1 and 2), form the design matrix for t=3..5:
X = [[x_2,x_1],[x_3,x_2],[x_4,x_3]] = [[0.9,1.2],[1.1,0.9],[1.4,1.1]].
Response y = [x_3,x_4,x_5] = [1.1,1.4,1.3].
Run a small OLS: . Numeric calculation (sketch): . Compute numbers: , off-diagonal , lower-right . Solve to obtain coefficients; the lag-2 coefficient is the PACF estimate. This step explicitly uses Linear Regression.
Interpreting ACF/PACF plots
Statistical significance: For large n, sample autocorrelations under white noise are approximately . A rough 95% confidence band is . Example: if , the band is . Any sample ACF outside that band suggests significant autocorrelation.
Thus ACF and PACF, combined with numerical rules and plots, are your first discovery tools in the Box–Jenkins cycle (coming later). They depend directly on Covariance and Correlation ideas and on Linear Regression for PACF estimation.
We now define the principal parametric families and show core formulas and estimation strategies. These models are linear in either past values or past shocks and are the workhorses of time series forecasting.
Autoregressive (AR) models
where is white noise with mean 0 and variance . This is analogous to regressing on its past p values (Linear Regression). To fit AR(p), one can use OLS on lagged regressors when the series is mean zero or include an intercept if needed. Another method uses the Yule–Walker equations derived from autocovariances.
Yule–Walker equations (for AR(p))
The theoretical autocovariances satisfy:
A practical way is to solve the linear system
Use sample autocovariances to get Yule–Walker estimators. Example: AR(1) Yule–Walker reduces to , so . Numerically, using the sample values from Section 1 where and , the Yule–Walker estimate is .
Moving Average (MA) models
The parameters multiply the unobserved past shocks , making direct linear regression impossible. Estimation typically proceeds via maximum likelihood or nonlinear least squares. However, the MA autocovariance has closed-form expressions in terms of and , which you can equate to sample autocovariances for method-of-moments estimation in small orders. Example: for MA(1) . If sample , solving yields . Solve numerically: discriminant , so . Positive root or ; invertibility constraint usually selects , so .
ARMA(p,q) models
Estimation is typically via maximum likelihood (often numerically executed) or conditional least squares.
ARIMA(p,d,q): dealing with nonstationarity
Forecasting with AR models (closed-form)
Estimation practicalities and model selection
Identifiability and invertibility
Concrete example illustrating estimation difference: Suppose sample ACF shows strong decay and PACF cuts after 2 lags: this suggests AR(2). Using sample autocovariances we solve Yule–Walker linear system to get and compute residuals then estimate as mean squared residual. This combines Covariance and Correlation (autocovariances) and Linear Regression (residual analysis).
In short, AR/MA/ARMA/ARIMA provide interpretable, mathematically tractable families; Yule–Walker and OLS give simple estimation for AR models, while MA parameters often need likelihood-based methods.
The Box–Jenkins (BJ) methodology is a structured four-step practical workflow for building ARIMA-class models. It turns the theoretical tools (ACF/PACF, AR/MA definitions, stationarity tests) into an applied recipe. Each step references ideas you've seen in previous sections and relies on Covariance and Correlation and Linear Regression for computations.
Box–Jenkins steps
1) Identification: Use plots of the time series, ACF, and PACF to decide whether differencing is needed and to propose model orders p and q. Look for patterns: slow ACF decay suggests nonstationarity or AR behavior; ACF cutoffs help detect MA terms. Example: If ACF decays slowly and first differences look stationary, set d=1 and then examine the ACF/PACF of differenced series.
2) Estimation: Fit candidate ARIMA(p,d,q) models via maximum likelihood or conditional least squares; estimate parameters . For AR parts you might use Yule–Walker to get initial estimates (a closed-form start) and then refine via ML. Example: Fit ARIMA(1,1,1) to data; the differencing makes the series stationary before estimation.
3) Diagnostic checking: Analyze residuals from the fitted model. Residual ACF should show no significant autocorrelation (randomness). Perform Ljung–Box test for joint autocorrelation up to lag m. Residuals should be approximately uncorrelated with constant variance and mean zero. If diagnostics fail, go back to step 1 and refine model orders.
4) Forecasting: With a validated model, compute forecasts and forecast intervals. ARIMA models allow iterative computation of point forecasts and mean squared error estimates for prediction intervals.
Concrete real-world examples and use cases
Seasonality and SARIMA: When periodic patterns exist, include seasonal terms. For seasonal period s, a SARIMA model includes factors like for seasonal AR and seasonal differencing . Numeric example: for monthly data with annual seasonality s=12, seasonal differencing with D=1 is .
Model evaluation and selection
Connections to machine learning and further topics
Practical pitfalls and tips
A short end-to-end numeric illustration
Suppose you have monthly sales series and see a linear upward trend. First difference to remove trend (d=1). The differenced series shows ACF cutting off at lag 1 and PACF decaying: candidate ARIMA(0,1,1). Fit MA(1) on differenced data; estimate and residuals look white. Forecast by integrating predicted differences forward to recover level forecasts. This is the Box–Jenkins loop in action.
In summary, Box–Jenkins ties together stationarity tests, ACF/PACF-guided identification, estimation (Yule–Walker, OLS, ML), and diagnostic checking to produce reliable forecasts. The concrete algebra and numerical examples throughout this lesson show how concepts from Covariance and Correlation and Linear Regression are repurposed to handle temporal dependence and forecasting tasks.
You observe the series x = [2, 4, 6, 8]. Compute the sample mean, sample autocovariance at lags 0 and 1 using denominator n=4, and the sample autocorrelation at lag 1.
Compute sample mean: .
Compute sample autocovariance at lag 0: . Differences squared: (2-5)^2=9, (4-5)^2=1, (6-5)^2=1, (8-5)^2=9. Sum = 20. So .
Compute sample autocovariance at lag 1: . Terms: (4-5)(2-5)=(-1)(-3)=3, (6-5)(4-5)=(1)(-1)=-1, (8-5)(6-5)=(3)(1)=3. Sum = 5. So .
Compute sample autocorrelation at lag 1: .
Summarize: , , , .
Insight: This example shows directly how the sample autocovariance and autocorrelation are computed from data and connects to the Covariance and Correlation prerequisite: lagged pairs play the same role as paired variables in a standard correlation computation.
Consider two processes: (A) AR(1) with and noise variance , (B) MA(1) with and noise variance . Compute theoretical ACF at lags 1,2,3 for both and use these to argue which pattern each would show on an ACF plot.
AR(1) theoretical ACF: .
Compute numeric values with : , , .
MA(1) theoretical ACF: and for .
Compute numeric values with : , , .
Interpretation: AR(1) ACF decays exponentially (0.6, 0.36, 0.216, ...). MA(1) ACF has a single non-zero lag (0.4 at lag 1) then zeros. So on a sample ACF plot, AR(1) would show gradually decreasing bars while MA(1) would show a significant bar at lag 1 and non-significant bars beyond.
Thus, if you observe ACF cutting off after lag 1, MA(1) is likely; if it decays, AR(1) is likely. PACF patterns provide complementary evidence.
Insight: This demonstrates the identification rule-of-thumb used in Box–Jenkins: ACF decay vs cut-off helps distinguish AR vs MA behavior. The numeric computations show concrete numbers you would expect in plots.
Using the sample autocovariances from worked example 1 (, ), estimate the AR(1) coefficient via Yule–Walker and compute the implied variance of residuals (using ).
Yule–Walker for AR(1) gives . Substitute sample values: .
Compute residual variance estimate: (this follows from decomposition of variance for AR(1)). Plug numbers: .
Compute , so .
Thus estimated AR(1) model is with estimated noise variance $4.6875$.
Check interpretation: the relatively small sample and short series make estimates noisy; in practice we use longer series and refine via maximum likelihood.
Insight: This example shows how Yule–Walker turns autocovariances into AR coefficients. It leverages the link between autocovariance structure and AR parameters and provides a direct, algebraic estimator without optimization.
Stationarity (especially weak stationarity) means constant mean and autocovariance depends only on lag; AR(1) is stationary iff |φ|<1, while a random walk is not.
Autocorrelation (ACF) and partial autocorrelation (PACF) are diagnostic tools: ACF measures linear dependence at lags; PACF measures direct dependence removing intermediates and is computed via regressions (use Linear Regression).
AR(p) models regress current value on past p values; MA(q) models are linear filters of past shocks; ARMA combines them; ARIMA adds differencing to handle nonstationarity.
Identification: AR models imply PACF cuts off after p and ACF decays; MA models imply ACF cuts off after q and PACF decays—use these rules in the Box–Jenkins identification step.
Yule–Walker and OLS give closed-form or simple linear-system estimators for AR parameters; MA and ARMA often require likelihood-based numerical optimization.
Box–Jenkins cycle (identify, estimate, diagnose, forecast) is a practical, iterative approach for building ARIMA-class models with checks via residual analysis and information criteria.
Every computational step builds on Covariance and Correlation (autocovariances) and Linear Regression (estimating lagged relationships or residuals).
Confusing stationarity with no trend: stationarity requires constant variance and autocovariances dependent only on lag, not merely detrending. Example: a process with stable seasonal variance but drifting mean is non-stationary unless differenced.
Using ACF/PACF rules mechanically with very small samples: sample ACF/PACF are noisy for small n; false cutoffs or apparent decays can mislead identification.
Treating MA parameters via OLS directly: MA models involve unobserved shocks so you cannot regress on past shocks; estimation requires likelihood or specialized methods.
Over-differencing a series: differencing more times than needed can introduce moving-average-like structure and degrade forecasts. Use tests (e.g., ADF) and visual inspection before differencing.
Easy: Given a stationary AR(1) process X_t = 0.8 X_{t-1} + ε_t with Var(ε_t)=1, compute the theoretical autocorrelation ρ_1 and ρ_3, and the stationary variance Var(X_t).
Hint: Use ρ_k = φ^k and Var(X_t)=σ^2/(1-φ^2).
ρ_1 = 0.8, ρ_3 = 0.8^3 = 0.512. Var(X_t) = 1/(1-0.8^2)=1/(1-0.64)=1/0.36 ≈ 2.7778.
Medium: You observe sample autocorrelations \hatρ1=0.45 and \hatρ2=0.2 from a long series (n large). Using the Yule–Walker equations for AR(2), solve approximately for φ1 and φ2 by solving the linear system [\hatρ1,\hatρ2]^T = [[1,\hatρ1],[\hatρ1,1]][φ1,φ2]^T.
Hint: Solve the 2×2 linear system: invert the matrix [[1,\hatρ1],[\hatρ1,1]] and multiply by [\hatρ1,\hatρ2].
Matrix A = [[1,0.45],[0.45,1]]. Determinant = 1-0.45^2 = 1-0.2025 = 0.7975. Inverse = (1/0.7975)[[1,-0.45],[-0.45,1]]. Multiply by vector r=[0.45,0.2]: φ = (1/0.7975)[10.45 -0.450.2 , -0.450.45 +10.2] = (1/0.7975)[0.45-0.09, -0.2025+0.2] = (1/0.7975)[0.36, -0.0025] ≈ [0.4517, -0.0031]. So φ1≈0.452, φ2≈-0.003.
Hard: Suppose you have monthly data with a strong annual seasonality (period s=12) and a linear upward trend. Outline and justify a Box–Jenkins modeling plan: preprocessing steps, identification choices (including possible seasonal orders), and how you would check residuals. Be explicit about differencing and SARIMA components.
Hint: Remove trend via differencing (first difference). Consider seasonal differencing (1-B^12). After differencing, examine ACF/PACF at seasonal lags (12,24) and nonseasonal lags to propose (p,d,q)(P,D,Q)_12. Use residual ACF/Ljung–Box to check.
Plan: 1) Take first difference to remove linear trend: Y_t = X_t - X_{t-1} (d=1). 2) Check seasonality: compute ACF of Y_t; if strong spikes at lags 12,24,... take seasonal difference (D=1): Z_t=(1-B^12)Y_t = (1-B)(1-B^{12})X_t with total differencing d=1,D=1. 3) On stationary Z_t, plot ACF/PACF. If ACF shows spike at lag 12 and PACF decays, consider seasonal MA term Q=1; if PACF spikes at lag 12 and ACF decays, consider seasonal AR P=1. For nonseasonal structure, use ACF/PACF at low lags to suggest p and q. For example candidate SARIMA(p,1,q)(P,1,Q)_12. 4) Fit candidate models (e.g., SARIMA(0,1,1)(0,1,1)_12), estimate parameters via ML, and inspect residuals: plot residual ACF (should be within ±1.96/√n), perform Ljung–Box test up to lag 24 or more to test joint correlation, check residual histogram/QQ-plot for normality assumption if needed. 5) If diagnostics fail, iterate: try different p,q,P,Q or additional differencing cautiously. 6) Validate via out-of-sample forecast performance. This plan explicitly uses seasonal differencing and SARIMA components and relies on ACF/PACF identification and diagnostic checks.
Looking back: This lesson reuses core ideas from your prerequisites. From Covariance and Correlation we extended covariance to autocovariance and defined autocorrelation , which are the numerical heart of the ACF. From Linear Regression we borrowed the idea of projecting onto explanatory variables: PACF at lag k is the coefficient on when regressing on lags 1..k, and OLS/residual calculations underpin AR estimation and diagnostics. Looking forward: mastering these foundations enables work on forecasting (point and interval forecasts), state-space models and the Kalman filter (used in engineering and econometrics), volatility modeling (GARCH models in finance), seasonal modeling (SARIMA), and modern probabilistic sequence models. Many machine learning sequence methods (RNNs, LSTMs) often require careful feature engineering and stationarity-aware preprocessing that derive from the ideas covered here. Specific downstream concepts that require this foundation include: Kalman filtering for state-space inference, GARCH for volatility clustering (which presumes understanding of serial dependence), and structural time series for decomposing trend/seasonality — all use ACF/PACF intuition, differencing/ARMA parametrizations, and residual diagnostics taught in this lesson.