Time Series Foundations

Probability & StatisticsDifficulty: ███░░Depth: 10Unlocks: 3

Stationarity, autocorrelation and partial autocorrelation functions. AR, MA, ARMA, ARIMA models. Box-Jenkins methodology.

Prerequisites (2)

Covariance and Correlation6 atoms

Linear Regression6 atoms

Unlocks (1)

State-Space Modelslvl 4

Many real-world datasets are sequences in time — from daily stock prices to hourly sensor readings — and understanding their time-dependent patterns lets you forecast, detect anomalies, and make decisions with confidence.

TL;DR:

Time series foundations give you tools to tell whether a sequence is stable over time, measure how past values influence future ones (autocorrelation and partial autocorrelation), build AR/MA/ARMA/ARIMA models, and apply the Box–Jenkins cycle to identify, estimate, and validate models for forecasting.

What Is Time Series Foundations?

Time series analysis studies observations indexed by time: $\{X_t: t \in \mathbb{Z}\}$ , where $X_t$ is the value at time $t$ . The foundations of time series cover three core concepts: stationarity (is the process stable over time?), dependence (how does $X_{t-k}$ affect $X_t$ ?), and parsimonious models that capture that dependence (AR, MA, ARMA, ARIMA). These let you move from raw data to forecasts and inference.

Why care? Because many applied problems are sequential: forecasting electricity load, modeling GDP, or controlling processes in engineering. A model that respects time-dependence is usually far more accurate than treating observations as iid. The prerequisites you already know — Covariance and Correlation (measuring linear relation) and Linear Regression (estimating linear coefficients via least squares) — are used heavily: sample autocovariances extend covariance to lags, and regression ideas underpin estimation of autoregressive parameters.

Stationarity: Intuitively, a stationary time series 'behaves the same' through time. We commonly use weak (covariance) stationarity: 1) constant mean $\mathbb{E}[X_t]=\mu$ for all $t$ , 2) autocovariance $\gamma(s,t)=\operatorname{Cov}(X_s,X_t)$ depends only on lag $|t-s|$ , i.e. $\gamma(h)=\operatorname{Cov}(X_{t},X_{t-h})$ . Strict stationarity (distribution invariance under time shifts) is stronger but weaker in practice because many models are only second-order.

Concrete example: Random walk vs AR(1)

•Random walk: $X_t = X_{t-1} + \varepsilon_t$ with $\varepsilon_t \sim (0,\sigma^2)$ . The mean of $X_t$ depends on $t$ (it drifts with accumulated noise), so it's non-stationary.
•AR(1): $X_t = \phi X_{t-1} + \varepsilon_t$ , with $|\phi|<1$ is stationary. For $|\phi|<1$ and $\varepsilon_t$ white noise with variance $\sigma^2$ , the stationary variance is

\operatorname{Var}(X_t)=\frac{\sigma^2}{1-\phi^2}.

Numeric example: if $\phi=0.6$ and $\sigma^2=1$ , then $\operatorname{Var}(X_t)=1/(1-0.36)=1/0.64=1.5625.$ This shows a finite, time-invariant variance.

Autocorrelation and autocovariance: the autocovariance at lag $k$ is

\gamma_k = \operatorname{Cov}(X_t,X_{t-k}),

and the autocorrelation is

\rho_k = \frac{\gamma_k}{\gamma_0},

where $\gamma_0=\operatorname{Var}(X_t)$ . These generalize the concepts from Covariance and Correlation. A quick sample calculation: suppose observed series $[2,4,6,8]$ . The sample mean $\bar X=5$ . The sample lag-1 autocovariance (using denominator $n$ for simplicity) is

\hat\gamma_1 = \frac{1}{4}\sum_{t=2}^4 (X_t-\bar X)(X_{t-1}-\bar X).

Compute terms: $(4-5)(2-5)=(-1)(-3)=3$ , $(6-5)(4-5)=(1)(-1)=-1$ , $(8-5)(6-5)=(3)(1)=3$ . Sum = 3-1+3=5. So $\hat\gamma_1 = 5/4 = 1.25$ . The sample variance $\hat\gamma_0 = \frac{1}{4}[(2-5)^2+(4-5)^2+(6-5)^2+(8-5)^2] = (9+1+1+9)/4 = 20/4 = 5$ . Thus sample autocorrelation at lag 1 is $\hat\rho_1 = 1.25/5 = 0.25$ .

Partial Autocorrelation (PACF): the PACF at lag $k$ , denoted $\phi_{kk}$ or $\alpha_k$ , measures the correlation between $X_t$ and $X_{t-k}$ after removing linear effects of intermediate lags $1\ldots k-1$ . A simple way to compute PACF is to fit the linear regression

X_t = \beta_0 + \beta_1 X_{t-1} + \dots + \beta_k X_{t-k} + \varepsilon_t

and take the coefficient on $X_{t-k}$ . This uses Linear Regression skills directly. Numeric example: for an AR(1) with coefficient $\phi=0.6$ , PACF at lag 1 is $0.6$ and at higher lags it is (theoretically) 0.

Model types (intuition):

•AR(p) (autoregressive of order p): $X_t = \phi_1 X_{t-1} + \dots + \phi_p X_{t-p} + \varepsilon_t$ . Like regressing current value on p past values. If p=1, AR(1) as above.
•MA(q) (moving average of order q): $X_t = \varepsilon_t + \theta_1 \varepsilon_{t-1} + \dots + \theta_q \varepsilon_{t-q}$ . This is a linear filter of white noise and has dependence that dies after q lags in the PACF/ACF in characteristic ways.
•ARMA(p,q): combination of both.
•ARIMA(p,d,q): integrate non-stationary series d times (differences) then fit ARMA(p,q).

These building blocks let you represent many time series parsimoniously and prepare for forecasting. The rest of the lesson develops the main mechanical tools: ACF/PACF, estimation via Yule–Walker and least squares, and the Box–Jenkins cycle for practical modeling.

Core Mechanic 1: Autocorrelation and Partial Autocorrelation (ACF & PACF)

ACF and PACF are the primary diagnostic tools for identifying the dependence structure in a stationary time series. They are analogous to the correlation and partial correlation concepts from Covariance and Correlation and use Linear Regression for PACF computations.

Definitions and sample formulas

•The theoretical autocovariance function (ACVF) is $\gamma_k = \mathrm{Cov}(X_t,X_{t-k})$ . The theoretical autocorrelation function (ACF) is $\rho_k = \gamma_k/\gamma_0$ .
•For an observed series $x_1,\dots,x_n$ , the sample autocovariance at lag $k$ is often taken as

\hat\gamma_k = \frac{1}{n}\sum_{t=k+1}^{n} (x_t-\bar x)(x_{t-k}-\bar x).

Example: with $[2,4,6,8]$ we computed $\hat\gamma_1=1.25$ and $\hat\gamma_0=5$ , so $\hat\rho_1=0.25$ (see Section 1). If you prefer the unbiased denominator $n-k$ replace $1/n$ with $1/(n-k)$ ; both are used in practice.

Properties of ACF for simple models (critical identification clues)

•AR(1): $X_t=\phi X_{t-1}+\varepsilon_t$ with $|\phi|<1$ has theoretical autocorrelation $\rho_k = \phi^k$ . Numeric example: $\phi=0.6$ gives $\rho_1=0.6$ , $\rho_2=0.36$ , $\rho_3=0.216$ , etc.
•MA(1): $X_t=\varepsilon_t+\theta\varepsilon_{t-1}$ has $\rho_1=\theta/(1+\theta^2)$ and $\rho_k=0$ for $k>1$ . Numeric example: $\theta=0.5$ gives $\rho_1=0.5/(1+0.25)=0.5/1.25=0.4$ , and $\rho_2=0$ .

Thus the ACF of AR(1) decays exponentially, while the ACF of MA(1) cuts off after lag 1. This distinction is foundational for identification.

Partial Autocorrelation Function (PACF)

•PACF at lag $k$ , denoted $\alpha_k$ or $\phi_{kk}$ , is the coefficient on $X_{t-k}$ in the best linear predictor of $X_t$ using $X_{t-1},\dots,X_{t-k}$ :

X_t = \beta_0 + \beta_1 X_{t-1} + \dots + \beta_k X_{t-k} + u_t,

then $\alpha_k = \beta_k$ from that regression. For stationary AR(p), the PACF cuts off after lag $p$ (i.e., $\alpha_k=0$ for $k>p$ ). For MA(q), the PACF decays gradually.

Computational methods

•Sample ACF is computed directly from $\hat\gamma_k$ as $\hat\rho_k = \hat\gamma_k/\hat\gamma_0$ .
•Sample PACF can be estimated by fitting the regression above for each $k$ (this uses Linear Regression knowledge). Another route uses the Durbin–Levinson recursion or solving Yule–Walker linear systems; both rely on sample autocovariances.

Small numeric demonstration of PACF via regression: Data: $x=[1.2, 0.9, 1.1, 1.4, 1.3]$ . Compute $\bar x=1.18$ . To estimate PACF at lag 2 (i.e., coefficient on $X_{t-2}$ when regressing on lags 1 and 2), form the design matrix for t=3..5:

X = [[x_2,x_1],[x_3,x_2],[x_4,x_3]] = [[0.9,1.2],[1.1,0.9],[1.4,1.1]].

Response y = [x_3,x_4,x_5] = [1.1,1.4,1.3].

Run a small OLS: $\beta=(X^TX)^{-1}X^T y$ . Numeric calculation (sketch): $X^TX = \begin{pmatrix}0.9^2+1.1^2+1.4^2 & 0.9\cdot1.2+1.1\cdot0.9+1.4\cdot1.1 \\ same & 1.2^2+0.9^2+1.1^2\end{pmatrix}$ . Compute numbers: $0.81+1.21+1.96=3.98$ , off-diagonal $1.08+0.99+1.54=3.61$ , lower-right $1.44+0.81+1.21=3.46$ . Solve to obtain coefficients; the lag-2 coefficient is the PACF estimate. This step explicitly uses Linear Regression.

Interpreting ACF/PACF plots

•If ACF decays slowly and PACF cuts off after p lags: suggest AR(p).
•If ACF cuts off after q lags and PACF decays: suggest MA(q).
•Mixed decay patterns suggest ARMA models.

Statistical significance: For large n, sample autocorrelations under white noise are approximately $N(0,1/n)$ . A rough 95% confidence band is $\pm 1.96/\sqrt{n}$ . Example: if $n=100$ , the band is $\pm 0.196$ . Any sample ACF outside that band suggests significant autocorrelation.

Thus ACF and PACF, combined with numerical rules and plots, are your first discovery tools in the Box–Jenkins cycle (coming later). They depend directly on Covariance and Correlation ideas and on Linear Regression for PACF estimation.

Core Mechanic 2: AR, MA, ARMA, ARIMA Models and Estimation

We now define the principal parametric families and show core formulas and estimation strategies. These models are linear in either past values or past shocks and are the workhorses of time series forecasting.

Autoregressive (AR) models

•AR(p) model:

X_t = \phi_1 X_{t-1} + \phi_2 X_{t-2} + \dots + \phi_p X_{t-p} + \varepsilon_t,

where $\varepsilon_t$ is white noise with mean 0 and variance $\sigma^2$ . This is analogous to regressing $X_t$ on its past p values (Linear Regression). To fit AR(p), one can use OLS on lagged regressors when the series is mean zero or include an intercept if needed. Another method uses the Yule–Walker equations derived from autocovariances.

Yule–Walker equations (for AR(p))

The theoretical autocovariances satisfy:

\begin{pmatrix}\gamma_0 \\ \gamma_1 \\ \vdots \\ \gamma_{p-1}\end{pmatrix} = \begin{pmatrix}1 & \phi_1 & \dots & \phi_{p-1} \\\phi_1 & 1 & \dots & \phi_{p-2} \\\vdots & & \ddots & \\\phi_{p-1} & \dots & \phi_1 & 1\end{pmatrix} \begin{pmatrix}\sigma^2 \\\ 0 \\\ \vdots\end{pmatrix}

A practical way is to solve the linear system

\begin{pmatrix}\gamma_1 \\\gamma_2 \\\ \vdots \\\gamma_p\end{pmatrix} = \begin{pmatrix}\gamma_0 & \gamma_1 & \dots & \gamma_{p-1} \\\gamma_1 & \gamma_0 & \dots & \gamma_{p-2} \\\vdots & & \ddots & \\\gamma_{p-1} & \dots & \gamma_0\end{pmatrix} \begin{pmatrix}\phi_1 \\\phi_2 \\\ \vdots \\\phi_p\end{pmatrix}.

Use sample autocovariances to get Yule–Walker estimators. Example: AR(1) Yule–Walker reduces to $\gamma_1 = \phi \gamma_0$ , so $\phi = \gamma_1/\gamma_0$ . Numerically, using the sample values from Section 1 where $\hat\gamma_1=1.25$ and $\hat\gamma_0=5$ , the Yule–Walker estimate is $\hat\phi = 1.25/5 = 0.25$ .

Moving Average (MA) models

•MA(q) model:

X_t = \varepsilon_t + \theta_1 \varepsilon_{t-1} + \dots + \theta_q \varepsilon_{t-q}.

The parameters $\theta_j$ multiply the unobserved past shocks $\varepsilon_{t-j}$ , making direct linear regression impossible. Estimation typically proceeds via maximum likelihood or nonlinear least squares. However, the MA autocovariance has closed-form expressions in terms of $\theta$ and $\sigma^2$ , which you can equate to sample autocovariances for method-of-moments estimation in small orders. Example: for MA(1) $\rho_1 = \theta/(1+\theta^2)$ . If sample $\hat\rho_1=0.4$ , solving $0.4 = \theta/(1+\theta^2)$ yields $0.4+0.4\theta^2=\theta \Rightarrow 0.4\theta^2 - \theta + 0.4=0$ . Solve numerically: discriminant $=1-4\cdot0.4^2=1-0.64=0.36$ , so $\theta=(1\pm 0.6)/(0.8)$ . Positive root $(1-0.6)/0.8=0.4/0.8=0.5$ or $(1+0.6)/0.8=1.6/0.8=2$ ; invertibility constraint usually selects $|\theta|<1$ , so $\theta=0.5$ .

ARMA(p,q) models

•ARMA(p,q): combines AR and MA parts:

X_t = \phi_1 X_{t-1} + \dots + \phi_p X_{t-p} + \varepsilon_t + \theta_1 \varepsilon_{t-1} + \dots + \theta_q \varepsilon_{t-q}.

Estimation is typically via maximum likelihood (often numerically executed) or conditional least squares.

ARIMA(p,d,q): dealing with nonstationarity

•If the series is non-stationary, differencing can remove trends. A first difference is $Y_t = X_t - X_{t-1}$ . The ARIMA(p,d,q) model means the d-th difference of $X_t$ follows an ARMA(p,q). For example, a random walk $X_t = X_{t-1} + \varepsilon_t$ is ARIMA(0,1,0).

Forecasting with AR models (closed-form)

•For AR(1) with mean zero: $X_t = \phi X_{t-1} + \varepsilon_t$ , the h-step ahead forecast from time t is

\hat X_{t+h|t} = \phi^h X_t.$$ Numeric example: $\phi=0.6$, $X_t=10$, the 3-step forecast is $0.6^3\cdot 10 = 0.216\cdot 10 = 2.16.$ - With a nonzero mean $\mu$, the forecast becomes $$\hat X_{t+h|t} = \mu + \phi^h (X_t-\mu).

Estimation practicalities and model selection

•Fit candidate models (identified via ACF/PACF), estimate parameters (Yule–Walker/OLS/ML), check diagnostics (residuals should resemble white noise), and compare models with AIC/BIC. Use overfitting caution: more parameters reduce residual variance but can harm out-of-sample forecasts.

Identifiability and invertibility

•AR models require roots of the characteristic polynomial $1-\phi_1 z - \dots - \phi_p z^p$ lie outside the unit circle for stationarity. MA models require the invertibility condition (roots of $1+\theta_1 z + \dots$ outside unit circle). Checking these ensures unique parameterizations.

Concrete example illustrating estimation difference: Suppose sample ACF shows strong decay and PACF cuts after 2 lags: this suggests AR(2). Using sample autocovariances $\hat\gamma_1,\hat\gamma_2$ we solve Yule–Walker linear system to get $\hat\phi_1,\hat\phi_2$ and compute residuals $\hat\varepsilon_t$ then estimate $\hat\sigma^2$ as mean squared residual. This combines Covariance and Correlation (autocovariances) and Linear Regression (residual analysis).

In short, AR/MA/ARMA/ARIMA provide interpretable, mathematically tractable families; Yule–Walker and OLS give simple estimation for AR models, while MA parameters often need likelihood-based methods.

Applications and Connections: Box–Jenkins Methodology and Real-World Use

The Box–Jenkins (BJ) methodology is a structured four-step practical workflow for building ARIMA-class models. It turns the theoretical tools (ACF/PACF, AR/MA definitions, stationarity tests) into an applied recipe. Each step references ideas you've seen in previous sections and relies on Covariance and Correlation and Linear Regression for computations.

Box–Jenkins steps

1) Identification: Use plots of the time series, ACF, and PACF to decide whether differencing is needed and to propose model orders p and q. Look for patterns: slow ACF decay suggests nonstationarity or AR behavior; ACF cutoffs help detect MA terms. Example: If ACF decays slowly and first differences look stationary, set d=1 and then examine the ACF/PACF of differenced series.

2) Estimation: Fit candidate ARIMA(p,d,q) models via maximum likelihood or conditional least squares; estimate parameters $\phi,\theta,\sigma^2$ . For AR parts you might use Yule–Walker to get initial estimates (a closed-form start) and then refine via ML. Example: Fit ARIMA(1,1,1) to data; the differencing makes the series stationary before estimation.

3) Diagnostic checking: Analyze residuals $\hat\varepsilon_t$ from the fitted model. Residual ACF should show no significant autocorrelation (randomness). Perform Ljung–Box test for joint autocorrelation up to lag m. Residuals should be approximately uncorrelated with constant variance and mean zero. If diagnostics fail, go back to step 1 and refine model orders.

4) Forecasting: With a validated model, compute forecasts and forecast intervals. ARIMA models allow iterative computation of point forecasts and mean squared error estimates for prediction intervals.

Concrete real-world examples and use cases

•Economics: GDP and inflation series often modeled with ARIMA for medium-term forecasting. Example: GDP growth series are often stationary after first differencing (d=1), motivating ARIMA( p,1,q ) fits.
•Finance: Daily returns are often roughly stationary and may require ARMA to capture serial dependence; however, volatility clustering is typically modeled with GARCH, which builds on AR foundations.
•Energy: Electricity load often displays strong daily/weekly seasonality. Seasonal ARIMA (SARIMA) extends ARIMA with seasonal differencing and seasonal AR/MA terms: SARIMA(p,d,q)(P,D,Q)_s. Example: hourly load with daily seasonality s=24 might need D=1 or SAR terms.

Seasonality and SARIMA: When periodic patterns exist, include seasonal terms. For seasonal period s, a SARIMA model includes factors like $(1 - \Phi_1 B^s - \dots )$ for seasonal AR and seasonal differencing $(1-B^s)^D$ . Numeric example: for monthly data with annual seasonality s=12, seasonal differencing with D=1 is $Y_t = (1-B^{12})X_t = X_t - X_{t-12}$ .

Model evaluation and selection

•Use information criteria: AIC = $-2\log L + 2k$ , BIC = $-2\log L + k\log n$ , where $k$ is number of parameters. Lower values are preferred. Example: Compare AR(1) (k small) vs ARMA(1,1) (k larger); pick model minimizing AIC with parsimony in mind.

Connections to machine learning and further topics

•State-space models and the Kalman filter generalize ARIMA and provide efficient likelihood computation and handling of missing data; they underpin many time-series ML methods. Recurrent neural networks and sequence models are flexible alternatives but lack statistical interpretability.

Practical pitfalls and tips

•Over-differencing can destroy structure and induce invertibility issues: only difference to achieve stationarity (use tests like the augmented Dickey–Fuller test).
•Inspect residuals visually and with tests; small sample sizes make ACF/PACF noisy.
•Start with simple models (parsimony) and gradually increase complexity.

A short end-to-end numeric illustration

Suppose you have monthly sales series and see a linear upward trend. First difference to remove trend (d=1). The differenced series shows ACF cutting off at lag 1 and PACF decaying: candidate ARIMA(0,1,1). Fit MA(1) on differenced data; estimate $\theta\approx0.5$ and residuals look white. Forecast by integrating predicted differences forward to recover level forecasts. This is the Box–Jenkins loop in action.

In summary, Box–Jenkins ties together stationarity tests, ACF/PACF-guided identification, estimation (Yule–Walker, OLS, ML), and diagnostic checking to produce reliable forecasts. The concrete algebra and numerical examples throughout this lesson show how concepts from Covariance and Correlation and Linear Regression are repurposed to handle temporal dependence and forecasting tasks.

Worked Examples (3)

Compute sample ACF for a small series

You observe the series x = [2, 4, 6, 8]. Compute the sample mean, sample autocovariance at lags 0 and 1 using denominator n=4, and the sample autocorrelation at lag 1.

Compute sample mean: $\bar x = (2+4+6+8)/4 = 20/4 = 5$ .
Compute sample autocovariance at lag 0: $\hat\gamma_0 = \frac{1}{4}\sum_{t=1}^4 (x_t-\bar x)^2$ . Differences squared: (2-5)^2=9, (4-5)^2=1, (6-5)^2=1, (8-5)^2=9. Sum = 20. So $\hat\gamma_0 = 20/4 = 5$ .
Compute sample autocovariance at lag 1: $\hat\gamma_1 = \frac{1}{4}\sum_{t=2}^4 (x_t-\bar x)(x_{t-1}-\bar x)$ . Terms: (4-5)(2-5)=(-1)(-3)=3, (6-5)(4-5)=(1)(-1)=-1, (8-5)(6-5)=(3)(1)=3. Sum = 5. So $\hat\gamma_1 = 5/4 = 1.25$ .
Compute sample autocorrelation at lag 1: $\hat\rho_1 = \hat\gamma_1 / \hat\gamma_0 = 1.25/5 = 0.25$ .
Summarize: $\bar x=5$ , $\hat\gamma_0=5$ , $\hat\gamma_1=1.25$ , $\hat\rho_1=0.25$ .

Insight: This example shows directly how the sample autocovariance and autocorrelation are computed from data and connects to the Covariance and Correlation prerequisite: lagged pairs play the same role as paired variables in a standard correlation computation.

Identify AR(1) vs MA(1) from theoretical ACF

Consider two processes: (A) AR(1) with $\phi=0.6$ and noise variance $\sigma^2=1$ , (B) MA(1) with $\theta=0.5$ and noise variance $\sigma^2=1$ . Compute theoretical ACF at lags 1,2,3 for both and use these to argue which pattern each would show on an ACF plot.

AR(1) theoretical ACF: $\rho_k = \phi^k$ .
Compute numeric values with $\phi=0.6$ : $\rho_1=0.6$ , $\rho_2=0.6^2=0.36$ , $\rho_3=0.6^3=0.216$ .
MA(1) theoretical ACF: $\rho_1 = \theta/(1+\theta^2)$ and $\rho_k=0$ for $k>1$ .
Compute numeric values with $\theta=0.5$ : $\rho_1=0.5/(1+0.25)=0.5/1.25=0.4$ , $\rho_2=0$ , $\rho_3=0$ .
Interpretation: AR(1) ACF decays exponentially (0.6, 0.36, 0.216, ...). MA(1) ACF has a single non-zero lag (0.4 at lag 1) then zeros. So on a sample ACF plot, AR(1) would show gradually decreasing bars while MA(1) would show a significant bar at lag 1 and non-significant bars beyond.
Thus, if you observe ACF cutting off after lag 1, MA(1) is likely; if it decays, AR(1) is likely. PACF patterns provide complementary evidence.

Insight: This demonstrates the identification rule-of-thumb used in Box–Jenkins: ACF decay vs cut-off helps distinguish AR vs MA behavior. The numeric computations show concrete numbers you would expect in plots.

Yule–Walker estimation for AR(1)

Using the sample autocovariances from worked example 1 ( $\hat\gamma_0=5$ , $\hat\gamma_1=1.25$ ), estimate the AR(1) coefficient $\phi$ via Yule–Walker and compute the implied variance $\sigma^2$ of residuals (using $\hat\sigma^2 = \hat\gamma_0 - \hat\phi\hat\gamma_1$ ).

Yule–Walker for AR(1) gives $\phi = \gamma_1 / \gamma_0$ . Substitute sample values: $\hat\phi = 1.25 / 5 = 0.25$ .
Compute residual variance estimate: $\hat\sigma^2 = \hat\gamma_0 - \hat\phi \hat\gamma_1$ (this follows from decomposition of variance for AR(1)). Plug numbers: $\hat\sigma^2 = 5 - 0.25 \cdot 1.25$ .
Compute $0.25\cdot 1.25 = 0.3125$ , so $\hat\sigma^2 = 5 - 0.3125 = 4.6875$ .
Thus estimated AR(1) model is $X_t = 0.25 X_{t-1} + \varepsilon_t$ with estimated noise variance $4.6875$.
Check interpretation: the relatively small sample and short series make estimates noisy; in practice we use longer series and refine via maximum likelihood.

Insight: This example shows how Yule–Walker turns autocovariances into AR coefficients. It leverages the link between autocovariance structure and AR parameters and provides a direct, algebraic estimator without optimization.

Key Takeaways

✓
Stationarity (especially weak stationarity) means constant mean and autocovariance depends only on lag; AR(1) is stationary iff |φ|<1, while a random walk is not.
✓
Autocorrelation (ACF) and partial autocorrelation (PACF) are diagnostic tools: ACF measures linear dependence at lags; PACF measures direct dependence removing intermediates and is computed via regressions (use Linear Regression).
✓
AR(p) models regress current value on past p values; MA(q) models are linear filters of past shocks; ARMA combines them; ARIMA adds differencing to handle nonstationarity.
✓
Identification: AR models imply PACF cuts off after p and ACF decays; MA models imply ACF cuts off after q and PACF decays—use these rules in the Box–Jenkins identification step.
✓
Yule–Walker and OLS give closed-form or simple linear-system estimators for AR parameters; MA and ARMA often require likelihood-based numerical optimization.
✓
Box–Jenkins cycle (identify, estimate, diagnose, forecast) is a practical, iterative approach for building ARIMA-class models with checks via residual analysis and information criteria.
✓
Every computational step builds on Covariance and Correlation (autocovariances) and Linear Regression (estimating lagged relationships or residuals).

Common Mistakes

✗
Confusing stationarity with no trend: stationarity requires constant variance and autocovariances dependent only on lag, not merely detrending. Example: a process with stable seasonal variance but drifting mean is non-stationary unless differenced.
✗
Using ACF/PACF rules mechanically with very small samples: sample ACF/PACF are noisy for small n; false cutoffs or apparent decays can mislead identification.
✗
Treating MA parameters via OLS directly: MA models involve unobserved shocks $\varepsilon_t$ so you cannot regress $X_t$ on past shocks; estimation requires likelihood or specialized methods.
✗
Over-differencing a series: differencing more times than needed can introduce moving-average-like structure and degrade forecasts. Use tests (e.g., ADF) and visual inspection before differencing.

Practice

easy

Easy: Given a stationary AR(1) process X_t = 0.8 X_{t-1} + ε_t with Var(ε_t)=1, compute the theoretical autocorrelation ρ_1 and ρ_3, and the stationary variance Var(X_t).

Hint: Use ρ_k = φ^k and Var(X_t)=σ^2/(1-φ^2).

Show solution

ρ_1 = 0.8, ρ_3 = 0.8^3 = 0.512. Var(X_t) = 1/(1-0.8^2)=1/(1-0.64)=1/0.36 ≈ 2.7778.

medium

Medium: You observe sample autocorrelations \hatρ1=0.45 and \hatρ2=0.2 from a long series (n large). Using the Yule–Walker equations for AR(2), solve approximately for φ1 and φ2 by solving the linear system [\hatρ1,\hatρ2]^T = [[1,\hatρ1],[\hatρ1,1]][φ1,φ2]^T.

Hint: Solve the 2×2 linear system: invert the matrix [[1,\hatρ1],[\hatρ1,1]] and multiply by [\hatρ1,\hatρ2].

Show solution

Matrix A = [[1,0.45],[0.45,1]]. Determinant = 1-0.45^2 = 1-0.2025 = 0.7975. Inverse = (1/0.7975)[[1,-0.45],[-0.45,1]]. Multiply by vector r=[0.45,0.2]: φ = (1/0.7975)[10.45 -0.450.2 , -0.450.45 +10.2] = (1/0.7975)[0.45-0.09, -0.2025+0.2] = (1/0.7975)[0.36, -0.0025] ≈ [0.4517, -0.0031]. So φ1≈0.452, φ2≈-0.003.

hard

Hard: Suppose you have monthly data with a strong annual seasonality (period s=12) and a linear upward trend. Outline and justify a Box–Jenkins modeling plan: preprocessing steps, identification choices (including possible seasonal orders), and how you would check residuals. Be explicit about differencing and SARIMA components.

Hint: Remove trend via differencing (first difference). Consider seasonal differencing (1-B^12). After differencing, examine ACF/PACF at seasonal lags (12,24) and nonseasonal lags to propose (p,d,q)(P,D,Q)_12. Use residual ACF/Ljung–Box to check.

Show solution

Plan: 1) Take first difference to remove linear trend: Y_t = X_t - X_{t-1} (d=1). 2) Check seasonality: compute ACF of Y_t; if strong spikes at lags 12,24,... take seasonal difference (D=1): Z_t=(1-B^12)Y_t = (1-B)(1-B^{12})X_t with total differencing d=1,D=1. 3) On stationary Z_t, plot ACF/PACF. If ACF shows spike at lag 12 and PACF decays, consider seasonal MA term Q=1; if PACF spikes at lag 12 and ACF decays, consider seasonal AR P=1. For nonseasonal structure, use ACF/PACF at low lags to suggest p and q. For example candidate SARIMA(p,1,q)(P,1,Q)_12. 4) Fit candidate models (e.g., SARIMA(0,1,1)(0,1,1)_12), estimate parameters via ML, and inspect residuals: plot residual ACF (should be within ±1.96/√n), perform Ljung–Box test up to lag 24 or more to test joint correlation, check residual histogram/QQ-plot for normality assumption if needed. 5) If diagnostics fail, iterate: try different p,q,P,Q or additional differencing cautiously. 6) Validate via out-of-sample forecast performance. This plan explicitly uses seasonal differencing and SARIMA components and relies on ACF/PACF identification and diagnostic checks.

Connections

Looking back: This lesson reuses core ideas from your prerequisites. From Covariance and Correlation we extended covariance to autocovariance $\gamma_k=\mathrm{Cov}(X_t,X_{t-k})$ and defined autocorrelation $\rho_k$ , which are the numerical heart of the ACF. From Linear Regression we borrowed the idea of projecting onto explanatory variables: PACF at lag k is the coefficient on $X_{t-k}$ when regressing $X_t$ on lags 1..k, and OLS/residual calculations underpin AR estimation and diagnostics. Looking forward: mastering these foundations enables work on forecasting (point and interval forecasts), state-space models and the Kalman filter (used in engineering and econometrics), volatility modeling (GARCH models in finance), seasonal modeling (SARIMA), and modern probabilistic sequence models. Many machine learning sequence methods (RNNs, LSTMs) often require careful feature engineering and stationarity-aware preprocessing that derive from the ideas covered here. Specific downstream concepts that require this foundation include: Kalman filtering for state-space inference, GARCH for volatility clustering (which presumes understanding of serial dependence), and structural time series for decomposing trend/seasonality — all use ACF/PACF intuition, differencing/ARMA parametrizations, and residual diagnostics taught in this lesson.

Quality: pending (0.0/5)

Time Series Foundations

Prerequisites (2)

Unlocks (1)

Graph Position

What Is Time Series Foundations?

Core Mechanic 1: Autocorrelation and Partial Autocorrelation (ACF & PACF)

Core Mechanic 2: AR, MA, ARMA, ARIMA Models and Estimation

Applications and Connections: Box–Jenkins Methodology and Real-World Use

Worked Examples (3)

Compute sample ACF for a small series

Identify AR(1) vs MA(1) from theoretical ACF

Yule–Walker estimation for AR(1)

Key Takeaways

Common Mistakes

Practice

Connections