Notes for Stochastic System

Combined notes for part of EECS 501 & MECHENG 549.

Table of contents

Prerequisites: Knowledge of Elementary Calculus, Linear Algebra and Probability

Discrete-Time Stochastic System

Stochastic Sequences

  • Definition: Given $k\in\mathbb{K}\subseteq\mathbb{Z}$ a sequence of integers, $\mathcal{X}(k,\omega): (\Omega,\mathcal{F},\mathbb{P})\to(\mathbb{R}^n,\mathcal{F}_ \mathcal{X},\mathbb{P}_ \mathcal{X})$ is a random/stochastic sequence.
  • Uncertainties: Consider a casual system $F$ relates some scalar inputs $u(k)$ to output $x(k)$
    • Epistemic/Model uncertainty: $\mathcal{X}(k,\omega)=F(k,u(k),u(k-1),\ldots,\omega)$. (system is stochastic and input is deterministic).
    • Aleatoric/Input uncertainty: $\mathcal{X}(k,\omega)=f(k,U(k,\omega),u(k-1,\omega),\ldots)$ (system is deterministic and input is stochastic).
  • Realization: An outcome $\mathcal{X}(k,\omega)=x(k)$ given $\omega$ is called a realization of stochastic sequence $\mathcal{X}$
  • Terminology and Convention
    • $\mathcal{X}(k,\omega)$ is often written as $\mathcal{X}(k)$ when there’s no ambiguity.
    • $\mathbb{K}=\mathbb{Z}$ if not specified.
    • Sequence over a set $\mathcal{K}_1\subseteq\mathbb{K}$ are denoted $\mathcal{X}(\mathcal{K}_1)$.
    • $\mathcal{X}$ denotes $\mathcal{X}(\mathbb{K})$ if not specified.
    • Consecutive subsequence: $$\mathcal{X}(k:l)=\{\mathcal{X}(k),\mathcal{X}(k+1),\ldots,\mathcal{X}(l)\},\;x(k:l)=\{x(k),x(k+1),\ldots,x(l)\}$$
    • Abbreviations:
      • SS - stochastic sequence
      • IID - independent indentically distributed

Probabilistic characterization

  • Distribution and density: $$F_ \mathcal{X}\left(k:l;x(k:l)\right)\equiv\mathbb{P}((\mathcal{X}_ i(k)\leqslant x_i(k))\cap\cdots\cap(\mathcal{X}_ i(l)\leqslant x_ i(l)),\;i=1\ldots n)$$ $$f_ \mathcal{X}\left(k:l;x(k:l)\right)\equiv \frac{\partial^{n(l-k+1)}}{\partial x_ 1(k)\cdots\partial x_ n(l)}F_ \mathcal{X}(k:l;x(k:l))$$ Here $k:l$ actually denotes a set of consecutive integers, it can be also changed to ordinary sets $\{k,l\}$ or single scalar $k$.
  • Ensemble Average: $\mathbb{E}[\psi(\mathcal{X}(k))]$, doing summation over different realization at same time $k$
    • Mean: $\mu_ \mathcal{X}(k)\equiv\mathbb{E}[\mathcal{X}(k)]=\int^\infty_{-\infty}x(k)f_ \mathcal{X}(k;x(k))\mathrm{d}x(k)$
    • Conditional Mean: $\mu_ \mathcal{X}(l|k)\equiv\mathbb{E}[\mathcal{X}(l)|\mathcal{X}(k)=x(k)]$
  • Time Average: $\frac{1}{2N+1}\sum^N_{k=-N}\psi(\mathcal{X}(k))$, doing summation over different time k of same realization
  • Autocorrelation:
    • Scalar case: $r_ \mathcal{X}(k,l)\equiv\mathbb{E}[\mathcal{X}(k)\mathcal{X}(l)]=\int^\infty_{-\infty}\int^\infty_{-\infty}x(k)x(l)f_ \mathcal{X}(k,l;x(k,l))\mathrm{d}x(k)\mathrm{d}x(l)$
    • Vector case: $R_ \mathcal{X}(k,l)\equiv\mathbb{E}[\mathcal{X}(k)\mathcal{X}^\top(l)]$
    • Conditional autocorrelation: $R_ \mathcal{X}(k,l|q)\equiv\mathbb{E}[\mathcal{X}(k)\mathcal{X}^\top(l)|\mathcal{X}(q)=x(q)]$
    • Often we denote $C_ \mathcal{X}(k)=R_ \mathcal{X}(k,k)$
  • Autocovariance:
    • Scalar case: $\kappa_ \mathcal{X}(k,l)\equiv\mathbb{E}[(\mathcal{X}(k)-\mu_ \mathcal{X}(k))(\mathcal{X}(l)-\mu_ \mathcal{X}(l))]$
    • Vector case: $\mathrm{K}_ \mathcal{X}(k,l)\equiv\mathbb{E}[(\mathcal{X}(k)-\mu_ \mathcal{X}(k))(\mathcal{X}(l)-\mu_ \mathcal{X}(l))^\top]$
    • Conditional autocovariance: $\mathrm{K}_ \mathcal{X}(k,l|q)\equiv\mathbb{E}[(\mathcal{X}(k)-\mu_ \mathcal{X}(k|q))(\mathcal{X}(l)-\mu_ \mathcal{X}(l|q))^\top]$
    • Often we denote $S_ \mathcal{X}(k|q)=\mathrm{K}_ \mathcal{X}(k,k|q)$
    • Useful conclusion: $\mathrm{K}(a,b)=\mathrm{K}(b,a)^T$
    • Normalized (autocorrelation coefficient): $\rho_ \mathcal{X}(k,l)\equiv\mathrm{K}_ \mathcal{X}(k,l)/\sigma^2_{\mathcal{X}(k)}\sigma^2_{\mathcal{X}(l)}$
  • Strong Stationarity(aka. strict sense): (necessarily identically distributed over time) $$\forall x(k:l)\in\mathbb{R}^{n(l-k+1)},\;\forall s\in\mathbb{Z},\;f_ \mathcal{X}(k:l;x(k:l))=f_ \mathcal{X}(k+s:l+s;x(k:l))$$
  • Weak Stationarity(aka. wide sense): $\forall k,l$ if $$\mu_ \mathcal{X}(k)=\mu_ \mathcal{X}(l)\;\text{and}\;\mathrm{K}_ \mathcal{X}(k,l)=\mathrm{K}_ \mathcal{X}(k+s,l+s)\equiv\bar{\mathrm{K}}_ \mathcal{X}(s)$$ Weak stationarity is necessary condition for stationarity. (Equal when Gaussian distributed)
  • Ergodicity: $\mathcal{X}$ is called ergodic in $\psi$ if
    1. $\mathbb{E}[\psi(\mathcal{X})]$ is stationary
    2. Ensemble average is equal to Time average, that is $$ \frac{1}{2N+1}\sum^N_{k=-N}\psi(\mathcal{X}(k))\to\mathbb{E}[\psi(\mathcal{X})];\text{as};l\to \infty$$

Markov Sequence

  • Markov Sequence: A ss. $\mathcal{X}(k)$ is called a (discrete-time) Markov sequence if $$\begin{split}f_ \mathcal{X}\left(k;x(k)\middle|\mathcal{X}(k-1)=x(k-1),\mathcal{X}(k-2)=x(k-2),\ldots\right) \\=f_ \mathcal{X}(k;x(k)|\mathcal{X}(k-1)=x(k-1))\end{split}$$
    • We often make some assumption on the initial condition $\mathcal{X}(0)$, such as known, deteministic or uniformly distributed within certain domain.
  • Markov Chains: Markov sequence with discrete set of values(states) ${x_1\ldots x_m}$
  • Hidden Markov model: Sequence $\mathcal{Y}$ is called a Hidden Markov Model if it’s modeled by a system of the form $$\begin{align}\mathcal{X}(k+1)&=g(k,\mathcal{X}(k),\mathcal{W}(k)) \\ \mathcal{Y}(k)&=h(k,\mathcal{X}(k),\mathcal{W}(k))\end{align}$$ We also say that $\mathcal{Y}$ has a (discrete-time) stochastic state space.
  • Guassian-Markov Sequence (GMS): $\mathcal{X}(k+1)=g(k,\mathcal{X}(k),\mathcal{W}(k))$ where $\mathcal{W}(k)$ is iid. Guassian

Linear Stochastic Sequence

$$\begin{align}\mathcal{X}(k+1)&=A(k)\mathcal{X}(k)+B(k)\mathcal{W}(k) \\ \mathcal{Y}(k)&=C(k)\mathcal{X}(k)+D(k)\mathcal{W}(k)\end{align}$$

For linear Markov sequences, the deterministic mean sequence and centered uncertain sequence completely decouple. So we often assume that $\mathcal{X}(k)$ and $\mathcal{Y}(k)$ are centered in this case, with regard to deterministic inputs. The equation with deterministic inputs is often written as $$\mathcal{X}(k+1)=A(k)\mathcal{X}(k)+B_u(k)u(k)+B_\mathcal{W}(k)\mathcal{W}(k)$$

  • Recursive-form expectations:

    • Mean: $\mu_ \mathcal{X}(k+1|q)=A(k)\mu_ \mathcal{X}(k|q)+B(k)\mu_\mathcal{W}(k)$
    • Covariance (Discrete-time algebraic Lyapunov/Stein difference equation): $$S_ \mathcal{X}(k+1|q)=A(k)S_ \mathcal{X}(k|q)A^\top(k)+B(k)S_\mathcal{W}(k)B^\top(k)$$

      Can be solved with dlyap in MATLAB

  • Convergence when $k\to\infty$:

    • Mean convergence: $\mu_ \mathcal{X}(k|q)$ converges requires $\max_i|\lambda_i(A)|<1$ ($\lambda$ denotes eigenvalue)
    • Covariance convergence: $S_ \mathcal{X}(k|q)$ converges requires $\max_i|\lambda_i(A)|<1$
    • (Discrete-time) Lyapunov equation (Stein equation): $\bar{S}_ \mathcal{X}=A\bar{S}_ \mathcal{X}A^\top+B\bar{S}_\mathcal{W}(k)B^\top$. Solution for this equation exists iff. $A$ is asymptotically stable (characterizing sequence is stationary).
  • Explicit state transition: By recursive substitution, $$\mathcal{X}(k)=\Psi(k,q)\mathcal{X}(q)+\sum^{k-1}_ {i=q}\Gamma(k,i)\mathcal{W}(i)$$ where state transition matrix $\Psi(k,q)=\begin{cases}I, &k=q \\ \prod^{k-1}_ {i=q}A(i),& k>q\end{cases}$ and $\Gamma(k,i)=\Psi(k,i+1)B(i)$.

    • Conditioned Mean sequence: $\mu_ \mathcal{X}(k|q)=\Psi(k,q)\mu_ \mathcal{X}(q)+\sum^{k-1}_ {i=q}\Gamma(k,i)\mu_\mathcal{W}(i)$
    • Conditioned Autocovariance Matrix: $$\mathrm{K}_ \mathcal{X}(k,l|q)=\Psi(k,q)S_ \mathcal{X}(q)\Psi^\top(l,q)+\sum^{min\{k,l\}-1}_ {i=q}\Gamma(k,i)S_\mathcal{W}(i)\Gamma^\top(l,i)$$
      • A special case: $S_ \mathcal{X}(k|q)=\mathrm{K}_ \mathcal{X}(k,k|q)=\sum^{k-1}_ {i=q}\Gamma(k,i)S_\mathcal{W}(i)\Gamma^\top(k,i)$, $S_ \mathcal{X}(k|k)=0$
      • Useful equation (stationary centered case): $$\mathrm{K}_ \mathcal{X}(k,l)=\begin{cases} S_ \mathcal{X}(k)\cdot(A^\top)^{(l-k)}&,l>k\\ S_ \mathcal{X}(k)&,l=k\\ A^{(k-l)}S_ \mathcal{X}(k)&,l< k\end{cases}$$
  • Conditional Autocorrelation Matrix: $R_ \mathcal{X}(k,l|q)=\mathrm{K}_ \mathcal{X}(k,l|q)+\mu_ \mathcal{X}(k|q)\mu_ \mathcal{X}^\top(l|q)$

  • Observation $\mathcal{Y}$ property:

    • Mean: $\mu_ \mathcal{Y}(k|q)=C(k)\mu_ \mathcal{X}(k|q)+D(k)\mu_\mathcal{W}(k)$
    • Covariance: $$\mathrm{K}_ \mathcal{Y}(k,l|q)=\begin{cases}C(k)\mathrm{K}_ \mathcal{X}(k,l|q)C^\top(l)+C(k)\Gamma(k,l)S_W(l)D^\top(l)&:k>l \\C(k)S_ \mathcal{X}C^\top(k)+D(k)S_\mathcal{W}(k)D^\top(k)&:k=l\\ C(k)\mathrm{K}_ \mathcal{X}(k,l|q)C^\top(l)+D(k)S_\mathcal{W}(k)\Gamma^\top(l,k)C^\top(l)&:k< l\end{cases}$$
    • Stationary time-invariant covariance: $$\mathrm{K}_ \mathcal{Y}(s)=\begin{cases}CA^{|s|}\bar{S}_ \mathcal{X}C^\top+CA^{|s|-1}B\bar{S}_ WD^\top&:s\neq 0 \\C\bar{S}_ \mathcal{X}C^\top+D\bar{S}_ WD^\top&:s=0\end{cases}$$

Gaussian Stochastic Sequence

  • Jointly Gaussian $\Rightarrow, \nLeftarrow$ Marginally Gaussian
  • $c^\top X$ is Gaussian $\Leftrightarrow X$ is Gaussian
  • Conditional Gaussian: if $X$ and $Y$ are Gaussian, then $X|Y \sim \mathcal{N}(\mu_{X|Y},S_{X|Y})$ where $\mu_{X|Y}=\mu_X+S_{XY}S_Y^{-1}(Y-\mu_Y)$, $S_{X|Y}=S_X-S_{XY}S_Y^{-1}S_{YX}$
  • A linear controllable GMS $\mathcal{X}(k)$ is stationary iff. $A(k)=A, B(k)=B$ (time-invariant) and $A$ is asymptotically stable.
  • All LTI stationary GMS are also ergodic in all finite momoents
  • Solve $\mathcal{X}(k+1)=A\mathcal{X}(k)+B\mathcal{W}(k)$ when $\mathcal{X}$ is stationary ($\max_i|\lambda_i(A)|<1$)
    1. solve $\bar{\mu}_ \mathcal{X}=A\bar{\mu}_ \mathcal{X}+B\bar{\mu}_\mathcal{W}$ for $\bar{\mu}$
    2. solve $\bar{S}_ \mathcal{X}=A\bar{S}_ \mathcal{X}A^\top+B\bar{S}_ \mathcal{W}B^\top$ for $\bar{S}_ \mathcal{X}$
    3. calculate $\Sigma_ \mathcal{X}(k:l|q)=\begin{bmatrix}\mathrm{K}_ \mathcal{X}(k,k|q)&\cdots&\mathrm{K}_ \mathcal{X}(k,l|q)\\ \vdots&\ddots&\vdots\\ \mathrm{K}_ \mathcal{X}(l,k|q)&\cdots&\mathrm{K}_ \mathcal{X}(l,l|q)\end{bmatrix}$ using $\bar{S}_ \mathcal{X}$
    4. Then $f_ \mathcal{X}(k:l;x(k:l))$ is determined with $\mu_ \mathcal{X}(k:l)=\{\bar{\mu}_ \mathcal{X},\bar{\mu}_ \mathcal{X}\ldots\bar{\mu}_ \mathcal{X}\}$ and $\Sigma_ \mathcal{X}(k:l)$ above.

Observation & Filtering

For deterministic version, please check my notes for control system.

  • (LTI) Luenberger observer: $$\begin{align}\hat{\mathcal{X}}(k+1)&=A\hat{\mathcal{X}}(k)+L(\hat{\mathcal{Y}}(k)-\mathcal{Y}(k))+B\bar{\mu}_ \mathcal{W} \\ \hat{\mathcal{Y}}(k)&=C\hat{\mathcal{X}}(k)+D\bar{\mu}_\mathcal{W}\end{align}$$ where $L$ is the observer gain.
    • Note: We often assume that the process & measurement noise are decoupled and independent
    • Combined form: $\hat{\mathcal{X}}(k+1)=[A+LC]\hat{\mathcal{X}}(k)-L\mathcal{Y}(k)+B\mu_\mathcal{W}(k)$
  • State estimation residual $r(k)=\mathcal{X}(k)-\hat{\mathcal{X}}(k)$
    • Combined form: $r(k+1)=[A+LC]r(k)+[B+LD]\tilde{\mathcal{W}}(k)$
    • Stationary covariance can be solved by a Lyapunov equation $$\bar{S}_ r=[A+LC]\bar{S}_ r[A+LC]^T+[LD+B]\bar{S}_ \mathcal{W}[LD+B]^T$$
  • A common objective: minimize $\bar{S}_r=\mathbb{E}(rr^\top)$
    • Solutions: $L=-A\bar{S}_ rC^\top[C\bar{S}_ rC^\top+D\bar{S}_ \mathcal{W}D^\top]^{-1}$ (Kalman observer gain)
    • Or solve (Discrete-time) Algebraic Riccati equation $$\bar{S}_ r=A\bar{S}_ rA^\top+B\bar{S}_ \mathcal{W}B^\top-A\bar{S}_ rC^\top[C\bar{S}_ rC^\top+D\bar{S}_ \mathcal{W}D^\top]^{-1}C\bar{S}_ rA^\top$$
  • Innovation sequence $e(k)=\hat{\mathcal{Y}}(k)-\mathcal{Y}(k)$ with $L$ is optimal
    • We can find that $\mu_e=0$ and $\mathrm{K}_e(k+s,k)=\begin{cases}C\bar{S}_rC^\top+D\bar{S}_WD^T&:s=0 \\0&:s\neq0\end{cases}$. So the innovation sequence is iid. (only in Kalman observer)
  • (Output) Probabilistically-equivalent model: $$\begin{align}\mathcal{X}(k+1)&=A\mathcal{X}(k)+Le(k) \\ \mathcal{Y}(k)&=C\mathcal{X}(k)-e(k)\end{align}$$

Markov Chains

Content in this section comes from EECS 501 In this specific field, we often use stand-alone analysis methods.

Basic definitions

  • State distribution: We denote row vector $\pi_t$ as $\pi_t(x)=\mathbb{P}(\mathcal{X}_t=x),\; x\in S$ ($S$ is the set of states). Directly we have $\sum_x\pi(x)=1$
  • Time homogeneous: $\mathbb{P}(\mathcal{X}_ {t+1}=y|\mathcal{X}_ t=x)=\mathbb{P}(\mathcal{X}_{s+1}=y|\mathcal{X}_s=x)\;\forall s,t$
  • One-step transition probability matrix: $P_t=[P_{xy,t}]$ where $P_{xy,t}=\mathbb{P}(\mathcal{X}_ {t+1}=y|\mathcal{X}_ t=x)$.
    • Time-homo case: $P=[P_{xy}]$ where $P_{xy}=p(y|x)$
    • Rows of the matrix sum up to 1.
    • This matrix is also called stochastic matrix.
    • Extend this matrix to continuous states, then we use transition kernel $T(x,y)$ to describe the transition probability,
  • m-step transition probability matrix: $P_{xy,t}^{(m)}=\mathbb{P}(\mathcal{X}_ {t+m}=y|\mathcal{X}_ t=x)$
    • Chapman-Kolmogorov Equation: $P^{(n+m)}_t=P^{(n)}_t P^{(m)}_t$

State Variables

  • Hitting time: $T_1(y)=\min\{n\geq 0:\mathcal{X}_ n=y\},\;T_k(y)=\min\{n>T_{k-1}(y):\mathcal{X}_ n=y\}$ where $n\in\mathbb{N}$
    • Period: For state $i\in x$, its period is the greatest common divisor of $\{n>1|T_n(i)>0\}$. A Markov Chain is aperiodic If all states have period 1.
  • Return probability: $f_{xy}=\mathbb{P}(T_1(y)<\infty|\mathcal{X}_0=x)$
  • Occupation time: $V(y)=\sum^\infty_{n=1}\unicode{x1D7D9}_{\mathcal{X_n}}(y)$
  • Some properties
    • $f_{xy}=\mathbb{P}(V(y)\geqslant 1|\mathcal{X}_0=x)$
    • $\mathbb{P}(V(y)=m|\mathcal{X}_ 0=x)=\begin{cases} 1-f_{xy}&:m=0\\f_{xy}f_{yy}^{m-1}(1-f_{yy})&:m\geqslant 1\end{cases}$
    • $\mathbb{E}[ V(y)|\mathcal{X}_ 0=x]=\begin{cases} 0&:f_{xy}=0\\\infty&:f_{xy}>0,\ f_{yy}=1\\f_{xy}/(1-f_{yy})&:f_{xy}>0,\ f_{yy}<1\end{cases}$
    • $\mathbb{E}[ V(y)|\mathcal{X}_ 0=x]=\sum^\infty_{n=1}P^{(n)}_{xy}$

State Classification

Here we usually consider only time-homogeneous Markov Chains

  • Accessible ($x\to y$): $\exists n\;\text{s.t.}\ P_{xy}^{(n)}>0$
    • $x\to y \Leftrightarrow f_{xy}>0$
  • Communicate ($x\leftrightarrow y$): $x\to y\;\text{and}\;y\to x$. This is a equivalence relation.
  • Equivalent class: set of states that communicate with each other
  • Irreducible: a Markov chain with only one communicating class
  • Absorbing/Closed state: $P_{xx}=1$
    • Absorbing class: set $C$ is absorbing iff $\forall x \in C, \sum_{y\in C}P_{xy}=1$
  • Transient state: $f_{xx}<1 \Leftrightarrow \mathbb{E}[V_i|\mathcal{X}_0=i]<\infty$
  • Recurrent state: $f_{xx}=1 \Leftrightarrow \mathbb{E}[V_i|\mathcal{X}_0=i]=\infty$
    • Positive recurrent: $\mathbb{E}[T_1(x)|\mathcal{X}_0=x]<0$
    • Null recurrent: $\mathbb{E}[T_1(x)|\mathcal{X}_0=x]=0$
    • These two kind of recurrent states also make up communicating classes
  • Some properties
    • If $x$ is (positive) recurrent and $x\to y$, then $y$ is also (positive) recurrent and $f_{xy}=f_{yx}=1$
    • Every closed and finite subset of $X$ contains at least one (positive) recurrent state.
    • All states of a communicating class are either positive recurrent, null recurrent or transient.
    • Method to determine whether class $C$ is recurrent/transient
      1. If $C$ is non-closed, it’s transient
      2. If $C$ is closed and finite, then $C$ is positive recurrent
      3. If $C$ is closed and infinite, then $C$ can be either positive/null recurrent or transient

    A typical example is the birth-death chain

  • Stationary distribution: $\bar{\pi}\;\text{s.t.}\;\bar{\pi}=\bar{\pi} P$
    • Stationary in limit form: $\bar{\pi}=\lim_{n\to \infty} \frac{1}{n}\sum^{n-1}_{t=0} \pi_t$
      • This is a Cesaro limit, we use this to deal with the problem that $\lim_{n\to \infty} \pi_t$ might not exist if the chain is periodic.
    • Reversibility criterion for stationary: $\forall i,j \in S, \pi_i P_{ij} = \pi_j P_{ji}$ (a.k.a detailed balance condition), then the process is reversible and therefore stationary.
    • Existence for stationary distribution satisfying $\bar{\pi}=\bar{\pi} P$
      1. If the chain has single positive recurrent class, then exists unique solution: $\bar{\pi}(x)=0$ for all transient or null recurrent $x$
      2. If the chain has multiple positive recurrent class, then there are multiple solutions, for each positive recurrent $x$ we have a $\bar{\pi}^i$.
      3. If the chain has only transient and null recurrent states, there is no solution$
    • Convergence of stationary distribution
      • If the chain has positive recurrent class, then $\frac{1}{n}\sum^n_{t=1}\mathbb{P}(\mathcal{X}_t=j)\xrightarrow[n\to \infty]{}\bar{\pi}_j,\;\forall \mu_0$
      • (Ergodic) If the chain is positive recurrent, then $\frac{1}{n}\sum^n_{t=1}\unicode{x1D7D9}_{\mathcal{X}_t}(j)\xrightarrow[n\to \infty]{\text{a.s.}}\bar{\pi}_j$
      • If the chain is positive recurrent and aperiodic, then $\mathbb{P}(\mathcal{X}_n=j)\xrightarrow[n\to \infty]{}\bar{\pi}_j$

Continuous-Time Stochastic System

Stochastic Sequences

  • Definition: Given $t\in\mathbb{T}\subset\mathbb{R}$ a sequence of time, $\mathcal{X}(t,\omega): (\Omega,\mathcal{F},\mathbb{P})\to(\mathbb{R}^n,\mathcal{F}_ \mathcal{X},\mathbb{P}_ \mathcal{X})$ is a continuous stochastic sequence.
  • Most definitions are similar to discrete sequence (including stationarity), while the sequence is often defined as $\mathcal{X}(\mathcal{G}),\;\mathcal{G}={t_1,t_2,\ldots,t_N}\subset\mathbb{T}, t_i< t_{i+1}$
  • Stochastic Differential Equation (SDE): $$\mathrm{d}\mathcal{X}(t)=F(\mathcal{X}(t),t)\mathrm{d}t+\sum^r_{i=1}G_i(\mathcal{X}(t),t)\mathrm{d}\mathcal{W}_i(t)$$ Here $\mathcal{W}$ is often a certain kind of noise random process.

Markov Sequence

  • Markov Sequence: A ss. $\mathcal{X}(k)$ is called a (continuous-time) Markov sequence if for the set of times $\mathcal{G}={t_1,t_2,\ldots,t_N}$ with $t_i < t_{i+1}$, we have $$\begin{split}f_ \mathcal{X}\left(t_N;x(t_N)\middle|\mathcal{X}(t_{N-1})=x(t_{N-1}),\mathcal{X}(t_{N-2})=x(t_{N-2}),\ldots\right)\qquad \\ =f_ \mathcal{X}(t_N;x(t_N)|\mathcal{X}(t_{N-1})=x(t_{N-1}))\end{split}$$
  • Hidden Markov Model: Definition similar to discrete case. A continuous-time stochastic state space is of the form $$\begin{align}\dot{\mathcal{X}}(t)&=F(\mathcal{X},t)+G(\mathcal{X},t)\mathrm{d}\mathcal{W}(t)/\mathrm{d}t \\ \mathcal{Y}(t)&=H(\mathcal{X},t)+J(\mathcal{X},t)\mathrm{d}\mathcal{W}(t)/\mathrm{d}t\end{align}$$ where $\mathcal{W}$ is usually Wiener process and white noise $\mathrm{d}\mathcal{W}(t)/\mathrm{d}t$ is often written as $\mathcal{U}(t)$.
    • The state space is affine if $F(\mathcal{X}(t),t)=A(t)\mathcal{X}(t)+u(t),\;N(\mathcal{X},t)=C(t)\mathcal{X}(t)+v(t)$
    • The state space is bilinear if $G(\mathcal{X},t)\mathrm{d}\mathcal{W}(t)=\sum^r_{i=1}B_i(t)\mathcal{X}\mathrm{d}\mathcal{W}_i(t)$

Poisson Counters

  • Definition: It’s a stochastic Markow process $\mathcal{N}(t)\in\{0,1,2,\ldots\}$ with the characteristic that it jumps up by only one integer at a time.
    • Characteristics: transition times are random, transition step is always 1.
  • Transition rule: $\frac{\partial}{\partial t}p_\mathcal{N}=\begin{cases}-\lambda p_\mathcal{N}(t;n)&:n=0\\-\lambda p_\mathcal{N}(t;n)+\lambda p_\mathcal{N}(t;n-1)&:n>0\end{cases}$
    • From the rule we can conclude that $p_\mathcal{N}(t;n)=\frac{1}{n!}(\lambda t)^n e^{-\lambda t}$
  • Bidirectional counter: $\mathcal{N}(t)$ is defined as one-directional. $\mathcal{N}_1(t)-\mathcal{N}_2(t)$ is called bidirectional poisson counter.
  • Expectation: $\mathbb{E}[\mathcal{N}(t)]=\lambda t,\;\mathbb{E}[\mathrm{d}\mathcal{N}(t)]=\lambda\mathrm{d}t$
  • Ito calculus for Poisson counter:
    • Ito sense: $n(t)$ is a realization of poisson counter $\mathcal{N}$. $x(t)$ is a solution in Ito sense to $\mathrm{d}x(t)=F(x(t),t)\mathrm{d}t+G(x(t),t)\mathrm{d}n(t)$ if
      1. On all intervals where $n(t)$ is constant, $\dot{x}(t)=F(x(t),t)$
      2. If $n(t)$ jumps at time $t_1$, $\lim_{t\to t_1^+}x(t)=\lim_{t\to t_1^-}x(t)+G\left(\lim_{t\to t_1^+}x(t),t_1\right)$
    • Ito rule: Given a fuction $\psi(\mathcal{X}(t),t)$, taking Taylor expansion we have $$\mathrm{d}\psi=\left\{\frac{\partial\psi}{\partial t}+(\nabla_\mathcal{X}\psi)F(\mathcal{X},t)\right\}\mathrm{d}t+\sum^r_{i=1}\left\{\psi\left(\mathcal{X}+G_i(\mathcal{X},t),t\right)-\psi(\mathcal{X},t)\right\}\mathrm{d}\mathcal{N}_i(t)$$

Wiener-Process

  • Definition (Brownian Motion): It’s a bidirectional poisson counter with infinite rate, i.e. $\mathcal{W}=\lim_{\lambda\to\infty}\frac{1}{\sqrt(\lambda)}(\mathcal{N}_1-\mathcal{N}_2)$ where $\mathcal{N}_1,\;\mathcal{N}_2$ have rate $\lambda/2$
    • Characteristic: It’s a Gaussian with zero mean and variance $t$
    • Note: Actual Wiener Process has stronger continuity property than Brownian motion.
  • Expectation: $\mathbb{E}[\mathcal{W}]=\mathbb{E}[d\mathcal{W}]=0,\;\mathbb{E}[\mathcal{W}(\tau)\mathcal{W}(t)]=\mathbb{E}[\mathcal{W}^2(\min\{t,\tau\})]=\min\{t,\tau\}$
  • Principle of independent increments: If the interval $[r,t), [\sigma,s)$ don’t overlap, then $\mathcal{W}(t)-\mathcal{W}(\tau)$ and $\mathcal{W}(s)-\mathcal{W}(\sigma)$ are uncorrelated.
  • Ito calculus for Wiener Process
    • Ito rule: Given a fuction $\psi(\mathcal{X}(t),t)$, taking Taylor expansion we have $$\begin{split}\mathrm{d}\psi=\frac{\partial\psi}{\partial t}\mathrm{d}t+(\nabla_\mathcal{X}\psi)F(\mathcal{X},t)\mathrm{d}t+\sum^r_{i=1}\left(\nabla_\mathcal{X}\psi(\mathcal{X},t)\right)G_i(\mathcal{X},t)\mathrm{d}\mathcal{W}_ i \\ -\sum^r_{i=1}\frac{1}{2}G_i^\top(\mathcal{X},t)\left(\mathrm{H}_\mathcal{X}\psi(\mathcal{X},t)\right)G_i(\mathcal{X},t)\mathrm{d}t\end{split}$$

      Note $\nabla_{\mathcal{X}}=\begin{bmatrix}\frac{\partial\psi}{\partial x_1}&\cdots&\frac{\partial\psi}{\partial x_n}\end{bmatrix}$ and $\mathrm{H}_{\mathcal{X}}=\begin{bmatrix}\frac{\partial^2\psi}{\partial x_1^2}&\cdots&\frac{\partial^2\psi}{\partial x_1\partial x_n}\\ \vdots&\ddots&\vdots \\ \frac{\partial^2\psi}{\partial x_n\partial x_1}&\cdots&\frac{\partial^2\psi}{\partial x_n^2}\end{bmatrix}$ are the gradient and Hessian operator respectively. By default, the operator target is $\mathcal{X}$ is not noted.

  • White noise: It is Guassian distributed stationary stochastic process with $\mu_\mathcal{U}(t)=0,\;\mathrm{K}_ \mathcal{U}(t,\tau)=\Phi_ \mathcal{U}\delta(t-\tau)$, where $\Phi_\mathcal{U}$ is spectral intensity.
    • We often consider white noise as derivative of Wiener process: $\mathcal{U}\sim\mathrm{d}\mathcal{W}/\mathrm{d}t$.

Linear Stochastic Sequence

$$\begin{align}\mathrm{d}\mathcal{X}(t)&=\{A(t)\mathcal{X}(t)+u(t)\}\mathrm{d}t+B(k)\mathrm{d}\mathcal{W}(t) \\ \mathcal{Y}(t)&=C(k)\mathcal{X}(k)+D(k)\mathcal{W}(k)\end{align}$$

  • Differential equation of expectations
    • Mean: $\frac{\mathrm{d}}{\mathrm{d}t}\mu_\mathcal{X}(t)=A(t)\mu_\mathcal{X}(t)$
    • Autocovariance: $\frac{\mathrm{d}}{\mathrm{d}t}S_\mathcal{X}(t)=A(t)S_\mathcal{X}(t)+S_\mathcal{X}A^\top(t)+B(t)B^\top(t)$ (called Lyapunov differential equation)
  • Stationary LTI case
    • Useful equation: $\mathrm{R}_ \mathcal{X}(t,\tau)=\begin{cases} S_ \mathcal{X}\exp\{A^\top(\tau-t)\}&,\tau>t \\ \exp\{A(t-\tau)\}S_ \mathcal{X}&,\tau< t\end{cases}$
    • (Continuous-time) algerbraic Lyapunov equation: $A\bar{S}_ \mathcal{X}+\bar{S}_\mathcal{X}A^\top+BB^\top=0$

Nonlinear Stochastic Sequence

  • The Fokker-Planck Equation (FPE): Consider the Wiener-process excited general SDE, we have $$\frac{\partial f_ \mathcal{X}(x;t)}{\partial t}=-\nabla\left(F(\mathcal{X})f_ \mathcal{X}(x;t)\right)+\frac{1}{2}\mathrm{tr}\left[\mathrm{H}\left(\Gamma(\mathcal{X})f_ \mathcal{X}(x;t)\right)\right]$$ here $\Gamma(\mathcal{X})=G(\mathcal{X})G^\top(\mathcal{X})$

    • $\mathcal{X}$ is stationary means $\frac{\partial f_\mathcal{X}(x;t)}{\partial t}=0$
  • Gaussian closure: An approximate solution for FPE is supposing $f_\mathcal{X}$ as multivariate Gaussian with some $S_\mathcal{X}$ and ${\mu_\mathcal{X}}$. That is we focus on 1st-order and 2nd-order estimation. Denote the estimated distribution as $\hat{f}_\mathcal{X}(x;t)$

    • Estimated expectation: $\hat{\mathbb{E}}\phi(\mathcal{X})=\int\cdots\int\phi(x)\hat{f}_\mathcal{X}(x;t)\mathrm{d}x$

    • Solution:$$\begin{split}h(x,t)=-\nabla F+\tilde{x}^\top S^{-1}F-(\nabla\Gamma)S^{-1}\tilde{x}+\frac{1}{2}\mathrm{tr}\left[\mathrm{H}\Gamma \right] \\ +\frac{1}{2}\mathrm{tr}\left[\left(-S^{-1}+S^{-1}\tilde{x}\tilde{x}^\top S^{-1}\right)\Gamma\right]\end{split}$$ and $$\begin{align}\dot{\mu}_\mathcal{X}(t)&=S(t)\hat{\mathbb{E}}[\nabla^\top h(x)] \\ \dot{S} _\mathcal{X}(t)&=S(t)\hat{\mathbb{E}}[\mathrm{H} h(x)]S(t)\end{align}$$

    • Useful simplification: If $G$ is constant (independent of $\mathcal{X}$), then the ODE of $\dot{\mu}$ and $\dot{S}$ become $$\begin{align}\dot{\mu}_ \mathcal{X}(t)&=\hat{\mathbb{E}}[F(\mathcal{X})] \\ \dot{S} _\mathcal{X}(t)&=\hat{A}^\top S+S\hat{A}+\Gamma\end{align}$$ where $\hat{A}=\hat{\mathbb{E}}\left[\frac{\partial F}{\partial\mathcal{X}}\right]$

      • Equivalent linearization (Quasi-linearization): In the estimation, $\mathcal{X}$ evolves equivalently to $\mathrm{d}\mathcal{X}(t)=\hat{A}(t)\mathcal{X}(t)\mathrm{d}t+G\mathrm{d}\mathcal{W}(t)$
    • No guaranteed bounds generally exist for the estimation error $$e(\mathcal{X},t)=\left(\frac{\partial\hat{f}_ \mathcal{X}}{\partial t}\right)-\left(-\nabla\left(F\hat{f}_ \mathcal{X}\right)+\frac{1}{2}\mathrm{tr}\left[\mathrm{H}\left(\Gamma\hat{f}_ \mathcal{X}\right)\right]\right)$$

    • Estimation of $\mu(t),\;S(t)$ could also have error. And stationarity may not be the consistent between original system and estimated system

Spectral Analysis

Content in this section comes from MECHENG 549

Power Spectral Density

  • Definition: The power spectral density (PSD) of stochastic process $\mathcal{Y}$ is $$\Phi_\mathcal{Y}(\omega)\equiv\mathbb{E}\left[ \lim_{T\to\infty}\frac{1}{2T}Y_T(\omega)Y_T^\top(\omega)\right]$$ where $Y_T(\omega)$ is the Fourier transform of centered process $\mathcal{Y}(t)$.
    • White noise has the same PSD for whatever $\omega$
  • Wiener-Khinchin Theorem: $\Phi_\mathcal{Y}(\omega)=\int^\infty_{-\infty}e^{-i\omega\theta}\bar{\mathrm{K}}_\mathcal{Y}(\theta)\mathrm{d}\theta$
  • Signal propagation: If the system $P$ has input $\mathcal{Y}$ and output $\mathcal{Z}$, then $\Phi_\mathcal{Z}=|P(i\omega)\Phi_\mathcal{Y}(\omega)P^\top(i\omega)|$ where $P(i\omega)$ is the Fourier transform of the system.
  • Cross spectrum: If the system $G$ has input $\mathcal{Y}$ and output $\mathcal{Z}$, then the cross-spectrum is $\Phi_{\mathcal{ZY}}(\omega)=G(i\omega)\Phi_\mathcal{Y}(\omega)$

Periodograms

  • Definition: We want to estimate $\Phi_{\mathcal{Y}}(\omega)$ without the expectation, then we use $$Q_T(\omega,y)=\frac{1}{2T}\left|\int^T_{-T}e^{-i\omega t}y(t)\mathrm{d}t\right|^2$$ where $y$ is a sample realization of $\mathcal{Y}$.
    • We also use window functions to calculate the spectrum over a finite time interval (using Bartlett’s procedure)

Stochastic realizations

We want to find a stochastic model which gives the same spectrum given. This is called stochastic realization problem. We focus on scalar LTI case.

If we know $P(s)=C[sI-A]^{-1}B+D$ (the Laplace transform of LTI system) and $P(s)=c\frac{\prod^m_{k=1}(s-z_k)}{\prod^n_{k=1}(s-p_k)}$ for some $m\leq n$ and real constant $c$. Then we have $$\Phi_\mathcal{Y}(\omega)=P(i\omega)P(-i\omega)=c^2\frac{\prod^m_{k=1}(\omega^2+z_k^2)}{\prod^n_{k=1}(\omega^2+p_k^2)}$$

Lemma: For any valid PSD, there exists a spectral factorization $\Phi_\mathcal{Y}(\omega)=\frac{\sum^m_{k=0}a_k(\omega^2)^k}{\sum^n_{k=0}b_k(\omega^2)^k}=P(i\omega)P(-i\omega)$ where $P(s)$ is an asymptotically stable n-th order transfer function, iff

  1. $\Phi_\mathcal{Y}(\omega)$ is a ratio of polynomials of $\omega^2$
  2. All coefficients $a_k$ and $b_k$ are real
  3. The denominator has no positive real roots in $\omega^2$
  4. Any positive real roots in the numerator have even multiplicity

Notes for math typing in Hexo

  1. Escape \ by \\. Especially escape { by \\{ instead of \{, and escape \\ by \\\\.
  2. Be careful about _, it’s used in markdown as italic indicator. Add space after _ is a useful solution.
  3. Some useful Mathjax tricks at StackExchange
  4. Several capital Greek characters should directly use its related Latin alphabet with \mathrm command.

Although I have migrated to Hugo, some tricks might still be relevant.

Built with Hugo
Theme Stacked designed by Jimmy, modified by Jacob