Table of contents
Prerequisites: Knowledge of Elementary Calculus, Linear Algebra and Probability
Discrete-Time Stochastic System
Stochastic Sequences
- Definition: Given $k\in\mathbb{K}\subseteq\mathbb{Z}$ a sequence of integers, $\mathcal{X}(k,\omega): (\Omega,\mathcal{F},\mathbb{P})\to(\mathbb{R}^n,\mathcal{F}_ \mathcal{X},\mathbb{P}_ \mathcal{X})$ is a random/stochastic sequence.
- Uncertainties: Consider a casual system $F$ relates some scalar inputs $u(k)$ to output $x(k)$
- Epistemic/Model uncertainty: $\mathcal{X}(k,\omega)=F(k,u(k),u(k-1),\ldots,\omega)$. (system is stochastic and input is deterministic).
- Aleatoric/Input uncertainty: $\mathcal{X}(k,\omega)=f(k,U(k,\omega),u(k-1,\omega),\ldots)$ (system is deterministic and input is stochastic).
- Realization: An outcome $\mathcal{X}(k,\omega)=x(k)$ given $\omega$ is called a realization of stochastic sequence $\mathcal{X}$
- Terminology and Convention
- $\mathcal{X}(k,\omega)$ is often written as $\mathcal{X}(k)$ when there’s no ambiguity.
- $\mathbb{K}=\mathbb{Z}$ if not specified.
- Sequence over a set $\mathcal{K}_1\subseteq\mathbb{K}$ are denoted $\mathcal{X}(\mathcal{K}_1)$.
- $\mathcal{X}$ denotes $\mathcal{X}(\mathbb{K})$ if not specified.
- Consecutive subsequence: $$\mathcal{X}(k:l)=\{\mathcal{X}(k),\mathcal{X}(k+1),\ldots,\mathcal{X}(l)\},\;x(k:l)=\{x(k),x(k+1),\ldots,x(l)\}$$
- Abbreviations:
- SS - stochastic sequence
- IID - independent indentically distributed
Probabilistic characterization
- Distribution and density: $$F_ \mathcal{X}\left(k:l;x(k:l)\right)\equiv\mathbb{P}((\mathcal{X}_ i(k)\leqslant x_i(k))\cap\cdots\cap(\mathcal{X}_ i(l)\leqslant x_ i(l)),\;i=1\ldots n)$$ $$f_ \mathcal{X}\left(k:l;x(k:l)\right)\equiv \frac{\partial^{n(l-k+1)}}{\partial x_ 1(k)\cdots\partial x_ n(l)}F_ \mathcal{X}(k:l;x(k:l))$$ Here $k:l$ actually denotes a set of consecutive integers, it can be also changed to ordinary sets $\{k,l\}$ or single scalar $k$.
- Ensemble Average: $\mathbb{E}[\psi(\mathcal{X}(k))]$, doing summation over different realization at same time $k$
- Mean: $\mu_ \mathcal{X}(k)\equiv\mathbb{E}[\mathcal{X}(k)]=\int^\infty_{-\infty}x(k)f_ \mathcal{X}(k;x(k))\mathrm{d}x(k)$
- Conditional Mean: $\mu_ \mathcal{X}(l|k)\equiv\mathbb{E}[\mathcal{X}(l)|\mathcal{X}(k)=x(k)]$
- Time Average: $\frac{1}{2N+1}\sum^N_{k=-N}\psi(\mathcal{X}(k))$, doing summation over different time k of same realization
- Autocorrelation:
- Scalar case: $r_ \mathcal{X}(k,l)\equiv\mathbb{E}[\mathcal{X}(k)\mathcal{X}(l)]=\int^\infty_{-\infty}\int^\infty_{-\infty}x(k)x(l)f_ \mathcal{X}(k,l;x(k,l))\mathrm{d}x(k)\mathrm{d}x(l)$
- Vector case: $R_ \mathcal{X}(k,l)\equiv\mathbb{E}[\mathcal{X}(k)\mathcal{X}^\top(l)]$
- Conditional autocorrelation: $R_ \mathcal{X}(k,l|q)\equiv\mathbb{E}[\mathcal{X}(k)\mathcal{X}^\top(l)|\mathcal{X}(q)=x(q)]$
- Often we denote $C_ \mathcal{X}(k)=R_ \mathcal{X}(k,k)$
- Autocovariance:
- Scalar case: $\kappa_ \mathcal{X}(k,l)\equiv\mathbb{E}[(\mathcal{X}(k)-\mu_ \mathcal{X}(k))(\mathcal{X}(l)-\mu_ \mathcal{X}(l))]$
- Vector case: $\mathrm{K}_ \mathcal{X}(k,l)\equiv\mathbb{E}[(\mathcal{X}(k)-\mu_ \mathcal{X}(k))(\mathcal{X}(l)-\mu_ \mathcal{X}(l))^\top]$
- Conditional autocovariance: $\mathrm{K}_ \mathcal{X}(k,l|q)\equiv\mathbb{E}[(\mathcal{X}(k)-\mu_ \mathcal{X}(k|q))(\mathcal{X}(l)-\mu_ \mathcal{X}(l|q))^\top]$
- Often we denote $S_ \mathcal{X}(k|q)=\mathrm{K}_ \mathcal{X}(k,k|q)$
- Useful conclusion: $\mathrm{K}(a,b)=\mathrm{K}(b,a)^T$
- Normalized (autocorrelation coefficient): $\rho_ \mathcal{X}(k,l)\equiv\mathrm{K}_ \mathcal{X}(k,l)/\sigma^2_{\mathcal{X}(k)}\sigma^2_{\mathcal{X}(l)}$
- Strong Stationarity(aka. strict sense): (necessarily identically distributed over time) $$\forall x(k:l)\in\mathbb{R}^{n(l-k+1)},\;\forall s\in\mathbb{Z},\;f_ \mathcal{X}(k:l;x(k:l))=f_ \mathcal{X}(k+s:l+s;x(k:l))$$
- Weak Stationarity(aka. wide sense): $\forall k,l$ if $$\mu_ \mathcal{X}(k)=\mu_ \mathcal{X}(l)\;\text{and}\;\mathrm{K}_ \mathcal{X}(k,l)=\mathrm{K}_ \mathcal{X}(k+s,l+s)\equiv\bar{\mathrm{K}}_ \mathcal{X}(s)$$ Weak stationarity is necessary condition for stationarity. (Equal when Gaussian distributed)
- Ergodicity: $\mathcal{X}$ is called ergodic in $\psi$ if
- $\mathbb{E}[\psi(\mathcal{X})]$ is stationary
- Ensemble average is equal to Time average, that is $$ \frac{1}{2N+1}\sum^N_{k=-N}\psi(\mathcal{X}(k))\to\mathbb{E}[\psi(\mathcal{X})];\text{as};l\to \infty$$
Markov Sequence
- Markov Sequence: A ss. $\mathcal{X}(k)$ is called a (discrete-time) Markov sequence if $$\begin{split}f_ \mathcal{X}\left(k;x(k)\middle|\mathcal{X}(k-1)=x(k-1),\mathcal{X}(k-2)=x(k-2),\ldots\right) \\=f_ \mathcal{X}(k;x(k)|\mathcal{X}(k-1)=x(k-1))\end{split}$$
- We often make some assumption on the initial condition $\mathcal{X}(0)$, such as known, deteministic or uniformly distributed within certain domain.
- Markov Chains: Markov sequence with discrete set of values(states) ${x_1\ldots x_m}$
- Hidden Markov model: Sequence $\mathcal{Y}$ is called a Hidden Markov Model if it’s modeled by a system of the form $$\begin{align}\mathcal{X}(k+1)&=g(k,\mathcal{X}(k),\mathcal{W}(k)) \\ \mathcal{Y}(k)&=h(k,\mathcal{X}(k),\mathcal{W}(k))\end{align}$$ We also say that $\mathcal{Y}$ has a (discrete-time) stochastic state space.
- Guassian-Markov Sequence (GMS): $\mathcal{X}(k+1)=g(k,\mathcal{X}(k),\mathcal{W}(k))$ where $\mathcal{W}(k)$ is iid. Guassian
Linear Stochastic Sequence
$$\begin{align}\mathcal{X}(k+1)&=A(k)\mathcal{X}(k)+B(k)\mathcal{W}(k) \\ \mathcal{Y}(k)&=C(k)\mathcal{X}(k)+D(k)\mathcal{W}(k)\end{align}$$
For linear Markov sequences, the deterministic mean sequence and centered uncertain sequence completely decouple. So we often assume that $\mathcal{X}(k)$ and $\mathcal{Y}(k)$ are centered in this case, with regard to deterministic inputs. The equation with deterministic inputs is often written as $$\mathcal{X}(k+1)=A(k)\mathcal{X}(k)+B_u(k)u(k)+B_\mathcal{W}(k)\mathcal{W}(k)$$
Recursive-form expectations:
- Mean: $\mu_ \mathcal{X}(k+1|q)=A(k)\mu_ \mathcal{X}(k|q)+B(k)\mu_\mathcal{W}(k)$
- Covariance (Discrete-time algebraic Lyapunov/Stein difference equation): $$S_ \mathcal{X}(k+1|q)=A(k)S_ \mathcal{X}(k|q)A^\top(k)+B(k)S_\mathcal{W}(k)B^\top(k)$$
Can be solved with
dlyap
in MATLAB
Convergence when $k\to\infty$:
- Mean convergence: $\mu_ \mathcal{X}(k|q)$ converges requires $\max_i|\lambda_i(A)|<1$ ($\lambda$ denotes eigenvalue)
- Covariance convergence: $S_ \mathcal{X}(k|q)$ converges requires $\max_i|\lambda_i(A)|<1$
- (Discrete-time) Lyapunov equation (Stein equation): $\bar{S}_ \mathcal{X}=A\bar{S}_ \mathcal{X}A^\top+B\bar{S}_\mathcal{W}(k)B^\top$. Solution for this equation exists iff. $A$ is asymptotically stable (characterizing sequence is stationary).
Explicit state transition: By recursive substitution, $$\mathcal{X}(k)=\Psi(k,q)\mathcal{X}(q)+\sum^{k-1}_ {i=q}\Gamma(k,i)\mathcal{W}(i)$$ where state transition matrix $\Psi(k,q)=\begin{cases}I, &k=q \\ \prod^{k-1}_ {i=q}A(i),& k>q\end{cases}$ and $\Gamma(k,i)=\Psi(k,i+1)B(i)$.
- Conditioned Mean sequence: $\mu_ \mathcal{X}(k|q)=\Psi(k,q)\mu_ \mathcal{X}(q)+\sum^{k-1}_ {i=q}\Gamma(k,i)\mu_\mathcal{W}(i)$
- Conditioned Autocovariance Matrix: $$\mathrm{K}_ \mathcal{X}(k,l|q)=\Psi(k,q)S_ \mathcal{X}(q)\Psi^\top(l,q)+\sum^{min\{k,l\}-1}_ {i=q}\Gamma(k,i)S_\mathcal{W}(i)\Gamma^\top(l,i)$$
- A special case: $S_ \mathcal{X}(k|q)=\mathrm{K}_ \mathcal{X}(k,k|q)=\sum^{k-1}_ {i=q}\Gamma(k,i)S_\mathcal{W}(i)\Gamma^\top(k,i)$, $S_ \mathcal{X}(k|k)=0$
- Useful equation (stationary centered case): $$\mathrm{K}_ \mathcal{X}(k,l)=\begin{cases} S_ \mathcal{X}(k)\cdot(A^\top)^{(l-k)}&,l>k\\ S_ \mathcal{X}(k)&,l=k\\ A^{(k-l)}S_ \mathcal{X}(k)&,l< k\end{cases}$$
Conditional Autocorrelation Matrix: $R_ \mathcal{X}(k,l|q)=\mathrm{K}_ \mathcal{X}(k,l|q)+\mu_ \mathcal{X}(k|q)\mu_ \mathcal{X}^\top(l|q)$
Observation $\mathcal{Y}$ property:
- Mean: $\mu_ \mathcal{Y}(k|q)=C(k)\mu_ \mathcal{X}(k|q)+D(k)\mu_\mathcal{W}(k)$
- Covariance: $$\mathrm{K}_ \mathcal{Y}(k,l|q)=\begin{cases}C(k)\mathrm{K}_ \mathcal{X}(k,l|q)C^\top(l)+C(k)\Gamma(k,l)S_W(l)D^\top(l)&:k>l \\C(k)S_ \mathcal{X}C^\top(k)+D(k)S_\mathcal{W}(k)D^\top(k)&:k=l\\ C(k)\mathrm{K}_ \mathcal{X}(k,l|q)C^\top(l)+D(k)S_\mathcal{W}(k)\Gamma^\top(l,k)C^\top(l)&:k< l\end{cases}$$
- Stationary time-invariant covariance: $$\mathrm{K}_ \mathcal{Y}(s)=\begin{cases}CA^{|s|}\bar{S}_ \mathcal{X}C^\top+CA^{|s|-1}B\bar{S}_ WD^\top&:s\neq 0 \\C\bar{S}_ \mathcal{X}C^\top+D\bar{S}_ WD^\top&:s=0\end{cases}$$
Gaussian Stochastic Sequence
- Jointly Gaussian $\Rightarrow, \nLeftarrow$ Marginally Gaussian
- $c^\top X$ is Gaussian $\Leftrightarrow X$ is Gaussian
- Conditional Gaussian: if $X$ and $Y$ are Gaussian, then $X|Y \sim \mathcal{N}(\mu_{X|Y},S_{X|Y})$ where $\mu_{X|Y}=\mu_X+S_{XY}S_Y^{-1}(Y-\mu_Y)$, $S_{X|Y}=S_X-S_{XY}S_Y^{-1}S_{YX}$
- A linear controllable GMS $\mathcal{X}(k)$ is stationary iff. $A(k)=A, B(k)=B$ (time-invariant) and $A$ is asymptotically stable.
- All LTI stationary GMS are also ergodic in all finite momoents
- Solve $\mathcal{X}(k+1)=A\mathcal{X}(k)+B\mathcal{W}(k)$ when $\mathcal{X}$ is stationary ($\max_i|\lambda_i(A)|<1$)
- solve $\bar{\mu}_ \mathcal{X}=A\bar{\mu}_ \mathcal{X}+B\bar{\mu}_\mathcal{W}$ for $\bar{\mu}$
- solve $\bar{S}_ \mathcal{X}=A\bar{S}_ \mathcal{X}A^\top+B\bar{S}_ \mathcal{W}B^\top$ for $\bar{S}_ \mathcal{X}$
- calculate $\Sigma_ \mathcal{X}(k:l|q)=\begin{bmatrix}\mathrm{K}_ \mathcal{X}(k,k|q)&\cdots&\mathrm{K}_ \mathcal{X}(k,l|q)\\ \vdots&\ddots&\vdots\\ \mathrm{K}_ \mathcal{X}(l,k|q)&\cdots&\mathrm{K}_ \mathcal{X}(l,l|q)\end{bmatrix}$ using $\bar{S}_ \mathcal{X}$
- Then $f_ \mathcal{X}(k:l;x(k:l))$ is determined with $\mu_ \mathcal{X}(k:l)=\{\bar{\mu}_ \mathcal{X},\bar{\mu}_ \mathcal{X}\ldots\bar{\mu}_ \mathcal{X}\}$ and $\Sigma_ \mathcal{X}(k:l)$ above.
Observation & Filtering
For deterministic version, please check my notes for control system.
- (LTI) Luenberger observer: $$\begin{align}\hat{\mathcal{X}}(k+1)&=A\hat{\mathcal{X}}(k)+L(\hat{\mathcal{Y}}(k)-\mathcal{Y}(k))+B\bar{\mu}_ \mathcal{W} \\ \hat{\mathcal{Y}}(k)&=C\hat{\mathcal{X}}(k)+D\bar{\mu}_\mathcal{W}\end{align}$$ where $L$ is the observer gain.
- Note: We often assume that the process & measurement noise are decoupled and independent
- Combined form: $\hat{\mathcal{X}}(k+1)=[A+LC]\hat{\mathcal{X}}(k)-L\mathcal{Y}(k)+B\mu_\mathcal{W}(k)$
- State estimation residual $r(k)=\mathcal{X}(k)-\hat{\mathcal{X}}(k)$
- Combined form: $r(k+1)=[A+LC]r(k)+[B+LD]\tilde{\mathcal{W}}(k)$
- Stationary covariance can be solved by a Lyapunov equation $$\bar{S}_ r=[A+LC]\bar{S}_ r[A+LC]^T+[LD+B]\bar{S}_ \mathcal{W}[LD+B]^T$$
- A common objective: minimize $\bar{S}_r=\mathbb{E}(rr^\top)$
- Solutions: $L=-A\bar{S}_ rC^\top[C\bar{S}_ rC^\top+D\bar{S}_ \mathcal{W}D^\top]^{-1}$ (Kalman observer gain)
- Or solve (Discrete-time) Algebraic Riccati equation $$\bar{S}_ r=A\bar{S}_ rA^\top+B\bar{S}_ \mathcal{W}B^\top-A\bar{S}_ rC^\top[C\bar{S}_ rC^\top+D\bar{S}_ \mathcal{W}D^\top]^{-1}C\bar{S}_ rA^\top$$
- Innovation sequence $e(k)=\hat{\mathcal{Y}}(k)-\mathcal{Y}(k)$ with $L$ is optimal
- We can find that $\mu_e=0$ and $\mathrm{K}_e(k+s,k)=\begin{cases}C\bar{S}_rC^\top+D\bar{S}_WD^T&:s=0 \\0&:s\neq0\end{cases}$. So the innovation sequence is iid. (only in Kalman observer)
- (Output) Probabilistically-equivalent model: $$\begin{align}\mathcal{X}(k+1)&=A\mathcal{X}(k)+Le(k) \\ \mathcal{Y}(k)&=C\mathcal{X}(k)-e(k)\end{align}$$
Markov Chains
Content in this section comes from EECS 501 In this specific field, we often use stand-alone analysis methods.
Basic definitions
- State distribution: We denote row vector $\pi_t$ as $\pi_t(x)=\mathbb{P}(\mathcal{X}_t=x),\; x\in S$ ($S$ is the set of states). Directly we have $\sum_x\pi(x)=1$
- Time homogeneous: $\mathbb{P}(\mathcal{X}_ {t+1}=y|\mathcal{X}_ t=x)=\mathbb{P}(\mathcal{X}_{s+1}=y|\mathcal{X}_s=x)\;\forall s,t$
- One-step transition probability matrix: $P_t=[P_{xy,t}]$ where $P_{xy,t}=\mathbb{P}(\mathcal{X}_ {t+1}=y|\mathcal{X}_ t=x)$.
- Time-homo case: $P=[P_{xy}]$ where $P_{xy}=p(y|x)$
- Rows of the matrix sum up to 1.
- This matrix is also called stochastic matrix.
- Extend this matrix to continuous states, then we use transition kernel $T(x,y)$ to describe the transition probability,
- m-step transition probability matrix: $P_{xy,t}^{(m)}=\mathbb{P}(\mathcal{X}_ {t+m}=y|\mathcal{X}_ t=x)$
- Chapman-Kolmogorov Equation: $P^{(n+m)}_t=P^{(n)}_t P^{(m)}_t$
State Variables
- Hitting time: $T_1(y)=\min\{n\geq 0:\mathcal{X}_ n=y\},\;T_k(y)=\min\{n>T_{k-1}(y):\mathcal{X}_ n=y\}$ where $n\in\mathbb{N}$
- Period: For state $i\in x$, its period is the greatest common divisor of $\{n>1|T_n(i)>0\}$. A Markov Chain is aperiodic If all states have period 1.
- Return probability: $f_{xy}=\mathbb{P}(T_1(y)<\infty|\mathcal{X}_0=x)$
- Occupation time: $V(y)=\sum^\infty_{n=1}\unicode{x1D7D9}_{\mathcal{X_n}}(y)$
- Some properties
- $f_{xy}=\mathbb{P}(V(y)\geqslant 1|\mathcal{X}_0=x)$
- $\mathbb{P}(V(y)=m|\mathcal{X}_ 0=x)=\begin{cases} 1-f_{xy}&:m=0\\f_{xy}f_{yy}^{m-1}(1-f_{yy})&:m\geqslant 1\end{cases}$
- $\mathbb{E}[ V(y)|\mathcal{X}_ 0=x]=\begin{cases} 0&:f_{xy}=0\\\infty&:f_{xy}>0,\ f_{yy}=1\\f_{xy}/(1-f_{yy})&:f_{xy}>0,\ f_{yy}<1\end{cases}$
- $\mathbb{E}[ V(y)|\mathcal{X}_ 0=x]=\sum^\infty_{n=1}P^{(n)}_{xy}$
State Classification
Here we usually consider only time-homogeneous Markov Chains
- Accessible ($x\to y$): $\exists n\;\text{s.t.}\ P_{xy}^{(n)}>0$
- $x\to y \Leftrightarrow f_{xy}>0$
- Communicate ($x\leftrightarrow y$): $x\to y\;\text{and}\;y\to x$. This is a equivalence relation.
- Equivalent class: set of states that communicate with each other
- Irreducible: a Markov chain with only one communicating class
- Absorbing/Closed state: $P_{xx}=1$
- Absorbing class: set $C$ is absorbing iff $\forall x \in C, \sum_{y\in C}P_{xy}=1$
- Transient state: $f_{xx}<1 \Leftrightarrow \mathbb{E}[V_i|\mathcal{X}_0=i]<\infty$
- Recurrent state: $f_{xx}=1 \Leftrightarrow \mathbb{E}[V_i|\mathcal{X}_0=i]=\infty$
- Positive recurrent: $\mathbb{E}[T_1(x)|\mathcal{X}_0=x]<0$
- Null recurrent: $\mathbb{E}[T_1(x)|\mathcal{X}_0=x]=0$
- These two kind of recurrent states also make up communicating classes
- Some properties
- If $x$ is (positive) recurrent and $x\to y$, then $y$ is also (positive) recurrent and $f_{xy}=f_{yx}=1$
- Every closed and finite subset of $X$ contains at least one (positive) recurrent state.
- All states of a communicating class are either positive recurrent, null recurrent or transient.
- Method to determine whether class $C$ is recurrent/transient
- If $C$ is non-closed, it’s transient
- If $C$ is closed and finite, then $C$ is positive recurrent
- If $C$ is closed and infinite, then $C$ can be either positive/null recurrent or transient
A typical example is the birth-death chain
- Stationary distribution: $\bar{\pi}\;\text{s.t.}\;\bar{\pi}=\bar{\pi} P$
- Stationary in limit form: $\bar{\pi}=\lim_{n\to \infty} \frac{1}{n}\sum^{n-1}_{t=0} \pi_t$
- This is a Cesaro limit, we use this to deal with the problem that $\lim_{n\to \infty} \pi_t$ might not exist if the chain is periodic.
- Reversibility criterion for stationary: $\forall i,j \in S, \pi_i P_{ij} = \pi_j P_{ji}$ (a.k.a detailed balance condition), then the process is reversible and therefore stationary.
- Existence for stationary distribution satisfying $\bar{\pi}=\bar{\pi} P$
- If the chain has single positive recurrent class, then exists unique solution: $\bar{\pi}(x)=0$ for all transient or null recurrent $x$
- If the chain has multiple positive recurrent class, then there are multiple solutions, for each positive recurrent $x$ we have a $\bar{\pi}^i$.
- If the chain has only transient and null recurrent states, there is no solution$
- Convergence of stationary distribution
- If the chain has positive recurrent class, then $\frac{1}{n}\sum^n_{t=1}\mathbb{P}(\mathcal{X}_t=j)\xrightarrow[n\to \infty]{}\bar{\pi}_j,\;\forall \mu_0$
- (Ergodic) If the chain is positive recurrent, then $\frac{1}{n}\sum^n_{t=1}\unicode{x1D7D9}_{\mathcal{X}_t}(j)\xrightarrow[n\to \infty]{\text{a.s.}}\bar{\pi}_j$
- If the chain is positive recurrent and aperiodic, then $\mathbb{P}(\mathcal{X}_n=j)\xrightarrow[n\to \infty]{}\bar{\pi}_j$
- Stationary in limit form: $\bar{\pi}=\lim_{n\to \infty} \frac{1}{n}\sum^{n-1}_{t=0} \pi_t$
Continuous-Time Stochastic System
Stochastic Sequences
- Definition: Given $t\in\mathbb{T}\subset\mathbb{R}$ a sequence of time, $\mathcal{X}(t,\omega): (\Omega,\mathcal{F},\mathbb{P})\to(\mathbb{R}^n,\mathcal{F}_ \mathcal{X},\mathbb{P}_ \mathcal{X})$ is a continuous stochastic sequence.
- Most definitions are similar to discrete sequence (including stationarity), while the sequence is often defined as $\mathcal{X}(\mathcal{G}),\;\mathcal{G}={t_1,t_2,\ldots,t_N}\subset\mathbb{T}, t_i< t_{i+1}$
- Stochastic Differential Equation (SDE): $$\mathrm{d}\mathcal{X}(t)=F(\mathcal{X}(t),t)\mathrm{d}t+\sum^r_{i=1}G_i(\mathcal{X}(t),t)\mathrm{d}\mathcal{W}_i(t)$$ Here $\mathcal{W}$ is often a certain kind of noise random process.
Markov Sequence
- Markov Sequence: A ss. $\mathcal{X}(k)$ is called a (continuous-time) Markov sequence if for the set of times $\mathcal{G}={t_1,t_2,\ldots,t_N}$ with $t_i < t_{i+1}$, we have $$\begin{split}f_ \mathcal{X}\left(t_N;x(t_N)\middle|\mathcal{X}(t_{N-1})=x(t_{N-1}),\mathcal{X}(t_{N-2})=x(t_{N-2}),\ldots\right)\qquad \\ =f_ \mathcal{X}(t_N;x(t_N)|\mathcal{X}(t_{N-1})=x(t_{N-1}))\end{split}$$
- Hidden Markov Model: Definition similar to discrete case. A continuous-time stochastic state space is of the form $$\begin{align}\dot{\mathcal{X}}(t)&=F(\mathcal{X},t)+G(\mathcal{X},t)\mathrm{d}\mathcal{W}(t)/\mathrm{d}t \\ \mathcal{Y}(t)&=H(\mathcal{X},t)+J(\mathcal{X},t)\mathrm{d}\mathcal{W}(t)/\mathrm{d}t\end{align}$$ where $\mathcal{W}$ is usually Wiener process and white noise $\mathrm{d}\mathcal{W}(t)/\mathrm{d}t$ is often written as $\mathcal{U}(t)$.
- The state space is affine if $F(\mathcal{X}(t),t)=A(t)\mathcal{X}(t)+u(t),\;N(\mathcal{X},t)=C(t)\mathcal{X}(t)+v(t)$
- The state space is bilinear if $G(\mathcal{X},t)\mathrm{d}\mathcal{W}(t)=\sum^r_{i=1}B_i(t)\mathcal{X}\mathrm{d}\mathcal{W}_i(t)$
Poisson Counters
- Definition: It’s a stochastic Markow process $\mathcal{N}(t)\in\{0,1,2,\ldots\}$ with the characteristic that it jumps up by only one integer at a time.
- Characteristics: transition times are random, transition step is always 1.
- Transition rule: $\frac{\partial}{\partial t}p_\mathcal{N}=\begin{cases}-\lambda p_\mathcal{N}(t;n)&:n=0\\-\lambda p_\mathcal{N}(t;n)+\lambda p_\mathcal{N}(t;n-1)&:n>0\end{cases}$
- From the rule we can conclude that $p_\mathcal{N}(t;n)=\frac{1}{n!}(\lambda t)^n e^{-\lambda t}$
- Bidirectional counter: $\mathcal{N}(t)$ is defined as one-directional. $\mathcal{N}_1(t)-\mathcal{N}_2(t)$ is called bidirectional poisson counter.
- Expectation: $\mathbb{E}[\mathcal{N}(t)]=\lambda t,\;\mathbb{E}[\mathrm{d}\mathcal{N}(t)]=\lambda\mathrm{d}t$
- Ito calculus for Poisson counter:
- Ito sense: $n(t)$ is a realization of poisson counter $\mathcal{N}$. $x(t)$ is a solution in Ito sense to $\mathrm{d}x(t)=F(x(t),t)\mathrm{d}t+G(x(t),t)\mathrm{d}n(t)$ if
- On all intervals where $n(t)$ is constant, $\dot{x}(t)=F(x(t),t)$
- If $n(t)$ jumps at time $t_1$, $\lim_{t\to t_1^+}x(t)=\lim_{t\to t_1^-}x(t)+G\left(\lim_{t\to t_1^+}x(t),t_1\right)$
- Ito rule: Given a fuction $\psi(\mathcal{X}(t),t)$, taking Taylor expansion we have $$\mathrm{d}\psi=\left\{\frac{\partial\psi}{\partial t}+(\nabla_\mathcal{X}\psi)F(\mathcal{X},t)\right\}\mathrm{d}t+\sum^r_{i=1}\left\{\psi\left(\mathcal{X}+G_i(\mathcal{X},t),t\right)-\psi(\mathcal{X},t)\right\}\mathrm{d}\mathcal{N}_i(t)$$
- Ito sense: $n(t)$ is a realization of poisson counter $\mathcal{N}$. $x(t)$ is a solution in Ito sense to $\mathrm{d}x(t)=F(x(t),t)\mathrm{d}t+G(x(t),t)\mathrm{d}n(t)$ if
Wiener-Process
- Definition (Brownian Motion): It’s a bidirectional poisson counter with infinite rate, i.e. $\mathcal{W}=\lim_{\lambda\to\infty}\frac{1}{\sqrt(\lambda)}(\mathcal{N}_1-\mathcal{N}_2)$ where $\mathcal{N}_1,\;\mathcal{N}_2$ have rate $\lambda/2$
- Characteristic: It’s a Gaussian with zero mean and variance $t$
- Note: Actual Wiener Process has stronger continuity property than Brownian motion.
- Expectation: $\mathbb{E}[\mathcal{W}]=\mathbb{E}[d\mathcal{W}]=0,\;\mathbb{E}[\mathcal{W}(\tau)\mathcal{W}(t)]=\mathbb{E}[\mathcal{W}^2(\min\{t,\tau\})]=\min\{t,\tau\}$
- Principle of independent increments: If the interval $[r,t), [\sigma,s)$ don’t overlap, then $\mathcal{W}(t)-\mathcal{W}(\tau)$ and $\mathcal{W}(s)-\mathcal{W}(\sigma)$ are uncorrelated.
- Ito calculus for Wiener Process
- Ito rule: Given a fuction $\psi(\mathcal{X}(t),t)$, taking Taylor expansion we have $$\begin{split}\mathrm{d}\psi=\frac{\partial\psi}{\partial t}\mathrm{d}t+(\nabla_\mathcal{X}\psi)F(\mathcal{X},t)\mathrm{d}t+\sum^r_{i=1}\left(\nabla_\mathcal{X}\psi(\mathcal{X},t)\right)G_i(\mathcal{X},t)\mathrm{d}\mathcal{W}_ i \\ -\sum^r_{i=1}\frac{1}{2}G_i^\top(\mathcal{X},t)\left(\mathrm{H}_\mathcal{X}\psi(\mathcal{X},t)\right)G_i(\mathcal{X},t)\mathrm{d}t\end{split}$$
Note $\nabla_{\mathcal{X}}=\begin{bmatrix}\frac{\partial\psi}{\partial x_1}&\cdots&\frac{\partial\psi}{\partial x_n}\end{bmatrix}$ and $\mathrm{H}_{\mathcal{X}}=\begin{bmatrix}\frac{\partial^2\psi}{\partial x_1^2}&\cdots&\frac{\partial^2\psi}{\partial x_1\partial x_n}\\ \vdots&\ddots&\vdots \\ \frac{\partial^2\psi}{\partial x_n\partial x_1}&\cdots&\frac{\partial^2\psi}{\partial x_n^2}\end{bmatrix}$ are the gradient and Hessian operator respectively. By default, the operator target is $\mathcal{X}$ is not noted.
- Ito rule: Given a fuction $\psi(\mathcal{X}(t),t)$, taking Taylor expansion we have $$\begin{split}\mathrm{d}\psi=\frac{\partial\psi}{\partial t}\mathrm{d}t+(\nabla_\mathcal{X}\psi)F(\mathcal{X},t)\mathrm{d}t+\sum^r_{i=1}\left(\nabla_\mathcal{X}\psi(\mathcal{X},t)\right)G_i(\mathcal{X},t)\mathrm{d}\mathcal{W}_ i \\ -\sum^r_{i=1}\frac{1}{2}G_i^\top(\mathcal{X},t)\left(\mathrm{H}_\mathcal{X}\psi(\mathcal{X},t)\right)G_i(\mathcal{X},t)\mathrm{d}t\end{split}$$
- White noise: It is Guassian distributed stationary stochastic process with $\mu_\mathcal{U}(t)=0,\;\mathrm{K}_ \mathcal{U}(t,\tau)=\Phi_ \mathcal{U}\delta(t-\tau)$, where $\Phi_\mathcal{U}$ is spectral intensity.
- We often consider white noise as derivative of Wiener process: $\mathcal{U}\sim\mathrm{d}\mathcal{W}/\mathrm{d}t$.
Linear Stochastic Sequence
$$\begin{align}\mathrm{d}\mathcal{X}(t)&=\{A(t)\mathcal{X}(t)+u(t)\}\mathrm{d}t+B(k)\mathrm{d}\mathcal{W}(t) \\ \mathcal{Y}(t)&=C(k)\mathcal{X}(k)+D(k)\mathcal{W}(k)\end{align}$$
- Differential equation of expectations
- Mean: $\frac{\mathrm{d}}{\mathrm{d}t}\mu_\mathcal{X}(t)=A(t)\mu_\mathcal{X}(t)$
- Autocovariance: $\frac{\mathrm{d}}{\mathrm{d}t}S_\mathcal{X}(t)=A(t)S_\mathcal{X}(t)+S_\mathcal{X}A^\top(t)+B(t)B^\top(t)$ (called Lyapunov differential equation)
- Stationary LTI case
- Useful equation: $\mathrm{R}_ \mathcal{X}(t,\tau)=\begin{cases} S_ \mathcal{X}\exp\{A^\top(\tau-t)\}&,\tau>t \\ \exp\{A(t-\tau)\}S_ \mathcal{X}&,\tau< t\end{cases}$
- (Continuous-time) algerbraic Lyapunov equation: $A\bar{S}_ \mathcal{X}+\bar{S}_\mathcal{X}A^\top+BB^\top=0$
Nonlinear Stochastic Sequence
The Fokker-Planck Equation (FPE): Consider the Wiener-process excited general SDE, we have $$\frac{\partial f_ \mathcal{X}(x;t)}{\partial t}=-\nabla\left(F(\mathcal{X})f_ \mathcal{X}(x;t)\right)+\frac{1}{2}\mathrm{tr}\left[\mathrm{H}\left(\Gamma(\mathcal{X})f_ \mathcal{X}(x;t)\right)\right]$$ here $\Gamma(\mathcal{X})=G(\mathcal{X})G^\top(\mathcal{X})$
- $\mathcal{X}$ is stationary means $\frac{\partial f_\mathcal{X}(x;t)}{\partial t}=0$
Gaussian closure: An approximate solution for FPE is supposing $f_\mathcal{X}$ as multivariate Gaussian with some $S_\mathcal{X}$ and ${\mu_\mathcal{X}}$. That is we focus on 1st-order and 2nd-order estimation. Denote the estimated distribution as $\hat{f}_\mathcal{X}(x;t)$
Estimated expectation: $\hat{\mathbb{E}}\phi(\mathcal{X})=\int\cdots\int\phi(x)\hat{f}_\mathcal{X}(x;t)\mathrm{d}x$
Solution:$$\begin{split}h(x,t)=-\nabla F+\tilde{x}^\top S^{-1}F-(\nabla\Gamma)S^{-1}\tilde{x}+\frac{1}{2}\mathrm{tr}\left[\mathrm{H}\Gamma \right] \\ +\frac{1}{2}\mathrm{tr}\left[\left(-S^{-1}+S^{-1}\tilde{x}\tilde{x}^\top S^{-1}\right)\Gamma\right]\end{split}$$ and $$\begin{align}\dot{\mu}_\mathcal{X}(t)&=S(t)\hat{\mathbb{E}}[\nabla^\top h(x)] \\ \dot{S} _\mathcal{X}(t)&=S(t)\hat{\mathbb{E}}[\mathrm{H} h(x)]S(t)\end{align}$$
Useful simplification: If $G$ is constant (independent of $\mathcal{X}$), then the ODE of $\dot{\mu}$ and $\dot{S}$ become $$\begin{align}\dot{\mu}_ \mathcal{X}(t)&=\hat{\mathbb{E}}[F(\mathcal{X})] \\ \dot{S} _\mathcal{X}(t)&=\hat{A}^\top S+S\hat{A}+\Gamma\end{align}$$ where $\hat{A}=\hat{\mathbb{E}}\left[\frac{\partial F}{\partial\mathcal{X}}\right]$
- Equivalent linearization (Quasi-linearization): In the estimation, $\mathcal{X}$ evolves equivalently to $\mathrm{d}\mathcal{X}(t)=\hat{A}(t)\mathcal{X}(t)\mathrm{d}t+G\mathrm{d}\mathcal{W}(t)$
No guaranteed bounds generally exist for the estimation error $$e(\mathcal{X},t)=\left(\frac{\partial\hat{f}_ \mathcal{X}}{\partial t}\right)-\left(-\nabla\left(F\hat{f}_ \mathcal{X}\right)+\frac{1}{2}\mathrm{tr}\left[\mathrm{H}\left(\Gamma\hat{f}_ \mathcal{X}\right)\right]\right)$$
Estimation of $\mu(t),\;S(t)$ could also have error. And stationarity may not be the consistent between original system and estimated system
Spectral Analysis
Content in this section comes from MECHENG 549
Power Spectral Density
- Definition: The power spectral density (PSD) of stochastic process $\mathcal{Y}$ is $$\Phi_\mathcal{Y}(\omega)\equiv\mathbb{E}\left[ \lim_{T\to\infty}\frac{1}{2T}Y_T(\omega)Y_T^\top(\omega)\right]$$ where $Y_T(\omega)$ is the Fourier transform of centered process $\mathcal{Y}(t)$.
- White noise has the same PSD for whatever $\omega$
- Wiener-Khinchin Theorem: $\Phi_\mathcal{Y}(\omega)=\int^\infty_{-\infty}e^{-i\omega\theta}\bar{\mathrm{K}}_\mathcal{Y}(\theta)\mathrm{d}\theta$
- Signal propagation: If the system $P$ has input $\mathcal{Y}$ and output $\mathcal{Z}$, then $\Phi_\mathcal{Z}=|P(i\omega)\Phi_\mathcal{Y}(\omega)P^\top(i\omega)|$ where $P(i\omega)$ is the Fourier transform of the system.
- Cross spectrum: If the system $G$ has input $\mathcal{Y}$ and output $\mathcal{Z}$, then the cross-spectrum is $\Phi_{\mathcal{ZY}}(\omega)=G(i\omega)\Phi_\mathcal{Y}(\omega)$
Periodograms
- Definition: We want to estimate $\Phi_{\mathcal{Y}}(\omega)$ without the expectation, then we use $$Q_T(\omega,y)=\frac{1}{2T}\left|\int^T_{-T}e^{-i\omega t}y(t)\mathrm{d}t\right|^2$$ where $y$ is a sample realization of $\mathcal{Y}$.
- We also use window functions to calculate the spectrum over a finite time interval (using Bartlett’s procedure)
Stochastic realizations
We want to find a stochastic model which gives the same spectrum given. This is called stochastic realization problem. We focus on scalar LTI case.
If we know $P(s)=C[sI-A]^{-1}B+D$ (the Laplace transform of LTI system) and $P(s)=c\frac{\prod^m_{k=1}(s-z_k)}{\prod^n_{k=1}(s-p_k)}$ for some $m\leq n$ and real constant $c$. Then we have $$\Phi_\mathcal{Y}(\omega)=P(i\omega)P(-i\omega)=c^2\frac{\prod^m_{k=1}(\omega^2+z_k^2)}{\prod^n_{k=1}(\omega^2+p_k^2)}$$
Lemma: For any valid PSD, there exists a spectral factorization $\Phi_\mathcal{Y}(\omega)=\frac{\sum^m_{k=0}a_k(\omega^2)^k}{\sum^n_{k=0}b_k(\omega^2)^k}=P(i\omega)P(-i\omega)$ where $P(s)$ is an asymptotically stable n-th order transfer function, iff
- $\Phi_\mathcal{Y}(\omega)$ is a ratio of polynomials of $\omega^2$
- All coefficients $a_k$ and $b_k$ are real
- The denominator has no positive real roots in $\omega^2$
- Any positive real roots in the numerator have even multiplicity
Notes for math typing in Hexo
- Escape
\
by\\
. Especially escape{
by\\{
instead of\{
, and escape\\
by\\\\
.- Be careful about
_
, it’s used in markdown as italic indicator. Add space after_
is a useful solution.- Some useful Mathjax tricks at StackExchange
- Several capital Greek characters should directly use its related Latin alphabet with
\mathrm
command.Although I have migrated to Hugo, some tricks might still be relevant.