Notes for Stochastic System

Table of contents

Prerequisites: Knowledge of Elementary Calculus, Linear Algebra and Probability

Discrete-Time Stochastic System

Stochastic Sequences

Definition: Given $k\in\mathbb{K}\subseteq\mathbb{Z}$ a sequence of integers, $\mathcal{X}(k,\omega): (\Omega,\mathcal{F},\mathbb{P})\to(\mathbb{R}^n,\mathcal{F}_ \mathcal{X},\mathbb{P}_ \mathcal{X})$ is a random/stochastic sequence.
Uncertainties: Consider a casual system $F$ relates some scalar inputs $u(k)$ to output $x(k)$
- Epistemic/Model uncertainty: $\mathcal{X}(k,\omega)=F(k,u(k),u(k-1),\ldots,\omega)$. (system is stochastic and input is deterministic).
- Aleatoric/Input uncertainty: $\mathcal{X}(k,\omega)=f(k,U(k,\omega),u(k-1,\omega),\ldots)$ (system is deterministic and input is stochastic).
Realization: An outcome $\mathcal{X}(k,\omega)=x(k)$ given $\omega$ is called a realization of stochastic sequence $\mathcal{X}$
Terminology and Convention
- $\mathcal{X}(k,\omega)$ is often written as $\mathcal{X}(k)$ when there’s no ambiguity.
- $\mathbb{K}=\mathbb{Z}$ if not specified.
- Sequence over a set $\mathcal{K}_1\subseteq\mathbb{K}$ are denoted $\mathcal{X}(\mathcal{K}_1)$.
- $\mathcal{X}$ denotes $\mathcal{X}(\mathbb{K})$ if not specified.
- Consecutive subsequence: $$\mathcal{X}(k:l)=\{\mathcal{X}(k),\mathcal{X}(k+1),\ldots,\mathcal{X}(l)\},\;x(k:l)=\{x(k),x(k+1),\ldots,x(l)\}$$
- Abbreviations:
  - SS - stochastic sequence
  - IID - independent indentically distributed

Probabilistic characterization

Distribution and density: $$F_ \mathcal{X}\left(k:l;x(k:l)\right)\equiv\mathbb{P}((\mathcal{X}_ i(k)\leqslant x_i(k))\cap\cdots\cap(\mathcal{X}_ i(l)\leqslant x_ i(l)),\;i=1\ldots n)$$ $$f_ \mathcal{X}\left(k:l;x(k:l)\right)\equiv \frac{\partial^{n(l-k+1)}}{\partial x_ 1(k)\cdots\partial x_ n(l)}F_ \mathcal{X}(k:l;x(k:l))$$ Here $k:l$ actually denotes a set of consecutive integers, it can be also changed to ordinary sets $\{k,l\}$ or single scalar $k$.
Ensemble Average: $\mathbb{E}[\psi(\mathcal{X}(k))]$, doing summation over different realization at same time $k$
- Mean: $\mu_ \mathcal{X}(k)\equiv\mathbb{E}[\mathcal{X}(k)]=\int^\infty_{-\infty}x(k)f_ \mathcal{X}(k;x(k))\mathrm{d}x(k)$
- Conditional Mean: $\mu_ \mathcal{X}(l|k)\equiv\mathbb{E}[\mathcal{X}(l)|\mathcal{X}(k)=x(k)]$
Time Average: $\frac{1}{2N+1}\sum^N_{k=-N}\psi(\mathcal{X}(k))$, doing summation over different time k of same realization
Autocorrelation:
- Scalar case: $r_ \mathcal{X}(k,l)\equiv\mathbb{E}[\mathcal{X}(k)\mathcal{X}(l)]=\int^\infty_{-\infty}\int^\infty_{-\infty}x(k)x(l)f_ \mathcal{X}(k,l;x(k,l))\mathrm{d}x(k)\mathrm{d}x(l)$
- Vector case: $R_ \mathcal{X}(k,l)\equiv\mathbb{E}[\mathcal{X}(k)\mathcal{X}^\top(l)]$
- Conditional autocorrelation: $R_ \mathcal{X}(k,l|q)\equiv\mathbb{E}[\mathcal{X}(k)\mathcal{X}^\top(l)|\mathcal{X}(q)=x(q)]$
- Often we denote $C_ \mathcal{X}(k)=R_ \mathcal{X}(k,k)$
Autocovariance:
- Scalar case: $\kappa_ \mathcal{X}(k,l)\equiv\mathbb{E}[(\mathcal{X}(k)-\mu_ \mathcal{X}(k))(\mathcal{X}(l)-\mu_ \mathcal{X}(l))]$
- Vector case: $\mathrm{K}_ \mathcal{X}(k,l)\equiv\mathbb{E}[(\mathcal{X}(k)-\mu_ \mathcal{X}(k))(\mathcal{X}(l)-\mu_ \mathcal{X}(l))^\top]$
- Conditional autocovariance: $\mathrm{K}_ \mathcal{X}(k,l|q)\equiv\mathbb{E}[(\mathcal{X}(k)-\mu_ \mathcal{X}(k|q))(\mathcal{X}(l)-\mu_ \mathcal{X}(l|q))^\top]$
- Often we denote $S_ \mathcal{X}(k|q)=\mathrm{K}_ \mathcal{X}(k,k|q)$
- Useful conclusion: $\mathrm{K}(a,b)=\mathrm{K}(b,a)^T$
- Normalized (autocorrelation coefficient): $\rho_ \mathcal{X}(k,l)\equiv\mathrm{K}_ \mathcal{X}(k,l)/\sigma^2_{\mathcal{X}(k)}\sigma^2_{\mathcal{X}(l)}$
Strong Stationarity(aka. strict sense): (necessarily identically distributed over time) $$\forall x(k:l)\in\mathbb{R}^{n(l-k+1)},\;\forall s\in\mathbb{Z},\;f_ \mathcal{X}(k:l;x(k:l))=f_ \mathcal{X}(k+s:l+s;x(k:l))$$
Weak Stationarity(aka. wide sense): $\forall k,l$ if $$\mu_ \mathcal{X}(k)=\mu_ \mathcal{X}(l)\;\text{and}\;\mathrm{K}_ \mathcal{X}(k,l)=\mathrm{K}_ \mathcal{X}(k+s,l+s)\equiv\bar{\mathrm{K}}_ \mathcal{X}(s)$$ Weak stationarity is necessary condition for stationarity. (Equal when Gaussian distributed)
Ergodicity: $\mathcal{X}$ is called ergodic in $\psi$ if
1. $\mathbb{E}[\psi(\mathcal{X})]$ is stationary
2. Ensemble average is equal to Time average, that is $$ \frac{1}{2N+1}\sum^N_{k=-N}\psi(\mathcal{X}(k))\to\mathbb{E}[\psi(\mathcal{X})];\text{as};l\to \infty$$

Markov Sequence

Markov Sequence: A ss. $\mathcal{X}(k)$ is called a (discrete-time) Markov sequence if $$\begin{split}f_ \mathcal{X}\left(k;x(k)\middle|\mathcal{X}(k-1)=x(k-1),\mathcal{X}(k-2)=x(k-2),\ldots\right) \\=f_ \mathcal{X}(k;x(k)|\mathcal{X}(k-1)=x(k-1))\end{split}$$
- We often make some assumption on the initial condition $\mathcal{X}(0)$, such as known, deteministic or uniformly distributed within certain domain.
Markov Chains: Markov sequence with discrete set of values(states) ${x_1\ldots x_m}$
Hidden Markov model: Sequence $\mathcal{Y}$ is called a Hidden Markov Model if it’s modeled by a system of the form $$\begin{align}\mathcal{X}(k+1)&=g(k,\mathcal{X}(k),\mathcal{W}(k)) \\ \mathcal{Y}(k)&=h(k,\mathcal{X}(k),\mathcal{W}(k))\end{align}$$ We also say that $\mathcal{Y}$ has a (discrete-time) stochastic state space.
Guassian-Markov Sequence (GMS): $\mathcal{X}(k+1)=g(k,\mathcal{X}(k),\mathcal{W}(k))$ where $\mathcal{W}(k)$ is iid. Guassian

Linear Stochastic Sequence

$$\begin{align}\mathcal{X}(k+1)&=A(k)\mathcal{X}(k)+B(k)\mathcal{W}(k) \\ \mathcal{Y}(k)&=C(k)\mathcal{X}(k)+D(k)\mathcal{W}(k)\end{align}$$

For linear Markov sequences, the deterministic mean sequence and centered uncertain sequence completely decouple. So we often assume that $\mathcal{X}(k)$ and $\mathcal{Y}(k)$ are centered in this case, with regard to deterministic inputs. The equation with deterministic inputs is often written as $$\mathcal{X}(k+1)=A(k)\mathcal{X}(k)+B_u(k)u(k)+B_\mathcal{W}(k)\mathcal{W}(k)$$

Recursive-form expectations:
- Mean: $\mu_ \mathcal{X}(k+1|q)=A(k)\mu_ \mathcal{X}(k|q)+B(k)\mu_\mathcal{W}(k)$
- Covariance (Discrete-time algebraic Lyapunov/Stein difference equation): $$S_ \mathcal{X}(k+1|q)=A(k)S_ \mathcal{X}(k|q)A^\top(k)+B(k)S_\mathcal{W}(k)B^\top(k)$$
  Can be solved with dlyap in MATLAB
Convergence when $k\to\infty$:
- Mean convergence: $\mu_ \mathcal{X}(k|q)$ converges requires $\max_i|\lambda_i(A)|<1$ ($\lambda$ denotes eigenvalue)
- Covariance convergence: $S_ \mathcal{X}(k|q)$ converges requires $\max_i|\lambda_i(A)|<1$
- (Discrete-time) Lyapunov equation (Stein equation): $\bar{S}_ \mathcal{X}=A\bar{S}_ \mathcal{X}A^\top+B\bar{S}_\mathcal{W}(k)B^\top$. Solution for this equation exists iff. $A$ is asymptotically stable (characterizing sequence is stationary).
Explicit state transition: By recursive substitution, $$\mathcal{X}(k)=\Psi(k,q)\mathcal{X}(q)+\sum^{k-1}_ {i=q}\Gamma(k,i)\mathcal{W}(i)$$ where state transition matrix $\Psi(k,q)=\begin{cases}I, &k=q \\ \prod^{k-1}_ {i=q}A(i),& k>q\end{cases}$ and $\Gamma(k,i)=\Psi(k,i+1)B(i)$.
- Conditioned Mean sequence: $\mu_ \mathcal{X}(k|q)=\Psi(k,q)\mu_ \mathcal{X}(q)+\sum^{k-1}_ {i=q}\Gamma(k,i)\mu_\mathcal{W}(i)$
- Conditioned Autocovariance Matrix: $$\mathrm{K}_ \mathcal{X}(k,l|q)=\Psi(k,q)S_ \mathcal{X}(q)\Psi^\top(l,q)+\sum^{min\{k,l\}-1}_ {i=q}\Gamma(k,i)S_\mathcal{W}(i)\Gamma^\top(l,i)$$
  - A special case: $S_ \mathcal{X}(k|q)=\mathrm{K}_ \mathcal{X}(k,k|q)=\sum^{k-1}_ {i=q}\Gamma(k,i)S_\mathcal{W}(i)\Gamma^\top(k,i)$, $S_ \mathcal{X}(k|k)=0$
  - Useful equation (stationary centered case): $$\mathrm{K}_ \mathcal{X}(k,l)=\begin{cases} S_ \mathcal{X}(k)\cdot(A^\top)^{(l-k)}&,l>k\\ S_ \mathcal{X}(k)&,l=k\\ A^{(k-l)}S_ \mathcal{X}(k)&,l< k\end{cases}$$
Conditional Autocorrelation Matrix: $R_ \mathcal{X}(k,l|q)=\mathrm{K}_ \mathcal{X}(k,l|q)+\mu_ \mathcal{X}(k|q)\mu_ \mathcal{X}^\top(l|q)$
Observation $\mathcal{Y}$ property:
- Mean: $\mu_ \mathcal{Y}(k|q)=C(k)\mu_ \mathcal{X}(k|q)+D(k)\mu_\mathcal{W}(k)$
- Covariance: $$\mathrm{K}_ \mathcal{Y}(k,l|q)=\begin{cases}C(k)\mathrm{K}_ \mathcal{X}(k,l|q)C^\top(l)+C(k)\Gamma(k,l)S_W(l)D^\top(l)&:k>l \\C(k)S_ \mathcal{X}C^\top(k)+D(k)S_\mathcal{W}(k)D^\top(k)&:k=l\\ C(k)\mathrm{K}_ \mathcal{X}(k,l|q)C^\top(l)+D(k)S_\mathcal{W}(k)\Gamma^\top(l,k)C^\top(l)&:k< l\end{cases}$$
- Stationary time-invariant covariance: $$\mathrm{K}_ \mathcal{Y}(s)=\begin{cases}CA^{|s|}\bar{S}_ \mathcal{X}C^\top+CA^{|s|-1}B\bar{S}_ WD^\top&:s\neq 0 \\C\bar{S}_ \mathcal{X}C^\top+D\bar{S}_ WD^\top&:s=0\end{cases}$$

Gaussian Stochastic Sequence

Jointly Gaussian $\Rightarrow, \nLeftarrow$ Marginally Gaussian
$c^\top X$ is Gaussian $\Leftrightarrow X$ is Gaussian
Conditional Gaussian: if $X$ and $Y$ are Gaussian, then $X|Y \sim \mathcal{N}(\mu_{X|Y},S_{X|Y})$ where $\mu_{X|Y}=\mu_X+S_{XY}S_Y^{-1}(Y-\mu_Y)$, $S_{X|Y}=S_X-S_{XY}S_Y^{-1}S_{YX}$
A linear controllable GMS $\mathcal{X}(k)$ is stationary iff. $A(k)=A, B(k)=B$ (time-invariant) and $A$ is asymptotically stable.
All LTI stationary GMS are also ergodic in all finite momoents
Solve $\mathcal{X}(k+1)=A\mathcal{X}(k)+B\mathcal{W}(k)$ when $\mathcal{X}$ is stationary ($\max_i|\lambda_i(A)|<1$)
1. solve $\bar{\mu}_ \mathcal{X}=A\bar{\mu}_ \mathcal{X}+B\bar{\mu}_\mathcal{W}$ for $\bar{\mu}$
2. solve $\bar{S}_ \mathcal{X}=A\bar{S}_ \mathcal{X}A^\top+B\bar{S}_ \mathcal{W}B^\top$ for $\bar{S}_ \mathcal{X}$
3. calculate $\Sigma_ \mathcal{X}(k:l|q)=\begin{bmatrix}\mathrm{K}_ \mathcal{X}(k,k|q)&\cdots&\mathrm{K}_ \mathcal{X}(k,l|q)\\ \vdots&\ddots&\vdots\\ \mathrm{K}_ \mathcal{X}(l,k|q)&\cdots&\mathrm{K}_ \mathcal{X}(l,l|q)\end{bmatrix}$ using $\bar{S}_ \mathcal{X}$
4. Then $f_ \mathcal{X}(k:l;x(k:l))$ is determined with $\mu_ \mathcal{X}(k:l)=\{\bar{\mu}_ \mathcal{X},\bar{\mu}_ \mathcal{X}\ldots\bar{\mu}_ \mathcal{X}\}$ and $\Sigma_ \mathcal{X}(k:l)$ above.

Observation & Filtering

For deterministic version, please check my notes for control system.

(LTI) Luenberger observer: $$\begin{align}\hat{\mathcal{X}}(k+1)&=A\hat{\mathcal{X}}(k)+L(\hat{\mathcal{Y}}(k)-\mathcal{Y}(k))+B\bar{\mu}_ \mathcal{W} \\ \hat{\mathcal{Y}}(k)&=C\hat{\mathcal{X}}(k)+D\bar{\mu}_\mathcal{W}\end{align}$$ where $L$ is the observer gain.
- Note: We often assume that the process & measurement noise are decoupled and independent
- Combined form: $\hat{\mathcal{X}}(k+1)=[A+LC]\hat{\mathcal{X}}(k)-L\mathcal{Y}(k)+B\mu_\mathcal{W}(k)$
State estimation residual $r(k)=\mathcal{X}(k)-\hat{\mathcal{X}}(k)$
- Combined form: $r(k+1)=[A+LC]r(k)+[B+LD]\tilde{\mathcal{W}}(k)$
- Stationary covariance can be solved by a Lyapunov equation $$\bar{S}_ r=[A+LC]\bar{S}_ r[A+LC]^T+[LD+B]\bar{S}_ \mathcal{W}[LD+B]^T$$
A common objective: minimize $\bar{S}_r=\mathbb{E}(rr^\top)$
- Solutions: $L=-A\bar{S}_ rC^\top[C\bar{S}_ rC^\top+D\bar{S}_ \mathcal{W}D^\top]^{-1}$ (Kalman observer gain)
- Or solve (Discrete-time) Algebraic Riccati equation $$\bar{S}_ r=A\bar{S}_ rA^\top+B\bar{S}_ \mathcal{W}B^\top-A\bar{S}_ rC^\top[C\bar{S}_ rC^\top+D\bar{S}_ \mathcal{W}D^\top]^{-1}C\bar{S}_ rA^\top$$
Innovation sequence $e(k)=\hat{\mathcal{Y}}(k)-\mathcal{Y}(k)$ with $L$ is optimal
- We can find that $\mu_e=0$ and $\mathrm{K}_e(k+s,k)=\begin{cases}C\bar{S}_rC^\top+D\bar{S}_WD^T&:s=0 \\0&:s\neq0\end{cases}$. So the innovation sequence is iid. (only in Kalman observer)
(Output) Probabilistically-equivalent model: $$\begin{align}\mathcal{X}(k+1)&=A\mathcal{X}(k)+Le(k) \\ \mathcal{Y}(k)&=C\mathcal{X}(k)-e(k)\end{align}$$

Markov Chains

Content in this section comes from EECS 501 In this specific field, we often use stand-alone analysis methods.

Basic definitions

State distribution: We denote row vector $\pi_t$ as $\pi_t(x)=\mathbb{P}(\mathcal{X}_t=x),\; x\in S$ ($S$ is the set of states). Directly we have $\sum_x\pi(x)=1$
Time homogeneous: $\mathbb{P}(\mathcal{X}_ {t+1}=y|\mathcal{X}_ t=x)=\mathbb{P}(\mathcal{X}_{s+1}=y|\mathcal{X}_s=x)\;\forall s,t$
One-step transition probability matrix: $P_t=[P_{xy,t}]$ where $P_{xy,t}=\mathbb{P}(\mathcal{X}_ {t+1}=y|\mathcal{X}_ t=x)$.
- Time-homo case: $P=[P_{xy}]$ where $P_{xy}=p(y|x)$
- Rows of the matrix sum up to 1.
- This matrix is also called stochastic matrix.
- Extend this matrix to continuous states, then we use transition kernel $T(x,y)$ to describe the transition probability,
m-step transition probability matrix: $P_{xy,t}^{(m)}=\mathbb{P}(\mathcal{X}_ {t+m}=y|\mathcal{X}_ t=x)$
- Chapman-Kolmogorov Equation: $P^{(n+m)}_t=P^{(n)}_t P^{(m)}_t$

State Variables

Hitting time: $T_1(y)=\min\{n\geq 0:\mathcal{X}_ n=y\},\;T_k(y)=\min\{n>T_{k-1}(y):\mathcal{X}_ n=y\}$ where $n\in\mathbb{N}$
- Period: For state $i\in x$, its period is the greatest common divisor of $\{n>1|T_n(i)>0\}$. A Markov Chain is aperiodic If all states have period 1.
Return probability: $f_{xy}=\mathbb{P}(T_1(y)<\infty|\mathcal{X}_0=x)$
Occupation time: $V(y)=\sum^\infty_{n=1}\unicode{x1D7D9}_{\mathcal{X_n}}(y)$
Some properties
- $f_{xy}=\mathbb{P}(V(y)\geqslant 1|\mathcal{X}_0=x)$
- $\mathbb{P}(V(y)=m|\mathcal{X}_ 0=x)=\begin{cases} 1-f_{xy}&:m=0\\f_{xy}f_{yy}^{m-1}(1-f_{yy})&:m\geqslant 1\end{cases}$
- $\mathbb{E}[ V(y)|\mathcal{X}_ 0=x]=\begin{cases} 0&:f_{xy}=0\\\infty&:f_{xy}>0,\ f_{yy}=1\\f_{xy}/(1-f_{yy})&:f_{xy}>0,\ f_{yy}<1\end{cases}$
- $\mathbb{E}[ V(y)|\mathcal{X}_ 0=x]=\sum^\infty_{n=1}P^{(n)}_{xy}$

State Classification

Here we usually consider only time-homogeneous Markov Chains

Accessible ($x\to y$): $\exists n\;\text{s.t.}\ P_{xy}^{(n)}>0$
- $x\to y \Leftrightarrow f_{xy}>0$
Communicate ($x\leftrightarrow y$): $x\to y\;\text{and}\;y\to x$. This is a equivalence relation.
Equivalent class: set of states that communicate with each other
Irreducible: a Markov chain with only one communicating class
Absorbing/Closed state: $P_{xx}=1$
- Absorbing class: set $C$ is absorbing iff $\forall x \in C, \sum_{y\in C}P_{xy}=1$
Transient state: $f_{xx}<1 \Leftrightarrow \mathbb{E}[V_i|\mathcal{X}_0=i]<\infty$
Recurrent state: $f_{xx}=1 \Leftrightarrow \mathbb{E}[V_i|\mathcal{X}_0=i]=\infty$
- Positive recurrent: $\mathbb{E}[T_1(x)|\mathcal{X}_0=x]<0$
- Null recurrent: $\mathbb{E}[T_1(x)|\mathcal{X}_0=x]=0$
- These two kind of recurrent states also make up communicating classes
Some properties
- If $x$ is (positive) recurrent and $x\to y$, then $y$ is also (positive) recurrent and $f_{xy}=f_{yx}=1$
- Every closed and finite subset of $X$ contains at least one (positive) recurrent state.
- All states of a communicating class are either positive recurrent, null recurrent or transient.
- Method to determine whether class $C$ is recurrent/transient
  1. If $C$ is non-closed, it’s transient
  2. If $C$ is closed and finite, then $C$ is positive recurrent
  3. If $C$ is closed and infinite, then $C$ can be either positive/null recurrent or transient
A typical example is the birth-death chain
Stationary distribution: $\bar{\pi}\;\text{s.t.}\;\bar{\pi}=\bar{\pi} P$
- Stationary in limit form: $\bar{\pi}=\lim_{n\to \infty} \frac{1}{n}\sum^{n-1}_{t=0} \pi_t$
  - This is a Cesaro limit, we use this to deal with the problem that $\lim_{n\to \infty} \pi_t$ might not exist if the chain is periodic.
- Reversibility criterion for stationary: $\forall i,j \in S, \pi_i P_{ij} = \pi_j P_{ji}$ (a.k.a detailed balance condition), then the process is reversible and therefore stationary.
- Existence for stationary distribution satisfying $\bar{\pi}=\bar{\pi} P$
  1. If the chain has single positive recurrent class, then exists unique solution: $\bar{\pi}(x)=0$ for all transient or null recurrent $x$
  2. If the chain has multiple positive recurrent class, then there are multiple solutions, for each positive recurrent $x$ we have a $\bar{\pi}^i$.
  3. If the chain has only transient and null recurrent states, there is no solution$
- Convergence of stationary distribution
  - If the chain has positive recurrent class, then $\frac{1}{n}\sum^n_{t=1}\mathbb{P}(\mathcal{X}_t=j)\xrightarrow[n\to \infty]{}\bar{\pi}_j,\;\forall \mu_0$
  - (Ergodic) If the chain is positive recurrent, then $\frac{1}{n}\sum^n_{t=1}\unicode{x1D7D9}_{\mathcal{X}_t}(j)\xrightarrow[n\to \infty]{\text{a.s.}}\bar{\pi}_j$
  - If the chain is positive recurrent and aperiodic, then $\mathbb{P}(\mathcal{X}_n=j)\xrightarrow[n\to \infty]{}\bar{\pi}_j$

Continuous-Time Stochastic System

Stochastic Sequences

Definition: Given $t\in\mathbb{T}\subset\mathbb{R}$ a sequence of time, $\mathcal{X}(t,\omega): (\Omega,\mathcal{F},\mathbb{P})\to(\mathbb{R}^n,\mathcal{F}_ \mathcal{X},\mathbb{P}_ \mathcal{X})$ is a continuous stochastic sequence.
Most definitions are similar to discrete sequence (including stationarity), while the sequence is often defined as $\mathcal{X}(\mathcal{G}),\;\mathcal{G}={t_1,t_2,\ldots,t_N}\subset\mathbb{T}, t_i< t_{i+1}$
Stochastic Differential Equation (SDE): $$\mathrm{d}\mathcal{X}(t)=F(\mathcal{X}(t),t)\mathrm{d}t+\sum^r_{i=1}G_i(\mathcal{X}(t),t)\mathrm{d}\mathcal{W}_i(t)$$ Here $\mathcal{W}$ is often a certain kind of noise random process.

Markov Sequence

Markov Sequence: A ss. $\mathcal{X}(k)$ is called a (continuous-time) Markov sequence if for the set of times $\mathcal{G}={t_1,t_2,\ldots,t_N}$ with $t_i < t_{i+1}$, we have $$\begin{split}f_ \mathcal{X}\left(t_N;x(t_N)\middle|\mathcal{X}(t_{N-1})=x(t_{N-1}),\mathcal{X}(t_{N-2})=x(t_{N-2}),\ldots\right)\qquad \\ =f_ \mathcal{X}(t_N;x(t_N)|\mathcal{X}(t_{N-1})=x(t_{N-1}))\end{split}$$
Hidden Markov Model: Definition similar to discrete case. A continuous-time stochastic state space is of the form $$\begin{align}\dot{\mathcal{X}}(t)&=F(\mathcal{X},t)+G(\mathcal{X},t)\mathrm{d}\mathcal{W}(t)/\mathrm{d}t \\ \mathcal{Y}(t)&=H(\mathcal{X},t)+J(\mathcal{X},t)\mathrm{d}\mathcal{W}(t)/\mathrm{d}t\end{align}$$ where $\mathcal{W}$ is usually Wiener process and white noise $\mathrm{d}\mathcal{W}(t)/\mathrm{d}t$ is often written as $\mathcal{U}(t)$.
- The state space is affine if $F(\mathcal{X}(t),t)=A(t)\mathcal{X}(t)+u(t),\;N(\mathcal{X},t)=C(t)\mathcal{X}(t)+v(t)$
- The state space is bilinear if $G(\mathcal{X},t)\mathrm{d}\mathcal{W}(t)=\sum^r_{i=1}B_i(t)\mathcal{X}\mathrm{d}\mathcal{W}_i(t)$

Poisson Counters

Definition: It’s a stochastic Markow process $\mathcal{N}(t)\in\{0,1,2,\ldots\}$ with the characteristic that it jumps up by only one integer at a time.
- Characteristics: transition times are random, transition step is always 1.
Transition rule: $\frac{\partial}{\partial t}p_\mathcal{N}=\begin{cases}-\lambda p_\mathcal{N}(t;n)&:n=0\\-\lambda p_\mathcal{N}(t;n)+\lambda p_\mathcal{N}(t;n-1)&:n>0\end{cases}$
- From the rule we can conclude that $p_\mathcal{N}(t;n)=\frac{1}{n!}(\lambda t)^n e^{-\lambda t}$
Bidirectional counter: $\mathcal{N}(t)$ is defined as one-directional. $\mathcal{N}_1(t)-\mathcal{N}_2(t)$ is called bidirectional poisson counter.
Expectation: $\mathbb{E}[\mathcal{N}(t)]=\lambda t,\;\mathbb{E}[\mathrm{d}\mathcal{N}(t)]=\lambda\mathrm{d}t$
Ito calculus for Poisson counter:
- Ito sense: $n(t)$ is a realization of poisson counter $\mathcal{N}$. $x(t)$ is a solution in Ito sense to $\mathrm{d}x(t)=F(x(t),t)\mathrm{d}t+G(x(t),t)\mathrm{d}n(t)$ if
  1. On all intervals where $n(t)$ is constant, $\dot{x}(t)=F(x(t),t)$
  2. If $n(t)$ jumps at time $t_1$, $\lim_{t\to t_1^+}x(t)=\lim_{t\to t_1^-}x(t)+G\left(\lim_{t\to t_1^+}x(t),t_1\right)$
- Ito rule: Given a fuction $\psi(\mathcal{X}(t),t)$, taking Taylor expansion we have $$\mathrm{d}\psi=\left\{\frac{\partial\psi}{\partial t}+(\nabla_\mathcal{X}\psi)F(\mathcal{X},t)\right\}\mathrm{d}t+\sum^r_{i=1}\left\{\psi\left(\mathcal{X}+G_i(\mathcal{X},t),t\right)-\psi(\mathcal{X},t)\right\}\mathrm{d}\mathcal{N}_i(t)$$

Wiener-Process

Definition (Brownian Motion): It’s a bidirectional poisson counter with infinite rate, i.e. $\mathcal{W}=\lim_{\lambda\to\infty}\frac{1}{\sqrt(\lambda)}(\mathcal{N}_1-\mathcal{N}_2)$ where $\mathcal{N}_1,\;\mathcal{N}_2$ have rate $\lambda/2$
- Characteristic: It’s a Gaussian with zero mean and variance $t$
- Note: Actual Wiener Process has stronger continuity property than Brownian motion.
Expectation: $\mathbb{E}[\mathcal{W}]=\mathbb{E}[d\mathcal{W}]=0,\;\mathbb{E}[\mathcal{W}(\tau)\mathcal{W}(t)]=\mathbb{E}[\mathcal{W}^2(\min\{t,\tau\})]=\min\{t,\tau\}$
Principle of independent increments: If the interval $[r,t), [\sigma,s)$ don’t overlap, then $\mathcal{W}(t)-\mathcal{W}(\tau)$ and $\mathcal{W}(s)-\mathcal{W}(\sigma)$ are uncorrelated.
Ito calculus for Wiener Process
- Ito rule: Given a fuction $\psi(\mathcal{X}(t),t)$, taking Taylor expansion we have $$\begin{split}\mathrm{d}\psi=\frac{\partial\psi}{\partial t}\mathrm{d}t+(\nabla_\mathcal{X}\psi)F(\mathcal{X},t)\mathrm{d}t+\sum^r_{i=1}\left(\nabla_\mathcal{X}\psi(\mathcal{X},t)\right)G_i(\mathcal{X},t)\mathrm{d}\mathcal{W}_ i \\ -\sum^r_{i=1}\frac{1}{2}G_i^\top(\mathcal{X},t)\left(\mathrm{H}_\mathcal{X}\psi(\mathcal{X},t)\right)G_i(\mathcal{X},t)\mathrm{d}t\end{split}$$
  Note $\nabla_{\mathcal{X}}=\begin{bmatrix}\frac{\partial\psi}{\partial x_1}&\cdots&\frac{\partial\psi}{\partial x_n}\end{bmatrix}$ and $\mathrm{H}_{\mathcal{X}}=\begin{bmatrix}\frac{\partial^2\psi}{\partial x_1^2}&\cdots&\frac{\partial^2\psi}{\partial x_1\partial x_n}\\ \vdots&\ddots&\vdots \\ \frac{\partial^2\psi}{\partial x_n\partial x_1}&\cdots&\frac{\partial^2\psi}{\partial x_n^2}\end{bmatrix}$ are the gradient and Hessian operator respectively. By default, the operator target is $\mathcal{X}$ is not noted.
White noise: It is Guassian distributed stationary stochastic process with $\mu_\mathcal{U}(t)=0,\;\mathrm{K}_ \mathcal{U}(t,\tau)=\Phi_ \mathcal{U}\delta(t-\tau)$, where $\Phi_\mathcal{U}$ is spectral intensity.
- We often consider white noise as derivative of Wiener process: $\mathcal{U}\sim\mathrm{d}\mathcal{W}/\mathrm{d}t$.

Linear Stochastic Sequence

$$\begin{align}\mathrm{d}\mathcal{X}(t)&=\{A(t)\mathcal{X}(t)+u(t)\}\mathrm{d}t+B(k)\mathrm{d}\mathcal{W}(t) \\ \mathcal{Y}(t)&=C(k)\mathcal{X}(k)+D(k)\mathcal{W}(k)\end{align}$$

Differential equation of expectations
- Mean: $\frac{\mathrm{d}}{\mathrm{d}t}\mu_\mathcal{X}(t)=A(t)\mu_\mathcal{X}(t)$
- Autocovariance: $\frac{\mathrm{d}}{\mathrm{d}t}S_\mathcal{X}(t)=A(t)S_\mathcal{X}(t)+S_\mathcal{X}A^\top(t)+B(t)B^\top(t)$ (called Lyapunov differential equation)
Stationary LTI case
- Useful equation: $\mathrm{R}_ \mathcal{X}(t,\tau)=\begin{cases} S_ \mathcal{X}\exp\{A^\top(\tau-t)\}&,\tau>t \\ \exp\{A(t-\tau)\}S_ \mathcal{X}&,\tau< t\end{cases}$
- (Continuous-time) algerbraic Lyapunov equation: $A\bar{S}_ \mathcal{X}+\bar{S}_\mathcal{X}A^\top+BB^\top=0$

Nonlinear Stochastic Sequence

The Fokker-Planck Equation (FPE): Consider the Wiener-process excited general SDE, we have $$\frac{\partial f_ \mathcal{X}(x;t)}{\partial t}=-\nabla\left(F(\mathcal{X})f_ \mathcal{X}(x;t)\right)+\frac{1}{2}\mathrm{tr}\left[\mathrm{H}\left(\Gamma(\mathcal{X})f_ \mathcal{X}(x;t)\right)\right]$$ here $\Gamma(\mathcal{X})=G(\mathcal{X})G^\top(\mathcal{X})$
- $\mathcal{X}$ is stationary means $\frac{\partial f_\mathcal{X}(x;t)}{\partial t}=0$
Gaussian closure: An approximate solution for FPE is supposing $f_\mathcal{X}$ as multivariate Gaussian with some $S_\mathcal{X}$ and ${\mu_\mathcal{X}}$. That is we focus on 1st-order and 2nd-order estimation. Denote the estimated distribution as $\hat{f}_\mathcal{X}(x;t)$
- Estimated expectation: $\hat{\mathbb{E}}\phi(\mathcal{X})=\int\cdots\int\phi(x)\hat{f}_\mathcal{X}(x;t)\mathrm{d}x$
- Solution:$$\begin{split}h(x,t)=-\nabla F+\tilde{x}^\top S^{-1}F-(\nabla\Gamma)S^{-1}\tilde{x}+\frac{1}{2}\mathrm{tr}\left[\mathrm{H}\Gamma \right] \\ +\frac{1}{2}\mathrm{tr}\left[\left(-S^{-1}+S^{-1}\tilde{x}\tilde{x}^\top S^{-1}\right)\Gamma\right]\end{split}$$ and $$\begin{align}\dot{\mu}_\mathcal{X}(t)&=S(t)\hat{\mathbb{E}}[\nabla^\top h(x)] \\ \dot{S} _\mathcal{X}(t)&=S(t)\hat{\mathbb{E}}[\mathrm{H} h(x)]S(t)\end{align}$$
- Useful simplification: If $G$ is constant (independent of $\mathcal{X}$), then the ODE of $\dot{\mu}$ and $\dot{S}$ become $$\begin{align}\dot{\mu}_ \mathcal{X}(t)&=\hat{\mathbb{E}}[F(\mathcal{X})] \\ \dot{S} _\mathcal{X}(t)&=\hat{A}^\top S+S\hat{A}+\Gamma\end{align}$$ where $\hat{A}=\hat{\mathbb{E}}\left[\frac{\partial F}{\partial\mathcal{X}}\right]$
  - Equivalent linearization (Quasi-linearization): In the estimation, $\mathcal{X}$ evolves equivalently to $\mathrm{d}\mathcal{X}(t)=\hat{A}(t)\mathcal{X}(t)\mathrm{d}t+G\mathrm{d}\mathcal{W}(t)$
- No guaranteed bounds generally exist for the estimation error $$e(\mathcal{X},t)=\left(\frac{\partial\hat{f}_ \mathcal{X}}{\partial t}\right)-\left(-\nabla\left(F\hat{f}_ \mathcal{X}\right)+\frac{1}{2}\mathrm{tr}\left[\mathrm{H}\left(\Gamma\hat{f}_ \mathcal{X}\right)\right]\right)$$
- Estimation of $\mu(t),\;S(t)$ could also have error. And stationarity may not be the consistent between original system and estimated system

Spectral Analysis

Content in this section comes from MECHENG 549

Power Spectral Density

Definition: The power spectral density (PSD) of stochastic process $\mathcal{Y}$ is $$\Phi_\mathcal{Y}(\omega)\equiv\mathbb{E}\left[ \lim_{T\to\infty}\frac{1}{2T}Y_T(\omega)Y_T^\top(\omega)\right]$$ where $Y_T(\omega)$ is the Fourier transform of centered process $\mathcal{Y}(t)$.
- White noise has the same PSD for whatever $\omega$
Wiener-Khinchin Theorem: $\Phi_\mathcal{Y}(\omega)=\int^\infty_{-\infty}e^{-i\omega\theta}\bar{\mathrm{K}}_\mathcal{Y}(\theta)\mathrm{d}\theta$
Signal propagation: If the system $P$ has input $\mathcal{Y}$ and output $\mathcal{Z}$, then $\Phi_\mathcal{Z}=|P(i\omega)\Phi_\mathcal{Y}(\omega)P^\top(i\omega)|$ where $P(i\omega)$ is the Fourier transform of the system.
Cross spectrum: If the system $G$ has input $\mathcal{Y}$ and output $\mathcal{Z}$, then the cross-spectrum is $\Phi_{\mathcal{ZY}}(\omega)=G(i\omega)\Phi_\mathcal{Y}(\omega)$

Periodograms

Definition: We want to estimate $\Phi_{\mathcal{Y}}(\omega)$ without the expectation, then we use $$Q_T(\omega,y)=\frac{1}{2T}\left|\int^T_{-T}e^{-i\omega t}y(t)\mathrm{d}t\right|^2$$ where $y$ is a sample realization of $\mathcal{Y}$.
- We also use window functions to calculate the spectrum over a finite time interval (using Bartlett’s procedure)

Stochastic realizations

We want to find a stochastic model which gives the same spectrum given. This is called stochastic realization problem. We focus on scalar LTI case.

If we know $P(s)=C[sI-A]^{-1}B+D$ (the Laplace transform of LTI system) and $P(s)=c\frac{\prod^m_{k=1}(s-z_k)}{\prod^n_{k=1}(s-p_k)}$ for some $m\leq n$ and real constant $c$. Then we have $$\Phi_\mathcal{Y}(\omega)=P(i\omega)P(-i\omega)=c^2\frac{\prod^m_{k=1}(\omega^2+z_k^2)}{\prod^n_{k=1}(\omega^2+p_k^2)}$$

Lemma: For any valid PSD, there exists a spectral factorization $\Phi_\mathcal{Y}(\omega)=\frac{\sum^m_{k=0}a_k(\omega^2)^k}{\sum^n_{k=0}b_k(\omega^2)^k}=P(i\omega)P(-i\omega)$ where $P(s)$ is an asymptotically stable n-th order transfer function, iff

$\Phi_\mathcal{Y}(\omega)$ is a ratio of polynomials of $\omega^2$
All coefficients $a_k$ and $b_k$ are real
The denominator has no positive real roots in $\omega^2$
Any positive real roots in the numerator have even multiplicity

Notes for math typing in Hexo
Escape \ by \\. Especially escape { by \\{ instead of \{, and escape \\ by \\\\.
Be careful about _, it’s used in markdown as italic indicator. Add space after _ is a useful solution.
Some useful Mathjax tricks at StackExchange
Several capital Greek characters should directly use its related Latin alphabet with \mathrm command.
Although I have migrated to Hugo, some tricks might still be relevant.