0%

Combined notes for part of EECS 501 & MECHENG 549.
Prerequisites: Knowledge of Elementary Calculus, Linear Algebra and Probability

# Discrete-Time Stochastic System

## Stochastic Sequences

• Definition: Given $k\in\mathbb{K}\subseteq\mathbb{Z}$ a sequence of integers, $\mathcal{X}(k,\omega): (\Omega,\mathcal{F},\mathbb{P})\rightarrow(\mathbb{R}^n,\mathcal{F}_ \mathcal{X},\mathbb{P}_ \mathcal{X})$ is a random/stochastic sequence.
• Uncertainties: Consider a casual system $F$ relates some scalar inputs $u(k)$ to output $x(k)$
• Epistemic/Model uncertainty: $\mathcal{X}(k,\omega)=F(k,u(k),u(k-1),\ldots,\omega)$. (system is stochastic and input is deterministic).
• Aleatoric/Input uncertainty: $\mathcal{X}(k,\omega)=f(k,U(k,\omega),u(k-1,\omega),\ldots)$ (system is deterministic and input is stochastic).
• Realization: An outcome $\mathcal{X}(k,\omega)=x(k)$ given $\omega$ is called a realization of stochastic sequence $\mathcal{X}$
• Terminology and Convention
• $\mathcal{X}(k,\omega)$ is often written as $\mathcal{X}(k)$ when there’s no ambiguity.
• $\mathbb{K}=\mathbb{Z}$ if not specified.
• Sequence over a set $\mathcal{K}_1\subseteq\mathbb{K}$ are denoted $\mathcal{X}(\mathcal{K}_1)$.
• $\mathcal{X}$ denotes $\mathcal{X}(\mathbb{K})$ if not specified.
• Consecutive subsequence: $$\mathcal{X}(k:l)=\{\mathcal{X}(k),\mathcal{X}(k+1),\ldots,\mathcal{X}(l)\},;x(k:l)=\{x(k),x(k+1),\ldots,x(l)\}$$
• Abbreviations:
• SS - stochastic sequence
• IID - independent indentically distributed

## Probabilistic characterization

• Distribution and density: $$F_ \mathcal{X}\left(k:l;x(k:l)\right)\equiv\mathbb{P}((\mathcal{X}i(k)\leqslant x_i(k))\cap\cdots\cap(\mathcal{X}i(l)\leqslant x_i(l)),;i=1\ldots n)$$ $$f \mathcal{X}\left(k:l;x(k:l)\right)\equiv \frac{\partial^{n(l-k+1)}}{\partial x_1(k)\cdots\partial x_n(l)}F \mathcal{X}(k:l;x(k:l))$$
Here $k:l$ actually denotes a set of consecutive integers, it can be also changed to ordinary sets $\{k,l\}$ or single scalar $k$.
• Ensemble Average: $\mathbb{E}[\psi(\mathcal{X}(k))]$, doing summation over different realization at same time $k$
• Mean: $\mu_ \mathcal{X}(k)\equiv\mathbb{E}[\mathcal{X}(k)]=\int^\infty_{-\infty}x(k)f_ \mathcal{X}(k;x(k))\mathrm{d}x(k)$
• Conditional Mean: $\mu_ \mathcal{X}(l|k)\equiv\mathbb{E}[\mathcal{X}(l)|\mathcal{X}(k)=x(k)]$
• Time Average: $\frac{1}{2N+1}\sum^N_{k=-N}\psi(\mathcal{X}(k))$, doing summation over different time k of same realization
• Autocorrelation:
• Scalar case: $r_ \mathcal{X}(k,l)\equiv\mathbb{E}[\mathcal{X}(k)\mathcal{X}(l)]=\int^\infty_{-\infty}\int^\infty_{-\infty}x(k)x(l)f_ \mathcal{X}(k,l;x(k,l))\mathrm{d}x(k)\mathrm{d}x(l)$
• Vector case: $R_ \mathcal{X}(k,l)\equiv\mathbb{E}[\mathcal{X}(k)\mathcal{X}^\top(l)]$
• Conditional autocorrelation: $R_ \mathcal{X}(k,l|q)\equiv\mathbb{E}[\mathcal{X}(k)\mathcal{X}^\top(l)|\mathcal{X}(q)=x(q)]$
• Often we denote $C_ \mathcal{X}(k)=R_ \mathcal{X}(k,k)$
• Autocovariance:
• Scalar case: $\kappa_ \mathcal{X}(k,l)\equiv\mathbb{E}[(\mathcal{X}(k)-\mu_ \mathcal{X}(k))(\mathcal{X}(l)-\mu_ \mathcal{X}(l))]$
• Vector case: $\mathrm{K}_ \mathcal{X}(k,l)\equiv\mathbb{E}[(\mathcal{X}(k)-\mu_ \mathcal{X}(k))(\mathcal{X}(l)-\mu_ \mathcal{X}(l))^\top]$
• Conditional autocovariance: $\mathrm{K}_ \mathcal{X}(k,l|q)\equiv\mathbb{E}[(\mathcal{X}(k)-\mu_ \mathcal{X}(k|q))(\mathcal{X}(l)-\mu_ \mathcal{X}(l|q))^\top]$
• Often we denote $S_ \mathcal{X}(k|q)=\mathrm{K}_ \mathcal{X}(k,k|q)$
• Useful conclusion: $\mathrm{K}(a,b)=\mathrm{K}(b,a)^T$
• Normalized (autocorrelation coefficient): $\rho_ \mathcal{X}(k,l)\equiv\mathrm{K}_ \mathcal{X}(k,l)/\sigma^2_{\mathcal{X}(k)}\sigma^2_{\mathcal{X}(l)}$
• Strong Stationarity(aka. strict sense): (necessarily identically distributed over time) $$\forall x(k:l)\in\mathbb{R}^{n(l-k+1)},;\forall s\in\mathbb{Z},;f_ \mathcal{X}(k:l;x(k:l))=f_ \mathcal{X}(k+s:l+s;x(k:l))$$
• Weak Stationarity(aka. wide sense): $\forall k,l$ if $$\mu_ \mathcal{X}(k)=\mu_ \mathcal{X}(l);\text{and};\mathrm{K}_ \mathcal{X}(k,l)=\mathrm{K}_ \mathcal{X}(k+s,l+s)\equiv\bar{\mathrm{K}}_ \mathcal{X}(s)$$ Weak stationarity is necessary condition for stationarity. (Equal when Gaussian distributed)
• Ergodicity: $\mathcal{X}$ is called ergodic in $\psi$ if
1. $\mathbb{E}[\psi(\mathcal{X})]$ is stationary
2. Ensemble average is equal to Time average, that is $$\frac{1}{2N+1}\sum^N_{k=-N}\psi(\mathcal{X}(k))\rightarrow\mathbb{E}[\psi(\mathcal{X})];\text{as};l\rightarrow \infty$$

## Markov Sequence

• Markov Sequence: A ss. $\mathcal{X}(k)$ is called a (discrete-time) Markov sequence if $$f_ \mathcal{X}\left(k;x(k)\middle|\mathcal{X}(k-1)=x(k-1),\mathcal{X}(k-2)=x(k-2),\ldots\right)=f_ \mathcal{X}(k;x(k)|\mathcal{X}(k-1)=x(k-1))$$
• We often make some assumption on the initial condition $\mathcal{X}(0)$, such as known, deteministic or uniformly distributed within certain domain.
• Markov Chains: Markov sequence with discrete set of values(states) ${x_1\ldots x_m}$
• Hidden Markov model: Sequence $\mathcal{Y}$ is called a Hidden Markov Model if it’s modeled by a system of the form \begin{align}\mathcal{X}(k+1)&=g(k,\mathcal{X}(k),\mathcal{W}(k)) \\ \mathcal{Y}(k)&=h(k,\mathcal{X}(k),\mathcal{W}(k))\end{align}
We also say that $\mathcal{Y}$ has a (discrete-time) stochastic state space.
• Guassian-Markov Sequence (GMS): $\mathcal{X}(k+1)=g(k,\mathcal{X}(k),\mathcal{W}(k))$ where $\mathcal{W}(k)$ is iid. Guassian

## Linear Stochastic Sequence

\begin{align}\mathcal{X}(k+1)&=A(k)\mathcal{X}(k)+B(k)\mathcal{W}(k) \\ \mathcal{Y}(k)&=C(k)\mathcal{X}(k)+D(k)\mathcal{W}(k)\end{align}

For linear Markov sequences, the deterministic mean sequence and centered uncertain sequence completely decouple. So we often assume that $\mathcal{X}(k)$ and $\mathcal{Y}(k)$ are centered in this case, with regard to deterministic inputs. The equation with deterministic inputs is often written as $$\mathcal{X}(k+1)=A(k)\mathcal{X}(k)+B_u(k)u(k)+B_\mathcal{W}(k)\mathcal{W}(k)$$

• Recursive-form expectations:

• Mean: $\mu_ \mathcal{X}(k+1|q)=A(k)\mu_ \mathcal{X}(k|q)+B(k)\mu_\mathcal{W}(k)$
• Covariance (Discrete-time algebraic Lyapunov/Stein difference equation): $$S_ \mathcal{X}(k+1|q)=A(k)S_ \mathcal{X}(k|q)A^\top(k)+B(k)S_\mathcal{W}(k)B^\top(k)$$

Can be solved with dlyap in MATLAB

• Convergence when $k\rightarrow\infty$:

• Mean convergence: $\mu_ \mathcal{X}(k|q)$ converges requires $\max_i|\lambda_i(A)|<1$ ($\lambda$ denotes eigenvalue)
• Covariance convergence: $S_ \mathcal{X}(k|q)$ converges requires $\max_i|\lambda_i(A)|<1$
• (Discrete-time) Lyapunov equation (Stein equation): $\bar{S}_ \mathcal{X}=A\bar{S}_ \mathcal{X}A^\top+B\bar{S}_\mathcal{W}(k)B^\top$. Solution for this equation exists iff. $A$ is asymptotically stable (characterizing sequence is stationary).
• Explicit state transition: By recursive substitution, $$\mathcal{X}(k)=\Psi(k,q)\mathcal{X}(q)+\sum^{k-1}_ {i=q}\Gamma(k,i)\mathcal{W}(i)$$ where state transition matrix $\Psi(k,q)=\begin{cases}I, &k=q \\ \prod^{k-1}_ {i=q}A(i),& k>q\end{cases}$ and $\Gamma(k,i)=\Psi(k,i+1)B(i)$.

• Conditioned Mean sequence: $\mu_ \mathcal{X}(k|q)=\Psi(k,q)\mu_ \mathcal{X}(q)+\sum^{k-1}_ {i=q}\Gamma(k,i)\mu_\mathcal{W}(i)$
• Conditioned Autocovariance Matrix: $$\mathrm{K}_ \mathcal{X}(k,l|q)=\Psi(k,q)S_ \mathcal{X}(q)\Psi^\top(l,q)+\sum^{min\{k,l\}-1}_ {i=q}\Gamma(k,i)S_\mathcal{W}(i)\Gamma^\top(l,i)$$
• A special case: $S_ \mathcal{X}(k|q)=\mathrm{K}_ \mathcal{X}(k,k|q)=\sum^{k-1}_ {i=q}\Gamma(k,i)S_\mathcal{W}(i)\Gamma^\top(k,i)$, $S_ \mathcal{X}(k|k)=0$
• Useful equation (stationary centered case): $$\mathrm{K}_ \mathcal{X}(k,l)=\begin{cases} S_ \mathcal{X}(k)\cdot(A^\top)^{(l-k)}&,l>k\\ S_ \mathcal{X}(k)&,l=k\\ A^{(k-l)}S_ \mathcal{X}(k)&,l< k\end{cases}$$
• Conditional Autocorrelation Matrix: $R_ \mathcal{X}(k,l|q)=\mathrm{K}_ \mathcal{X}(k,l|q)+\mu_ \mathcal{X}(k|q)\mu_ \mathcal{X}^\top(l|q)$

• Observation $\mathcal{Y}$ property:

• Mean: $\mu_ \mathcal{Y}(k|q)=C(k)\mu_ \mathcal{X}(k|q)+D(k)\mu_\mathcal{W}(k)$
• Covariance: $$\mathrm{K}_ \mathcal{Y}(k,l|q)=\begin{cases}C(k)\mathrm{K}_ \mathcal{X}(k,l|q)C^\top(l)+C(k)\Gamma(k,l)S_W(l)D^\top(l)&:k>l \\C(k)S_ \mathcal{X}C^\top(k)+D(k)S_\mathcal{W}(k)D^\top(k)&:k=l\\ C(k)\mathrm{K}_ \mathcal{X}(k,l|q)C^\top(l)+D(k)S_\mathcal{W}(k)\Gamma^\top(l,k)C^\top(l)&:k< l\end{cases}$$
• Stationary time-invariant covariance: $$\mathrm{K}_ \mathcal{Y}(s)=\begin{cases}CA^{|s|}\bar{S}_ \mathcal{X}C^\top+CA^{|s|-1}B\bar{S}_ WD^\top&:s\neq 0 \\C\bar{S}_ \mathcal{X}C^\top+D\bar{S}_ WD^\top&:s=0\end{cases}$$

## Gaussian Stochastic Sequence

• Jointly Gaussian $\Rightarrow, \nLeftarrow$ Marginally Gaussian
• $c^\top X$ is Gaussian $\Leftrightarrow X$ is Gaussian
• Conditional Gaussian: if $X$ and $Y$ are Gaussian, then $X|Y \sim \mathcal{N}(\mu_{X|Y},S_{X|Y})$ where $\mu_{X|Y}=\mu_X+S_{XY}S_Y^{-1}(Y-\mu_Y)$, $S_{X|Y}=S_X-S_{XY}S_Y^{-1}S_{YX}$
• A linear controllable GMS $\mathcal{X}(k)$ is stationary iff. $A(k)=A, B(k)=B$ (time-invariant) and $A$ is asymptotically stable.
• All LTI stationary GMS are also ergodic in all finite momoents
• Solve $\mathcal{X}(k+1)=A\mathcal{X}(k)+B\mathcal{W}(k)$ when $\mathcal{X}$ is stationary ($\max_i|\lambda_i(A)|<1$)
1. solve $\bar{\mu}_ \mathcal{X}=A\bar{\mu}_ \mathcal{X}+B\bar{\mu}_\mathcal{W}$ for $\bar{\mu}$
2. solve $\bar{S}_ \mathcal{X}=A\bar{S}_ \mathcal{X}A^\top+B\bar{S}_ \mathcal{W}B^\top$ for $\bar{S}_ \mathcal{X}$
3. calculate $\Sigma_ \mathcal{X}(k:l|q)=\begin{bmatrix}\mathrm{K}_ \mathcal{X}(k,k|q)&\cdots&\mathrm{K}_ \mathcal{X}(k,l|q)\\ \vdots&\ddots&\vdots\\ \mathrm{K}_ \mathcal{X}(l,k|q)&\cdots&\mathrm{K}_ \mathcal{X}(l,l|q)\end{bmatrix}$ using $\bar{S}_ \mathcal{X}$
4. Then $f_ \mathcal{X}(k:l;x(k:l))$ is determined with $\mu_ \mathcal{X}(k:l)=\{\bar{\mu}_ \mathcal{X},\bar{\mu}_ \mathcal{X}\ldots\bar{\mu}_ \mathcal{X}\}$ and $\Sigma_ \mathcal{X}(k:l)$ above.

## Observation & Filtering

• (LTI) Luenberger observer: \begin{align}\hat{\mathcal{X}}(k+1)&=A\hat{\mathcal{X}}(k)+L(\hat{\mathcal{Y}}(k)-\mathcal{Y}(k))+B\bar{\mu}_ \mathcal{W} \\ \hat{\mathcal{Y}}(k)&=C\hat{\mathcal{X}}(k)+D\bar{\mu}_\mathcal{W}\end{align} where $L$ is the observer gain.
• Note: We often assume that the process & measurement noise are decoupled and independent
• Combined form: $\hat{\mathcal{X}}(k+1)=[A+LC]\hat{\mathcal{X}}(k)-L\mathcal{Y}(k)+B\mu_\mathcal{W}(k)$
• State estimation residual $r(k)=\mathcal{X}(k)-\hat{\mathcal{X}}(k)$
• Combined form: $r(k+1)=[A+LC]r(k)+[B+LD]\tilde{\mathcal{W}}(k)$
• Stationary covariance can be solved by a Lyapunov equation $$\bar{S}_ r=[A+LC]\bar{S}_ r[A+LC]^T+[LD+B]\bar{S}_ \mathcal{W}[LD+B]^T$$
• A common objective: minimize $\bar{S}_r=\mathbb{E}(rr^\top)$
• Solutions: $L=-A\bar{S}_ rC^\top[C\bar{S}_ rC^\top+D\bar{S}_ \mathcal{W}D^\top]^{-1}$ (Kalman observer gain)
• Or solve (Discrete-time) Algebraic Riccati equation $$\bar{S}_ r=A\bar{S}_ rA^\top+B\bar{S}_ \mathcal{W}B^\top-A\bar{S}_ rC^\top[C\bar{S}_ rC^\top+D\bar{S}_ \mathcal{W}D^\top]^{-1}C\bar{S}_ rA^\top$$
• Innovation sequence $e(k)=\hat{\mathcal{Y}}(k)-\mathcal{Y}(k)$ with $L$ is optimal
• We can find that $\mu_e=0$ and $\mathrm{K}_e(k+s,k)=\begin{cases}C\bar{S}_rC^\top+D\bar{S}_WD^T&:s=0 \\0&:s\neq0\end{cases}$. So the innovation sequence is iid. (only in Kalman observer)
• (Output) Probabilistically-equivalent model: \begin{align}\mathcal{X}(k+1)&=A\mathcal{X}(k)+Le(k) \\ \mathcal{Y}(k)&=C\mathcal{X}(k)-e(k)\end{align}

# Markov Chains

Content in this section comes from EECS 501
In this specific field, we often use stand-alone analysis methods.

## Basic definitions

• State distribution: We denote row vector $\pi_t$ as $\pi_t(x)=\mathbb{P}(\mathcal{X}_t=x),; x\in S$ ($S$ is the set of states). Directly we have $\sum_x\pi(x)=1$
• Time homogeneous: $\mathbb{P}(\mathcal{X}_ {t+1}=y|\mathcal{X}_ t=x)=\mathbb{P}(\mathcal{X}_{s+1}=y|\mathcal{X}_s=x);\forall s,t$
• One-step transition probability matrix: $P_t=[P_{xy,t}]$ where $P_{xy,t}=\mathbb{P}(\mathcal{X}_ {t+1}=y|\mathcal{X}_ t=x)$.
• Time-homo case: $P=[P_{xy}]$ where $P_{xy}=p(y|x)$
• Rows of the matrix sum up to 1.
• This matrix is also called stochastic matrix.
• Extend this matrix to continuous states, then we use transition kernel $T(x,y)$ to describe the transition probability,
• m-step transition probability matrix: $P_{xy,t}^{(m)}=\mathbb{P}(\mathcal{X}_ {t+m}=y|\mathcal{X}_ t=x)$
• Chapman-Kolmogorov Equation: $P^{(n+m)}_t=P^{(n)}_t P^{(m)}_t$

## State Variables

• Hitting time: $T_1(y)=\min\{n\geq 0:\mathcal{X}n=y\},;T_k(y)=\min\{n>T{k-1}(y):\mathcal{X}_n=y\}$ where $n\in\mathbb{N}$
• Period: For state $i\in x$, its period is the greatest common divisor of $\{n>1|T_n(i)>0\}$. A Markov Chain is aperiodic If all states have period 1.
• Return probability: $f_{xy}=\mathbb{P}(T_1(y)<\infty|\mathcal{X}_0=x)$
• Occupation time: $V(y)=\sum^\infty_{n=1}\unicode{x1D7D9}_{\mathcal{X_n}}(y)$
• Some properties
• $f_{xy}=\mathbb{P}(V(y)\geqslant 1|\mathcal{X}_0=x)$
• $\mathbb{P}(V(y)=m|\mathcal{X}_ 0=x)=\begin{cases} 1-f_{xy}&:m=0\\f_{xy}f_{yy}^{m-1}(1-f_{yy})&:m\geqslant 1\end{cases}$
• $\mathbb{E}[ V(y)|\mathcal{X}_ 0=x]=\begin{cases} 0&:f_{xy}=0\\\infty&:f_{xy}>0,\ f_{yy}=1\\f_{xy}/(1-f_{yy})&:f_{xy}>0,\ f_{yy}<1\end{cases}$
• $\mathbb{E}[ V(y)|\mathcal{X}_ 0=x]=\sum^\infty_{n=1}P^{(n)}_{xy}$

## State Classification

Here we usually consider only time-homogeneous Markov Chains

• Accessible ($x\rightarrow y$): $\exists n;\text{s.t.}\ P_{xy}^{(n)}>0$
• $x\rightarrow y \Leftrightarrow f_{xy}>0$
• Communicate ($x\leftrightarrow y$): $x\rightarrow y;\text{and};y\rightarrow x$. This is a equivalence relation.
• Equivalent class: set of states that communicate with each other
• Irreducible: a Markov chain with only one communicating class
• Absorbing/Closed state: $P_{xx}=1$
• Absorbing class: set $C$ is absorbing iff $\forall x \in C, \sum_{y\in C}P_{xy}=1$
• Transient state: $f_{xx}<1 \Leftrightarrow \mathbb{E}[V_i|\mathcal{X}_0=i]<\infty$
• Recurrent state: $f_{xx}=1 \Leftrightarrow \mathbb{E}[V_i|\mathcal{X}_0=i]=\infty$
• Positive recurrent: $\mathbb{E}[T_1(x)|\mathcal{X}_0=x]<0$
• Null recurrent: $\mathbb{E}[T_1(x)|\mathcal{X}_0=x]=0$
• These two kind of recurrent states also make up communicating classes
• Some properties
• If $x$ is (positive) recurrent and $x\rightarrow y$, then $y$ is also (positive) recurrent and $f_{xy}=f_{yx}=1$
• Every closed and finite subset of $X$ contains at least one (positive) recurrent state.
• All states of a communicating class are either positive recurrent, null recurrent or transient.
• Method to determine whether class $C$ is recurrent/transient
1. If $C$ is non-closed, it’s transient
2. If $C$ is closed and finite, then $C$ is positive recurrent
3. If $C$ is closed and infinite, then $C$ can be either positive/null recurrent or transient

A typical example is the birth-death chain

• Stationary distribution: $\bar{\pi};\text{s.t.};\bar{\pi}=\bar{\pi} P$
• Stationary in limit form: $\bar{\pi}=\lim_{n\rightarrow \infty} \frac{1}{n}\sum^{n-1}_{t=0} \pi_t$
• This is a Cesaro limit, we use this to deal with the problem that $\lim_{n\rightarrow \infty} \pi_t$ might not exist if the chain is periodic.
• Reversibility criterion for stationary: $\forall i,j \in S, \pi_i P_ij = \pi_j P_ji$ (a.k.a detailed balance condition), then the process is reversible and therefore stationary.
• Existence for stationary distribution satisfying $\bar{\pi}=\bar{\pi} P$
1. If the chain has single positive recurrent class, then exists unique solution: $\bar{\pi}(x)=0$ for all transient or null recurrent $x$
2. If the chain has multiple positive recurrent class, then there are multiple solutions, for each positive recurrent $x$ we have a $\bar{\pi}^i$.
3. If the chain has only transient and null recurrent states, there is no solution$• Convergence of stationary distribution • If the chain has positive recurrent class, then$\frac{1}{n}\sum^n_{t=1}\mathbb{P}(\mathcal{X}_t=j)\xrightarrow[n\rightarrow \infty]{}\bar{\pi}_j,;\forall \mu_0$• (Ergodic) If the chain is positive recurrent, then$\frac{1}{n}\sum^n_{t=1}\unicode{x1D7D9}_{\mathcal{X}_t}(j)\xrightarrow[n\rightarrow \infty]{\text{a.s.}}\bar{\pi}_j$• If the chain is positive recurrent and aperiodic, then$\mathbb{P}(\mathcal{X}_n=j)\xrightarrow[n\rightarrow \infty]{}\bar{\pi}_j$# Continuous-Time Stochastic System ## Stochastic Sequences • Definition: Given$t\in\mathbb{T}\subset\mathbb{R}$a sequence of time,$\mathcal{X}(t,\omega): (\Omega,\mathcal{F},\mathbb{P})\rightarrow(\mathbb{R}^n,\mathcal{F}_ \mathcal{X},\mathbb{P}_ \mathcal{X})$is a continuous stochastic sequence. • Most definitions are similar to discrete sequence (including stationarity), while the sequence is often defined as$\mathcal{X}(\mathcal{G}),;\mathcal{G}={t_1,t_2,\ldots,t_N}\subset\mathbb{T}, t_i< t_{i+1}$• Stochastic Differential Equation (SDE): $$\mathrm{d}\mathcal{X}(t)=F(\mathcal{X}(t),t)\mathrm{d}t+\sum^r_{i=1}G_i(\mathcal{X}(t),t)\mathrm{d}\mathcal{W}_i(t)$$ Here$\mathcal{W}$is often a certain kind of noise random process. ## Markov Sequence • Markov Sequence: A ss.$\mathcal{X}(k)$is called a (continuous-time) Markov sequence if for the set of times$\mathcal{G}={t_1,t_2,\ldots,t_N}$with$t_i < t_{i+1}, we have $$f_ \mathcal{X}\left(t_N;x(t_N)\middle|\mathcal{X}(t_{N-1})=x(t_{N-1}),\mathcal{X}(t_{N-2})=x(t_{N-2}),\ldots\right)=f_ \mathcal{X}(t_N;x(t_N)|\mathcal{X}(t_{N-1})=x(t_{N-1}))$$ • Hidden Markov Model: Definition similar to discrete case. A continuous-time stochastic state space is of the form \begin{align}\dot{\mathcal{X}}(t)&=F(\mathcal{X},t)+G(\mathcal{X},t)\mathrm{d}\mathcal{W}(t)/\mathrm{d}t \\ \mathcal{Y}(t)&=H(\mathcal{X},t)+J(\mathcal{X},t)\mathrm{d}\mathcal{W}(t)/\mathrm{d}t\end{align} where\mathcal{W}$is usually Wiener process and white noise$\mathrm{d}\mathcal{W}(t)/\mathrm{d}t$is often written as$\mathcal{U}(t)$. • The state space is affine if$F(\mathcal{X}(t),t)=A(t)\mathcal{X}(t)+u(t),;N(\mathcal{X},t)=C(t)\mathcal{X}(t)+v(t)$• The state space is bilinear if$G(\mathcal{X},t)\mathrm{d}\mathcal{W}(t)=\sum^r_{i=1}B_i(t)\mathcal{X}\mathrm{d}\mathcal{W}_i(t)$## Poisson Counters • Definition: It’s a stochastic Markow process$\mathcal{N}(t)\in\{0,1,2,\ldots\}$with the characteristic that it jumps up by only one integer at a time. • Characteristics: transition times are random, transition step is always 1. • Transition rule:$\frac{\partial}{\partial t}p_\mathcal{N}=\begin{cases}-\lambda p_\mathcal{N}(t;n)&:n=0\\-\lambda p_\mathcal{N}(t;n)+\lambda p_\mathcal{N}(t;n-1)&:n>0\end{cases}$• From the rule we can conclude that$p_\mathcal{N}(t;n)=\frac{1}{n!}(\lambda t)^n e^{-\lambda t}$• Bidirectional counter:$\mathcal{N}(t)$is defined as one-directional.$\mathcal{N}_1(t)-\mathcal{N}_2(t)$is called bidirectional poisson counter. • Expectation:$\mathbb{E}[\mathcal{N}(t)]=\lambda t,;\mathbb{E}[\mathrm{d}\mathcal{N}(t)]=\lambda\mathrm{d}t$• Ito calculus for Poisson counter: • Ito sense:$n(t)$is a realization of poisson counter$\mathcal{N}$.$x(t)$is a solution in Ito sense to$\mathrm{d}x(t)=F(x(t),t)\mathrm{d}t+G(x(t),t)\mathrm{d}n(t)$if 1. On all intervals where$n(t)$is constant,$\dot{x}(t)=F(x(t),t)$2. If$n(t)$jumps at time$t_1$,$\lim_{t\rightarrow t_1^+}x(t)=\lim_{t\rightarrow t_1^-}x(t)+G\left(\lim_{t\rightarrow t_1^+}x(t),t_1\right)$• Ito rule: Given a fuction$\psi(\mathcal{X}(t),t)$, taking Taylor expansion we have $$\mathrm{d}\psi=\left\{\frac{\partial\psi}{\partial t}+(\nabla_\mathcal{X}\psi)F(\mathcal{X},t)\right\}\mathrm{d}t+\sum^r_{i=1}\left\{\psi\left(\mathcal{X}+G_i(\mathcal{X},t),t\right)-\psi(\mathcal{X},t)\right\}\mathrm{d}\mathcal{N}_i(t)$$ ## Wiener-Process • Definition (Brownian Motion): It’s a bidirectional poisson counter with infinite rate, i.e.$\mathcal{W}=\lim_{\lambda\rightarrow\infty}\frac{1}{\sqrt(\lambda)}(\mathcal{N}_1-\mathcal{N}_2)$where$\mathcal{N}_1,;\mathcal{N}_2$have rate$\lambda/2$• Characteristic: It’s a Gaussian with zero mean and variance$t$• Note: Actual Wiener Process has stronger continuity property than Brownian motion. • Expectation:$\mathbb{E}[\mathcal{W}]=\mathbb{E}[d\mathcal{W}]=0,;\mathbb{E}[\mathcal{W}(\tau)\mathcal{W}(t)]=\mathbb{E}[\mathcal{W}^2(\min\{t,\tau\})]=\min\{t,\tau\}$• Principle of independent increments: If the interval$[r,t), [\sigma,s)$don’t overlap, then$\mathcal{W}(t)-\mathcal{W}(\tau)$and$\mathcal{W}(s)-\mathcal{W}(\sigma)$are uncorrelated. • Ito calculus for Wiener Process • Ito rule: Given a fuction$\psi(\mathcal{X}(t),t)$, taking Taylor expansion we have $$\mathrm{d}\psi=\frac{\partial\psi}{\partial t}\mathrm{d}t+(\nabla_\mathcal{X}\psi)F(\mathcal{X},t)\mathrm{d}t+\sum^r_{i=1}\left(\nabla_\mathcal{X}\psi(\mathcal{X},t)\right)G_i(\mathcal{X},t)\mathrm{d}\mathcal{W}_ i-\sum^r_{i=1}\frac{1}{2}G_i^\top(\mathcal{X},t)\left(\mathrm{H}_\mathcal{X}\psi(\mathcal{X},t)\right)G_i(\mathcal{X},t)\mathrm{d}t$$ Note that gradient$\nabla_{\mathcal{X}}=\begin{bmatrix}\frac{\partial\psi}{\partial x_1}&\cdots&\frac{\partial\psi}{\partial x_n}\end{bmatrix}$, Hessian$\mathrm{H}_{\mathcal{X}}=\begin{bmatrix}\frac{\partial^2\psi}{\partial x_1^2}&\cdots&\frac{\partial^2\psi}{\partial x_1\partial x_n}\\ \vdots&\ddots&\vdots \\ \frac{\partial^2\psi}{\partial x_n\partial x_1}&\cdots&\frac{\partial^2\psi}{\partial x_n^2}\end{bmatrix}$. By default, the operator target is$\mathcal{X}$is not noted. • White noise: It is Guassian distributed stationary stochastic process with$\mu_\mathcal{U}(t)=0,;\mathrm{K}_ \mathcal{U}(t,\tau)=\Phi_ \mathcal{U}\delta(t-\tau)$, where$\Phi_\mathcal{U}$is spectral intensity. • We often consider white noise as derivative of Wiener process:$\mathcal{U}\sim\mathrm{d}\mathcal{W}/\mathrm{d}t. ## Linear Stochastic Sequence \begin{align}\mathrm{d}\mathcal{X}(t)&=\{A(t)\mathcal{X}(t)+u(t)\}\mathrm{d}t+B(k)\mathrm{d}\mathcal{W}(t) \\ \mathcal{Y}(t)&=C(k)\mathcal{X}(k)+D(k)\mathcal{W}(k)\end{align} • Differential equation of expectations • Mean:\frac{\mathrm{d}}{\mathrm{d}t}\mu_\mathcal{X}(t)=A(t)\mu_\mathcal{X}(t)$• Autocovariance:$\frac{\mathrm{d}}{\mathrm{d}t}S_\mathcal{X}(t)=A(t)S_\mathcal{X}(t)+S_\mathcal{X}A^\top(t)+B(t)B^\top(t)$(called Lyapunov differential equation) • Stationary LTI case • Useful equation:$\mathrm{R}_ \mathcal{X}(t,\tau)=\begin{cases} S_ \mathcal{X}\exp\{A^\top(\tau-t)\}&,\tau>t \\ \exp\{A(t-\tau)\}S_ \mathcal{X}&,\tau< t\end{cases}$• (Continuous-time) algerbraic Lyapunov equation:$A\bar{S}_ \mathcal{X}+\bar{S}_\mathcal{X}A^\top+BB^\top=0$## Nonlinear Stochastic Sequence • The Fokker-Planck Equation (FPE): Consider the Wiener-process excited general SDE, we have $$\frac{\partial f_ \mathcal{X}(x;t)}{\partial t}=-\nabla\left(F(\mathcal{X})f_ \mathcal{X}(x;t)\right)+\frac{1}{2}\text{tr}\left[\mathrm{H}\left(G(\mathcal{X})G^\top(\mathcal{X})f_ \mathcal{X}(x;t)\right)\right]$$ •$\mathcal{X}$is stationary means$\frac{\partial f_\mathcal{X}(x;t)}{\partial t}=0$• Gaussian closure: An approximate solution for FPE is supposing$f_\mathcal{X}$as multivariate Gaussian with some$S_\mathcal{X}$and${\mu_\mathcal{X}}$. That is we focus on 1st-order and 2nd-order estimation. Denote the estimated distribution as$\hat{f}_\mathcal{X}(x;t)$• Estimated expectation:$\hat{\mathbb{E}}\phi(\mathcal{X})=\int\cdots\int\phi(x)\hat{f}_\mathcal{X}(x;t)\mathrm{d}x$• Solution:$h(x,t)=-\nabla F+\tilde{x}^\top S^{-1}F-\nabla(GG^\top)S^{-1}\tilde{x}+\frac{1}{2}\text{tr}\left[\mathrm{H}(GG^\top) \right]+\frac{1}{2}\text{tr}\left[(-S^{-1}+S^{-1}\tilde{x}\tilde{x}^\top S^{-1})GG^\top\right], \begin{align}\dot{\mu}_\mathcal{X}(t)&=S(t)\hat{\mathbb{E}}[\nabla^\top h(x)] \\ \dot{S} _\mathcal{X}(t)&=S(t)\hat{\mathbb{E}}[\mathrm{H} h(x)]S(t)\end{align} • Useful simplification: IfG$is constant (independent of$\mathcal{X}$), then the ODE of$\dot{\mu}$and$\dot{S}become \begin{align}\dot{\mu}_ \mathcal{X}(t)&=\hat{\mathbb{E}}[F(\mathcal{X})] \\ \dot{S} _\mathcal{X}(t)&=\hat{A}^\top S+S\hat{A}+GG^\top\end{align} where\hat{A}=\hat{\mathbb{E}}\left[\frac{\partial F}{\partial\mathcal{X}}\right]$• Equivalent linearization (Quasi-linearization): In the estimation,$\mathcal{X}$evolves equivalently to$\mathrm{d}\mathcal{X}(t)=\hat{A}(t)\mathcal{X}(t)\mathrm{d}t+G\mathrm{d}\mathcal{W}(t)$• No guaranteed bounds generally exist for the estimation error $$e(\mathcal{X},t)=\left(\frac{\partial\hat{f}_ \mathcal{X}}{\partial t}\right)-\left(-\nabla\left(F\hat{f}_ \mathcal{X}\right)+\frac{1}{2}\text{tr}\left[\mathrm{H}\left(GG^\top\hat{f}_ \mathcal{X}\right)\right]\right)$$ • Estimation of$\mu(t),;S(t)$could also have error. And stationarity may not be the consistent between original system and estimated system # Spectral Analysis Content in this section comes from MECHENG 549 ## Power Spectral Density • Definition: The power spectral density (PSD) of stochastic process$\mathcal{Y}$is $$\Phi_\mathcal{Y}(\omega)\equiv\mathbb{E}\left[ \lim_{T\rightarrow\infty}\frac{1}{2T}Y_T(\omega)Y_T^\top(\omega)\right]$$ where$Y_T(\omega)$is the Fourier transform of centered process$\mathcal{Y}(t)$. • White noise has the same PSD for whatever$\omega$• Wiener-Khinchin Theorem:$\Phi_\mathcal{Y}(\omega)=\int^\infty_{-\infty}e^{-i\omega\theta}\bar{\mathrm{K}}_\mathcal{Y}(\theta)\mathrm{d}\theta$• Signal propagation: If the system$P$has input$\mathcal{Y}$and output$\mathcal{Z}$, then$\Phi_\mathcal{Z}=|P(i\omega)\Phi_\mathcal{Y}(\omega)P^\top(i\omega)|$where$P(i\omega)$is the Fourier transform of the system. • Cross spectrum: If the system$G$has input$\mathcal{Y}$and output$\mathcal{Z}$, then the cross-spectrum is$\Phi_{\mathcal{ZY}}(\omega)=G(i\omega)\Phi_\mathcal{Y}(\omega)$## Periodograms • Definition: We want to estimate$\Phi_{\mathcal{Y}}(\omega)$without the expectation, then we use $$Q_T(\omega,y)=\frac{1}{2T}\left|\int^T_{-T}e^{-i\omega t}y(t)\mathrm{d}t\right|^2$$ where$y$is a sample realization of$\mathcal{Y}$. • We also use window functions to calculate the spectrum over a finite time interval (using Bartlett’s procedure) ## Stochastic realizations We want to find a stochastic model which gives the same spectrum given. This is called stochastic realization problem. We focus on scalar LTI case. If we know$P(s)=C[sI-A]^{-1}B+D$(the Laplace transform of LTI system) and$P(s)=c\frac{\prod^m_{k=1}(s-z_k)}{\prod^n_{k=1}(s-p_k)}$for some$m\leq n$and real constant$c$. Then we have $$\Phi_\mathcal{Y}(\omega)=P(i\omega)P(-i\omega)=c^2\frac{\prod^m_{k=1}(\omega^2+z_k^2)}{\prod^n_{k=1}(\omega^2+p_k^2)}$$. Lemma: For any valid PSD, there exists a spectral factorization$\Phi_\mathcal{Y}(\omega)=\frac{\sum^m_{k=0}a_k(\omega^2)^k}{\sum^n_{k=0}b_k(\omega^2)^k}=P(i\omega)P(-i\omega)$where$P(s)$is an asymptotically stable n-th order transfer function, iff 1.$\Phi_\mathcal{Y}(\omega)$is a ratio of polynomials of$\omega^2$2. All coefficients$a_k$and$b_k$are real 3. The denominator has no positive real roots in$\omega^2\$
4. Any positive real roots in the numerator have even multiplicity

# Notes for math typing in Hexo

1. Escape \ by \\. Especially escape { by \\{ instead of \{, and escape \\ by \\\\.
2. Be careful about _, it’s used in markdown as italic indicator. Add space after _ is a useful solution.
3. Some useful Mathjax tricks at StackExchange
4. Several capital Greek characters should directly use its related Latin alphabet with \mathrm command.
Treat me some coffee XD