Probabilistic Machine Learning

Interactive notebooks accompanying the Probabilistic Machine Learning book.

3 - Multivariate Models


3.2 - Multivariate Gaussian	Marginals and Conditionals (2D)	Predicting EGFR protein levels from mRNA gene expression	$p(x_1 \mid x_2)$
	Marginals and Conditionals (5D)	Comprehensive real estate valuation with 5 correlated features	$p(\mathbf{x}_1 \mid \mathbf{x}_2)$
	Missing Value Imputation	Patient health records with missing lab results	$p(\mathbf{x}_h \mid \mathbf{x}_v)$
3.3 - Linear Gaussian System	Bayes Rule for Gaussians	Blood pressure estimation with multiple clinical devices	$p(\mathbf{z} \mid \mathbf{y}) \propto p(\mathbf{y} \mid \mathbf{z})\,p(\mathbf{z})$
	Bayes Rule with Non-Trivial W	Inferring physiological state from derived health metrics	$\mathbf{y} = W\mathbf{z} + \boldsymbol{\epsilon}$
	Inferring an Unknown Scalar	Bayesian HER2 gene expression estimation from qPCR	$p(\mu \mid \mathbf{y})$
	Inferring an Unknown Vector	Cytokine concentration estimation from noisy ELISA replicates	$p(\mathbf{z} \mid \mathbf{y})$
	Sensor Fusion	Cell state estimation combining RNA-seq, flow cytometry, and ATAC-seq	$p(\mathbf{z} \mid \mathbf{y}_1, \mathbf{y}_2, \mathbf{y}_3)$
3.5 - Mixture Models	Gaussian Mixture Models	Cell type discovery in scRNA-seq brain tissue using GMM	$p(\mathbf{x}) = \sum_k \pi_k \mathcal{N}(\mathbf{x} \mid \boldsymbol{\mu}_k, \Sigma_k)$

6 - Information Theory


6.1 - Entropy	Entropy	Cell state uncertainty in whole cell modeling: discrete, binary, joint, conditional entropy and perplexity	$H(X) = -\sum_k p_k \log p_k$
6.2 - KL Divergence	KL Divergence	Comparing gene expression distributions across healthy and diseased tissue	$D_{\text{KL}}(p \| q) = \sum_k p_k \log \frac{p_k}{q_k}$
6.3 - Mutual Information	Mutual Information	Identifying informative biomarkers for cell state classification	$I(X;Y) = H(X) - H(X \mid Y)$

7 - Linear Algebra


7.0 - Foundations	Scalar, Vector, and Matrix Products	Basic multiplication shapes and the dimension compatibility rule	$c = \mathbf{a}^T\mathbf{b},\ \mathbf{c} = A\mathbf{b},\ C = AB$
	Inner Product (Dot Product)	Measuring alignment and similarity between vectors	$\mathbf{a}^T\mathbf{b} = \sum_i a_i b_i$
	Outer Product	Creating matrices from vectors and building covariance	$\mathbf{a}\mathbf{b}^T$
	Matrix-Vector Multiplication	Geometric transformations: rotation, scaling, shearing	$\mathbf{y} = A\mathbf{x}$
	Matrix-Matrix Multiplication	Composing transformations and the associativity property	$(AB)C = A(BC)$
	Quadratic Forms	Weighted distance and Mahalanobis	$\mathbf{x}^T W \mathbf{x}$
	The A B A.T Pattern	Transforming covariance through linear maps	$\Sigma_y = A\Sigma_x A^T$
	Schur Complement	Block matrix operations and Gaussian conditioning	$M/D = A - BD^{-1}C$
7.3 - Matrix Inversion	Factor Model Covariance	Building gene expression covariance from transcription factor pathways	$\Sigma = WW^T + \Psi$
	Low-Rank Covariance Update	Why adding $XX^T$ to a covariance matrix models new pathway exposures	$\Sigma' = \Sigma + XX^T$
	Sherman-Morrison Formula	Rank-1 precision updates when discovering a single transcription factor	$(A + \mathbf{u}\mathbf{v}^T)^{-1}$
	Matrix Inversion Lemma (Woodbury)	Efficient precision matrix updates for gene regulatory network inference	$(A + UCV)^{-1}$
7.4 - Eigenvalue Decomposition	Geometry of Quadratic Forms	Ellipsoidal level sets applied to protein binding affinity in drug discovery	$\mathbf{x}^T A\mathbf{x} = c$
7.5 - Singular Value Decomposition	SVD	Gene expression profiling: discovering latent biological programs with SVD	$A = USV^T$
7.6 - Matrix Decompositions	Cholesky Sampling from MVN	Clinical trial simulation with correlated patient biomarkers	$\Sigma = LL^T$

8 - Optimization


8.1 - The EM Algorithm	Expectation-Maximization (EM)	Medical diagnosis with latent disease types using Gaussian mixtures	$\mathcal{Q}(\boldsymbol{\theta}, \boldsymbol{\theta}^{\text{old}})$
8.2 - First-Order Methods	Gradient Descent	Drug dose-response curve fitting with gradient descent, line search, and momentum	$\boldsymbol{\theta}_{t+1} = \boldsymbol{\theta}_t - \eta_t \nabla \mathcal{L}(\boldsymbol{\theta}_t)$
8.3 - Second-Order Methods	Newton, BFGS, and Trust Regions	Enzyme kinetics parameter estimation with Hessian-based optimizers	$\boldsymbol{\theta}_{t+1} = \boldsymbol{\theta}_t - \eta_t \mathbf{H}_t^{-1} \nabla \mathcal{L}(\boldsymbol{\theta}_t)$
8.4 - Stochastic Gradient Descent	SGD, Scheduling, and Adam	Predicting drug sensitivity from gene expression with adaptive optimizers	$\boldsymbol{\theta}_{t+1} = \boldsymbol{\theta}_t - \eta_t \mathbf{M}_t^{-1} \nabla \mathcal{L}(\boldsymbol{\theta}_t, z_t)$

9 - Linear Discriminant Analysis


9.2 - Gaussian Discriminant Analysis	Gaussian Discriminant Analysis	NSCLC cancer subtype classification from blood protein biomarkers	$p(y=c \mid \mathbf{x}) \propto \pi_c \mathcal{N}(\mathbf{x} \mid \boldsymbol{\mu}_c, \boldsymbol{\Sigma}_c)$
9.3 - Naive Bayes Classifiers	Naive Bayes Classifiers	Antimicrobial compound screening from molecular fingerprints	$p(\mathbf{x} \mid y=c) = \prod_d p(x_d \mid y=c, \theta_{dc})$

10 - Logistic Regression


10.2 - Binary Logistic Regression	Binary Logistic Regression	Predicting tumor drug response from gene expression biomarkers	$p(y \mid \mathbf{x}) = \sigma(\mathbf{w}^\top \mathbf{x} + b)$
10.3 - Multinomial Logistic Regression	Multinomial Logistic Regression	Classifying NSCLC tumor subtypes from gene expression biomarkers	$p(y=c \mid \mathbf{x}) = \text{softmax}(W\mathbf{x})_c$

11 - Linear Regression


11.2 - Least Squares	Least Squares Linear Regression	Predicting protein abundance from mRNA expression in whole cell modeling	$\hat{\mathbf{w}} = (\mathbf{X}^\top\mathbf{X})^{-1}\mathbf{X}^\top\mathbf{y}$
11.3 - Ridge Regression	Ridge Regression	Predicting drug sensitivity from high-dimensional gene expression profiles	$\hat{\mathbf{w}} = (\mathbf{X}^\top\mathbf{X} + \lambda\mathbf{I})^{-1}\mathbf{X}^\top\mathbf{y}$
11.4 - Lasso Regression	Lasso Regression	Identifying antibiotic resistance biomarkers from bacterial gene expression	$\hat{\mathbf{w}} = \arg\min \|\mathbf{X}\mathbf{w}-\mathbf{y}\|^2 + \lambda\|\mathbf{w}\|_1$
11.7 - Bayesian Linear Regression	Bayesian Linear Regression	Predicting protein abundance from transcriptomics with full posterior uncertainty	$p(\mathbf{w} \mid \mathcal{D}) = \mathcal{N}(\hat{\mathbf{w}}, \hat{\boldsymbol{\Sigma}})$

13 - Neural Networks for Tabular Data


13.1 - Backpropagation	Backpropagation for an MLP	Classifying bacterial antibiotic resistance from genomic features	$\boldsymbol{\delta}_2 = (U^\top \boldsymbol{\delta}_1) \odot H(\mathbf{z})$

16 - Exemplar-based Methods


16.2 - Learning Distance Metrics	Learning Distance Metrics	Drug compound similarity from molecular descriptors using LMNN, NCA, and deep metric learning	$d_M(\mathbf{x}, \mathbf{x}') = \sqrt{(\mathbf{x} - \mathbf{x}')^\top M (\mathbf{x} - \mathbf{x}')}$
16.3 - Kernel Density Estimation	Kernel Density Estimation	Non-parametric profiling of single-cell flow cytometry, T-cell classification, and dose-response regression	$p(x \mid \mathcal{D}) = \frac{1}{N}\sum_n K_h(x - x_n)$

17 - Kernel Methods


17.1 - Mercer Kernels	Mercer Kernels	Cell phenotype similarity from gene expression using RBF, Matern, ARD, and kernel combination	$\kappa(\mathbf{x}, \mathbf{x}') = \exp!\left(-\frac{\|\mathbf{x} - \mathbf{x}'\|^2}{2\ell^2}\right)$
17.2 - Gaussian Processes	Gaussian Processes	Predicting enzyme activity across temperature conditions with GP prior, posterior, and marginal likelihood	$p(\mathbf{f} \mid \mathcal{D}) = \mathcal{N}(\boldsymbol{\mu}_, \mathbf{K}{,} - \mathbf{K}_{X,}^\top \mathbf{K}\sigma^{-1} \mathbf{K}{X,})$
17.3 - Support Vector Machines	Support Vector Machines	Classifying bacterial cells as stressed vs. normal with hard/soft margin, kernel trick, and SVR	$f(\mathbf{x}) = \sum_{n \in \mathcal{S}} \alpha_n \tilde{y}_n \kappa(\mathbf{x}_n, \mathbf{x}) + \hat{w}_0$
	Kernel Ridge Regression	Predicting enzyme activity from substrate concentration using the kernel trick	$f(\mathbf{x}) = \mathbf{k}^\top(\mathbf{K} + \lambda \mathbf{I})^{-1}\mathbf{y}$

18 - Trees, Forests, Bagging, and Boosting


18.1 - Tree Ensembles for Biotech	Random Forests and Gradient Boosting	Predicting cancer cell line drug sensitivity from multi-omics (expression + mutations + tissue) with a practical tour of RF, XGBoost-style gradient boosting, feature importance, and partial dependence	$f_m(\mathbf{x}) = f_{m-1}(\mathbf{x}) + \nu F_m(\mathbf{x}; \boldsymbol{\theta}_m)$