← All topics

Probabilistic Machine Learning

Interactive notebooks accompanying the Probabilistic Machine Learning book.


3 - Multivariate Models

3.2 - Multivariate Gaussian Marginals and Conditionals (2D) Predicting EGFR protein levels from mRNA gene expression $p(x_1 \mid x_2)$
Marginals and Conditionals (5D) Comprehensive real estate valuation with 5 correlated features $p(\mathbf{x}_1 \mid \mathbf{x}_2)$
Missing Value Imputation Patient health records with missing lab results $p(\mathbf{x}_h \mid \mathbf{x}_v)$
3.3 - Linear Gaussian System Bayes Rule for Gaussians Blood pressure estimation with multiple clinical devices $p(\mathbf{z} \mid \mathbf{y}) \propto p(\mathbf{y} \mid \mathbf{z})\,p(\mathbf{z})$
Bayes Rule with Non-Trivial W Inferring physiological state from derived health metrics $\mathbf{y} = W\mathbf{z} + \boldsymbol{\epsilon}$
Inferring an Unknown Scalar Bayesian HER2 gene expression estimation from qPCR $p(\mu \mid \mathbf{y})$
Inferring an Unknown Vector Cytokine concentration estimation from noisy ELISA replicates $p(\mathbf{z} \mid \mathbf{y})$
Sensor Fusion Cell state estimation combining RNA-seq, flow cytometry, and ATAC-seq $p(\mathbf{z} \mid \mathbf{y}_1, \mathbf{y}_2, \mathbf{y}_3)$
3.5 - Mixture Models Gaussian Mixture Models Cell type discovery in scRNA-seq brain tissue using GMM $p(\mathbf{x}) = \sum_k \pi_k \mathcal{N}(\mathbf{x} \mid \boldsymbol{\mu}_k, \Sigma_k)$

6 - Information Theory

6.1 - Entropy Entropy Cell state uncertainty in whole cell modeling: discrete, binary, joint, conditional entropy and perplexity $H(X) = -\sum_k p_k \log p_k$
6.2 - KL Divergence KL Divergence Comparing gene expression distributions across healthy and diseased tissue $D_{\text{KL}}(p | q) = \sum_k p_k \log \frac{p_k}{q_k}$
6.3 - Mutual Information Mutual Information Identifying informative biomarkers for cell state classification $I(X;Y) = H(X) - H(X \mid Y)$

7 - Linear Algebra

7.0 - Foundations Scalar, Vector, and Matrix Products Basic multiplication shapes and the dimension compatibility rule $c = \mathbf{a}^T\mathbf{b},\ \mathbf{c} = A\mathbf{b},\ C = AB$
Inner Product (Dot Product) Measuring alignment and similarity between vectors $\mathbf{a}^T\mathbf{b} = \sum_i a_i b_i$
Outer Product Creating matrices from vectors and building covariance $\mathbf{a}\mathbf{b}^T$
Matrix-Vector Multiplication Geometric transformations: rotation, scaling, shearing $\mathbf{y} = A\mathbf{x}$
Matrix-Matrix Multiplication Composing transformations and the associativity property $(AB)C = A(BC)$
Quadratic Forms Weighted distance and Mahalanobis $\mathbf{x}^T W \mathbf{x}$
The A B A.T Pattern Transforming covariance through linear maps $\Sigma_y = A\Sigma_x A^T$
Schur Complement Block matrix operations and Gaussian conditioning $M/D = A - BD^{-1}C$
7.3 - Matrix Inversion Factor Model Covariance Building gene expression covariance from transcription factor pathways $\Sigma = WW^T + \Psi$
Low-Rank Covariance Update Why adding $XX^T$ to a covariance matrix models new pathway exposures $\Sigma' = \Sigma + XX^T$
Sherman-Morrison Formula Rank-1 precision updates when discovering a single transcription factor $(A + \mathbf{u}\mathbf{v}^T)^{-1}$
Matrix Inversion Lemma (Woodbury) Efficient precision matrix updates for gene regulatory network inference $(A + UCV)^{-1}$
7.4 - Eigenvalue Decomposition Geometry of Quadratic Forms Ellipsoidal level sets applied to protein binding affinity in drug discovery $\mathbf{x}^T A\mathbf{x} = c$
7.5 - Singular Value Decomposition SVD Gene expression profiling: discovering latent biological programs with SVD $A = USV^T$
7.6 - Matrix Decompositions Cholesky Sampling from MVN Clinical trial simulation with correlated patient biomarkers $\Sigma = LL^T$

8 - Optimization

8.1 - The EM Algorithm Expectation-Maximization (EM) Medical diagnosis with latent disease types using Gaussian mixtures $\mathcal{Q}(\boldsymbol{\theta}, \boldsymbol{\theta}^{\text{old}})$
8.2 - First-Order Methods Gradient Descent Drug dose-response curve fitting with gradient descent, line search, and momentum $\boldsymbol{\theta}_{t+1} = \boldsymbol{\theta}_t - \eta_t \nabla \mathcal{L}(\boldsymbol{\theta}_t)$
8.3 - Second-Order Methods Newton, BFGS, and Trust Regions Enzyme kinetics parameter estimation with Hessian-based optimizers $\boldsymbol{\theta}_{t+1} = \boldsymbol{\theta}_t - \eta_t \mathbf{H}_t^{-1} \nabla \mathcal{L}(\boldsymbol{\theta}_t)$
8.4 - Stochastic Gradient Descent SGD, Scheduling, and Adam Predicting drug sensitivity from gene expression with adaptive optimizers $\boldsymbol{\theta}_{t+1} = \boldsymbol{\theta}_t - \eta_t \mathbf{M}_t^{-1} \nabla \mathcal{L}(\boldsymbol{\theta}_t, z_t)$

9 - Linear Discriminant Analysis

9.2 - Gaussian Discriminant Analysis Gaussian Discriminant Analysis NSCLC cancer subtype classification from blood protein biomarkers $p(y=c \mid \mathbf{x}) \propto \pi_c \mathcal{N}(\mathbf{x} \mid \boldsymbol{\mu}_c, \boldsymbol{\Sigma}_c)$
9.3 - Naive Bayes Classifiers Naive Bayes Classifiers Antimicrobial compound screening from molecular fingerprints $p(\mathbf{x} \mid y=c) = \prod_d p(x_d \mid y=c, \theta_{dc})$

10 - Logistic Regression

10.2 - Binary Logistic Regression Binary Logistic Regression Predicting tumor drug response from gene expression biomarkers $p(y \mid \mathbf{x}) = \sigma(\mathbf{w}^\top \mathbf{x} + b)$
10.3 - Multinomial Logistic Regression Multinomial Logistic Regression Classifying NSCLC tumor subtypes from gene expression biomarkers $p(y=c \mid \mathbf{x}) = \text{softmax}(W\mathbf{x})_c$

11 - Linear Regression

11.2 - Least Squares Least Squares Linear Regression Predicting protein abundance from mRNA expression in whole cell modeling $\hat{\mathbf{w}} = (\mathbf{X}^\top\mathbf{X})^{-1}\mathbf{X}^\top\mathbf{y}$
11.3 - Ridge Regression Ridge Regression Predicting drug sensitivity from high-dimensional gene expression profiles $\hat{\mathbf{w}} = (\mathbf{X}^\top\mathbf{X} + \lambda\mathbf{I})^{-1}\mathbf{X}^\top\mathbf{y}$
11.4 - Lasso Regression Lasso Regression Identifying antibiotic resistance biomarkers from bacterial gene expression $\hat{\mathbf{w}} = \arg\min |\mathbf{X}\mathbf{w}-\mathbf{y}|^2 + \lambda|\mathbf{w}|_1$
11.7 - Bayesian Linear Regression Bayesian Linear Regression Predicting protein abundance from transcriptomics with full posterior uncertainty $p(\mathbf{w} \mid \mathcal{D}) = \mathcal{N}(\hat{\mathbf{w}}, \hat{\boldsymbol{\Sigma}})$

13 - Neural Networks for Tabular Data

13.1 - Backpropagation Backpropagation for an MLP Classifying bacterial antibiotic resistance from genomic features $\boldsymbol{\delta}_2 = (U^\top \boldsymbol{\delta}_1) \odot H(\mathbf{z})$

16 - Exemplar-based Methods

16.2 - Learning Distance Metrics Learning Distance Metrics Drug compound similarity from molecular descriptors using LMNN, NCA, and deep metric learning $d_M(\mathbf{x}, \mathbf{x}') = \sqrt{(\mathbf{x} - \mathbf{x}')^\top M (\mathbf{x} - \mathbf{x}')}$
16.3 - Kernel Density Estimation Kernel Density Estimation Non-parametric profiling of single-cell flow cytometry, T-cell classification, and dose-response regression $p(x \mid \mathcal{D}) = \frac{1}{N}\sum_n K_h(x - x_n)$

17 - Kernel Methods

17.1 - Mercer Kernels Mercer Kernels Cell phenotype similarity from gene expression using RBF, Matern, ARD, and kernel combination $\kappa(\mathbf{x}, \mathbf{x}') = \exp!\left(-\frac{|\mathbf{x} - \mathbf{x}'|^2}{2\ell^2}\right)$
17.2 - Gaussian Processes Gaussian Processes Predicting enzyme activity across temperature conditions with GP prior, posterior, and marginal likelihood $p(\mathbf{f} \mid \mathcal{D}) = \mathcal{N}(\boldsymbol{\mu}_, \mathbf{K}{,} - \mathbf{K}_{X,}^\top \mathbf{K}\sigma^{-1} \mathbf{K}{X,})$
17.3 - Support Vector Machines Support Vector Machines Classifying bacterial cells as stressed vs. normal with hard/soft margin, kernel trick, and SVR $f(\mathbf{x}) = \sum_{n \in \mathcal{S}} \alpha_n \tilde{y}_n \kappa(\mathbf{x}_n, \mathbf{x}) + \hat{w}_0$
Kernel Ridge Regression Predicting enzyme activity from substrate concentration using the kernel trick $f(\mathbf{x}) = \mathbf{k}^\top(\mathbf{K} + \lambda \mathbf{I})^{-1}\mathbf{y}$