Schur Complement: Conditioning and Block Matrices¶
The Schur complement is what remains when we "remove" the influence of one part of a system from another. For a block matrix:
$$\mathbf{M} = \begin{pmatrix} \mathbf{A} & \mathbf{B} \\ \mathbf{C} & \mathbf{D} \end{pmatrix}$$
The Schur complement of $\mathbf{D}$ is:
$$\mathbf{M}/\mathbf{D} = \mathbf{A} - \mathbf{B}\mathbf{D}^{-1}\mathbf{C}$$
Why It Matters¶
- Gaussian conditioning: Conditional covariance = Schur complement
- Block matrix inversion: Key component of the formula
- Information geometry: "Information about A after observing D"
- Sparse systems: Enables efficient computation
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.patches import Ellipse
np.random.seed(42)
plt.rcParams['figure.figsize'] = (12, 6)
plt.rcParams['font.size'] = 11
1. The Fundamental Application: Gaussian Conditionals¶
For a joint Gaussian:
$$\begin{pmatrix} \mathbf{x}_1 \\ \mathbf{x}_2 \end{pmatrix} \sim N\left( \begin{pmatrix} \boldsymbol{\mu}_1 \\ \boldsymbol{\mu}_2 \end{pmatrix}, \begin{pmatrix} \mathbf{\Sigma}_{11} & \mathbf{\Sigma}_{12} \\ \mathbf{\Sigma}_{21} & \mathbf{\Sigma}_{22} \end{pmatrix} \right)$$
The conditional distribution $\mathbf{x}_1 | \mathbf{x}_2$ has:
$$\text{Cov}(\mathbf{x}_1 | \mathbf{x}_2) = \mathbf{\Sigma}_{11} - \mathbf{\Sigma}_{12} \mathbf{\Sigma}_{22}^{-1} \mathbf{\Sigma}_{21}$$
This is exactly the Schur complement!
# Simple 2D example
# Joint distribution of (x1, x2) with correlation
rho = 0.8 # Correlation
Sigma = np.array([[1, rho],
[rho, 1]])
# Extract blocks (scalars in 2D case)
Sigma_11 = Sigma[0, 0]
Sigma_12 = Sigma[0, 1]
Sigma_21 = Sigma[1, 0]
Sigma_22 = Sigma[1, 1]
# Schur complement = conditional variance
Sigma_cond = Sigma_11 - Sigma_12 * (1/Sigma_22) * Sigma_21
print("Conditional Variance via Schur Complement")
print("="*50)
print(f"\nJoint covariance Σ:")
print(Sigma)
print(f"\nMarginal variance of x₁: Σ₁₁ = {Sigma_11}")
print(f"Correlation: ρ = {rho}")
print(f"\nSchur complement (conditional variance):")
print(f" Σ₁₁ - Σ₁₂ Σ₂₂⁻¹ Σ₂₁ = {Sigma_11} - {Sigma_12}² / {Sigma_22}")
print(f" = {Sigma_11} - {Sigma_12**2}")
print(f" = {Sigma_cond}")
print(f"\nVariance reduction: {(1 - Sigma_cond/Sigma_11)*100:.1f}%")
Conditional Variance via Schur Complement
==================================================
Joint covariance Σ:
[[1. 0.8]
[0.8 1. ]]
Marginal variance of x₁: Σ₁₁ = 1.0
Correlation: ρ = 0.8
Schur complement (conditional variance):
Σ₁₁ - Σ₁₂ Σ₂₂⁻¹ Σ₂₁ = 1.0 - 0.8² / 1.0
= 1.0 - 0.6400000000000001
= 0.3599999999999999
Variance reduction: 64.0%
# Visualize: Marginal vs Conditional
np.random.seed(42)
n_samples = 1000
samples = np.random.multivariate_normal([0, 0], Sigma, n_samples)
# Condition on x2 = 1
x2_observed = 1.0
x1_conditional_mean = Sigma_12 / Sigma_22 * x2_observed
x1_conditional_std = np.sqrt(Sigma_cond)
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
# Left: Joint distribution with conditioning line
ax1 = axes[0]
ax1.scatter(samples[:, 0], samples[:, 1], alpha=0.3, s=15, c='blue')
ax1.axhline(y=x2_observed, color='red', linestyle='--', linewidth=2, label=f'$x_2 = {x2_observed}$')
ax1.scatter([x1_conditional_mean], [x2_observed], c='green', s=200, marker='*',
zorder=5, label=f'Conditional mean')
# Show conditional uncertainty as horizontal bar
ax1.plot([x1_conditional_mean - 2*x1_conditional_std, x1_conditional_mean + 2*x1_conditional_std],
[x2_observed, x2_observed], 'g-', linewidth=4, alpha=0.5)
ax1.set_xlabel('$x_1$')
ax1.set_ylabel('$x_2$')
ax1.set_title('Joint Distribution with Conditioning', fontsize=12)
ax1.legend(loc='upper left')
ax1.grid(True, alpha=0.3)
ax1.set_xlim(-4, 4)
ax1.set_ylim(-4, 4)
# Right: Marginal vs Conditional of x1
ax2 = axes[1]
x_range = np.linspace(-4, 4, 200)
# Marginal distribution of x1
marginal_pdf = 1/np.sqrt(2*np.pi*Sigma_11) * np.exp(-x_range**2 / (2*Sigma_11))
ax2.fill_between(x_range, marginal_pdf, alpha=0.3, color='blue', label=f'Marginal $p(x_1)$, σ={np.sqrt(Sigma_11):.2f}')
ax2.plot(x_range, marginal_pdf, 'b-', linewidth=2)
# Conditional distribution of x1 | x2
conditional_pdf = 1/np.sqrt(2*np.pi*Sigma_cond) * np.exp(-(x_range - x1_conditional_mean)**2 / (2*Sigma_cond))
ax2.fill_between(x_range, conditional_pdf, alpha=0.3, color='red',
label=f'Conditional $p(x_1 | x_2={x2_observed})$, σ={x1_conditional_std:.2f}')
ax2.plot(x_range, conditional_pdf, 'r-', linewidth=2)
ax2.axvline(x=0, color='blue', linestyle=':', alpha=0.7, label='Marginal mean')
ax2.axvline(x=x1_conditional_mean, color='red', linestyle=':', alpha=0.7, label=f'Conditional mean')
ax2.set_xlabel('$x_1$')
ax2.set_ylabel('Density')
ax2.set_title(f'Schur Complement = Conditional Variance\n(ρ = {rho})', fontsize=12)
ax2.legend()
ax2.grid(True, alpha=0.3)
plt.suptitle(f'Conditioning Reduces Uncertainty: Var = {Sigma_11} → {Sigma_cond:.2f}', fontsize=14)
plt.tight_layout()
plt.show()
Real-World Scenario: Medical Diagnosis¶
Scenario: A doctor is estimating a patient's blood pressure. They have:
- Prior uncertainty about blood pressure (from population statistics)
- A correlated measurement: heart rate
Knowing the patient's heart rate reduces uncertainty about blood pressure. The Schur complement tells us by how much!
# Medical example: Blood pressure and heart rate
# Joint distribution based on population statistics
# Means
mu_bp = 120 # Systolic BP in mmHg
mu_hr = 72 # Heart rate in BPM
# Standard deviations
sigma_bp = 15 # BP varies quite a bit
sigma_hr = 10 # Heart rate too
# Correlation (BP and HR are positively correlated)
rho = 0.6
# Build covariance matrix
Sigma = np.array([
[sigma_bp**2, rho*sigma_bp*sigma_hr],
[rho*sigma_bp*sigma_hr, sigma_hr**2]
])
print("Medical Diagnosis: Estimating Blood Pressure")
print("="*50)
print(f"\nPopulation statistics (prior):")
print(f" Blood pressure: {mu_bp} ± {sigma_bp} mmHg")
print(f" Heart rate: {mu_hr} ± {sigma_hr} BPM")
print(f" Correlation: ρ = {rho}")
print(f"\nCovariance matrix:")
print(Sigma)
Medical Diagnosis: Estimating Blood Pressure ================================================== Population statistics (prior): Blood pressure: 120 ± 15 mmHg Heart rate: 72 ± 10 BPM Correlation: ρ = 0.6 Covariance matrix: [[225. 90.] [ 90. 100.]]
# Patient comes in with heart rate = 85 BPM (elevated)
observed_hr = 85
# Conditional distribution using Schur complement
Sigma_11 = Sigma[0, 0] # Var(BP)
Sigma_12 = Sigma[0, 1] # Cov(BP, HR)
Sigma_22 = Sigma[1, 1] # Var(HR)
# Conditional mean: E[BP | HR]
conditional_mean_bp = mu_bp + Sigma_12 / Sigma_22 * (observed_hr - mu_hr)
# Conditional variance: Schur complement
conditional_var_bp = Sigma_11 - Sigma_12**2 / Sigma_22
conditional_std_bp = np.sqrt(conditional_var_bp)
print(f"\nPatient observation: Heart rate = {observed_hr} BPM")
print(f"\nUsing Schur complement for conditional distribution:")
print(f"\n E[BP | HR={observed_hr}] = μ_BP + (Σ_12/Σ_22)(HR - μ_HR)")
print(f" = {mu_bp} + ({Sigma_12}/{Sigma_22})({observed_hr} - {mu_hr})")
print(f" = {mu_bp} + {Sigma_12/Sigma_22:.2f} × {observed_hr - mu_hr}")
print(f" = {conditional_mean_bp:.1f} mmHg")
print(f"\n Var[BP | HR] = Σ_11 - Σ_12²/Σ_22 (Schur complement)")
print(f" = {Sigma_11} - {Sigma_12**2}/{Sigma_22}")
print(f" = {conditional_var_bp:.1f}")
print(f" Std[BP | HR] = {conditional_std_bp:.1f} mmHg")
print(f"\nSummary:")
print(f" Prior (no HR info): BP = {mu_bp} ± {sigma_bp} mmHg")
print(f" Posterior (HR known): BP = {conditional_mean_bp:.1f} ± {conditional_std_bp:.1f} mmHg")
print(f"\n Uncertainty reduced by {(1 - conditional_std_bp/sigma_bp)*100:.1f}%")
Patient observation: Heart rate = 85 BPM
Using Schur complement for conditional distribution:
E[BP | HR=85] = μ_BP + (Σ_12/Σ_22)(HR - μ_HR)
= 120 + (90.0/100.0)(85 - 72)
= 120 + 0.90 × 13
= 131.7 mmHg
Var[BP | HR] = Σ_11 - Σ_12²/Σ_22 (Schur complement)
= 225.0 - 8100.0/100.0
= 144.0
Std[BP | HR] = 12.0 mmHg
Summary:
Prior (no HR info): BP = 120 ± 15 mmHg
Posterior (HR known): BP = 131.7 ± 12.0 mmHg
Uncertainty reduced by 20.0%
# Visualize
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
# Generate samples from joint
np.random.seed(42)
samples = np.random.multivariate_normal([mu_bp, mu_hr], Sigma, 500)
# Left: Joint distribution
ax1 = axes[0]
ax1.scatter(samples[:, 0], samples[:, 1], alpha=0.4, s=20, c='blue')
ax1.axhline(y=observed_hr, color='red', linestyle='--', linewidth=2,
label=f'Observed HR = {observed_hr}')
ax1.scatter([conditional_mean_bp], [observed_hr], c='green', s=200, marker='*',
zorder=5, label=f'E[BP|HR] = {conditional_mean_bp:.1f}')
# Show 2σ interval
ax1.plot([conditional_mean_bp - 2*conditional_std_bp, conditional_mean_bp + 2*conditional_std_bp],
[observed_hr, observed_hr], 'g-', linewidth=4, alpha=0.6, label='95% CI')
ax1.set_xlabel('Blood Pressure (mmHg)')
ax1.set_ylabel('Heart Rate (BPM)')
ax1.set_title('Joint Distribution: BP and HR', fontsize=12)
ax1.legend(loc='upper left')
ax1.grid(True, alpha=0.3)
# Right: Prior vs Posterior for BP
ax2 = axes[1]
bp_range = np.linspace(80, 160, 200)
# Prior
prior_pdf = 1/np.sqrt(2*np.pi*sigma_bp**2) * np.exp(-(bp_range - mu_bp)**2 / (2*sigma_bp**2))
ax2.fill_between(bp_range, prior_pdf, alpha=0.3, color='blue', label=f'Prior: {mu_bp} ± {sigma_bp}')
ax2.plot(bp_range, prior_pdf, 'b-', linewidth=2)
# Posterior
posterior_pdf = 1/np.sqrt(2*np.pi*conditional_var_bp) * np.exp(-(bp_range - conditional_mean_bp)**2 / (2*conditional_var_bp))
ax2.fill_between(bp_range, posterior_pdf, alpha=0.3, color='red',
label=f'Posterior: {conditional_mean_bp:.1f} ± {conditional_std_bp:.1f}')
ax2.plot(bp_range, posterior_pdf, 'r-', linewidth=2)
ax2.axvline(x=140, color='orange', linestyle=':', linewidth=2, alpha=0.7, label='Hypertension threshold')
ax2.set_xlabel('Blood Pressure (mmHg)')
ax2.set_ylabel('Probability Density')
ax2.set_title(f'Information Gain from Heart Rate\n(Schur complement reduces variance)', fontsize=12)
ax2.legend()
ax2.grid(True, alpha=0.3)
plt.suptitle('Medical Diagnosis: Schur Complement Quantifies Information Gain', fontsize=14)
plt.tight_layout()
plt.show()
2. The Formula: Block Matrix Inversion¶
The Schur complement is key to inverting block matrices:
$$\begin{pmatrix} \mathbf{A} & \mathbf{B} \\ \mathbf{C} & \mathbf{D} \end{pmatrix}^{-1} = \begin{pmatrix} (\mathbf{A} - \mathbf{B}\mathbf{D}^{-1}\mathbf{C})^{-1} & \cdots \\ \cdots & \cdots \end{pmatrix}$$
The top-left block of the inverse is the inverse of the Schur complement!
# Demonstrate block matrix inversion
A = np.array([[4, 1], [1, 3]])
B = np.array([[1, 0], [0, 1]])
C = np.array([[1, 0], [0, 1]]) # C = B for symmetric case
D = np.array([[2, 0.5], [0.5, 2]])
# Build full matrix
M = np.block([[A, B], [C, D]])
# Direct inversion
M_inv = np.linalg.inv(M)
# Schur complement of D
D_inv = np.linalg.inv(D)
schur_D = A - B @ D_inv @ C
schur_D_inv = np.linalg.inv(schur_D)
print("Block Matrix Inversion via Schur Complement")
print("="*50)
print(f"\nMatrix M (4×4):")
print(M)
print(f"\nDirect inverse M⁻¹ (top-left 2×2 block):")
print(M_inv[:2, :2].round(4))
print(f"\nSchur complement of D: A - B D⁻¹ C =")
print(schur_D.round(4))
print(f"\nInverse of Schur complement:")
print(schur_D_inv.round(4))
print(f"\nThey match: {np.allclose(M_inv[:2, :2], schur_D_inv)}")
Block Matrix Inversion via Schur Complement ================================================== Matrix M (4×4): [[4. 1. 1. 0. ] [1. 3. 0. 1. ] [1. 0. 2. 0.5] [0. 1. 0.5 2. ]] Direct inverse M⁻¹ (top-left 2×2 block): [[ 0.3394 -0.156 ] [-0.156 0.4771]] Schur complement of D: A - B D⁻¹ C = [[3.4667 1.1333] [1.1333 2.4667]] Inverse of Schur complement: [[ 0.3394 -0.156 ] [-0.156 0.4771]] They match: True
3. Higher Dimensions: Multiple Conditioning Variables¶
The Schur complement works for any partitioning of variables.
# 4D example: Conditioning on 2 variables
# Variables: [weight, height, age, cholesterol]
# We'll condition cholesterol on [weight, height, age]
# Realistic covariance matrix (approximate)
Sigma_full = np.array([
[400, 50, 5, 80], # weight (lbs)
[50, 16, 0.5, 10], # height (inches)
[5, 0.5, 100, 20], # age (years)
[80, 10, 20, 900] # cholesterol (mg/dL)
])
# Partition: x1 = [cholesterol], x2 = [weight, height, age]
Sigma_11 = Sigma_full[3:4, 3:4] # (1,1) - cholesterol variance
Sigma_12 = Sigma_full[3:4, 0:3] # (1,3) - cholesterol-others covariance
Sigma_21 = Sigma_full[0:3, 3:4] # (3,1)
Sigma_22 = Sigma_full[0:3, 0:3] # (3,3) - others covariance
# Schur complement: conditional variance of cholesterol given all others
Sigma_22_inv = np.linalg.inv(Sigma_22)
Sigma_cond = Sigma_11 - Sigma_12 @ Sigma_22_inv @ Sigma_21
print("Multivariate Conditioning Example")
print("="*50)
print(f"\nVariables: weight, height, age, cholesterol")
print(f"\nMarginal std(cholesterol) = {np.sqrt(Sigma_11[0,0]):.1f} mg/dL")
print(f"\nConditional std(cholesterol | weight, height, age)")
print(f" = √(Schur complement)")
print(f" = {np.sqrt(Sigma_cond[0,0]):.1f} mg/dL")
print(f"\nKnowing weight, height, and age reduces cholesterol uncertainty by")
print(f" {(1 - np.sqrt(Sigma_cond[0,0])/np.sqrt(Sigma_11[0,0]))*100:.1f}%")
Multivariate Conditioning Example ================================================== Variables: weight, height, age, cholesterol Marginal std(cholesterol) = 30.0 mg/dL Conditional std(cholesterol | weight, height, age) = √(Schur complement) = 29.7 mg/dL Knowing weight, height, and age reduces cholesterol uncertainty by 1.1%
4. Intuition: Information Decomposition¶
The Schur complement has a beautiful interpretation:
$$\text{Var}(x_1) = \underbrace{\text{Var}(x_1 | x_2)}_{\text{Schur complement}} + \underbrace{\text{"Variance explained by } x_2\text{"}}_{\Sigma_{12} \Sigma_{22}^{-1} \Sigma_{21}}$$
The second term is the reduction in variance due to knowing $x_2$.
# Variance decomposition
rho_values = [0, 0.3, 0.6, 0.9]
fig, ax = plt.subplots(figsize=(10, 6))
# Original variance
original_var = 1.0
x = np.arange(len(rho_values))
width = 0.6
conditional_vars = [original_var - rho**2 for rho in rho_values]
explained_vars = [rho**2 for rho in rho_values]
# Stacked bar chart
bars1 = ax.bar(x, conditional_vars, width, label='Conditional variance (Schur complement)', color='steelblue')
bars2 = ax.bar(x, explained_vars, width, bottom=conditional_vars,
label='Variance explained by $x_2$', color='coral')
ax.set_xticks(x)
ax.set_xticklabels([f'ρ = {rho}' for rho in rho_values])
ax.set_ylabel('Variance')
ax.set_title('Variance Decomposition: Marginal = Conditional + Explained', fontsize=12)
ax.axhline(y=original_var, color='black', linestyle='--', linewidth=1.5, label='Marginal variance')
ax.legend()
ax.set_ylim(0, 1.2)
# Annotate
for i, (cond, expl, rho) in enumerate(zip(conditional_vars, explained_vars, rho_values)):
ax.text(i, cond/2, f'{cond:.2f}', ha='center', va='center', fontsize=10, color='white', fontweight='bold')
if expl > 0.05:
ax.text(i, cond + expl/2, f'{expl:.2f}', ha='center', va='center', fontsize=10, color='white', fontweight='bold')
plt.tight_layout()
plt.show()
print("\nAs correlation increases, more variance is 'explained' by conditioning,")
print("and the Schur complement (residual uncertainty) decreases.")
As correlation increases, more variance is 'explained' by conditioning, and the Schur complement (residual uncertainty) decreases.
5. Key Property: Conditional Covariance is Constant¶
A remarkable property of the Gaussian distribution: the conditional covariance does not depend on the observed value.
$$\text{Cov}(x_1 | x_2 = a) = \Sigma_{11} - \Sigma_{12}\Sigma_{22}^{-1}\Sigma_{21}$$
Notice: the right side contains only covariance matrix blocks — no $a$ appears!
| Quantity | Formula | Depends on observed value? |
|---|---|---|
| Conditional mean | $\mu_1 + \Sigma_{12}\Sigma_{22}^{-1}(a - \mu_2)$ | ✅ Yes |
| Conditional covariance | $\Sigma_{11} - \Sigma_{12}\Sigma_{22}^{-1}\Sigma_{21}$ | ❌ No |
Implications:
- We can compute uncertainty reduction before observing anything
- Useful for experimental design: "How much will measuring X reduce uncertainty about Y?"
- Enables efficient algorithms (e.g., Kalman filter precomputes covariance updates)
- This is a unique property of Gaussians — non-Gaussian conditionals can have value-dependent variance
# Demonstrate: Conditional covariance is CONSTANT across different observed values
rho = 0.8
Sigma = np.array([[1, rho], [rho, 1]])
# Schur complement (conditional variance) - computed ONCE
Sigma_cond = Sigma[0,0] - Sigma[0,1]**2 / Sigma[1,1]
# Different observed values of x2
x2_values = [-2, -1, 0, 1, 2]
fig, ax = plt.subplots(figsize=(10, 8))
# Create grid for contour plot
x1_grid = np.linspace(-4, 4, 200)
x2_grid = np.linspace(-3.5, 3.5, 200)
X1, X2 = np.meshgrid(x1_grid, x2_grid)
pos = np.dstack((X1, X2))
# Compute joint PDF
from scipy.stats import multivariate_normal
rv = multivariate_normal([0, 0], Sigma)
Z = rv.pdf(pos)
# Plot contours of joint distribution
contour_levels = [0.01, 0.05, 0.1, 0.15]
contours = ax.contour(X1, X2, Z, levels=contour_levels, colors='steelblue', alpha=0.7, linewidths=1.5)
ax.contourf(X1, X2, Z, levels=20, cmap='Blues', alpha=0.3)
# Plot conditional distributions for each observed x2
colors = plt.cm.plasma(np.linspace(0.2, 0.85, len(x2_values)))
for x2_obs, color in zip(x2_values, colors):
# Conditional mean CHANGES with observed value
cond_mean = Sigma[0,1] / Sigma[1,1] * x2_obs
# Conditional std is CONSTANT (Schur complement)
cond_std = np.sqrt(Sigma_cond)
# Draw horizontal line at x2_obs
ax.axhline(y=x2_obs, color=color, linestyle='--', alpha=0.4, linewidth=1)
# Mark conditional mean and ±2σ interval
ax.scatter([cond_mean], [x2_obs], c=[color], s=120, marker='o', zorder=5,
edgecolor='white', linewidth=1.5)
ax.plot([cond_mean - 2*cond_std, cond_mean + 2*cond_std], [x2_obs, x2_obs],
color=color, linewidth=5, alpha=0.8, solid_capstyle='round')
# Add annotation
ax.annotate('Same width!\n(constant Σ_cond)', xy=(2.3, 1.8), fontsize=11,
ha='center', style='italic',
bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.9))
# Draw regression line (line of conditional means)
x2_line = np.linspace(-3.5, 3.5, 100)
x1_line = Sigma[0,1] / Sigma[1,1] * x2_line
ax.plot(x1_line, x2_line, 'k--', linewidth=1.5, alpha=0.5, label='Regression line (conditional means)')
ax.set_xlabel('$x_1$', fontsize=12)
ax.set_ylabel('$x_2$ (observed)', fontsize=12)
ax.set_title(f'Conditional Covariance is Constant (ρ = {rho})\n'
f'Slicing the ellipse at any $x_2$ gives the same conditional width: σ = {np.sqrt(Sigma_cond):.2f}',
fontsize=12)
ax.set_xlim(-4, 4)
ax.set_ylim(-3.5, 3.5)
ax.set_aspect('equal')
ax.legend(loc='lower right')
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
print(f"Conditional variance Var(x₁|x₂) = {Sigma_cond:.4f} (same for ALL observed values)")
print(f"Conditional std σ(x₁|x₂) = {np.sqrt(Sigma_cond):.4f}")
print(f"\nThis can be computed BEFORE observing any data!")
Conditional variance Var(x₁|x₂) = 0.3600 (same for ALL observed values) Conditional std σ(x₁|x₂) = 0.6000 This can be computed BEFORE observing any data!
6. Connection to Precision Matrix¶
There's a duality between the Schur complement of the covariance and the precision matrix:
- $(\Sigma^{-1})_{11}$ = Inverse of Schur complement of $\Sigma_{22}$
- Conditional precision has a simpler form in the precision parameterization
# Precision matrix and Schur complement relationship
Sigma = np.array([[1, 0.7], [0.7, 1]])
Lambda = np.linalg.inv(Sigma) # Precision matrix
# Schur complement of Sigma
schur_sigma = Sigma[0,0] - Sigma[0,1]**2 / Sigma[1,1]
print("Covariance-Precision Duality")
print("="*50)
print(f"\nCovariance Σ:")
print(Sigma)
print(f"\nPrecision Λ = Σ⁻¹:")
print(Lambda.round(4))
print(f"\nSchur complement of Σ (conditional variance):")
print(f" Σ₁₁ - Σ₁₂²/Σ₂₂ = {schur_sigma:.4f}")
print(f"\nλ₁₁ (precision matrix diagonal):")
print(f" Λ₁₁ = {Lambda[0,0]:.4f}")
print(f"\nRelationship: Λ₁₁ = 1/(Schur complement) = {1/schur_sigma:.4f}")
print(f"Match: {np.isclose(Lambda[0,0], 1/schur_sigma)}")
Covariance-Precision Duality ================================================== Covariance Σ: [[1. 0.7] [0.7 1. ]] Precision Λ = Σ⁻¹: [[ 1.9608 -1.3725] [-1.3725 1.9608]] Schur complement of Σ (conditional variance): Σ₁₁ - Σ₁₂²/Σ₂₂ = 0.5100 λ₁₁ (precision matrix diagonal): Λ₁₁ = 1.9608 Relationship: Λ₁₁ = 1/(Schur complement) = 1.9608 Match: True
Key Takeaways¶
Schur complement: $\mathbf{A} - \mathbf{B}\mathbf{D}^{-1}\mathbf{C}$
- "What remains of A after removing D's influence"
Gaussian conditioning: Conditional variance = Schur complement $$\text{Var}(x_1 | x_2) = \Sigma_{11} - \Sigma_{12}\Sigma_{22}^{-1}\Sigma_{21}$$
Conditional covariance is constant (unique to Gaussians!):
- Does NOT depend on the observed value, only on the covariance structure
- Can be computed before any observation is made
- The conditional mean shifts; the conditional covariance stays fixed
Variance decomposition: $$\text{Marginal var} = \text{Conditional var} + \text{Explained var}$$
Block matrix inversion: Schur complement appears in the inverse formula
Applications:
- Bayesian inference (updating beliefs)
- Medical diagnosis (combining evidence)
- Kalman filtering (measurement updates)
- Experimental design (predict uncertainty reduction before measuring)
This completes our journey through matrix-vector multiplication patterns!
From basic products to the sophisticated Schur complement, these operations form the foundation of probabilistic machine learning.