Why the average of a function isn’t the function of the average.
probability
statistics
Introduces convex functions and proves Jensen’s inequality. Shows how convexity underpins key results in information theory and probabilistic modeling.
Published
January 10, 2025
%matplotlib inlineimport numpy as npimport scipy as spimport matplotlib.pyplot as pltimport pandas as pdpd.set_option('display.width', 500)pd.set_option('display.max_columns', 100)pd.set_option('display.notebook_repr_html', True)import seaborn as snssns.set_style("whitegrid")sns.set_context("poster")
//anaconda/envs/py35/lib/python3.5/site-packages/matplotlib/__init__.py:872: UserWarning: axes.color_cycle is deprecated and replaced with axes.prop_cycle; please use the latter.
warnings.warn(self.msg_depr % (key, alt_key))
Convexity
Let \(f\) be a function with domain the set of real numbers. If the second derivative is greater than zero for all \(x\in R\) this function is convex.
Consider the case of two random variables \(x_1\) and \(x_2\), as seen in the diagram below:
Jensen’s inequality: for a convex function, the weighted average of function values always lies above the function of the weighted average.
Defnition Let f be a real valued function defined on an interval \(I = [a, b]\). \(f\) is said to be convex on I if \(\forall x_1, x_2 \in I, \lambda \in [0, 1]\),
\(f\) is said to be strictly convex if the inequality is strict. Intuitively, this definition states that the function falls below the straight line (the secant) from points \((x_1, f(x_1))\) to \((x_2, f(x_2))\). In other words, the equality is satisfied only for \(\lambda = 0\) and \(\lambda = 1\).
Jensen’s Inequality
Let \(f\) be a convex function defined on an interval \(I\). If \(x_1,x_2,\dots,x_n \in I {\rm and} \lambda_1, \lambda_2,\ldots,\lambda_n \ge 0\) with \(\sum^n_{i=1} \lambda_i = 1\),
Proof: For \(n = 1\) this is trivial. The case \(n = 2\) corresponds to the definition of convexity (see above). To show that this is true for all natural numbers, we proceed by induction. Assume the theorem is true for some \(n\) then,