Basic Intro to KL-Divergence

Sanjay Parajuli
Sep 17, 2023

--

Way to measure difference in terms of probabilities.

KL-Divergence is the metric for comparing two distributions. It is used to measure the difference between two probability distributions over the same variable x.

It is asymmetric. i.e. if A and B are two distributions, then A-B <> B-A.

Let us suppose gender is a feature column in a ML model. So, gender is discrete random variable with 3 possible values: Male, Female, Others. We can compare the gender distribution in Train set (A) and Test set(B) using KL divergence as below:

Using formula for discrete distributions, calculate KL-divergence by using each gender in place of x, and use A and B in place of p and q.

Dₖₗ(male) = 60*ln(60/50)=60*0.18=10.8
Dₖₗ(female) = 60*ln(30/40)=30*-0.28=-8.4
Dₖₗ(others) = 10*ln(10/10)=10*0=0
finally,
Dₖₗ(gender) = 10.8–8.4+0=2.4

References:
https://www.youtube.com/watch?v=q0AkK8aYbLY
http://hanj.cs.illinois.edu/cs412/bk3/KL-divergence.pdf

--

--