Yurii Shevchuk

Iterative homogenization by elimination


Let’s say we have \(S\) objects and each object is associated with one of the \(N\) classes. There are \(c_i\) objects associated with the \(i\)-th class. We want to eliminate only \(b\) objects in a way that forms distribution of classes as close as possible to the uniformal distribution.


Intuition suggests that we can eliminate one object at a time by selecting a class with the largest number of objects in it. It’s not quite obvious whether this is the best strategy to follow. For example, we can start with the following distribution of classes - \(c = [12, 6, 7, 13]\). Next, we can eliminate 2 objects from it (\(b = 2\)). After elimination we can end up with 10 different outcomes. We can follow our initial strategy and end up with the following distribution: \([12, 6, 7, 11]\) (or \([11, 6, 7, 12]\)). Using some other strategy we can get a different outcome, for example, \(c = [12, 6, 6, 12]\). How do we know which one is closer to the uniformal distribution.

The word “closer” impliest that we need to have a certain measure that allows us to measure homogenization of the distribution. In order to get the solution we need to better define our objective.


We can use Kullback–Leibler divergence (KL divergence) in order to measure homogenization of the distribution. We can normalize \(c\) in order to convert it to a probability distribution \(p\)

And we can define desirable uniformal distribution

KL divergence is not symmetric. Comparing \(u\) to \(p\) is not the same as comparing \(p\) to \(u\). We can show that for this problem both versions will lead to the same solution and order in which distributions are specified is not important for our purpose.

We can start with the following definition

For our problem we want to subtract \(b\) objects and after this we will end up with a distribution \(q\)

Optimization could be sumarized in the following way


First, we can notice that original function that we want to opimize can be simplified

We can use Karush–Kuhn–Tucker (KKT) conditions in order to solve this problem. Objective could be defined in the following way

and we need to solve the following equation in order to find optimal solution

we can rewrite this equation in order to have \(x_i\) specified explicitly

From the equation above we can notice that

From the Complementary slackness in the KKT conditions we know that \(\eta_i x_i = 0\). This condition implies that \(\eta_i\) and \(x_i\) cannot be non-zero values at the same time. We can consider two cases:

  1. There exists at least one more non-zero \(x_i\). From this assumption the following statement is true (since \(\eta_i = \eta_j = 0\))

    This equation says that \(c_i - x_i = c_j - x_j\). This means that after elimination we should end up with exactly the same values in each category.

  2. There exists at least on \(x_i=0\). From this assumption the following statement is true:

    and since \(\eta_i \ge 0\) we get that

    Since we assumed that \(x_i=0\) and \(x_j \ne 0\) the equation above implies that every zero is associated with the smallest \(c_i\) value and every non-zero is associated with the largest \(c_j\) value. This observation matches our initial intuition. Equation implies that we should remove objects from the most common classes first.


There are multiple different algorithms that could be derived from the previous equations. We can find one by using the following definitions

  1. \(A = \{ i \, | \, x_i=0 \} \)
  2. \(\overset{\_}{A} = \{ i \, | \, x_i\ne0 \}\)
  3. \(|\overset{\_}{A}| = m \) and \(|A| = n- m \)
  4. \(\sum_{i=1}^N{c_i} = C\)
  5. \( k = c_i - x_i, \forall i \in \overset{-}{A}\)
  6. \(c_i \le c_j, \forall i \le j \)

If we assume that there is at least one \(x_i = 0\) then the following should be true

and this equation will work if \(c_0 = 0\) (which implies that we have N+1 category, but one additional category doesn’t have objects in it).

In addition, after some rearrangement of terms we can create the following equation.

Reversing order of distributions in the KL divergence function

As before, main equation could be simplified

On the right side, we have a negative entropy, and minimum will mean that we just need to maximize entropy of the distribution \(q\)

from which follows that

where \(S = \sum_{j=1}^{N} c_j - x_j \)

and based on the same logic as before we can show that

And again, we can go through two cases:

  1. if we assume that \(x_i=0\) and \(x_j\ne0\) then \(c_j \gt c_j - x_j \ge c_i\)
  2. if we assume that \(x_i\ne0\) and \(x_j\ne0\) then \(c_j - x_j = c_i - x_i\)

The same conclusion could be drawn from these equations which shows that order of the distribution doesn’t make a difference for this problem

Handling discrete cases

Developed equations can produce non-integer solutions. These results won’t be suitable for our initial problem. Solution could be easily modified in order to work with integers. First, we can notice that non-integer values will be produced only for cases when \( \sum_{j \in \overset{\_}{A}}{c_j} - b \) is not divisible by \(m\) without the remainder. Instead, we can use the following equation



This is just one way of defining \(s_j\) function, but any strategy will work. It’s aesy to see why this solution is the best that we can get from the previous equations and given that \(m\) categories have exactly the same value it doesn’t really matter from which of the categories we subtract remaining ones.

Share this: