I've been trying to make an RNG that outputs data in a way that has an entropy that approaches a user defined value as the output gets arbitrarily long. I've been struggling at a certain point that I'll detail later, but the main motivation for this project is just trying to practice on a problem like this. Here's my repo.
In this post, we'll be talking about Shannon entropy. Here's a good refresher for it. If you scroll down there's examples and the equation.
Looking at the equation, when developing this algorithm we have two different input values for it: the target entropy and the number of outcomes, n. From there, we can create a probability vector of length n whose entropy is equal to the target entropy, then we can map objects to elements of the probability vector and sample indices from the vector by using the alias sampling method. From there, we have a nice algorithm for creating sequences of any type that have the target entropy.
So where's the problem? It's in actually generating the probability vector. In order to get to a V0, I implemented two functions: helpers::compute_probabilities
and helpers::find_lambda
. compute_probabilities
creates the probability vector that has the target entropy by sampling from the exponential distribution. find_lambda
repeatedly calls compute_probabilities
using binary search to converge on a lambda that produces a probability vector with the target entropy. The problem line is right here:
pub fn compute_probabilities(num_outcomes: usize, lambda: f64) -> Vec<f64> {
let unnormalized: Vec<f64> = (1..=num_outcomes)
.map(|i| (-lambda * i as f64).exp())
.collect();
When calculating the actual probability values, they get exponentially smaller as the probability vector gets bigger. This works. Technically. But in the worst way. When sampling using vectors generated using this function, they realistically only sample the first few elements. That's the problem. I want to more equally distribute the probability in such a way that doesn't make the tenth value arbitrarily small but the first value really large in comparison. For reference, using this function and lambda=1, the tenth element in the vector is 5x10^-5 but the first value .37.
I figured I'd post this before I start throwing a bunch of different distributions at this function to see if anyone sees any inherent flaws with my process.
Lets see if I understand it correctly:
You want to reduce the entropy (below that of an equal distribution) which requires having a unequal distribution.
You currently use a exponential falloff but are not happy with the disparity in probabilities.
You want a distribution that *feels* somewhat like an equal distribution.
Possible (unnormalized) distributions:
1,...,1,f,0,...,0 with 1>=f=>0
Pros: All (used) events (except the last) are equally distributed. You can roughly set the entropy by changing the number of characters used (2^H) and do fine adjustments using the fudge factor f (maybe a formula exists, but you can keep using your binary search).
Cons: you aren't really using all events
f,1,...,1 with f>1
(there is probably also a formula to find f) Pro: all other events can actually happen
Con: dominated by one event
I can't think of a distribution with a higher chance for the least likely event than the second option.