submitted 1 month ago by AshrafIbrahim03@programming.dev to c/programming@programming.dev

2 comments fedilink hide all child comments

I've been trying to make an RNG that outputs data in a way that has an entropy that approaches a user defined value as the output gets arbitrarily long. I've been struggling at a certain point that I'll detail later, but the main motivation for this project is just trying to practice on a problem like this. Here's my repo.

In this post, we'll be talking about Shannon entropy. Here's a good refresher for it. If you scroll down there's examples and the equation.

Looking at the equation, when developing this algorithm we have two different input values for it: the target entropy and the number of outcomes, n. From there, we can create a probability vector of length n whose entropy is equal to the target entropy, then we can map objects to elements of the probability vector and sample indices from the vector by using the alias sampling method. From there, we have a nice algorithm for creating sequences of any type that have the target entropy.

So where's the problem? It's in actually generating the probability vector. In order to get to a V0, I implemented two functions: helpers::compute_probabilities and helpers::find_lambda. compute_probabilities creates the probability vector that has the target entropy by sampling from the exponential distribution. find_lambda repeatedly calls compute_probabilities using binary search to converge on a lambda that produces a probability vector with the target entropy. The problem line is right here:

    pub fn compute_probabilities(num_outcomes: usize, lambda: f64) -> Vec<f64> {
        let unnormalized: Vec<f64> = (1..=num_outcomes)
            .map(|i| (-lambda * i as f64).exp())
            .collect();

When calculating the actual probability values, they get exponentially smaller as the probability vector gets bigger. This works. Technically. But in the worst way. When sampling using vectors generated using this function, they realistically only sample the first few elements. That's the problem. I want to more equally distribute the probability in such a way that doesn't make the tenth value arbitrarily small but the first value really large in comparison. For reference, using this function and lambda=1, the tenth element in the vector is 5x10^-5 but the first value .37.

I figured I'd post this before I start throwing a bunch of different distributions at this function to see if anyone sees any inherent flaws with my process.

you are viewing a single comment's thread
view the rest of the comments

[-] anton 1 points 1 month ago

Lets see if I understand it correctly:
You want to reduce the entropy (below that of an equal distribution) which requires having a unequal distribution.
You currently use a exponential falloff but are not happy with the disparity in probabilities.
You want a distribution that *feels* somewhat like an equal distribution.

Possible (unnormalized) distributions:

1,...,1,f,0,...,0 with 1>=f=>0
Pros: All (used) events (except the last) are equally distributed. You can roughly set the entropy by changing the number of characters used (2^H) and do fine adjustments using the fudge factor f (maybe a formula exists, but you can keep using your binary search).
Cons: you aren't really using all events

f,1,...,1 with f>1
(there is probably also a formula to find f) Pro: all other events can actually happen
Con: dominated by one event

I can't think of a distribution with a higher chance for the least likely event than the second option.

this post was submitted on 03 Aug 2025

9 points (100.0% liked)

Programming

22479 readers

218 users here now

Welcome to the main community in programming.dev! Feel free to post anything relating to programming here!

Cross posting is strongly encouraged in the instance. If you feel your post or another person's post makes sense in another community cross post into it.

Hope you enjoy the instance!

Rules

Follow the programming.dev instance rules
Keep content related to programming in some way
If you're posting long videos try to add in some form of tldr for those who don't want to watch videos

Wormhole

Follow the wormhole through a path of communities !webdev@programming.dev

founded 2 years ago

MODERATORS

snowe@programming.dev

Ategon@programming.dev

MaungaHikoi@lemmy.nz

UlrikHD@programming.dev