Does the Random Number Generation (RGN) process change over different OS versions?

Hi everyone!

I appreciate your help. I am a researcher and I use UMAP to cluster my data. Reproducibility is a key requirement for my field, so I set a random seed for reproducibility.

After coming back to my project after some time, I do not get the same results than previously even though I am working in a virtual environment, which I did not change.

When pondering about the reasons, I remembered that I upgraded my OS from Sonoma 14.1.1 to 14.5, so I was wondering whether the change in OS might cause those issues.

I'm sorry if this question is obvious to developer folks, but before I downgrade my OS or create a virtual machine, any tipp is much appreciated. Thank you!

What random number API are you using? Does its documentation specify its exact algorithm? If not, then there’s always a risk that you are accidentally relying on undefined behavior that could change over time.

That said, I’d be a little surprised if Apple is putting much effort into improving the older PRNG functions such as rand and random (C functions). There’s also arc4random but it’s not seedable, so I’m guessing that’s not the one you are using.

Thanks for your questions. I was indeed a bit unspecific. I am coding in Python and use this UMAP implementation: https://umap-learn.readthedocs.io/en/latest/api.html

UMAP instances have a parameter "random_state" and the documentation says the following: "If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random."

I provided an integer, so I am guessing UMAP uses the random number generator of the numpy version in my virtual env. This would be PCG64 (https://numpy.org/doc/2.1/reference/random/generator.html#numpy.random.default_rng).

The old random number generator of RandomState instances uses the Mersenne Twister algorithm MT19937 (https://numpy.org/doc/2.1/reference/random/legacy.html#numpy.random.RandomState).

From the documentation it is not entirely clear to me what UMAP means by "seed used by the random number generator". But in any case numpy says that "our RNGs are deterministic sequences and can be reproduced by specifying a seed integer to derive its initial state."

I looked up the history of changes in my virtual env (just to double check) and I don't see any changes in packages that should influence random number generation like numpy or numba. I had issues with reproducibility in the past, but this was across OS (Linux vs. Mac with the same virtual env) and not on the same computer with the same OS (but different versions of that OS).

Thanks a lot for your help!

Does the Random Number Generation (RGN) process change over different OS versions?
 
 
Q