Per this article on c++'s random_device and mt19937 random number engine http://www.pcg-random.org/posts/cpps-random_device.html the author states the impact of seeding mt19937 with random_device EQEMU code does exactly what he says not to do. If you look at https://github.com/EQMacEmu/Server/blob/54f2877ff4f42d57057f1b210f0318062399f311/common/random.h which is random.h in its constructor Reseed() is called. Reseed() looks like... Code: [ Only registered users can see the bbcode. Click Here To Register... ] and Random is initialized as a global variable inside Zone. That means for each zone the RNG is only seeded once, and it's with a static UINT. This static UINT changes per each zone object, but that's irrelevant since each zone is its own entity. You can tell by the comments that the person who wrote this code knows better seeding can be done with seed_seq which, is what the article above states should be done. http://www.pcg-random.org/posts/cpp-seeding-surprises.html I wrote a program that uses EQEMU's Random.h class and ran tests. Here's a slightly modified Random.h Code: [ Only registered users can see the bbcode. Click Here To Register... ] and here's my main class that implements this and does some logging Code: [ Only registered users can see the bbcode. Click Here To Register... ] What this program does is run each test (4 modes) and rolls 1 million times each, gathering info on the distribution of numbers, how many times the number rolled the same as the previous number (streaks) and how many times the number rolled the same as the previous number +/- 1 (close streaks).
I wish I had a way to visualize this data... Anyways you can see that both random engines tried (mt19937 and srand) both distribute their numbers fairly evenly. You can see that reseeding before each roll doesn't really affect the distribution, however reseeding before each roll on each engine added 8x the overhead. You can see that seeding with system time before each roll is impossible with this test because the loop happens so quickly, the seed is being set as the same thing repeatedly, thus the rand pattern is the same. Now for analyzing the streaks... Code: [ Only registered users can see the bbcode. Click Here To Register... ] In conclusion, I was wrong for stating in a previous thread that seeding with system time would be better than random_device (all in all they are just UINTs.) I was also wrong about how seeding more than once could be beneficial. **Going forward** I would like to edit my program when I get more time and do the following... - Setting the close streak threshold to something greater than 1. For example, some checks on tradeskills are 5%... so maybe set the threshold to match that. - Maybe since our Int roll function according to EQEMU is (0, 99) which makes sense because we are just converting it to %, but would it be a difference if we did (0, 999) or (0, 9999)? - Implementing the seed_seq function. I think this is the next best approach There's got to be some reason why our RNG is so streaky, it is beyond noticeable in game, and hopefully this data / test can spark some ideas.
Spark what ideas? How to make it streakier like EQ really was? I think its streaky enough. I am against trying to make it have bigger streaks, regardless of how EQ was.
the opposite, less streaky... EQ's code wouldn't have mt19937 or random_device... they were probably using c++98
Changing how the pseudo RNG algorithm is seeded won't change the results in-game what so ever. You see streaks in-game because true randomness is streaky. Play Diablo and you'll get the same legendaries over and over.
I'd have to argue that EQ live, EQ mac, and even p99 wasn't as streaky as takp Haynar is p99's random class similar / the same as here? I'm unfamiliar with diablo 1, but if its like diablo 2 and the stats are random on legends then I'd have to argue that getting the same legendary isn't that uncommon in a small world like diablo (just like the small world of 1 eq zone), it would be getting similar stats on the same legendary that would conclude streakyness.
http://dilbert.com/strip/2001-10-25 https://xkcd.com/221/ Seriously though; we may all be prone to Gambler's fallacy.
Or the clustering illusion, yeah. Torven's right: Real random processes are streaky! Go flip a coin 100 times; you'll see streaks of 5+ heads pretty regularly.
Aren't there only two outcomes? Skill up / no skill up. That's the streakiness you are talking about.
im talking about everything that uses the RNG. for example kick is 33% chance to land in our code, for it to land it goes to the RNG rolls 0-99, if the number rolled is 33 or under it succeeds. everything is based off that concept, skill ups too. i am specifically interested in the part where the RNG is rolling 0-99
I know math may be difficult sometimes, and I'm not asking you to change it... I'm just analyzing it. Does p99's random.h class look the same as TAKP? I've asked that question a couple times don't know if you've seen it. There are other RNGs out there, here's a custom one that is built using a 64 bit int distribution. Code: [ Only registered users can see the bbcode. Click Here To Register... ] You'll notice right away that there are 0 streaks in 1 million rolls, but double the "Close Streaks" as our current RNG. Now in 1 million rolls I would expect some streaks in a range of 100 numbers, but in this case it doesn't happen. To roll the same number twice with 1/100 odds is a 0.0001% chance (if we are not counting in a row.) So out of a million rolls that 0.0001% chance never happened. Our current RNG is saying that 0.0001% chance is happening ~1000 times per 1 million rolls, which = 0.0001%. While our RNG may be statistically accurate and matches 1/100 * 1/100 chance to roll the same number, it does not account for rolling that same number in a row in a given finite set. Also my program does not account for rolling a number 3 or more times in a row. If our TAKP RNG is rolling the same number 3 times in a row that's 0.000001% chance. Out of a million rolls that could happen 1 time statistically speaking. edit* so I modded my program to remember up to 3 consecutive rolls. We realize that rolling 1-100 1million times and getting the same number 3 times in a row is a 0.000001% chance. Aka it should happen once. It happened 82 times in 1 million rolls (a chance of 1.2x10^-8) or 0.000000012%! That screams streaky.
RL is streaky! Whether they're drawn from continuous or discrete distributions, uniform random numbers will tend to clump and cluster. That's the whole reason people use variance reduction techniques like stratification and Latin hypercube sampling in Monte Carlo simulation. The animation on the wiki clustering illusion page illustrates this nicely. Is Knuth's LCG outputting ints or floats? If floats, that'd explain why you see no 'streaks' but lots of 'close streaks'.
Math is tough for me. I cant understand all these silly statistic things. They are beyond my uneducated, ability to understand. So i am going to just call you a nerd, point and laugh, instead.
BTW, surron, I for one think your analysis is super cool! RNGs are fascinating, and the blog post in the OP makes a good point about the importance of the initial seed. The Mersenne twister wiki page also mentions that the output stream takes a long time to start looking random if it's seeded badly. If you really want to geek out here, I'd suggest working with uniform random reals between 0 and 1, rather than integers between 0 and 99. Uni(0,1) RNGs are fundamental; all other distributions are based on Uni(0,1). There are a bunch of standard tests for Uni(0,1) RNGs that check uniformity, independence, randomness, etc. The TestU01 C library implements a bunch of these tests.
If an EQEmu had a better working RNG than the real EQ, is that a success or failure by the developers?
rolling the same number 3 times in a row is a 0.000001% chance (1/100 * 1/100 * 1/100) so in 1 million rolls that is expected to happen once. (0.000001% * 1000000) = 1 our RNG lets that happen 82 times. i dont think that proof is "in my head"
We can conclude that the RNG is not perfect. And noone is working on a replacement. And surron thinks we should drop everything and fix this now. 3 of same value in a row is meh. We look at pass/fails for the most part in tradeskills for example. If < 30 is a fail. Guess what? Three 5's in a row is equivalent to a 1, 6, 8 in the overall response. It averages out fairly well. So my suggestion is this: surron: quit pretending like you are smarter than everyone else. Analyzing 3 points of a data set to make conclusions is about as bad as u can get in the world of data analysis.
The chance of rolling the same number (0-99) 3 times in a row is actually 1 in 10000. The chance of rolling a GIVEN number 3 times in a row is 1 in 1 million. When looking for a streak of 3, whatever the first number in the streak is doesn't matter (100% chance of being satisfied), as long as the two following it are the same (1 in 100 chance for each number).
I hope you goof balls realize for doing a rand 0-99 it doesnt pick the same number 3 times. There are many numbers picked in between. But after converting to an int, it looks like you got same number. This discussion is beyond stupid. With idiot after idiot, now being all knowing about programming and statistics.