Jane Street Puzzle Solution - Mar 2023
For original problem, see https://www.janestreet.com/puzzles/current-puzzle/.
I know this is very long for a post but it can be interesting.
The Problem from Jane Street (March 2023)
These contests consist of rounds in which each robot has a single attempt to score. In an attempt, a robot speeds down the running track (modeled as the numberline) from 0, the starting line, to 1, the takeoff point. A robot moves along this track by drawing a real number uniformly from [0,1] and adding it to the robot’s current position. After each of these advances, the robot must decide whether to jump or wait. If a robot crosses the takeoff point (at 1) before jumping its attempt receives a score of 0. If the robot jumps before crossing 1, it draws one final real number from [0,1] and adds it to its current position, and this final sum is the score of the attempt.
In a head-to-head contest, the two robots each have a single attempt without knowing the other’s result. In the case that they tie (typically because they both scored 0), that round is discarded and a new round begins. As soon as one robot scores higher than the other on the same round, that robot is declared the winner!
Assume both robots are programmed to optimize their probability of winning and are aware of each other’s strategies. You are just sitting down to watch a match’s very first attempt (of the first round, which may or may not end up being discarded). What is the probability that this attempt scores 0? Give this probability as a decimal rounded to 9 digits past the decimal point.
To quickly solve this problem we notice the 9 digits decimal requirement for the answer, so direct simulation seems not accurate enough as Monte Carlo methods generally has its result’s precision scaling like O(sqrt(N)), where N is the sample size. We want to use as much math first to find patterns.
Here on we use a few abbreviations:
strat ← strategy; w.r.t. ← with respect to;
Structure and Symmetry of the Game
Symmetry of Strategy : As the two robots only see the other’s final score, their choices in the game are not impacted by the particular choices the other robot made within one particular match. Obviously they have the same optimal strategy, due to lack of ordering in their decision making.
Sufficiency of Pure Strategy: There is no point using mixed strategy and let’s explain. The reasonable guess for both robots’ (same) strat is to continue wait until its score is larger than x (0 <= x <= 1), then will switch to jump to lock in the high score.
The reasonable guess for both robots’ (same and optimal) strat is to continue waiting until its score is larger than x (0 <= x <= 1), then will switch to jump one final time to lock in the high score.
This is a pure strategy because there is no probability in the decision making. Once a robot sees its current score it’ll only take one deterministic action (wait or jump). Why can’t mixed strat do better? Suppose we have a mixed strat which is when we assign probability to a few pure strategies and choose them randomly at each step. Because we want to maximize the probability of winning, which is linear w.r.t the prob. of winning of each pure strat, we can’t really do better than always choosing the pure strat with the highest score. OK so we revert back to sticking to the best pure strat.
Solving for X: the Wait→Jump Threshold Score
At the threshold X, the choices of waiting or jumping breaks even, by breaking even we mean either choice gives the same probability of winning for this robot. Think about this:
also let’s denote the overall probability of one robot ends up with a zero-score as p0. Consider when I have score exactly at X and making choices for myself:
Prob of Winning - when I Choose Jump at X
When I chose jump, I’m not gonna end up with zero-score. However, my opponent may have a zero score with a probability of p0, in which case I win definitely. If my opponent have non-zero score, I need a higher score to beat it. So my winning prob. is
In the case that both of us have non-zero scores, my final score S will be:
where U(a, b) is a uniform random number between a and b. My opponent’s final score will be
The meaning is that our opponent will continue to wait until its score is more than X so its score must be more than X. However we also know the opponent is not zero-score, so it cannot overshoot to be larger than 1 at this stage. This gives the U(x,1) term. And after the score surpasses X, our opponent will jump as well giving the final U(0,1) term.
To calculate probability S > S’ can be tricky though if you want to directly calculate pdf (probability density function) of S and S’. A cool trick is to use graph, which is generally the best way to compute such problems with Uniformish pdf.
Finally we have (Thanks to our kind reader Giovanni’s correction!)
thus we have all the terms needed for this winning prob when I jump at score X.
Prob of Winning - when I Choose Wait at X
Still we can first examine all the cases where my opponent and I ended up waiting too much so we had a zero score.
Case B, C are obvious. When Case A happens we have to rerun the game and because our symmetry of strats, both of us will win with equal probability one half. And in Case D, when both of us are non-zero, both of us have exactly the score of S’ that was calculated above. So to sum up
where the four terms corresponds to Case A→D. My probability of getting a zero score is exactly X because when I wait at X, I have a probability of (1-X) from the Random Uniform U(0, 1) to get an additional score while the sum of score is still below 1, so my prob. of blowing up is 1 - (1-X) = X.
By solving
we have one equation, but we have two variables X and p0.
Final Equation: Relate p0 and X.
Now let’s forget about the whole game theory argument and simply calculate the p0, the probability of waiting too much and blow up with a score above 1. Define a function
By examining the state-transition properties,
To understand the formula, simply recognize that with a current score of s, I can either blow up in my next wait, with a probability of s, or not blow up and transit into the next score s’ and then blow up from that new state. (Note pdf of random uniform U(0,1) is just 1.) So really my probability of eventually blowing up is the sum of all these possibilities. We differentiate this equation to get
The boundary condition for this ODE is tricky tho because of the discontinuity at (s = x). To tackle this we need to look at where the ODE still holds, note when the score is slightly less than X (at the left limit), the probability of blowing up is really just at this round’s wait and if I don’t blow up right now I’ll simply jump next time (because my score is now between x and 1), so there is no complicated state transition:
Thus solving the ODE gives:
and our p0 at the beginning is simply
Numerical Solution and Conclusion
With the two equations of p0 and X, we can solve X to 9 decimals points and p0 ~ 0.11, and I won’t give the exact number because the puzzle challenge is not yet concluded.
If you make it this far, Consider subscribe! I’m going to put a lot of hardcore contents about the math / techs that I’m excited about and share it with you. I’m looking for quant dev jobs in the HFT space so if you are looking for a job too or hiring, let me know! Let’s all make this a new community and conquer the world together!
Thanks for the post, its very well written and I had no problem understanding it. Only thing, isn't P(S>S') = 1/6(x^2+x+1) rather than 1 - 1/6(x^2+x+1)?
An alternative way to relate p0 and X.
Prob(Eventually not blow up when the current score = 0) = \sum_{n = 0}^{\infty} Prob(Land in [X,1] after the n'th decision)
Prob(Land in [X,1] after the n'th decision) = Prob(Land in [0,X] after the (n-1)'th decision)\times Prob(U(0,1)\in [X,1])
According to the PDF of Irwin–Hall distribution,
Prob(Land in [0,X] after the (n-1)'th decision) = \int_{y=0}^X \frac{y^{n-1}}{(n-1)!} = \frac{X^n}{n!}
Therefore
Prob(Land in [0,X] after the (n-1)'th decision)\times Prob(U(0,1)\in [X,1]) = (1-X)\frac{X^n}{n!}
At last,
Prob(Eventually not blow up when the current score = 0) = \sum_{n = 0}^{\infty} Prob(Land in [X,1] after the n'th decision) = \sum_{n = 1}^{\infty}(1-X)\frac{X^n}{n!}=(1-X)e^X
Prob(Eventually blow up when the current score = 0) = 1-(1-X)e^X