Seed hacking · ↗ arxiv.org

Optimizing the randomness out of your results

Jun 24, 2026 · 2 min read

By now, p-hacking is a familiar research vice. It can be done with varying degrees of intent, from the naive beginner poking around in their data to the celebrity professor who makes a career of it. Basically, p-hacking occurs when researchers turn their datasets upside down and shake them until p < 0.05 falls out. This is generally accomplished by slicing the data in different ways, adding and removing variables, and running different kinds of tests until statistical significance is achieved. Usually it is accompanied with a healthy dose of selective reporting and HARKing (Hypothesizing After the Results are Known).

Less often discussed is seed hacking (or seed optimization/selection/scanning), as described in the humorously titled paper “torch.manual seed(3407) is all you need: On the influence of random seeds in deep learning architectures for computer vision” by David Picard. Basically, Picard built models on two of the classic datasets in computer vision (CIFAR and ImageNet), varying only the seed between training runs. He found that while overall variability between seeds was low, he could find “lucky” seeds that produced increases in validation accuracy that would be considered important improvements in the research community, despite being entirely due to luck. He concludes:

I am definitely not saying that all recent publications in computer vision are the result of lucky seed optimization. This is clearly not the case, these methods work. However, in the light of this short study, I am inclined to believe that many results are overstated due to implicit seed selection - be it from common experimental practice of trial and error or of the “evolutionary pressure” that peer review exerts on them.

He further recommends that researchers rigorously investigate the impact of random seed on their results by varying seeds/dataset splits over a large number of training runs and reporting summary statistics on the distribution of validation metrics.

When reading code attached to papers, I will occasionally wonder where the authors got their random seeds from. Myself, I usually pick from a handful of memorable values (1, 12345, 42, 33) or just the date I created the script. But I suppose some of these seeds were probably selected by authors treating the random seed as just another hyperparameter to optimize through a grid search.

^{Hat tip to Mohammed Hamdy on LinkedIn.}