Enter your baseline conversion rate and the smallest improvement worth catching, and this works out how many visitors each variation needs before the test can call a winner. The most common reason A/B results never hold up is simple: the test was stopped before it had the numbers to detect the effect.
How many visitors per variation do I need?
The minimum detectable effect (MDE) is the smallest relative change you care about. Asking to detect a 2% lift needs far more traffic than a 20% lift, because small differences are harder to tell apart from random noise. Set the MDE to the smallest improvement that would actually change your decision, then commit to that sample size before you start.