Iterative Product Delivery: A Hypothesis Testing Guide

Nov 24, 2021

6 Min Read

Any phone’s App Store will disclose that most installed applications have received an update within the previous week. Today’s software solutions are released in iterations to test assumptions and theories about improving the user experience.

Today’s software solutions are delivered in iterations to confirm assumptions and ideas about improving the user experience. At any one moment, organizations like do hundreds of A/B testing on their websites.

For web-based apps, there is no need to decide on the design of a product 12-18 months in advance, construct it, and then deploy it. Rather than that, it is perfectly feasible to release incremental changes that add value to users as they are implemented—preventing the need to make assumptions about user preferences and ideal solutions— because each assumption and hypothesis may be tested independently.

Along with continuously offering value via enhancements, this method enables a product team to receive constant consumer input and corrections necessary. Creating and testing ideas every couple of weeks is a more cost-effective and time-efficient method of developing a course-correcting and iterative approach to product value creation.

How Is Hypothesis Testing Performed?

While delivering a product to consumers, verifying design and feature assumptions is critical to comprehend their effect in the actual world.

Traditionally, this validation was accomplished with product hypothesis testing, in which the researcher defined success by defining a hypothesis for a change. For example, if an Amazon data product manager hypothesizes that increasing the size of product photos would increase conversion rates, success is characterized by increased conversion rates.

One of the critical components of hypothesis testing is isolating factors within the product experience to ascribe success (or failure) to the adjustments made. Therefore, if an Amazon product manager had a second hypothesis that displaying customer reviews with product photographs would increase conversion, it would be impossible to test both hypotheses concurrently. This would result in an incorrect assignment of causes and effects; hence, the two modifications must be separated and assessed independently.

Thus, product choices about features should be accompanied by hypothesis testing to verify the features’ performance.

Numerous Hypothesis Testing Techniques

A/B Testing

Randomized A/B testing, in which a modification or feature is delivered to one-half of users (A) at random and withheld from the other half (B), may verify the most prevalent use cases (B). Returning to the premise that larger product photos improve Amazon conversion, half of the consumers will notice the change, while the other half will view the website as it was before. After that, each group (A and B) would have their conversions measured and compared. The conclusion would be that the initial premise was valid. The adjustment could be carried out to all consumers if the group viewing larger product photos saw a substantial increase in conversion.

Multivariate Testing

Each variable should ideally be separated and examined independently to attribute changes convincingly. However, such a sequential testing method may be time-consuming, mainly when many versions are being tested. To continue with the example, in testing the hypothesis that larger product photos result in more excellent conversion rates on Amazon, the term “bigger” is subjective, and numerous variants of “bigger” (i.e., 1.1x, 1.3x, and 1.5x) may be necessary.

Instead of evaluating such scenarios sequentially, a multivariate test may be used, in which consumers are divided into numerous versions rather than split in half. For example, four groups (A, B, C, and D) represent 25% of users; viewers in group A will experience no change, while versions B, C, and D will see pictures 1.1x, 1.3x, and 1.5x larger, respectively. This test compares numerous variations to the current version of the product to determine the optimal variant.

Before/After Testing

Because of network effects, it is not always practicable to divide users in half (or into numerous variations). For example, if the test includes analyzing if one logic for calculating surge fares on Uber is superior to another, the drivers cannot be segregated into multiple variations since the reasoning considers the whole city’s demand and supply mismatch. In such instances, a test will need to compare the impacts before and after the modification to conclude.

The restriction here is the inability to separate the impacts of seasonality and externality, which might influence the test and control periods differently. Assume that at time t, a modification is made to the logic that determines surge pricing on Uber, such that logic A is used before and logic B is utilized after. While the impacts may be compared before and after time t, there is no certainty that the effects are purely attributable to the change in reasoning. Due to a change in demand or other considerations, there might have been a discrepancy between the two time periods.

Time-based On/Off Testing

While time-based testing has several drawbacks, it may be used to mitigate many of them. Time-based testing involves introducing a change to all users for a specific amount of time, turning it off for an equal time, and then repeating the process over an extended time.

According to the Uber use case, the modification may be shown to drivers on Monday, retracted on Tuesday, and displayed again on Wednesday, and so on.

While this strategy does not eliminate the impacts of seasonality and externality, it does considerably lessen their impact, making such studies more robust in their results.

Test Design

Choosing the correct test for the use case at hand is a critical step in confirming a hypothesis as quickly and robustly as possible. Once a decision has been made, the specifics of the test design may be specified.

The test design is essentially a well-organized overview of:

  • Tested Hypothesis: The hypothesis to be investigated is that showing customers larger product photos would encourage them to buy more things.
  • Test’s success metrics: Customer conversion
  • Test’s decision-making criteria: The test shows that users in the variation had a greater conversion rate than those in the control group.
  • Test’s Metrics to be instrumented: Customer conversion, product image clicks

In the hypothesis that larger product photos would result in higher conversion rates on Amazon, the success measure is conversion, and the decision criterion is an increase in conversion.

The findings must be assessed after selecting and designing the appropriate test and identifying success criteria and metrics. Some statistical principles are required to do this.


When executing tests, it is critical to verify that the two chosen variations (A and B) are not biased in the success measure. For example, if the variation that sees the larger pictures already has a greater conversion rate than the variant that does not notice the change, the test is biased and may lead to incorrect findings.

To confirm that there is no sample bias, the mean and variance for the success measure may be seen before the modification is implemented.

Power and Significance

Once a distinction between these two variations is found, it is critical to establish that the observed difference represents a genuine impact rather than a random one. This may be accomplished by calculating the change inside the success metric’s significance.

In layman’s words, significance quantifies the rate with which a test demonstrates that larger pictures increase conversion when they do not. The power metric indicates how often the test shows that larger prints result in more excellent conversion rates when they do.

Thus, tests must have high power and a modest significance value to provide more accurate findings.

While a detailed examination of the statistical principles associated with product hypothesis testing is beyond the scope of this article, the following steps are advised to improve knowledge:

  • Make sure you use data engineers and data analysts early in the process since they are generally good at selecting the correct test designs and can help product managers.
  • Udemy, Udacity, and Coursera, for example, provide a lot of digital programs on hypothesis testing, A/B testing, and relevant statistical principles.
  • Using technologies like Google’s Firebase and Optimizely, which have a lot of built-in features for performing the correct tests, may help the process go more smoothly.

Using Hypothesis Testing to Improve Product Management

To continually provide value to people, it is necessary to test numerous hypotheses, for which many methods of product hypothesis testing may be used. To confirm or invalidate a hypothesis, it must be accompanied by an associated test design, as mentioned above.

This method aids in quantifying the value provided by new modifications and features, focusing on the essential elements, and delivering incremental iterations.


Stay Connected with the Latest