Difference between A/B Testing, (Non-)Experiment, & Experimentation
image credit: https://www.behance.net/

Difference between A/B Testing, (Non-)Experiment, & Experimentation

Key Summary

  • Oftentimes there is a confusion between different roles in a project group when someone says, "Let's run an experiment for this project" This confusion (partially) comes from jargon different roles are accustomed to.
  • As an overarching terminology, "experimentation" can mean either "experiment" as a specific research methodology, or convey a generic connotation of "let's test it" which is akin to a pilot, exploratory study, initial investigation, etc.
  • The term "experiment" is a specific research methodology and scientific process to validate a  hypothesis and seeks causal relations between variables. An experiment can have multiple study groups (a.k.a variants) depending on the experiment design. If needed, a control group can be included which is often taken as a baseline for comparison. "A/B testing" can be considered as a simple type of controlled experiment
  • Not every user research requires an "experiment" or "A/B testing". Other non-experiment approaches including qualitative methods can provide critical insights, too. There is never the best methodology, only the most appropriate methodology.

Background

In most project working groups, people come from different backgrounds with various expertise (what a blessing!). Since people are trained differently both professionally or academically, there is oftentimes a misalignment when people suggest a particular methodology. A/B testing vs. experiment vs. experimentation is perhaps one of the most notable examples. This note serves a purpose of a brief introduction and clarification for these three terminologies that are oftentimes used interchangeably leading to confusion during project planning.

The focus of this note is user experience research or relevant areas, as this is a realm where many teams actively pursue solutions, programs, and tools to enhance users' experience, learning, or motivation to take an action the way a research team expects. I will also briefly mention other alternative methodologies, both quantitative and qualitative, which can be used to answer project questions albeit they are not considered the "gold standard" - a title which a randomized controlled trial (i.e. experimental design) often earns.

Two-Fold Meaning of Experimentation

We can think of the word experimentation in two ways: (1) a general idea of "let's test it" and (2) a scientific methodology of conducting an experiment by fulfilling a set of rigorous steps to derive a conclusion. Sometimes, the word of experimentation is used without a nuanced clarification about which of the two is intended. The outcome can be either a delay in determining a method to answer a business question in scope, or cause confusion about necessary requirements to run a study in order to to make a business decision.

Experimentation can mean (1) a general idea of "let's test it" and (2) a scientific methodology of conducting an experiment by fulfilling a set of rigorous steps to derive a conclusion

If we don't know which method to apply to answer a business question of interest, it is a good practice to say: "for this question, we need to test it" instead of "let's experiment it". In other words, it is safer to say: "we need to set up a study plan/ research plan for this question", rather than "we need an experimentation to answer this question" especially before a consensus is reached that a experimental design as a methodology is the best way to address the business question. Now that we know the word experimentation can be either generic or methodological. What is an experiment?

Difference b/w Experiment A/B Testing

An experiment consists of a hypothesis testing, which hinges on a rigorous and scientific procedure of deriving a research question from phenomena, setting up hypotheses, sample size estimate, randomization and group assignment, data collection/analysis, report/review results, etc. A controlled experiment is a common type of experiment (a.k.a controlled trial). The purpose of a controlled experiment is to test the effect(s) from an intended independent variable (e.g.  a new tool, program, design, or intervention) by comparing the results between treatment group(s) and a controlled group.  The ultimate objective of an experiment is to identify causality (if any) by manipulating a specific variable or factor and examining its associated outcome.

An experiment can have multiple treatment groups with different variants of interventions or it can also have a single treatment group and a single control group, which we call an A/B testing. Essentially, A/B testing is the simplest type of a controlled experiment (see image below). When conducting A/B testing, akin to an experiment, the objective is to identify the effect(s) of an intervention by comparing results from a treatment group (i.e. those who receive the intervention) and those from a control group (i.e. those who don't receive the intervention).

a controlled experiment design

A key for an experiment is to control the known and randomize the unknown factors. The aim is to keep the known variables between the control and treatment group(s) constant, so that the only difference in observed outcome between the two sides can be correctly attributed to the manipulated variable (i.e. intervention). In social sciences where humans are often the subjects we study, there are very likely to be other unknown factors which influence the outcome other than our intended intervention and the known factors we've controlled for. After all, we are not just testing different chemical products in a science lab when we run an experiment in a social setting. In that case, correct sampling and randomizing the group assignment to counter potential biases or confounds is critical.

Not every question requires an A/B testing, sometimes non-experimental methods can answer a question more directly, quickly, and economically

Experiment design fills entire textbooks and multi-semester university classes. It also requires much practical experience to know how to accommodate for various (un)expected factors and/or confounds based on given resources and time. Although a controlled experiment is often titled as the "gold standard" given its rigorous planning and the robustness of results, there are other alternatives a project team can consider. Not every question requires an experiment (or A/B testing), sometimes non-experimental methods can answer a question more directly, quickly, and economically especially in a business setting.

What Are Non-Experiment Methods?

Aside from a controlled experiment or A/B testing, there are other alternatives with different tradeoffs between scientific rigor and logistical expense. At a risk of overly simplification, I list some alternatives below along with a brief explanation :

  • Pre- & post-test design: Single or multiple groups design with a measure taken before and after an intervention is implemented. The goal is to identify a change (a.k.a delta) in a targeted metric due to the intervention. It is critical to ensure the measures (e.g. questionnaire, test) before and after the intervention are comparable in terms of the nature, difficulty, format, etc. of the questions. It is also important to ensure that when participants in a study show a higher performance in the post-test, it is not due to the effect of familiarity or because they can guess the answers based on the pre-test.
  • Quasi-experiment: The prefix "quasi-" does not signal a fake experiment. It conveys a concept that it is close to an experiment but the main difference is that it is not a "controlled" experiment. Common reasons to run a quasi-experiment can be that it is unethical or infeasible to determine who receives an intervention who does not. It is still viable to compare study groups and calculate the effects of an intervention using this method by using some post-study statistical methods.
  • Observational data analysis: Observational data pertains to data gathered about participants when they are not arranged to react to a planned intervention (like in an experiment). For instance, lots of log data or historical data with a variety of variables can serve as observational data. There are different statistical approaches one can use to examine the effect of a variable based on the facts that have already occurred. If we have some historical data logging people's reactions to a new/old product, we can code the use of new/old product as a variable and statistically check its impact on the target variable we are interested in.
  • Mixed methods design: A mixed method design combines quantitative and qualitative analysis. This approach enables researchers to tap into two data perspectives for the breadth and depth of data to answer a research question. Some common approaches that actualize a mixed methods design include nesting an interview in an experiment, or applying a focus group first before implementing a survey, etc.
There is never the best methodology, only the most appropriate methodology. In other words, the best methodology is the one that can answer your research question the best

There are many other alternatives to an experiment, even in the qualitative realm alone. Qualitative methods such as focus group, interview, user diary, field research by shadowing user behavior can oftentimes offer good insights at a lower logistical cost. On the quantitative side, if there is good quality and affordance for historical data relevant to a research question, predictive analytics, statistical simulations, and machine learning methods can also be useful.

There is never the best methodology, only the most appropriate methodology. In other words, the best methodology is the one that can answer your research question the best.

A Successful Experiment? Not Always about Stats Significance

Sometimes we get excited about statistical significance or a large positive delta coming from a new design/product. That's great! But we should also be aware that statistical significance alone does not justify a successful experiment.

Depending on the actual question about the metrics in a study, sometimes we may not want to see a statistical significance. For instance, if we deploy a new product design that features a lower operational cost and is easier to maintain, and our hope is that it won't cause a change in user behavior, a statistical significance/difference is not an ideal outcome. Even if a statistical significance is what a project team is pursuing, we should also be mindful not to tie it to the success of an experiment.

A successful experiment is the one that gives researchers and stakeholders robust and actionable insights to make decisions

A successful experiment is the one that gives researchers and stakeholders robust and actionable insights to make decisions about a design/product, rather than only one that shows statistical significance. The positive effects (if any) of a new design or new product come from the design/product itself. Experimenters' main job is to measure hypothesized positive effects, but an experiment on its own will not change the nature of that design/product. Of course, an experimenter can participate in product development before running an experiment. But it is important that the engagement won't bias the experiment design and results.

If we want to see what we want to see, or if we are 100% sure about what we will see, we will not need an experiment. A successful experiment does not always give people what they are expecting, but it will always inform people what they should do next!


(Opinions expressed in this article are solely my own and do not express the views or opinions of my employer) 

Kevin M. Yates

L&D Detective™ | Measurement Advisor | Keynote Speaker | Nonprofit Founder

1y

This is good work Dr. Eddie Lin. 👏🏽👏🏽👏🏽

Like
Reply

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics