I developed tea-tasting, a Python package for the statistical analysis of A/B tests featuring:
In this blog post, I explore each of these advantages of using tea-tasting in the analysis of experiments.
If you are eager to try it, check the documentation.
tea-tasting includes statistical methods and techniques that cover most of what you might need in the analysis of experiments.
Analyze metric averages and proportions with the Student's t-test and the Z-test. Or use Bootstrap to analyze any other statistic of your choice. And there is a predefined method for the analysis of quantiles using Bootstrap. tea-tasting also detects mismatches in the sample ratios of different variants of an A/B test.
tea-tasting applies delta method for the analysis of ratios of averages. For example, average number of orders per average number of sessions, assuming that session is not a randomization unit.
Use pre-experiment data, metric forecasts, or other covariates to reduce variance and increase the sensitivity of an experiment. This approach is also known as CUPED or CUPAC.
The calculation of confidence intervals for percentage change in Student's t-test and Z-test can be tricky. Just taking confidence interval for absolute change and dividing it by control average will produce a biased result. tea-tasting applies delta method to calculate the correct interval.
Analyze statistical power for Student's t-test and Z-test. There are three possible options:
Learn more in the detailed user guide.
The roadmap includes:
You can define a custom metric with a statistical test of your choice.
There are many different databases and engines for storing and processing experimental data. And in most cases it's not efficient to pull the detailed experimental data into a Python environment. Many statistical tests, such as the Student's t-test or the Z-test, require only aggregated data for analysis.
For example, if the raw experimental data are stored in ClickHouse, it's faster and more efficient to calculate counts, averages, variances, and covariances directly in ClickHouse rather than fetching granular data and performing aggregations in a Python environment.
Querying all the required statistics manually can be a daunting and error-prone task. For example, analysis of ratio metrics and variance reduction with CUPED require not only number of rows and variance, but also covariances. But don't worry—tea-tasting does all this work for you.
tea-tasting accepts data either as a Pandas DataFrame or an Ibis Table. Ibis is a Python package which serves as a DataFrame API to various data backends. It supports 20+ backends including BigQuery, ClickHouse, PostgreSQL/GreenPlum, Snowflake, and Spark. You can write an SQL query, wrap it as an Ibis Table, and pass it to tea-tasting.
Keep in mind that tea-tasting assumes that:
Some statistical methods, like Bootstrap, require granular data for the analysis. In this case, tea-tasting fetches the detailed data as well.
Learn more in the guide on data backends.
You can perform all the tasks listed above using just NumPy, SciPy, and Ibis. In fact, tea-tasting uses these packages under the hood. What tea-tasting offers on top is a convenient higher-level API.
It's easier to show than to describe. Here is the basic example:
import tea_tasting as tt data = tt.make_users_data(seed=42) experiment = tt.Experiment( sessions_per_user=tt.Mean("sessions"), orders_per_session=tt.RatioOfMeans("orders", "sessions"), orders_per_user=tt.Mean("orders"), revenue_per_user=tt.Mean("revenue"), ) result = experiment.analyze(data) print(result) #> metric control treatment rel_effect_size rel_effect_size_ci pvalue #> sessions_per_user 2.00 1.98 -0.66% [-3.7%, 2.5%] 0.674 #> orders_per_session 0.266 0.289 8.8% [-0.89%, 19%] 0.0762 #> orders_per_user 0.530 0.573 8.0% [-2.0%, 19%] 0.118 #> revenue_per_user 5.24 5.73 9.3% [-2.4%, 22%] 0.123
The two-stage approach, with separate parametrization and inference, is common in statistical modeling. This separation helps in making the code more modular and easier to understand.
tea-tasting performs calculations that can be tricky and error-prone:
It also provides a framework for representing experimental data to avoid errors. Grouping the data by randomization units and including all units in the dataset is important for correct analysis.
In addition, tea-tasting provides some convenience methods and functions, such as pretty formatting of the result and a context manager for metric parameters.
Last but not least: documentation. I believe that good documentation is crucial for tool adoption. That's why I wrote several user guides and an API reference.
I recommend starting with the example of basic usage in the user guide. Then you can explore specific topics, such as variance reduction or power analysis, in the same guide.
See the guide on data backends to learn how to use a data backend of your choice with tea-tasting.
See the guide on custom metrics if you want to perform statistical test that is not included in tea-tasting.
Use the API reference to explore all parameters and detailed information about the functions, classes, and methods available in tea-tasting.
There are a variety of statistical methods that can be applied in the analysis of an experiment. But only a handful of them are actually used in most cases.
On the other hand, there are methods specific to the analysis of A/B tests that are not included in the general purpose statistical packages like SciPy.
tea-tasting functionality includes the most important statistical tests, as well as methods specific to the analysis of A/B tests.
tea-tasting provides a convenient API that helps to reduce the time spent on analysis and minimize the probability of error.
In addition, tea-tasting optimizes computational efficiency by calculating the statistics in the data backend of your choice, where the data are stored.
With the detailed documentation, you can quickly learn how to use tea-tasting for the analysis of your experiments.
The package name "tea-tasting" is a play on words that refers to two subjects:
The above is the detailed content of tea-tasting: a Python package for the statistical analysis of A/B tests. For more information, please follow other related articles on the PHP Chinese website!