Translator | Zhu Xianzhong
Reviewer|Sun Shujuan
In my previous blog, we have explainedhow to use causal trees to evaluate policy heterogeneity HandlingEffect. If you haven't read, I suggest you read it before reading this article, because weIn this articleI thinkYou have understood the part of the in this article and the content related to this article.
Why are heterogenous treatment effects (HTE: heterogenous treatment effects) ? First, the estimation of heterogeneous treatment effects allows us to condition their on their expected outcomes (illness, firm revenue, customer satisfaction etc.) Users (patients, users, customers, etc.) who choose to provide processing (drugs, advertisements, products, etc.). In other words, it is estimated that HTE helps us in targeting. In fact, as we will see later in the article, one processing method is bringing positive results to some users. While beneficial, it may be ineffective or even counterproductive on average. The opposite may also be true: A drug is effective on average, but isif weare clearIt has side effects for usersIf the information#the effectiveness of this drugwillFurther improve.
#In this article, we will explore an extension of the causal tree - the causalforest forest. Just as random forests expand regression trees by averaging multiple bootstrap trees together, causal forests also expand causal trees. The main difference comes from the reasoning perspective, which is less straightforward. We will also see how the outputs of different HTE estimation algorithms can be compared and how they can be used for policy objectives. Online Discount Case
In the rest of this article, we continue to useMy last article about Toy example used in the Cause and Effect Tree article: Let’s assume we are an online store and we are interested in knowing whether offering discounts to new customers will increase their spending in the store. To understand whether the discount is a good deal, we conducted the following randomized experiment, or A/B test: Every time a new customer browses our online store, we randomly assign them to a treatment condition. We offer discounts to users who are processed; and we do not offer discounts to users who are in control. I import the data generation process dgp_online_discounts() from filesrc.dgp. I also import some drawing functions and libraries from src.utilslibraries. In order to include not only code, but also data and tables, I used the Deepnote framework, which is a Jupyter-based Web's collaborative notebook environment. observewhether they got a discount, how we handled it, how much they spent, and some other interesting results. experiment is randomly assigned, we can use a simple mean difference estimate to estimate ExperimentEffect. We expect the experiment and control groups to be similar except for the discount, so we can attribute any differences in spending to the discount. Looks like seems to be valid:ExperimentThe average spend increased by $1.95. But are all customers affected equally? treatmenteffect, Probably on an individual level. Causal Forest Heterogeneous treatment effectsThere are many different options . The simplest approach is to interact with the outcome of interest in terms of heterogeneity dimensions. The problem with this approach is which variable to choose. Sometimes, we have information that may guide our actions in advance; for example, we may know that mobile users spend more on average than desktop users many. Other times, we may be interested in a certain dimension for commercial reasons; for example, we may want to invest more in a certain region. However, when we have no additional information, we want the process to be data-driven. In a previous article, we explored a data-driven approach to estimating heterogeneous treatment effects Random forest, as the name suggests, is an extension of the regression tree, adding two independent sources of randomness to it. In particular, the random forest algorithm is able to make predictions on many different regression trees, each on a bootstrap sample Do the training and average them together. This process is often called the guided aggregation algorithm, also known as the bagging algorithm, and can be applied to any prediction algorithm and is not specific to random forests. An additional source of randomness comes from feature selection, since at each split only a random subset of all features X are considered for the optimal split. #These two additional sources of randomness are very important and help improve the performance of random forests. First, the bagging algorithm allows random forests to produce smoother predictions than regression trees by averaging multiple discrete predictions. In contrast, random feature selection allows random forests to explore the feature space more deeply, allowing them to discover more interactions than simple regression trees. In fact, there may be interactions between variables that are not very predictive on their own (and therefore not divisive), but are very powerful together. Causal forest is equivalent to random forest, but used to estimate heterogeneous treatment effects, unlike causal trees It is exactly the same as regression tree. As with causal trees, we have a basic problem: we are interested in predicting an object that we have not observed: the individual treatmenteffectτᵢ. The solution is to create an auxiliary result variable Y* whose expected value for each observation is exactly the processing effect. Auxiliary result variable If you want to understand morewhy this variable has no effect on the individualtreatmentadding bias, please take a look at My previous article, I am here## A detailed introduction is given in # articles. In short, you canthinkYᵢ* as an estimator of the mean difference for a single observation. Once we have an outcome variable, there are a few more things we need to do in order to use random forests to estimate heterogeneous treatment effects . First, we need to build the tree with the same number of processing units and control units on each leaf. Secondly, we need to use different samples to build the tree and evaluate it, i.e. calculate the average result for each leaf. This process is often called honest trees because we can treat the samples of each leaf as independent of the tree structure, so it is very useful for inference. Before proceeding with the evaluation, let us first generate dummy variables for the categorical variables device, browser and region . df_dummies = pd.get_dummies(df[dgp.X[1:]], drop_first=True)
df = pd.concat([df, df_dummies], axis=1)
X = ['time'] + list(df_dummies.columns)
. Fortunately, we don't have to do all of this manually, because is already available in Microsoft's EconML package Provides a good causal tree and forest implementation. We will use the CausalForestML function from .
from econml.dml import CausalForestDML np.random.seed(0) forest_model = CausalForestDML(max_depth=3) forest_model =[dgp.Y], X=df[X], T=df[dgp.D])
from econml.cate_interpreter import SingleTreeCateInterpreter intrp = SingleTreeCateInterpreter(max_depth=2).interpret(forest_model, df[X]) intrp.plot(feature_names=X, fnotallow=12)
def compute_discrete_effects(df, hte_model): temp_df = df.copy() temp_df.time = 0 temp_df = dgp.add_treatment_effect(temp_df) temp_df = temp_df.rename(columns={'effect_on_spend': 'True'}) temp_df['Predicted'] = hte_model.effect(temp_df[X]) df_effects = pd.DataFrame() for var in X[1:]: for effect in ['True', 'Predicted']: v = temp_df.loc[temp_df[var]==1, effect].mean() - temp_df[effect][temp_df[var]==0].mean() effect_var = {'Variable': [var], 'Effect': [effect], 'Value': [v]} df_effects = pd.concat([df_effects, pd.DataFrame(effect_var)]).reset_index(drop=True) return df_effects, temp_df['Predicted'].mean() df_effects, avg_effect_notime = compute_discrete_effects(df, forest_model)
fig, ax = plt.subplots() sns.barplot(data=df_effects, x="Variable", y="Value", hue="Effect", ax=ax).set( xlabel='', ylabel='', title='Heterogeneous Treatment Effects') ax.set_xticklabels(ax.get_xticklabels(), rotatinotallow=45, ha="right");
def compute_time_effect(df, hte_model, avg_effect_notime):
df_time = df.copy() df_time[[X[1:]] + ['device', 'browser', 'region']] = 0 df_time = dgp.add_treatment_effect(df_time) df_time['predicted'] = hte_model.effect(df_time[X]) + avg_effect_notime return df_time df_time = compute_time_effect(df, forest_model, avg_effect_notime)
sns.scatterplot(x='time', y='effect_on_spend', data=df_time, label='True') sns.scatterplot(x='time', y='predicted', data=df_time, label='Predicted').set( ylabel='', title='Heterogeneous Treatment Effects') plt.legend(title='Effect');
cost = 4
from econml.dml import CausalForestDML np.random.seed(0) tree_model = CausalForestDML(n_estimators=1, subforest_size=1, inference=False, max_depth=3) tree_model =[dgp.Y], X=df[X], T=df[dgp.D])
df_train, df_test = df.iloc[:80_000, :], df.iloc[20_000:,]
np.random.seed(0) tree_model =[dgp.Y], X=df_train[X], T=df_train[dgp.D]) forest_model =[dgp.Y], X=df_train[X], T=df_train[dgp.D])
def compute_toc(df, hte_model, cost, truth=False): df_toc = pd.DataFrame() for q in np.linspace(0, 1, 101): if truth: df = dgp.add_treatment_effect(df_test) effect = df['effect_on_spend'] else: effect = hte_model.effect(df[X]) ate = np.mean(effect[effect >= np.quantile(effect, 1-q)]) temp = pd.DataFrame({'q': [q], 'ate': [ate]}) df_toc = pd.concat([df_toc, temp]).reset_index(drop=True) return df_toc df_toc_tree = compute_toc(df_train, tree_model, cost) df_toc_forest = compute_toc(df_train, forest_model, cost)
def plot_toc(df_toc, cost, ax, color, title): ax.axhline(y=cost, lw=2, c='k') ax.fill_between(x=df_toc.q, y1=cost, y2=df_toc.ate, where=(df_toc.ate > cost), color=color, alpha=0.3) if any(df_toc.ate > cost): q = df_toc_tree.loc[df_toc.ate > cost, 'q'].values[-1] else: q = 0 ax.axvline(x=q, ymin=0, ymax=0.36, lw=2, c='k', ls='--') sns.lineplot(data=df_toc, x='q', y='ate', ax=ax, color=color).set( title=title, ylabel='ATT', xlabel='Share of treated', ylim=[1.5, 8.5]) ax.text(0.7, cost+0.1, f'Discount cost: {cost:.0f}$', fnotallow=12) fix, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 6)) plot_toc(df_toc_tree, cost, ax1, 'C0', 'TOC - Causal Tree') plot_toc(df_toc_forest, cost, ax2, 'C1', 'TOC - Causal Forest')
def compute_effect_test(df_test, hte_model, cost, ax, title, truth=False): df_test['Treated'] = hte_model.effect(df_test[X]) > cost if truth: df_test = dgp.add_treatment_effect(df_test) df_test['Effect'] = df_test['effect_on_spend'] else: np.random.seed(0) hte_model_test = copy.deepcopy(hte_model).fit(Y=df_test[dgp.Y], X=df_test[X], T=df_test[dgp.D]) df_test['Effect'] = hte_model_test.effect(df_test[X]) df_test['Cost Effective'] = df_test['Effect'] > cost tot_effect = ((df_test['Effect'] - cost) * df_test['Treated']).sum() sns.barplot(data=df_test, x='Cost Effective', y='Treated', errorbar=None, width=0.5, ax=ax, palette=['C3', 'C2']).set( title=title + 'n', ylim=[0,1]) ax.text(0.5, 1.08, f'Total effect: {tot_effect:.2f}', fnotallow=14, ha='center') return fix, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5)) compute_effect_test(df_test, tree_model, cost, ax1, 'Causal Tree') compute_effect_test(df_test, forest_model, cost, ax2, 'Causal Forest')
from sklearn.metrics import mean_squared_error as mse def compute_mse_test(df_test, hte_model): df_test = dgp.add_treatment_effect(df_test) print(f"MSE = {mse(df_test['effect_on_spend'], hte_model.effect(df_test[X])):.4f}") compute_mse_test(df_test, tree_model) compute_mse_test(df_test, forest_model)
fix, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5)) compute_effect_test(df_test, tree_model, cost, ax1, 'Causal Tree', True) compute_effect_test(df_test, forest_model, cost, ax2, 'Causal Forest', True)
df_toc = compute_toc(df_test, tree_model, cost, True) fix, ax = plt.subplots(1, 1, figsize=(7, 5)) plot_toc(df_toc, cost, ax, 'C2', 'TOC - Ground Truth')
In this article, we learned a functionvery Powerful #algorithms for estimating heterogeneous treatment effects - causal forests. Causal forests are built on the same principles as causal trees, but benefit from a deeper exploration of parameter space and bagging algorithms.
In addition,we alsolearnedhow to useEstimation of heterogeneous treatment effects to implement policypositioning. By identifying users with the highest processing effectiveness, we are able to ensure that a policy is profitable. We also see that the policy objectives differ from the heterogeneous treatment effects assessment objectives, as the tails of the distribution may have stronger # than the mean ##Correlation.
ReferencesZhu Xianzhong, 51CTO community editor, 51CTO expert blogger, lecturer, Weifang No.1 A computer teacher in a university and a veteran in the field of freelance programming.
Original title: From Causal Trees to Forests, by Matteo Courthoud
The above is the detailed content of Decision-making positioning application based on causal forest algorithm. For more information, please follow other related articles on the PHP Chinese website!