Skip to main content

Is A/B testing that scary?

Reader AR pointed me to this Fast Company article that examines the ethics of A/B testing. The only way to comprehend this point of view is to think of A/B testing not as a scientific experiment but as a decision-making process that involves running an experiment. The researchers are unhappy that A/B tests could lend support to decisions that have undesirable impact on society. Two such examples are described: Two images are tested for a job ad. During the test, site visitors were shown one of the two images, selected at random. The winner of the test is an image that disproportionately drives male applicants. Separate pricing tests are run in different zip codes. The "winning" prices at the conclusion of these tests are different for different zip codes. Because racial profiles differ by zip code, prices are in effect different for different races. Therefore, the test result leads to race-based discrimination. There are two important questions to discuss here. First, what is the alternative to A/B testing? Is that method of decision-making better? Second, is the harm produced by the experiment itself, or by the decision made as a result?   Alternatives to A/B Testing Consider the image test described above. Presumably, the test is run because someone believes that one of those two images might perform better at driving applicants. At most companies, a test sees the light of day after teams of people debate and prioritize testing ideas. If the test including a sexist image is run, then the team in charge of testing has approved it for some (possibly bad) reason. If they didn't have A/B testing, how would they have decided which image to run? And if the image is not explicitly sexist - in other words, if the analyst had to analyze the data to learn that one image drove more male applicants - how would that insight be surfaced without running the other image? The alternative decision process may be even worse. It is certainly true that automated A/B testing is risky - because no human beings are involved in turning test results into actions. The absence of humans is usually touted as a benefit by vendors of such testing tools. In this example, a human analyst reports on the test result, and includes the analysis by gender showing that while total applications increased, the winning image disproportionately attracted male applicants. The decision-makers can and should decide not to adopt the winning image based on that analysis. The A/B test revealed the bias but did not cause it. Even without the gender issue, such analysis and discussion of results is necessary. For example, ad clicks can be generated by placing ads near scroll bars to stimulate accidental clicking. Human analysts can report that clicks increased but only through accidental clicking. The decision-makers can and should decide not to implement the winning design. From where does the harm come? The other example is more far-fetched. I am reverse-engineering the pricing test as described. Given that the test led to different prices for different zip codes, they would be running separate A/B tests stratified by zip code. Given the law of supply and demand, it might be the case that the winning price would be lower in poorer zip codes and higher in richer zip codes. This definitely results in price discrimination by zip code. If the design team did not want price discrimination by zip code, then such a test design would not have been approved so the test itself isn't creating harm. Further, race-based price discrimination is accused because zip codes are correlated with race. Almost all variables are correlated with race. Age is correlated with race, so are income, education, what websites one visits, etc. So this standard leads to a banning of all segmentation and targeting policies. The only possible pricing policy would be one price for all. *** In short, human supervision of A/B testing from design to interpretation is definitely needed. A/B tests provide a wealth of data to support decision-making. The biases highlighted by the Fast Company article are merely revealed by the testing - they are not caused by it.  

from Big Data, Plainly Spoken (aka Numbers Rule Your World) https://ift.tt/2Vr9Yvz
via IFTTT

Comments

Popular posts from this blog

Controlling legend appearance in ggplot2 with override.aes

[This article was first published on Very statisticious on Very statisticious , and kindly contributed to R-bloggers ]. (You can report issue about the content on this page here ) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. In ggplot2 , aesthetics and their scale_*() functions change both the plot appearance and the plot legend appearance simultaneously. The override.aes argument in guide_legend() allows the user to change only the legend appearance without affecting the rest of the plot. This is useful for making the legend more readable or for creating certain types of combined legends. In this post I’ll first introduce override.aes with a basic example and then go through three additional plotting scenarios to how other instances where override.aes comes in handy. Table of Contents R packages Introducing override.aes Adding a guides() layer Using the guide argument in scale_*() Changing multiple aesthetic par...

Using RStudio and LaTeX

(This article was first published on r – Experimental Behaviour , and kindly contributed to R-bloggers) This post will explain how to integrate RStudio and LaTeX, especially the inclusion of well-formatted tables and nice-looking graphs and figures produced in RStudio and imported to LaTeX. To follow along you will need RStudio, MS Excel and LaTeX. Using tikzdevice to insert R Graphs into LaTeX I am a very visual thinker. If I want to understand a concept I usually and subconsciously try to visualise it. Therefore, more my PhD I tried to transport a lot of empirical insights by means of  visualization . These range from histograms, or violin plots to show distributions, over bargraphs including error bars to compare means, to interaction- or conditional effects of regression models. For quite a while it was very tedious to include such graphs in LaTeX documents. I tried several ways, like saving them as pdf and then including them in LaTeX as pdf, or any other file ...