DataScience4you2me

Posts

Showing posts with the label Causal Inference

The Pfizer-Biontech Vaccine May Be A Lot More Effective Than You Think?

Ian Fellows writes: I [Fellows] just wrote up a little Bayesian analysis that I thought you might be interested in. Specifically, everyone seems fixated on the 90% effectiveness lower bound reported for the Pfizer vaccine, but the true efficacy is likely closer to 97%. Please let me know if you see any errors. I’m basing it off of a press release, which is not ideal for scientific precision. Here’s Fellows’s analysis: Yesterday an announcement went out that the SARS-CoV-2 vaccine candidate developed by Pfizer and Biontech was determined to be effective during an interim analysis. This is fantastic news. Perhaps the best news of the year. It is however another example of science via press release. There is very limited information contained in the press release and one can only wonder why they couldn’t take the time to write up a two page report for the scientific community. That said, we can draw some inferences from the release that may help put this in context. From the pres...

Lying with statistics

As Deb Nolan and I wrote in our book, Teaching Statistics: A Bag of Tricks, the most basic form of lying with statistics is simply to make up a number. We gave the example of Senator McCarthy’s proclaimed (but nonexistent) list of 205 Communists, but we have a more recent example : One of the supposed pieces of evidence [of votes being recorded for dead people] was a list that circulated on Twitter Thursday evening allegedly containing names, birth dates, and zip codes for registered voters in Michigan. The origin of the list and the identity of the person who first made it public are not known. CNN examined 50 of the more than 14,000 names on the list by taking the first 25 names on the list and then 25 more picked at random. We ran the names through Michigan’s Voter Information database to see if they requested or returned a ballot. We then checked the names against publicly available records to see if they were indeed dead. Of the 50, 37 were indeed dead and had not voted, accor...

Sh*ttin brix in the tail…

After my conversation with Andrew yesterday about The Economist election forecasting model I got curious about how G. Elliot, Merlin and Andrew want their prediction to be assessed given the menu of strange contingencies we have in front of us. I checked Betfair rules for some guidance: This market will be settled according to the candidate that has the most projected Electoral College votes won at the 2020 presidential election. Any subsequent events such as a ‘faithless elector’ will have no effect on the settlement of this market. In the event that no Presidential candidate receives a majority of the projected Electoral College votes, this market will be settled on the person chosen as President in accordance with the procedures set out by the Twelfth Amendment to the United States Constitution. This market will be void if an election does not take place in 2020. If more than one election takes place in 2020, then this market will apply to the first election that is h...

As a forecaster, how important is it to “have a few elections under your belt”?

Kevin Lewis pointed me to this comment from Nate Silver on a recent post : Having a few elections under your belt helps a *lot*. No matter how much you test things in the lab, there are some things you’re going to learn only by seeing how your forecast reacts to real data in real time. (I’m sure this applies to lots of other stuff too.) It’s an interesting thought. Nate and I both have experience with election forecasting: he’s been doing it since 2008 and I’ve been doing it since 1992, on and off. And, as Nate wrote, our forecasts are pretty similar, so I guess I can take his comment as being very positive!, in that he’s putting us (the Economist) and them (Fivethirtyeight) in the same category, as the product of experienced forecasters who have learned by seeing how our forecasts react to real data in real time and have ended up with similar results. I do agree that our forecasts are similar, especially at the national level. We have some differences in how we handle polls, whi...

Prediction markets and election forecasts

Zev Berger writes: The question sounds snarky, but it’s not meant in that vein. It’s instructive to hear how modelers understand the predictions of their models, which is something I am still trying to think through. Your model has the chance of Biden being elected at 0.95. Predictit has Biden at 0.60. Given the spread, do you have money on a Biden victory? My reply: I wrote about this here and in section 2.6 of this article . Relatedly, I received this email from Harry Crane: Writing to call your attention to joint work with Darrion Vinson, which may be of interest to your readers. We’re running a study that compares statistical forecasts against prediction markets for 2020 election cycle. We’re pre-registering our analysis by posting our methods ahead of time. The first version is here . We also have an app that tracks the performance over time. Currently we’re only comparing forecasts from 538 to the market at PredictIt. I understand you’ve also designed a model for th...

“Valid t-ratio Inference for instrumental variables”

A couple people pointed me to this recent econometrics paper, which begins: In the single IV model, current practice relies on the first-stage F exceed- ing some threshold (e.g., 10) as a criterion for trusting t-ratio inferences, even though this yields an anti-conservative test. We show that a true 5 percent test instead requires an F greater than 104.7. Maintaining 10 as a threshold requires replacing the critical value 1.96 with 3.43. We re-examine 57 AER papers and find that corrected inference causes half of the initially presumed statistically significant results to be insignificant. We introduce a more powerful test, the tF procedure, which provides F-dependent adjusted t-ratio critical values. I don’t like this sort of thing as it seems to be focusing on binary decisions in a way that seems inappropriate to me. To me, this sort of paper is the rough equivalent of some sort of Talmudic argument about whether God can dig a ditch so wide he can’t jump across it. I just don’t...

Birthday data!

Someone asked us for the birthday data , and Aki replied: We used 1969-1989 also in BDA3 https://ift.tt/2w8eCYl And there we mention that the birthday data come from the National Vital Statistics System natality data and are at https://ift.tt/2jsRz1R, provided by Robert Kern using Google BigQuery. The code for the BDA3 example is at https://ift.tt/2ToWXDF (with link to data, too) We have also a more recent paper using the data https://ift.tt/3ktWUT9 with R and Stan code available at https://ift.tt/3kpl4ho and the copy of the data is in the same repo at https://ift.tt/34vkbhX from Statistical Modeling, Causal Inference, and Social Science https://ift.tt/3ouG5dc via IFTTT

Merlin did some analysis of possible electoral effects of rejections of vote-by-mail ballots . . .

Elliott writes : Postal voting could put America’s Democrats at a disadvantage: Rejection rates for absentee ballots have fallen since 2016, but are higher for non-whites than whites The final impact of a surge in postal voting will not be known until weeks after the election. Yet North Carolina, a closely contested state, releases detailed data on ballots as they arrive. So far, its figures suggest that a tarnished election is unlikely—but that Democrats could be hurt by their disproportionate embrace of voting by mail. . . . The Tar Heel state has received eight times as many postal votes as it had by this point in 2016. Despite fears about first-time absentee voters botching their ballots, the share that are rejected has in fact fallen to 1.3%, from 2.6% in 2016. This is probably due in part to campaigns educating supporters on voting by mail, and also to new efforts by the state to process such ballots. However, these gains have been concentrated among white and richer voters...

Some wrong lessons people will learn from the president’s illness, hospitalization, and expected recovery

Jonathan Falk writes about the president’s illness: I [Falk] would think it provides a focused opportunity to make a few salient statistical education points. First, a 6 percent mortality rate (among old people with comorbidities) is really bad, but any single selected person is really quite unlikely to die, or even be really sick. Same with all the reports about blood clots, six month recovery times, etc., etc. Even more unlikely. A prediction: when Trump feels fine in a couple of days this will be taken as one more piece of evidence that this is not a serious disease, which is statistically illiterate on a number of levels. Second, the reference group (old people with comorbidities) implicitly assumes a standard level of care. (Changes in the standard of care is one of the main reasons the death rate has fallen so much from the average.) Trump’s probabilities are way better than that because he gets care that very few other people in the world get. Third, it will be interesting ...

Quino y Mafalda

Obit by Harrison Smith , full of stories: She was a wise and idealistic young girl, a cartoon kid with a ball of black frizz for hair, a passionate hatred of soup and a name, Mafalda, inspired by a failed home appliance brand. Although her creator, a cartoonist known as Quino, drew her regularly for just nine years, the Argentine comic strip “Mafalda” became a cultural touchstone across Latin America and Europe, examining issues such as nationalism, war and environmental destruction just as Argentina’s democracy was giving way to dictatorship. When Mafalda spots workmen trying to locate a gas leaks, she asks: “Are you searching for our national roots?” In another sequence, Mafalda’s pet turtle is revealed to have an unusual name, Bureaucracy. When a friend asks why she gave it that name, Mafalda replies that she needs to come back the next day for more information. She can’t say exactly when. . . . from Statistical Modeling, Causal Inference, and Social Science https://ift.tt/3...

Does this fallacy have a name?

Rafa Irizarry writes: What do we call it when someone thinks cor(Y,X) = 0 because lim h -> 0 cor( X, Y | X \in (x-h, x+h) ) = 0 Example: Steph, Kobe, and Jordan are average (or below average) height in the NBA so height does not predict being good at basketball. GRE math scores don’t predict success in a Math Phd program so you don’t need to know GRE level Math to enter Math PhD program: https://ift.tt/3kTMv2y I can’t find a name for it. My reply: I don’t know if there’s a name for it. It’s indeed a well known point—I guess that Gauss, Laplace, Galton, etc., knew about it. We make the point in the attached figure from my two books with Jennifer Hill. Here it is in Regression and Other Stories: I’ll blog and see if anyone out there knows the name of the fallacy. from Statistical Modeling, Causal Inference, and Social Science https://ift.tt/2S41cDS via IFTTT

Election Scenario Explorer using Economist Election Model

Ric Fernholz writes: I wanted to tell you about a new website I built together with my brother Dan. The 2020 Election Scenario Explorer allows you to explore how electoral outcomes in individual states influence the national election outlook using data from your election model . The map and tables on our site reveal some interesting observations about the election and your model. The site provides a measure of the influence of different states using the expected reduction in entropy or variance, following the data generated by your model. Several of the most influential states according to this measure differ from those states emphasized by more common “tipping point” analyses. I appreciate you sharing your code and simulation output with the public, as this made our project possible. Open data and code ftw! P.S. They should round those numbers to the nearest percentage point (see section 2.1 of this article ). from Statistical Modeling, Causal Inference, and Social Scienc...

Problem of the between-state correlations in the Fivethirtyeight election forecast

Elliott writes: I think we’re onto something with the low between-state correlations [see item 1 of our earlier post ]. Someone sent me this collage of maps from Nate’s model that show: – Biden winning every state except NJ – Biden winning LA and MS but not MI and WI – Biden losing OR but winning WI, PA And someone says that in the 538 simulations where Trump wins CA, he only has a 60% chance of winning the elec overall. Seems like the arrows are pointing to a very weird covariance structure. I agree that these maps look really implausible for 2020. How’s Biden gonna win Idaho, Wyoming, Alabama, etc. . . . but not New Jersey? But this does all seem consistent with correlations of uncertainties between states that are too low. Perhaps this is a byproduct of Fivethirtyeight relying too strongly on state polls and not fully making use of the information from national polls and from the relative positions of the states in previous elections. If you think of the goal as fore...

Statistics is hard, especially if you don’t know any statistics (FDA edition)

Paul Alper shares this story : From the NYT: Dr. Stephen M. Hahn, the commissioner of the Food and Drug Administration, said 35 out of 100 Covid-19 patients “would have been saved because of the administration of plasma.” He later walked this back because of confusion between Absolute Risk Reduction and Relative Risk Reduction, a common error usually promoted by drug manufacturers because relative improvement appears more dramatic to the beholder. He [Hahn] clarified that his earlier statements suggested an absolute reduction in risk, instead of the relative risk of a certain group of patients compared with another. The chart, analyzing the same tiny subset of Mayo Clinic study patients, did not include numerical figures, but it appeared to indicate a 30-day survival probability of about 63 percent in patients who received plasma with a low level of antibodies, compared with about 76 percent in those who received a high level of antibodies. From the FDA: “there appears to ...

This is your chance to comment on the U.S. government’s review of evidence on the effectiveness of home visiting. Comments are due by 1 Sept.

Emily Sama-Miller writes: The federally sponsored Home Visiting Evidence of Effectiveness (HomVEE) systematic evidence review is seeking public comment on proposed updates to its standards and procedures. HomVEE reviews research literature on home visiting for families with pregnant women and children from birth to kindergarten entry, and its results are used to inform federal funding decisions. HomVEE is sponsored by the Office of Planning, Research, and Evaluation for the Administration for Children and Families (ACF) in the U.S. Department of Health and Human Services. HomVEE has released a draft Version 2 Handbook that describes updated procedures and standards for the systematic review. ACF is seeking public comment in response to two Federal Register notices that summarize key proposed updates, one about clarifying and updating procedures standards for rating research quality and one about defining and reviewing different versions of home visiting models . The full draft hand...

Himmicanes again

Gary Smith gives a clear non-technical explanation of why not to take that himmicanes study seriously. Further background here . from Statistical Modeling, Causal Inference, and Social Science https://ift.tt/2Chdsg2 via IFTTT

The U.S. high school math olympiad champions of the 1970s and 1980s: Where were they then?

George Berzsenyi writes: Here is the last issue of the USAMO [math olympiad] Newsletter that was edited by Nura Turner and Tsz-Mei Ko along with a couple of additional summaries about the IMO participants. Concerning the Newsletter, I [Berzsenyi] just learned from Tsz-Mei Ko that it was the last issue. At the time, I really wanted to do well in these competitions. In retrospect, I think it worked out best that I did ok but not great . from Statistical Modeling, Causal Inference, and Social Science https://ift.tt/33PL0NA via IFTTT

StanCon 2020 is on Thursday!

For all that registered for the conference, THANK YOU! We, the organizers, are truly moved by how global and inclusive the community has become. We are currently at 230 registrants from 33 countries. And 25 scholarships were provided to people in 12 countries. Please join us. Registration is $50. We have scholarships still available (more info on the registration page). Updates Videos for contributed talks and developer talks are online! Register now and you’ll be sent a password. Our plenary speakers have all been confirmed (these will happen live at StanCon): Seth Flaxman ; Imperial College, London; “Hierarchical Models for Covid – identifying effects of lockdown and an R package” Moriba Jah ; Oden Institute for Computational Engineering & Sciences; “Multi-Source Information Modeling, Curation, and Fusion Enabling Transdisciplinary Decision-Making: A Case for Space!” David Shor ; “STAN and US Politics” Thank you to our sponsors, Metrum Research Group an...

Jobzzzzzz!

It’s a busy day for Bayesians. John Haman writes: The Institute for Defense Analyses – Operational Evaluation Division (OED) is looking for a Bayesian statistician to join its Test Science team. Test Science is a group of statisticians, data scientists, and psychologists that provides expertise on experimentation to the DoD. In particular, we are looking for a Bayesian statistician to help our naval warfare group use the results from past test events to inform the design and analysis of future test events. Candidates will also need to have a background in experimental design, strong public speaking and writing skills, and be able to work well on group projects. US citizenship required. The job ad is here: idalink.org/Statistician_RA More info about IDA and Test Science: Testscience.org Ida.org Contact Heather Wojton ( hwojton@ida.org ) with any questions. And Macartan Humphreys writes: Possibly of interest for statisticians / social scientists looking for a p...

StanCon 2020 program is now online!

This year’s Stan Conference is on August 13, 2020 (next Thursday)! The program has been finalized and is online . So far, we’re at 89 registrants spanning across 17 countries! Registration is $50, which includes swag. There are scholarships available for those that need financial support. If you’re a Stan developer, there’s a discount (see the forums ). Our vision for this year’s conference: All virtual. We’re trying our best to enable the interactions at StanCon that make the event special. We’re using a service, Remo, that has tables where people can gather around and chat. Global and inclusive. There are 3 sessions that last 2-3 hours that are spaced 8 hours apart. Each session has its own plenary speaker and six discussions (total: 3 plenaries, 18 contributed talks, and 4 developer talks). One of the contributed talks will be recorded in 6 languages (English, Catalan, Spanish, Hindi, French, Finnish)! The format. All contributed talks will be distributed and available pr...