A central portal to navigate through my Data Science writings

My Top Data Science Stories (Updated Monthly) Leihua Ye phd
Photo by James Everitt on Unsplash


Welcome to my Data Science Blog. My name is Leihua Ye, and I sincerely hope 2021 has treated you gently.

I wear multiple hats: an academic researcher at the University of California, Santa Barbara for the day and a Top Writer in Artificial Intelligence and Data Science for the night.

Getting Started, Experimentation and Causal Inference

How not to fail your online controlled experimentation

Experimentation and Causal Inference 8 Common Pitfalls of Running A/B Tests How to fail your online controlled experimentation
Photo by Rolf Blicher Godfrey on Unsplash

Online experimentation has become the industry standard for product innovation and decision-making. With well-designed A/B tests, tech companies can iterate their product lines quicker and provide better user experiences. Among FAANG, Netflix is the company most open about its experimental approach. …

Experimentation and Causal Inference

Best practices that data scientists should follow pre-, during-, and after- experiments

Photo by niko photos on Unsplash


Randomized Controlled Trials (aka. A/B tests) are the gold standard of establishing causal inference. RCTs strictly control for the randomization process and ensure equal distributions across covariates before rolling out the treatment. Thus, we can attribute the mean difference between the treatment and control groups to the intervention.

A/B tests…

Experimentation and Causal Inference

A statistical approach to A/A tests

Experimentation and Causal Inference A Statistical Approach to A/A Tests What it is? Why do you need? How to do it?
Photo by Andy Salazar on Unsplash


A rigorous process of experimentation, aka., A/B tests, has become trendy and widely adopted in the tech sector. As the early adopters, FAANG companies have incorporated experimentation into their decision-making process.

For example, Microsoft Bing conducts A/B tests on 80% of its product changes. Google resorts to experimentation to…


Three solutions from Lyft, LinkedIn, and Doordash

EXPERIMENTATION AND CAUSAL INFERENCE How User Interference May Mess Up Your A/B Tests? Three solutions from Lyft, LinkedIn, and Doordash
Photo by Thom Holmes on Unsplash


A rigorous process of A/B testing generates valuable insights about consumer behaviors directly related to the success of a product. …

Career In Tech

What works for me may also work for you

Career In Tech How I Grow As A Data Scientist What works for me may also work for you Best advices in tech sector
Photo by Hu Chen on Unsplash

My Data Story

I was once told that I have an interesting career path to Data Science.

Yeah, my educational background is a mix of hard and soft sciences. Almost all of my undergraduate, Master’s, and Ph.D. degrees were in Social Sciences. …


It’s about my content strategy on Medium and LinkedIn

Writing A Personal Update and Two Industry Trends It’s about my content strategy on Medium and LinkedIn
Photo by Adam Kool on Unsplash


Thank you all for reading, clapping, following, highlighting, and “private-noting” my work in Data Science. Truly appreciate the engagement. The past few months have been determining in my career, and things have settled down a little bit. I will hold a press conference on the due date so you guys…

Technical Writing

Where and how to start

Photo by Vincent van Zalinge on Unsplash


Next month (September 2021) marks my two-year anniversary of being active on Medium. Writing, less so technical writing, wasn’t my thing. It started as a distraction from working on a book-length dissertation but ended up as a two-year-long commitment with many more to come.

I’ve put up so much…

Hi Junqi,

Thanks for these questions. #1 Matching and PSM. Matching allows you to match on a few key covariates, but you can't match on many variables because you may not have enough data to match. For example, if you want to match on 5 variables with 2 levels each variable, then you need 2^10 (or 1024) observations. The number can easily go up if you have 20+ variables. In contrast, PSM is a scaled score that allows you to match on a probability from 0 and 1. Thus, it's easier to match successfully.

#2 unobserved variables.

If unobserved confounding variables affect the IV and DV, then PSM does not work like any other quasi-experimental and observational designs.

Great questions.


Experimentation and Causal Inference

A better alternative to Propensity Score Matching

Experimentation and Causal Inference Propensity Score Stratification in Observational Data: A Tutorial A better alternative to Propensity Score Matching
Photo by Christophe Hautier on Unsplash


Do you know how much new data we generate each day? In 2021, the number is roughly 2.5 quintillion bytes of data. So, we are not in short supply of data but lack the correct tool of making it useful.

It’s easy to show how multiple factors move together with…

Leihua Ye, Ph.D. Researcher

PhD @ University of California. Top Writer | Machine Learning | Data Science | Experimentation & Causal Inference www.linkedin.com/in/leihuaye

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store