—Work in Progress! Come back later for more—

“…causal calculus differentiates between two types of conditional distributions one might want to estimate. …in ML we usually estimate only one of them, but in some applications we should actually…estimate the other one.” – Ferenc Huszar [1]

Intro

I’ve been wanting to learn Causal Inference (CI) / Causal Modeling (causal inference) “properly” ever since I asked for Judea Pearl’s bestselling The Book of Why [2] for Christmas in 2019. It’s a great read, but it hides any actual computation by merely mentioning the mystical “do calculus”. Pearl offers details on that in his textbook [3], but I want get up to speed faster, so I’ve been reading and watching other treatments [4] [5] [6] [7] [8] [9] [10], which most often are either “slow” or quickly throw up what I call the “Wall of Math.”¹

In this post, we’re going to catapult straight over that wall — starting immediately with code and calculation, introducing definitions and proofs only when we actually need them.

Case Study: Gear Acquisition Syndrome (GAS)

Causal modeling introductions invariably choose some medical intervention as an illustrative example. Since my specialty is teaching audio engineers [11], we’ll discuss the sad affliction known as Gear Acquisition Syndrome (GAS).

GAS is a debilitating condition affecting millions of musicians worldwide. It is the (false) cognitive bias in which the musician thinks that buying gear will make them more talented, and therefore result in better sound.²

Clarification

GAS is an in-joke among musicians. It is a fake condition that highlights our real tendency to delude ourselves into justifying “unnecessary” purchases. (And yes, I can tell you exactly what gear I “need” next.)

While better gear can lead to better sound, we will explore using causal modeling whether gear actually has any impact on talent.

Let’s Get Coding!

Typically, causal inference tutorials delay the introduction of code until after time scale for proton decay, but this one is different. So get ready, we are launching the catapult – with us inside.

Defining and Visualizing Graphs

First we need to be able to define and take a look at the graphs that will comprise our causal models. We’ll set up the graph with Python (nodes, edges, directions) then push it to the viz routine so we can see it and move it around. I can’t articulate my intended applications, but based on my interests, I’m use a few libraries for this:

NetworkX for general graph bookkeeping (PyG is overkill for today.)
D3.js for interactive visualizations with force-directed nodes.
Pandas for handling tables of data
PyTorch for other sundry numerical calculations, since ultimately I want to do “machine learning stuff”

We won’t use causal inference-specific libraries like CausalML or EconML, …, etc. because I suspect they would likely hide and/or automate the bits I hope to learn to implement.

Note

I will often hide the code via folds to preserve the narrative unless there’s some code that I really want you to see.

Installing Packages

!uv pip install networkx matplotlib torch torchvision --quiet

Let’s define a small graph or two with NetworkX:

import networkx as nx
import copy

# Gear Acquisition Syndrome
G = nx.DiGraph()
G.add_node("Gear",   role="treatment")
G.add_node("Sound",  role="outcome")
G.add_edges_from([("Gear", "Sound"),])

# "Reality"
G2 = copy.deepcopy(G)
G2.add_node("Talent", role="confounder")
G2.add_edges_from([
    ("Talent",   "Sound"),
    ("Talent",   "Gear",),
])

And then visualize with D3.js – try dragging the nodes around!

Visualization utility code — feel free to ignore

# Did I let Claude generate this code cell? ABSOLUTELY
import json, itertools
from IPython.display import HTML

ROLE_COLORS = {
    "treatment":  "#c0392b",  # dark red
    "outcome":    "#1e8449",  # dark green
    "confounder": "#6c3483",  # dark purple
    "default":    "#2471a3",  # dark blue
}
_dag_counter = itertools.count()

def show_dag(G, width=600, height=300, node_size=34, font_size=13, title=""):
    if not isinstance(G, list):
        G, title = [G], [title]
    elif not isinstance(title, list):
        title = [title] * len(G)

    specs = []
    divs  = []
    for g, t in zip(G, title):
        nodes = [{"id": str(n), "color": ROLE_COLORS.get(d.get("role"), ROLE_COLORS["default"])}
                 for n, d in g.nodes(data=True)]
        links = [{"source": str(u), "target": str(v), "label": d.get("label", "")}
                 for u, v, d in g.edges(data=True)]
        uid = f"dag{next(_dag_counter)}"
        specs.append({"uid": uid, "nodes": nodes, "links": links,
                      "title": t, "titleOffset": 28 if t else 0})
        divs.append(f'<div id="{uid}" style="width:{width}px;height:{height}px;'
                    f'border:1px solid #444;border-radius:6px;background:#1a1a2e;display:inline-block;"></div>')

    specs_json = json.dumps(specs)
    container  = '<div style="display:flex;gap:12px;flex-wrap:wrap;">' + "".join(divs) + '</div>'

    script = f"""
<script type="module">
import * as d3 from "https://cdn.jsdelivr.net/npm/d3@7/+esm";

function initGraphs() {{
  const specs = {specs_json};
  const W = {width}, H = {height}, R = {node_size}, FS = {font_size};
  console.log("initGraphs: found", specs.length, "specs");

  for (const spec of specs) {{
    const {{uid, nodes, links, title, titleOffset}} = spec;
    const el = document.getElementById(uid);
    console.log("Looking for #" + uid + ":", el);
    if (!el) continue;

    const w = W, h = H, r = R, fs = FS;
    const svg = d3.select(el).append("svg").attr("width", w).attr("height", h);

    if (title) {{
      svg.append("text").attr("x", w/2).attr("y", 30)  // title placement
        .attr("text-anchor", "middle").attr("fill", "#ccc")
        .attr("font-size", "18px").attr("font-family", "sans-serif").attr("font-weight", "bold")
        .text(title);
    }}

    svg.append("defs").append("marker")
      .attr("id", uid + "-arrow").attr("viewBox", "0 -5 10 10")
      .attr("refX", 10).attr("refY", 0).attr("markerWidth", 7).attr("markerHeight", 7).attr("orient", "auto")
      .append("path").attr("d", "M0,-5L10,0L0,5").attr("fill", "#aaa");

    const linkData = links.map(l => ({{...l}}));
    const nodeData = nodes.map(n => ({{...n}}));

    const sim = d3.forceSimulation(nodeData)
      .force("link", d3.forceLink(linkData).id(d => d.id).distance(140))
      .force("charge", d3.forceManyBody().strength(-500))
      .force("center", d3.forceCenter(w/2, titleOffset/2 + h/2));

    const link = svg.append("g").selectAll("line").data(linkData).join("line")
      .attr("stroke", "#aaa").attr("stroke-width", 2)
      .attr("marker-end", "url(#" + uid + "-arrow)");

    const edgeLabels = svg.append("g").selectAll("text").data(linkData.filter(d => d.label)).join("text")
      .attr("text-anchor", "middle").attr("fill", "#ffdd57")
      .attr("font-size", "16px").attr("font-family", "sans-serif").attr("font-weight", "bold")
      .text(d => d.label);

    const node = svg.append("g").selectAll("g").data(nodeData).join("g")
      .call(d3.drag()
        .on("start", (e,d) => {{ if (!e.active) sim.alphaTarget(0.3).restart(); d.fx=d.x; d.fy=d.y; }})
        .on("drag",  (e,d) => {{ d.fx=e.x; d.fy=e.y; }})
        .on("end",   (e,d) => {{ if (!e.active) sim.alphaTarget(0); d.fx=null; d.fy=null; }}));

    node.append("circle").attr("r", r).attr("fill", d => d.color).attr("stroke", "#fff").attr("stroke-width", 1.5);

    node.each(function(d) {{
      const lines = d.id.split("\\n");
      const txt = d3.select(this).append("text").attr("text-anchor", "middle")
        .attr("fill", "white").attr("font-size", fs+"px").attr("font-family", "sans-serif");
      const lh = fs * 1.2, yStart = -((lines.length - 1) * lh) / 2;
      lines.forEach((line, i) => {{
        txt.append("tspan").attr("x", 0).attr("y", yStart + i*lh).attr("dy", "0.35em").text(line);
      }});
    }});

    sim.on("tick", () => {{
      link.each(function(d) {{
        const dx = d.target.x-d.source.x, dy = d.target.y-d.source.y;
        const dist = Math.sqrt(dx*dx+dy*dy) || 1, ux = dx/dist, uy = dy/dist;
        d3.select(this)
          .attr("x1", d.source.x+ux*r).attr("y1", d.source.y+uy*r)
          .attr("x2", d.target.x-ux*r).attr("y2", d.target.y-uy*r);
      }});
      edgeLabels
        .attr("x", d => (d.source.x+d.target.x)/2)
        .attr("y", d => (d.source.y+d.target.y)/2 - 8);
      node.attr("transform", d => `translate(${{d.x}},${{d.y}})`);
    }});
  }}
}}

setTimeout(initGraphs, 100);
</script>
"""
    return HTML(container + script)

show_dag([G, G2], width=330, height=320, title=["Gear Acquisition Syndrome", "Reality"])

We’ll regard Sound as the “outcome”, the thing we measure via listening tests where listeners rate their preferences. We could imagine listeners submitting ranking scores and averaging them into a Mean Opinion Score (MOS), but for now it suffices to imagine (binary) A/B Testing for user preferences. Following causal inference conventions, we’ll denote the outcome as \(Y\) and use binary values:

\(Y=1\): Sounds Great
\(Y=0\): Sounds Bad (Technical term: “like ass”)

But Wait – How Do You Measure Talent?!

We’re not going to be measuring Talent. Rather, we’re going to explore how allowing a causal link between Gear and Talent changes the expected statistics of Sound, and this will help us distinguish whether the GAS picture or the Reality picture best fits the data.

Sample Data

I made a fictitious 16-person dataset that mixes famous artists with anonymous randos. The naive/GAS view looks only at the correlation between Gear and Sound:

Get dataset and display naive view

import pandas as pd

df = pd.read_csv('player_data.csv')

COLORS = {
    'Talent': {'High': '#4477AA', 'Low':  '#CC6600'},   # blue / orange
    'Gear':   {'Fancy': '#AA3377', 'Cheap': '#CC9900'}, # purple / yellow  
    'Sound':  {'Great': '#228833', 'Bad':  '#BB4455'},  # green / red-pink
}

def style_col(val, col):
    c = COLORS[col].get(val, '')
    return f'background-color: {c}; color: #EEEEEE' if c else ''

(df.drop(columns=['Talent']).style
   .applymap(style_col, subset=['Gear'],  col='Gear')
   .applymap(style_col, subset=['Sound'], col='Sound'))

	Name	Gear	Sound
0	Tom Scholz	Fancy	Great
1	Jeff Beck	Fancy	Bad
2	Jimi	Fancy	Great
3	Prince	Fancy	Great
4	Jenny	Fancy	Great
5	Jay	Fancy	Great
6	Bill	Fancy	Bad
7	Paris Hilton	Fancy	Bad
8	Milli Vanilli	Cheap	Great
9	Bob Dylan	Cheap	Bad
10	Dale	Cheap	Great
11	Todd	Cheap	Bad
12	Bubba	Cheap	Bad
13	Abe	Cheap	Bad
14	Earnest	Cheap	Bad
15	Mongo	Cheap	Bad

According this view, \(P(\rm{Sound=Great | Gear=Fancy})\) = 5/8 = 0.625. => Better gear probably makes you sound better!

But wait, if we include the confounding effect of Talent:

Code

(df.style
   .applymap(style_col, subset=['Talent'], col='Talent')
   .applymap(style_col, subset=['Gear'],   col='Gear')
   .applymap(style_col, subset=['Sound'],  col='Sound'))

	Name	Talent	Gear	Sound
0	Tom Scholz	High	Fancy	Great
1	Jeff Beck	High	Fancy	Bad
2	Jimi	High	Fancy	Great
3	Prince	High	Fancy	Great
4	Jenny	High	Fancy	Great
5	Jay	High	Fancy	Great
6	Bill	High	Fancy	Bad
7	Paris Hilton	Low	Fancy	Bad
8	Milli Vanilli	Low	Cheap	Great
9	Bob Dylan	High	Cheap	Bad
10	Dale	Low	Cheap	Great
11	Todd	Low	Cheap	Bad
12	Bubba	Low	Cheap	Bad
13	Abe	Low	Cheap	Bad
14	Earnest	Low	Cheap	Bad
15	Mongo	Low	Cheap	Bad

The Key Question as Math

In terms of causal inference., we want to compute the probability \(P({\rm Sound=Great} | \rm{do(Gear=Fancy}))\), i.e., will the sound will improve if we “intervene” by upgrading the gear?

To do that, we need to “marginalize” over Talent (i.e. do a weighted sum over all subsets):

\[\begin{align} P(\text{Sound=Great} \mid do(\text{Gear=Fancy})) &= P(\text{Sound=Great} \mid \text{Gear=Fancy, Talent=High}) \cdot P(\text{Talent=High}) \\ &+ P(\text{Sound=Great} \mid \text{Gear=Fancy, Talent=Low}) \cdot P(\text{Talent=Low}) \end{align}\] This is called the “adjustment formula”, specifically the “backdoor adjustment formula”.

That’s getting a little cumbersome, so let’s abbreviate our notation:

\[P(Y=1 \mid do(X=1)) = P(Y=1 \mid X=1, T=1) \cdot P(T=1) + P(Y=1 \mid X=1, T=0) \cdot P(T=0)\]

where \(X\) = Gear (1=Fancy), \(Y\) = Sound (1=Great), \(T\) = Talent (1=High). In code, that can look like:

# 1. Compute prior probabilities for Talent
p_talent_high = (df["Talent"] == "High").mean()
p_talent_low  = (df["Talent"] == "Low").mean()

# 2. Compute conditional probabilities for Sound given Gear=Fancy and Talent
subset_high = df.query("Gear == 'Fancy' and Talent == 'High'")
subset_low  = df.query("Gear == 'Fancy' and Talent == 'Low'")

p_sound_given_high = (subset_high["Sound"] == "Great").mean()
p_sound_given_low  = (subset_low["Sound"]  == "Great").mean()

# 3. Combine using your formula
p_do_gear_fancy = (p_sound_given_high * p_talent_high) + (p_sound_given_low * p_talent_low)

p_naive = (df[df["Gear"] == "Fancy"]["Sound"] == "Great").mean()
print("Naive estimate =", p_naive)

result = p_do_gear_fancy
print(f"Adjusted = {result:.3f}")

Naive estimate = 0.625
Adjusted = 0.357

But we can compute this much more efficiently via “advanced” pandas methods. Let’s make a fast, general routine that we can use elsewhere:

def backdoor_adjustment(data, outcome_col, treatment_col, confounder_col, outcome_val, treatment_val):
    """
    Computes P(Outcome = outcome_val | do(Treatment = treatment_val)) 
    by adjusting for a single confounding variable.
    """
    # 1. Calculate P(Confounder) for all categories
    p_confounder = data[confounder_col].value_counts(normalize=True)
    
    # 2. Filter data for the specific treatment strategy
    treatment_subset = data[data[treatment_col] == treatment_val]
    
    # 3. Calculate P(Outcome = outcome_val | Treatment, Confounder) for all categories
    p_outcome_given_confounder = (
        treatment_subset[outcome_col].eq(outcome_val)
        .groupby(treatment_subset[confounder_col])
        .mean() )
    
    # 4. Align indices, multiply, and sum (the adjustment formula)
    return (p_outcome_given_confounder * p_confounder).sum()
    
p_naive = (df[df["Gear"] == "Fancy"]["Sound"] == "Great").mean()
print("Naive estimate =", p_naive)

result = backdoor_adjustment(df, "Sound", "Gear", "Talent", "Great", "Fancy")
print(f"Adjusted = {result:.3f}")

Naive estimate = 0.625
Adjusted = 0.357

Work in Progress

Still writing! Come back for more later.

Advanced Topics

Code

G3 = nx.DiGraph()
G3.add_node("Talent",      role="default")
G3.add_node("Performance", role="confounder")
G3.add_node("Gear",        role="default")
G3.add_node("Recording",   role="confounder")
G3.add_node("Signal\nProc", role="treatment")
G3.add_node("Sound",       role="outcome")

G3.add_edges_from([
    ("Talent",      "Performance"),
    ("Performance", "Recording"),
    ("Gear",        "Recording"),
    ("Recording",   "Signal\nProc"),
    ("Signal\nProc", "Sound"),
])

show_dag(G3, width=800, height=600, title="Why does my recording sound like ass?")

References

[1]

F. Huszár, “ML beyond curve fitting: An intro to causal inference and do-calculus.” https://www.inference.vc/untitled/, May 2018.

[2]

J. Pearl and D. Mackenzie, The Book of Why: The new science of cause and effect. New York: Basic Books, 2018.

[3]

J. Pearl, Causality: Models, reasoning, and inference, 2nd ed. Cambridge, UK: Cambridge University Press, 2009.

[4]

R. R. Tucci, “Introduction to Judea Pearl’s do-calculus,” CoRR, vol. abs/1305.5506, 2013, Available: http://arxiv.org/abs/1305.5506

[5]

B. Neal, Introduction to Causal Inference (from a machine learning perspective). 2020. Available: https://www.bradyneal.com/causal-inference-course

[6]

M. Facure, Causal inference in Python. O’Reilly Media, 2023. Available: https://www.oreilly.com/library/view/causal-inference-in/9781098140243/

[7]

A. Ghosh, A. Roy, and D. Herremans, “KARMA-MV: A benchmark for causal question answering on music videos.” 2026. Available: https://arxiv.org/abs/2605.08175

[8]

A. (Ari) Bornstein, “3 PyTorch lightning community causal inference examples to inspire your next project.” https://devblog.pytorchlightning.ai/3-pytorch-lightning-community-causal-inference-examples-to-inspire-your-next-project-da6f45c07b70, Mar. 16, 2021.

[9]

J. Krohn, “SDS 909: Causal AI, with dr. Robert osazuwa ness.” https://www.superdatascience.com/podcast/sds-909-causal-ai-with-dr-robert-usazuwa-ness, Jul. 29, 2025.

[10]

B. J. Koch, T. Sainburg, P. G. Bastías, S. Jiang, Y. Sun, and J. G. Foster, “A primer on deep learning for causal inference,” Sociological Methods & Research, vol. 54, no. 2, pp. 397–447, May 2025, doi: 10.1177/00491241241234866.

[11]

“Bayesian analysis for audio engineers.” invited lecture to Meeting of the Belmont Student Section of the Audio Engineering Society, 2014. Available: https://hedges.belmont.edu/BayesianAnalysisPartI.pdf

2026 Scott H. Hawley

Footnotes

Specifically: a Slough of Despond that front-loads definitions and jargon before you can do anything useful. In physics education I call that the “Chapter 2 Math Dump”; in ML it’s the “Wall of Math”. For causal inference it’s not even equations, it’s seemingly-endless terminology that induces cognitive load while you’re anxiously wondering “But how can I calculate something?” These treatments also invariably claim to assume mere “basic familiarity with statistics” yet invoke laws I’ve never heard of. Seems “basic” is relative.↩︎
Other definitions of GAS – e.g., “buying more gear will help me make better music” – are isomorphic to the one we have adopted. Predatory companies like Sweetwater and Guitar Center capitalize on the pathetic delusions of desperate losers and our lust for shiny toys. /s↩︎