Basic tutorial
This tutorial demonstrates the usage of the sweights package.
We will first cook up a toy model with a discriminant variable (invariant mass) and a control variable (decay time) and use it to generate some toy data.
Then we will use a fit to the invariant mass to obtain some component pdf estimates and use these to extract some weights which project out the signal only component in the decay time.
We will demonstrate both the classic sWeights and the Custom Ortogonal Weight functions (COWs) method. See arXiv:2112.04575 for more details.
Finally, we will fit the weighted decay time distribution and correct the covariance matrix according to the description in arXiv:1911.01303.
[1]:
from types import SimpleNamespace
# external requirements
import numpy as np
from scipy.stats import norm, expon
import matplotlib.pyplot as plt
from iminuit import Minuit
from iminuit.cost import ExtendedUnbinnedNLL
# from this package
from sweights import SWeight # for classic sweights
from sweights import Cow # for custom orthogonal weight functions
from sweights import cov_correct, approx_cov_correct # for covariance corrections
from sweights.testing import make_classic_toy # to generate a toy dataset
from sweights.util import plot_binned, make_weighted_negative_log_likelihood
Toy model and toy data
We generate an m-distribution (for the discriminatory variable) and an independent t-distribution (the control variable). In particle physics, m is typically the invariant mass distribution of decay candidates, and t is the decay time distribution of these candidates. But any other two variables can be used, as long as they are independent in the pure signal and pure background.
[2]:
# make a toy model
true_yield = SimpleNamespace()
true_yield.s = 1000
true_yield.b = 1000
# mass
mrange = (0, 1)
m_truth = SimpleNamespace()
m_truth.mu = 0.5
m_truth.sigma = 0.1
m_truth.slope = 1
# time
trange = (0, 1.5)
t_truth = SimpleNamespace()
t_truth.slope = 0.2
t_truth.mu = 0.1
t_truth.sigma = 1.0
toy = make_classic_toy(
1,
s=true_yield.s,
b=true_yield.b,
mrange=mrange,
trange=trange,
ms_mu=m_truth.mu,
ms_sigma=m_truth.sigma,
mb_mu=m_truth.slope,
ts_mu=t_truth.slope,
tb_mu=t_truth.mu,
tb_sigma=t_truth.sigma,
)
Make pdfs, plot ground truth and toy data
[3]:
# m-density for fitting and plotting (not normalized)
def m_density(x, s, b, mu, sigma, slope):
ds = norm(mu, sigma)
snorm = np.diff(ds.cdf(mrange))
db = expon(mrange[0], slope)
bnorm = np.diff(db.cdf(mrange))
return s / snorm * ds.pdf(x) + b / bnorm * db.pdf(x)
# t-density for fitting and plotting (not normalized)
def t_density(x, s, b, slope, mu, sigma):
ds = expon(trange[0], slope)
snorm = np.diff(ds.cdf(trange))
db = norm(mu, sigma)
bnorm = np.diff(db.cdf(trange))
return s / snorm * ds.pdf(x) + b / bnorm * db.pdf(x)
def plot(toy, bins=50, npoint=400, draw_pdf=True):
fig, ax = plt.subplots(1, 2, figsize=(12, 4))
plt.sca(ax[0])
plot_binned(toy[0], bins=bins, range=mrange, color="k")
plt.sca(ax[1])
plot_binned(toy[1], bins=bins, range=trange, color="k")
if draw_pdf:
m = np.linspace(*mrange, npoint)
mnorm = (mrange[1] - mrange[0]) / bins
par = m_truth.mu, m_truth.sigma, m_truth.slope
s = m_density(m, true_yield.s, 0, *par)
b = m_density(m, 0, true_yield.b, *par)
p = s + b
ax[0].plot(m, mnorm * s, "C0--", label="signal")
ax[0].plot(m, mnorm * b, "C1:", label="background")
ax[0].plot(m, mnorm * p, "k-", label="total")
t = np.linspace(*trange, npoint)
tnorm = (trange[1] - trange[0]) / bins
par = t_truth.slope, t_truth.mu, t_truth.sigma
s = t_density(t, true_yield.s, 0, *par)
b = t_density(t, 0, true_yield.b, *par)
p = s + b
ax[1].plot(t, tnorm * s, "C0--", label="signal")
ax[1].plot(t, tnorm * b, "C1:", label="background")
ax[1].plot(t, tnorm * p, "k-", label="total")
ax[0].set_xlabel("m")
ax[0].set_ylim(bottom=0)
ax[0].legend()
ax[1].set_xlabel("t")
ax[1].set_ylim(bottom=0)
ax[1].legend()
fig.tight_layout()
plot(toy)

Fit toy data in the m-variable
This provides us with estimates for the component shapes and the component yields, which we need to apply the sWeight method.
We use an extended unbinned maximum-likelihood fit here, but an extended binned maximum-likelihood fit would work as well. We could use an ordinary fit in which the model pdf is normalized.
[4]:
# m-model for an extended maximum-likelihood fit, must return...
# - integral as first argument
# - density as second argument
# see iminuit documentation for more information
def m_model(x, s, b, mu, sigma, slope):
return (s + b, m_density(x, s, b, mu, sigma, slope))
mi = Minuit(
ExtendedUnbinnedNLL(toy[0], m_model),
s=true_yield.s,
b=true_yield.b,
mu=m_truth.mu,
sigma=m_truth.sigma,
slope=m_truth.slope,
)
mi.limits["s", "b"] = (0, true_yield.s + true_yield.b)
mi.limits["mu"] = mrange
mi.limits["sigma"] = (0, mrange[1] - mrange[0])
mi.limits["slope"] = (0, 50)
mi.migrad()
mi.hesse()
[4]:
Migrad | |
---|---|
FCN = -2.696e+04 | Nfcn = 142 |
EDM = 1.99e-05 (Goal: 0.0002) | |
Valid Minimum | Below EDM threshold (goal x 10) |
No parameters at limit | Below call limit |
Hesse ok | Covariance accurate |
Name | Value | Hesse Error | Minos Error- | Minos Error+ | Limit- | Limit+ | Fixed | |
---|---|---|---|---|---|---|---|---|
0 | s | 960 | 50 | 0 | 2E+03 | |||
1 | b | 1.02e3 | 0.05e3 | 0 | 2E+03 | |||
2 | mu | 0.494 | 0.005 | 0 | 1 | |||
3 | sigma | 0.097 | 0.004 | 0 | 1 | |||
4 | slope | 1.14 | 0.16 | 0 | 50 |
s | b | mu | sigma | slope | |
---|---|---|---|---|---|
s | 2.13e+03 | -1.2e3 (-0.540) | -12.228e-3 (-0.058) | 91.430e-3 (0.458) | -1.037 (-0.144) |
b | -1.2e3 (-0.540) | 2.18e+03 | 12.225e-3 (0.058) | -91.418e-3 (-0.453) | 1.037 (0.142) |
mu | -12.228e-3 (-0.058) | 12.225e-3 (0.058) | 2.06e-05 | -0.002e-3 (-0.086) | -0.155e-3 (-0.219) |
sigma | 91.430e-3 (0.458) | -91.418e-3 (-0.453) | -0.002e-3 (-0.086) | 1.87e-05 | -0.070e-3 (-0.104) |
slope | -1.037 (-0.144) | 1.037 (0.142) | -0.155e-3 (-0.219) | -0.070e-3 (-0.104) | 0.0244 |
Compute sWeights
We first create an estimated pdfs of the signal and background component, respectively. Then we create the sWeighter object from the m-distribution, these pdfs, and the fitted yields.
Note that this will run much quicker if verbose=False
and checks=False
.
[5]:
def spdf(m):
return m_density(m, 1, 0, *mi.values[2:])
def bpdf(m):
return m_density(m, 0, 1, *mi.values[2:])
sweight = SWeight(
toy[0],
[spdf, bpdf],
[mi.values["s"], mi.values["b"]],
[mrange],
method="summation",
compnames=("sig", "bkg"),
verbose=True,
checks=True,
)
Initialising sweight with the summation method:
PDF normalisations:
0 1.0000000000000002
1 1.0000000000000002
W-matrix:
[[0.00072638 0.00029386]
[0.00029386 0.00070361]]
A-matrix:
[[1656.59790131 -691.86886983]
[-691.86886983 1710.19865477]]
Integral of w*pdf matrix (should be close to the
identity):
[[9.99871623e-01 2.30160489e-05]
[9.35941868e-05 1.00003883e+00]]
Check of weight sums (should match yields):
Component | sWeightSum | Yield | Diff |
---------------------------------------------------
0 | 964.8295 | 964.8295 | 0.00% |
1 | 1018.1999 | 1018.1999 | 0.00% |
Construct the COW
COW pdfs are weighted by a numerator, which we call the variance function \(I(m)\). This function is arbitrary, but there is an optimal choice, which minimizes the variance of the weights. In case of factorizing signal and background PDFs, the optimal choice is \(I(m) \propto g(m)\), where \(g(m)\) is the estimated total density of in the discriminant variable m. A solution close to optimal is to replace \(g(m)\) with a histogram, with the advantage that \(g(m)\) does not have to be explicitly constructed.
But simply using \(I(m) \propto 1\) gives almost equivalent results in this case. To use a constant density for \(I(m)\), we pass Im=None
to the Cow
object.
[6]:
# unity:
Im = None
# sweight equivalent:
# def Im(m):
# return m_density(m, *mi.values) / (mi.values['s'] + mi.values['b'] )
# histogram:
# Im = np.histogram(toy[0], range=mrange)
# make the cow
cow = Cow(mrange, spdf, bpdf, Im, verbose=True)
Initialising COW:
W-matrix:
[[2.91611955 0.97740504]
[0.97740504 1.06300333]]
A-matrix:
[[ 0.4956826 -0.45576779]
[-0.45576779 1.35979793]]
Comparison of the sweight and COW methods
We first compare the weight distributions.
[7]:
def plot_weights(x, sw, bw):
plt.figure()
plt.plot(x, sw, "C0--", label="signal")
plt.plot(x, bw, "C1:", label="background")
plt.plot(x, sw + bw, "k-", label="sum")
plt.xlabel("mass")
plt.ylabel("weight")
plt.legend()
plt.tight_layout()
for method, weighter in zip( ['sWeights','COWs'], [sweight, cow] ):
x = np.linspace(*mrange,400)
swp = weighter.get_weight(0, x)
bwp = weighter.get_weight(1, x)
plot_weights(x, swp, bwp)
plt.title(method)


The weights are not identical, because in this case we used a simple (non-optimal) variance function for the COWs.
Fit weighted t-distribution
[8]:
# signal pdf in t-domain
def t_signal_pdf(x, slope):
return t_density(x, 1, 0, slope, 0, 1)
fitted_slopes = []
for method, weighter in (("sWeights", sweight), ("COWs", cow)):
# get signal weights
w = weighter(toy[0])
# do the minimisation
tmi = Minuit(
make_weighted_negative_log_likelihood(toy[1], w, t_signal_pdf),
slope=t_truth.slope,
)
tmi.limits["slope"] = (0, 10)
tmi.migrad()
tmi.hesse()
# do the correction
fitted_slopes.append(tmi.values["slope"])
# first order correction
ncov = approx_cov_correct(
t_signal_pdf, toy[1], w, tmi.values, tmi.covariance, verbose=False
)
# second order correction
hs = t_signal_pdf
ws = weighter
W = weighter.Wkl
# these derivatives can be done numerically but for the sweights / COW case
# it's straightfoward to compute them
ws = lambda Wss, Wsb, Wbb, gs, gb: (Wbb * gs - Wsb * gb) / (
(Wbb - Wsb) * gs + (Wss - Wsb) * gb
)
dws_Wss = (
lambda Wss, Wsb, Wbb, gs, gb: gb
* (Wsb * gb - Wbb * gs)
/ (-Wss * gb + Wsb * gs + Wsb * gb - Wbb * gs) ** 2
)
dws_Wsb = (
lambda Wss, Wsb, Wbb, gs, gb: (Wbb * gs**2 - Wss * gb**2)
/ (Wss * gb - Wsb * gs - Wsb * gb + Wbb * gs) ** 2
)
dws_Wbb = (
lambda Wss, Wsb, Wbb, gs, gb: gs
* (Wss * gb - Wsb * gs)
/ (-Wss * gb + Wsb * gs + Wsb * gb - Wbb * gs) ** 2
)
tcov = cov_correct(
hs,
[spdf, bpdf],
toy[1],
toy[0],
w,
[mi.values["s"], mi.values["b"]],
tmi.values,
tmi.covariance,
[dws_Wss, dws_Wsb, dws_Wbb],
[W[0, 0], W[0, 1], W[1, 1]],
verbose=False,
)
print(
f"{method:9}: "
f"naive {tmi.values[0]:.3f} +/- {tmi.errors[0]:.3f}, "
f"corrected {tmi.values[0]:.3f} +/- {tcov[0,0]**0.5:.3f}"
)
sWeights : naive 0.202 +/- 0.007, corrected 0.202 +/- 0.017
COWs : naive 0.206 +/- 0.007, corrected 0.206 +/- 0.016
[9]:
### Plot the weighted decay distributions and the fit result
sws = sweight(toy[0])
scow = cow(toy[0])
bins = 50
t = np.linspace(*trange, 400)
for i, (w, method, slope) in enumerate(
zip(
(sws, scow),
("sWeights", "COWs"),
fitted_slopes,
)
):
color = f"C{i}"
plot_binned(
toy[1],
bins=bins,
range=trange,
weights=w,
label=method,
color=color,
)
tnorm = np.sum(w) * (trange[1] - trange[0]) / bins
plt.plot(t, tnorm * t_signal_pdf(t, slope), color=color)
plt.legend()
plt.xlabel("Time")
plt.ylabel("Weighted Events");

Write sWeights into a file
There are many ways to store the sweights for later use. In the Python world, you can just pickle the arrays. Or you can use Numpy’s npz format. A slightly more organized way is to use a Pandas data frame. We show that option.
[10]:
import pandas as pd
df = pd.DataFrame()
df['m'] = toy[0]
df['t'] = toy[1]
df['sweight_ws'] = sweight.get_weight(0, toy[0])
df['sweight_wb'] = sweight.get_weight(1, toy[0])
df['cow_ws'] = cow.get_weight(0, toy[0])
df['cow_wb'] = cow.get_weight(1, toy[0])
df
[10]:
m | t | sweight_ws | sweight_wb | cow_ws | cow_wb | |
---|---|---|---|---|---|---|
0 | 0.927542 | 0.075108 | -0.678891 | 1.679019 | -0.303654 | 0.906145 |
1 | 0.468974 | 0.005580 | 1.217367 | -0.217423 | 1.523312 | -0.464189 |
2 | 0.449125 | 0.152979 | 1.180231 | -0.180283 | 1.374161 | -0.310642 |
3 | 0.558792 | 0.079876 | 1.171464 | -0.171515 | 1.213886 | -0.250459 |
4 | 0.400035 | 0.028505 | 0.992147 | 0.007819 | 0.793680 | 0.264917 |
... | ... | ... | ... | ... | ... | ... |
1978 | 0.930051 | 1.093816 | -0.678957 | 1.679085 | -0.302998 | 0.904166 |
1979 | 0.653689 | 0.081157 | 0.618199 | 0.381803 | 0.137224 | 0.670526 |
1980 | 0.382352 | 0.095900 | 0.881958 | 0.118018 | 0.560982 | 0.494386 |
1981 | 0.564565 | 1.170659 | 1.155841 | -0.155891 | 1.149193 | -0.195338 |
1982 | 0.718651 | 0.169675 | -0.085891 | 1.085961 | -0.226854 | 0.961263 |
1983 rows × 6 columns
Pandas data frames can be saved in multiple formats (see Pandas documentation). The feather format is particularly good.
However, in the particle physics world, you probably want to save the table as a ROOT file. This is easy with uproot. It will convert the data frame into a ROOT TTree automatically.
[11]:
import uproot
with uproot.recreate('outf.root') as f:
f['tree'] = df