API Reference

Quick Summary

Here are the main classes and methods:

`Cows`(sample, spdf, bpdf[, norm, range, ...])	Compute sWeights using COWs (new experimental API).
`SWeight`(data[, pdfs, yields, discvarranges, ...])	Compute sweights for a dataset given component pdfs (old API).
`Cow`(mrange, gs, gb[, Im, renorm, verbose])	Produce weights using COWs (old API).
`convert_rf_pdf`(pdf, obs, *[, npoints, ...])	Convert a RooFit RooAbsPdf into a vectorized Python callable.
`kendall_tau`(x, y)	Run Kendall's tau test of independence.
`cov_correct`(hs, gxs, hdata, gdata, weights, ...)	Perform a second order covariance correction for a fit to weighted data.
`approx_cov_correct`(pdf, data, wts, fvals, fcov)	Perform a first order covariance correction for a fit to weighted data.

COWs (experimental)

This is a new API for sWeights computation which will replace the current API when we switch to v2.0.0. The API in v1.x is experimental and may change from version to version. It will become stable from v2.x onward. If you need a stable API right now, please use the SWeight and Cow classes below or pin the version of the sweights package.

class sweights.Cows(sample: Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes] | None, spdf: Density | Sequence[Density], bpdf: Density | Sequence[Density], norm: Density | Tuple[ndarray[tuple[int, ...], dtype[floating[Any]]], ndarray[tuple[int, ...], dtype[floating[Any]]]] | None = None, *, range: Tuple[float, float] | None = None, summation: bool | None = None, yields: Sequence[float] | None = None, bounds: Dict[Density, Dict[str, Tuple[float, float]]] = {}, starts: Dict[Density, Dict[str, float]] = {}, validation: FitValidation = FitValidation.GOF)

Compute sWeights using COWs (new experimental API).

This class replaces the old SWeight and Cow classes. It can compute classic sWeights and the generalized COWs. It automatically follows best practice based on the input that you give. When you use it wrong, it will complain.

After initialization, the class instance is the weight function for the signal. To compute sWeights, you need to pass the sample in the discriminant variable. This sample must not be cut in any way, and the sWeights cannot be multiplied with other weights, unless these weights are independent of both the discriminant and the control variables.

You can also get weight functions for each component, with the __getitem__ operator. You can iterate over the class instance to iterate over all weight functions. To get the number of weight functions, use len() on the instance.

Furthermore, the class has some useful attributes, pdfs, norm, and yields.

Initialize.

Parameters:

sample (array-like or None) – Sample over the discriminant variable. You should pass this to compute COWs via summation. If sample is None, COWs are computed with via integration. COWs may be computed via integration also if we cannot guarantee that summation works correctly, as summation requires that norm is an unbiased estimate of the the total density. You can overwrite the automated choice via summation.
spdf (callable or sequence of callable) – Signal PDF in the discriminant variable. Must accept an array-like argument and return an array. The PDF must be normalized over the sample range. This can also be a sequence of PDFs that comprise the signal.
bpdf (callable or sequence of callables) – Background PDF in the discriminant variable. Each must accept an array-like argument and return an array. The PDF must be normalized over the sample range. This can also be a sequence of PDFs that comprise the background.
norm (callable or tuple(array, array) or None, optional (default is None)) – The normalization function in the COWs formula. If a callable is provided, it must accept an array-like argument and return an array. Passing a callable deactivates the summation method even if sample is provided. You can override this with summation=True. If a tuple is provided, it must consist of two array, entries and bin edges of a histogram over the distriminant variable, in other words, the output of numpy.histogram. If this argument is None, the function is computed from the provided component PDFs and sample or yields.
range (tuple(float, float) or None, optional (default is None)) – Integration range to use if COWs are computed via integration. If range is None, and sample is not None, the range is computed from the sample.
summation (bool or None, optional (default is None)) – If this is None, use summation (only possible if sample is set) unless the user provides an external normalization function with norm, then we cannot guarantee that summation is possible. Setting this to True enforces summation and setting it to False enforces integration.
yields (sequence of float or None, optional (default is None)) – If this is not None and norm is None, compute the normalization function from the component PDFs and these yields. This can be used to override the otherwise internally computed yields. If the PDFs are parametric, this argument is instead used as starting values for the fit.
bounds (Dict[Density, Dict[str, Range]], optional (default is {})) – Allows to pass parameter bounds to the fitter for each density.
starts (Dict[Density, Dict[str, float]], optional (default is {})) – Allows to pass parameter starting values to the fitter for each density.
validation (FitValidation, optional (default: FitValidation.GOF)) – How to validate the internal fits, also see FitValidation. For getting optimal and unbiased weights, a linear combination of the component PDFs need to match the observed distribution. Using the summation method further requires that the norm function is an unbiased estimate of the observed distribution. If set to DISPLAY, the full internal fit result is shown (displayed in a Jupyter notebook or printed on the terminal). If set to PLOT, only the fitted curve is plotted. If set to GOF, a goodness-of-fit test is performed to detect bad fits and a warning is emitted if the test fails. If you want nothing of that, you can slightly speed up the computation by skipping all of that with NONE.

Examples

See Tutorials.

__call__(x: ndarray[tuple[int, ...], dtype[floating[Any]]]) → ndarray[tuple[int, ...], dtype[floating[Any]]]

Compute weights for the signal component.

Parameters:: x (array of float or None, optional (default is None)) – Where in the domain of the discriminant variable to compute the weights. If the user already provided the sample of the discriminant variable during initialization, you can leave this to None and weights are computed for that sample.

__getitem__(idx: int | str) → Callable[[ndarray[tuple[int, ...], dtype[floating[Any]]]], ndarray[tuple[int, ...], dtype[floating[Any]]]]

Return the weight function for the i-th component.

You can also get the weight function for the signal and background by passing the strings ‘s’ and ‘b’, respectively.

__len__() → int: Return number of components.

Classic sWeights

class sweights.SWeight(data, pdfs=[], yields=[], discvarranges=None, method='summation', compnames=None, alphas=None, rfobs=[], verbose=False, checks=True)

Compute sweights for a dataset given component pdfs (old API).

Initialize SWeight object.

This will compute the W and A (or alpha) matrices which are used to produce the weight functions. Evaluation of these functions on a dataset is done in a different function get_weight().

Parameters:

data (array) – The dataset in the discriminating variable, should have shape (nevents,ndiscvars) so for one discriminating variable will just be (N,) or (N,1).
pdfs (list of callable) – A list of the component pdfs. These should be simple python callable functions which are passed the same number of parameters as there are discriminating variables. If running with the ‘roofit’ method (see method) then this can be a list of ROOT.RooAbsPdf objects. If you only have your pdfs defined as RooFit objects then you can use the wrapper function convert_rf_pdf() to convert them into the appropriate object type for this call and then use other methods.
yields (list of float) – A list of the component yields.
discvarranges (list of tuple or tuple of tuple, optional) – The ranges for each discriminating variable, as a tuple or list of two element tuples specifying the lower and upper bound for the range (the default is None which will use minus to plus infinity)
method (str, optional) – The sweights method to use. Must be one of ‘summation’, ‘integration’, ‘subhess’, ‘tsplot’, ‘rootfit’. The recommended default is summation corresponding to Variant B of arXiv:2112.04574
compnames (list of str, optional) – A list of the component names. Only used for the legend entries when making a plot of the weight functions with make_weight_plot()
alphas (array, optional) – If using the ‘subhess’ method then the covariance matrix of a fit to the disciminanting variable(s) in which only the yields float must also be passed. This matrix is inverted to produce the W-matrix.
rfobs (list, optional) – If using the ‘roofit’ method then the discriminating variables (of type RooRealVar) on which the pdfs depend must also be passed
verbose (bool, optional (default is False)) – Print intermediate results.
checks (bool, optional) – Perform some checks that the derived weights are self consistent, in particular check that the sum of all component weights for a given value of the discriminating variable(s) is unity, and check that the sum of all weights for a given component reproduces the yields. This will print additional output to the screen and issue warnings if checks are not passed.

Notes

Many of the arguments passed and methods implemented are not strictly necessary. In particular the ‘tsplot’ and ‘roofit’ methods are just wrappers for implementations elsewhere that were originally used as cross-checks. In future versions these two methods should be removed to simplify this call.

COWs

class sweights.Cow(mrange: Tuple[float, float], gs: Density | Sequence[Density], gb: Density | Sequence[Density], Im: int | Density | Tuple[ndarray[tuple[int, ...], dtype[floating[Any]]], ndarray[tuple[int, ...], dtype[floating[Any]]]] | None = None, renorm: bool = True, verbose: bool = False, **deprecated: Any)

Produce weights using COWs (old API).

Initialize Cow object.

This will compute the W and A (or alpha) matrices which are used to produce the weight functions. Evaluation of these functions on a dataset is done in a different function get_weight().

Parameters:

mrange (tuple(float, float)) – Integration range in the discriminant variable.
gs (callable or sequence of callables) – Signal PDF in the discriminant variable. Must accept an array-like argument and return an array. This can also be a sequence of PDFs that comprise the signal.
gb (callable or sequence of callables) – Background PDF in the discriminant variable. Each must accept an array-like argument and return an array. This can also be a sequence of PDFs that comprise the background.
Im (callable or tuple(array-like, array-like) or None, optional) – The “variance function” in the COWs formula. An arbitrary density normalized over the integration range. If a callable is provided, it must accept an array-like argument and return an array. If a tuple is provide, it must consist of two array, entries and bin edges of a histogram over the distriminant variable, in other words, the output of numpy.histogram. If this argument is None, a constant density is used (the default).
renorm (bool, optional) – Renormalise passed functions to unity (you can override this if you already know it’s true).
verbose (bool, optional) – If True, produce extra output for debugging.
**deprecated – Deprecated arguments, which will raise a warning when used.

Notes

For more details see arXiv:2112.04574

Covariance Correction

sweights.cov_correct(hs, gxs, hdata, gdata, weights, Nxs, fvals, fcov, dw_dW_fs, Wvals, verbose=False)

Perform a second order covariance correction for a fit to weighted data.

Parameters:

hs (callable) – The control variable pdf which must take arguments (x, p0,…,pn) which has been fitted to a weighted dataset, where x is the observable and p0 … pn are the shape parameters
gxs (list of callable) – A list of the disciminant variable pdfs which must take a single argument (x), where x is the observable (shape parameters of gxs must be known in this case)
hdata (array) – The data values of the control variable observable at which the hs pdf is evaluated
gdata (array) – The data values of the disciminant variable observable at which the gxs pdfs are evaluated
weights (array) – The values of the weights
Nxs (list of float or tuple of float) – A list of the fitted yield values for the components gxs
fvals (array_like) – A list of the fitted values of the shape parameters p0,….,pn
fcov (array) – A covariance matrix of the weighted likelihood fit before the correction (this is normally available from the minmiser e.g. iminuit)
dw_dW_fs (list of callable) – A list of the functions describing the partial derivate of the weight function with respect to the W matrix elements (see the tutorial to see this passed for sweights and COWs)
Wvals (list of float or tuple of float) – A list of the W matrix elements
verbose (bool, optional) – Print some output

Returns:

The corrected covariance matrix

Return type:

array

Notes

This function corresponds to the weighted score function with sweights (both terms of Eq.51 in arXiv:2112.04574). If the shape parameters of the gxs are not known then the full sandwich estimate must be used which is not yet implemented in this package.

sweights.approx_cov_correct(pdf, data, wts, fvals, fcov, verbose=False)

Perform a first order covariance correction for a fit to weighted data.

Parameters:

pdf (callable) – The control variable pdf which must take arguments (x, p0,…,pn) which has been fitted to a weighted dataset, where x is the observable and p0 … pn are the shape parameters
data (array) – The data values of the observable at which the pdf is evaluated
wts (array) – The values of the weights
fvals (array_like) – A list of the fitted values of the shape parameters p0,….,pn
fcov (array) – A covariance matrix of the weighted likelihood fit before the correction (this is normally available from the minmiser e.g. iminuit)
verbose (bool, optional) – Print some output

Returns:

The corrected covariance matrix

Return type:

array

Notes

This function corresponds to the weighted score function with independent weights (first term of Eq.51 in arXiv:2112.04574). This is often a good approximation for sweights although will be an overestimate. For a better correction for sweights use cov_correct (which is only in general accurate when the shape parameters in the discriminating variable are known)

Utilities

Various utilities used by the package or in the tutorials.

class sweights.util.BernsteinBasisPdf(k: int, n: int, a: float, b: float)

Bernstein basis PDF.

Initialize Bernstein basis.

Parameters:

k (int) – Index of the basis.
n (int) – Order of the polynom.
a (float) – Starting value where the polynomial is defined and normalized.
b (float) – Ending value where the polynomial is defined and normalized.

sweights.util.convert_rf_pdf(pdf: Any, obs: Any, *, npoints: int = 0, method: str = 'makima', forcenorm: bool = True) → Density

Convert a RooFit RooAbsPdf into a vectorized Python callable.

This converts a RooFit::RooAbsPdf object into a Python callable that can be used by either the SWeight or Cow classes.

Parameters:

pdf (RooAbsPdf) – The pdf, must inherit from RooAbsPdf (e.g. RooGaussian, RooExponential, RooAddPdf etc.)
obs (RooRealVar) – The observable.
npoints (int, optional (default is 0)) – If npoints is zero, a wrapper around the RooFit PDF is returned. This wrapper internally calls the RooFit PDF and therefore produces exact results. If npoints is larger than zero, a spline interpolator is constructed instead to approximate the RooFit PDF. It is constructed by evaluating the exact PDF at the given number of points, which are equally spaced over the range of obs, which must be a bounded variable. The spline is an approximation, but faster to compute.
method (str, optional (default is "makima")) – Interpolation method to use. Accepted values are “makima” and “pchip”.
forcenorm (bool, optional (default is True)) – This only has an effect, if an interpolator is returned. Since the interpolator is not normalized in general, we need to normalize it. Setting this to False skips the computation of the integral, which is done numerically. Deactivate this only if the numerical interaction fails for some reason.

Returns:

A callable function representing a normalised pdf which can then be passed to the SWeight or Cow classes

Return type:

callable

sweights.util.make_bernstein_pdf(degree: int, a: float, b: float) → List[Density]

Construct a normalized Bernstein basis of given degree.

The Bernstein basis polynomials returned by this function are normalized over the given range.

Parameters:

degree (int) – Order of the Bernstein basis. Must be at least 1.
a (float) – Starting value where the polynomial is defined and normalized.
b (float) – Ending value where the polynomial is defined and normalized.

Returns:

Normalized density.

Return type:

Callable

sweights.util.make_norm_pdf(a: float, b: float, mu: Sequence[float], sigma: float) → List[Density]

Construct a sequence of truncated normal distributions.

Parameters:

a (float) – Lower end of the data window.
b (float) – Upper end of the data window.
mu (array-like) – Locations of the truncated normal distributions.
sigma (float) – Width of the truncated normal distributions.

Return type:

list of truncated normal distributions

sweights.util.make_weighted_negative_log_likelihood(x: ndarray[tuple[int, ...], dtype[floating[Any]]], weights: ndarray[tuple[int, ...], dtype[floating[Any]]], model: Callable[[...], ndarray[tuple[int, ...], dtype[floating[Any]]]]) → Callable[[...], float]: Construct weighted log-likelihood function compatible with iminuit.

sweights.util.normalized(fn: Density, range: Tuple[float, float]) → Density: Return a function normalized over the given range.

sweights.util.pdf_from_histogram(w: ndarray[tuple[int, ...], dtype[floating[Any]]], xe: ndarray[tuple[int, ...], dtype[floating[Any]]]) → Density

Return a pdf (piecewise constant) constructe from a histogram.

Parameters:

w (array of float or int) – Counts of the histogram.
xe (array of float) – Edges of the histogram.

Returns:

A normalized density.

Return type:

Callable

Plot histogram from data.

Parameters:

data (array-like) – Data to sort into a histogram.
bins (int or array-like or None, optional) – Number of bins of the histogram.
range ((float, float) or None, optional) – Range of the histogram.
weights (array-like or None, optional) – Weights.
density (bool, optional (default is False)) – If True, normalize the histogram.
axes (Axes or None, optional) – Axes to plot on. If None, then use matplotlib.pyplot.gca().
**kwargs – Further arguments are forwarded to matplotlib.pyplot.errorbar.

sweights.kendall_tau(x: ndarray[tuple[int, ...], dtype[floating[Any]]], y: ndarray[tuple[int, ...], dtype[floating[Any]]]) → Tuple[float, float, float]

Run Kendall’s tau test of independence.

Useful for ascertainting the extent to which two variables are independent and thus the PDF for both variables factorizes into two independent PDFs, one for each variable. This is a requirement to apply the classic sWeights method.

WARNING: Using this function only makes sense if you have pure samples for all components considered in the sWeights method. You cannot apply this to a mixed sample. In general, you won’t have these isolated samples for each component, because then you would not need sWeights. Yet, you can often get them from Monte-Carlo simulation of the experiment. If you trust the simulation, you can use this coefficient to test for factorization.

Parameters:

x (array) – Values in the first dimension - must have the same shape as y
y (array) – Values in the second dimension - must have the same shape as x

Returns:

Two element tuple with the coefficient and its uncertainty

Return type:

tuple

Notes

x and y must have the same dimension. This function now uses scipy.stats.kendalltau for the coefficent calculation (the uncertainty calculation is trivial) which makes a few optimisations. See the scipy documentation for more information.

sweights.convert_rf_pdf(pdf: Any, obs: Any, *, npoints: int = 0, method: str = 'makima', forcenorm: bool = True) → Density

Convert a RooFit RooAbsPdf into a vectorized Python callable.

This converts a RooFit::RooAbsPdf object into a Python callable that can be used by either the SWeight or Cow classes.

Parameters:

pdf (RooAbsPdf) – The pdf, must inherit from RooAbsPdf (e.g. RooGaussian, RooExponential, RooAddPdf etc.)
obs (RooRealVar) – The observable.
npoints (int, optional (default is 0)) – If npoints is zero, a wrapper around the RooFit PDF is returned. This wrapper internally calls the RooFit PDF and therefore produces exact results. If npoints is larger than zero, a spline interpolator is constructed instead to approximate the RooFit PDF. It is constructed by evaluating the exact PDF at the given number of points, which are equally spaced over the range of obs, which must be a bounded variable. The spline is an approximation, but faster to compute.
method (str, optional (default is "makima")) – Interpolation method to use. Accepted values are “makima” and “pchip”.
forcenorm (bool, optional (default is True)) – This only has an effect, if an interpolator is returned. Since the interpolator is not normalized in general, we need to normalize it. Setting this to False skips the computation of the integral, which is done numerically. Deactivate this only if the numerical interaction fails for some reason.

Returns:

A callable function representing a normalised pdf which can then be passed to the SWeight or Cow classes

Return type:

callable