
class jackknife.Jackknife(input_sample_generator_function, n_samples, transformation_function, correlation_axis=- 1, output_file_prefix='', output_samples_are_stored=False)

A class implementing a jackknife estimation.

The jackknife is a resampling method that allows to estimate the variance and bias of a parameter. This can further be used to obtain the bias-corrected jackknife estimator.

Given \(n\) input samples \(x_i\) and a transformation function \(f()\), the goal is to estimate \(y=f(x)\), with \(x\) and \(y\) being the true values. A simple estimate would just be the transformed input mean \(f(\bar{x})\), with \(\bar{x}\) being the input sample mean. In general this carries a bias of order \(1/n\). Using jackknife resampling this leading \(1/n\) term can be estimated and removed, yielding the bias-corrected jackknife estimator.

First, the resampling of the input is done by

\[x_i \rightarrow x'_i = (n\bar{x} - x_i) / (n-1),\]

where \(x'_i\) is the new sample. This is then transformed in the following way:

\[y_i = n f(\bar{x}) - (n-1) f(x'_i),\]

where \(y_i\) is the transformed, bias-corrected output sample. The jackknife estimator \(\hat{y}_{\textrm{jackknife}}\) can now be obtained simply by calculating the sample mean of the \(y_i\). Other statistical quantities like the variance and covariance for example can also be calculated from the output samples. For more details on that, see do_estimation().

Both \(x_i\) and \(y_i\) can be scalar or high-dimensional quantities. To allow jackknifing multiple observables at once, the output samples must be a dictionary of numbers or numpy arrays, where each entry represents a different observable. The input samples are either numbers or numpy arrays. In the latter case, naturally, all calculations are done element-wise.

  • input_sample_generator_function – Function that returns a generator yielding the input samples \(x_i\). An input sample must either be a number or a numpy array.

  • n_samples (int) – \(n\)

  • transformation_function\(f: x_i\rightarrow y_i\). Callable that takes a single input sample \(x_i\) as an argument and returns an output sample \(y_i\). An output sample must be a dictionary of numbers or numpy arrays. The keys in this dictionary are used as names for the observables in the output file (see write_results_to_file()) and the number of observables is deduced from the dictionary’s length.

  • correlation_axis – Axis along which the outer product for the covariance and correlation matrix is calculated. Must be an integer or an iterable with a length equal to the number of observables.

  • output_file_prefix

  • stores_output_samples – If True the output samples \(y_i\) are also stored in the HDF5 output file.


ValueError – if \(n<2\)

property output_file_name



Estimate \(y=f(x)\) and some common statistics.

The jackknife estimator \(\hat{y}_{\textrm{jackknife}}\) is simply given by the sample mean \(\bar{y}\),

\[\hat{y}_{\textrm{jackknife}} = \bar{y} = \frac{1}{n} \sum_i y_i,\]

with output samples \(y_i\) and sample size \(n\). Other common statistical quantities are estimated with the sample variance \(s^2\), the sample standard deviation \(s\), the standard error of the mean \(\textrm{SEM}\), the sample covariance matrix \(q\) and the matrix of sample Pearson correlation coefficients \(r\). They are calculated using the following formulas:

\[\begin{split}s^2 &= \frac{1}{n-1}\left(\sum_i y_i^2 - \frac{1}{n} \left(\sum_i y_i\right)^2\right),\\ s &= \sqrt{s^2},\\ \textrm{SEM} &= \frac{s}{\sqrt{n}},\\ q &= \frac{1}{n-1} \left(\sum_i y_i\otimes y_i - \frac{1}{n}\left(\sum_i y_i\right) \otimes \left(\sum_i y_i\right)\right),\\ r_{ij} &= \frac{q_{ij}}{s_i s_j}.\end{split}\]

Here, \(\otimes\) denotes the outer product along the correlation_axis. Since \(y\) is implemented as a dictionary potentially holding multiple independent observables, naturally, the above statistics are calculated separately for each entry.


While the sample variance \(s^2\) is an unbiased estimator, the sample standard deviation \(s\) is not. It generally underestimates the standard deviation. Because of that, also the standard error of the mean is underestimated.


Write the results of the last jackknife estimation to a file.

Create an HDF5 file with the following structure:

        ├── .config
        │       ├── jk.n_samples
        │       └── jk.store_output_samples
        ├── <name_of_observable_1>
        │       ├── mean
        │       ├── variance
        │       ├── standard_deviation
        │       ├── standard_error_of_mean
        │       ├── transformed_input_mean
        │       ├── covariance
        │       └── correlation
        ├── <name_of_observable_2>
        │       ├── mean
        :       :

The time stamp is from the start of the last jackknife estimation and the keys of the dictionary returned by transformation_function are used for the names of the observables.


User code can and should expand the .config group with problem-specific metadata, e.g., with information about the input samples or the transformation function.