Linear Model

class sklego.linear_model.DeadZoneRegressor(threshold=0.3, relative=False, effect='linear', n_iter=2000, stepsize=0.01, check_grad=False)[source]

Bases: sklearn.base.BaseEstimator, sklearn.base.RegressorMixin

fit(X, y)[source]
predict(X)[source]
class sklego.linear_model.DemographicParityClassifier[source]

Bases: sklearn.base.BaseEstimator, sklearn.linear_model._base.LinearClassifierMixin

A logistic regression classifier which can be constrained on demographic parity (p% score).

Minimizes the Log loss while constraining the correlation between the specified sensitive_cols and the distance to the decision boundary of the classifier.

Only works for binary classification problems

\[\begin{split}\begin{array}{cl}{\operatorname{minimize}} & -\sum_{i=1}^{N} \log p\left(y_{i} | \mathbf{x}_{i}, \boldsymbol{\theta}\right) \\ {\text { subject to }} & {\frac{1}{N} \sum_{i=1}^{N}\left(\mathbf{z}_{i}-\overline{\mathbf{z}}\right) d \boldsymbol{\theta}\left(\mathbf{x}_{i}\right) \leq \mathbf{c}} \\ {} & {\frac{1}{N} \sum_{i=1}^{N}\left(\mathbf{z}_{i}-\overline{\mathbf{z}}\right) d_{\boldsymbol{\theta}}\left(\mathbf{x}_{i}\right) \geq-\mathbf{c}}\end{array}\end{split}\]

Source: - M. Zafar et al. (2017), Fairness Constraints: Mechanisms for Fair Classification

Parameters:
  • covariance_threshold – The maximum allowed covariance between the sensitive attributes and the distance to the decision boundary. If set to None, no fairness constraint is enforced
  • sensitive_cols – List of sensitive column names(when X is a dataframe) or a list of column indices when X is a numpy array.
  • C – Inverse of regularization strength; must be a positive float. Like in support vector machines, smaller values specify stronger regularization.
  • penalty – Used to specify the norm used in the penalization. Expects ‘none’ or ‘l1’
  • fit_intercept – Specifies if a constant (a.k.a. bias or intercept) should be added to the decision function.
  • max_iter – Maximum number of iterations taken for the solvers to converge.
  • train_sensitive_cols – Indicates whether the model should use the sensitive columns in the fit step.
  • multi_class – The method to use for multiclass predictions
  • n_jobs – The amount of parallel jobs thata should be used to fit multiclass models
class sklego.linear_model.EqualOpportunityClassifier[source]

Bases: sklearn.base.BaseEstimator, sklearn.linear_model._base.LinearClassifierMixin

A logistic regression classifier which can be constrained on equal opportunity score.

Minimizes the Log loss while constraining the correlation between the specified sensitive_cols and the distance to the decision boundary of the classifier for those examples that have a y_true of 1.

Only works for binary classification problems

\[\begin{split}\begin{array}{cl}{\operatorname{minimize}} & -\sum_{i=1}^{N} \log p\left(y_{i} | \mathbf{x}_{i}, \boldsymbol{\theta}\right) \\ {\text { subject to }} & {\frac{1}{POS} \sum_{i=1}^{POS}\left(\mathbf{z}_{i}-\overline{\mathbf{z}}\right) d \boldsymbol{\theta}\left(\mathbf{x}_{i}\right) \leq \mathbf{c}} \\ {} & {\frac{1}{POS} \sum_{i=1}^{POS}\left(\mathbf{z}_{i}-\overline{\mathbf{z}}\right) d_{\boldsymbol{\theta}}\left(\mathbf{x}_{i}\right) \geq-\mathbf{c}}\end{array}\end{split}\]

where POS is the subset of the population where y_true = 1

Parameters:
  • covariance_threshold – The maximum allowed covariance between the sensitive attributes and the distance to the decision boundary. If set to None, no fairness constraint is enforced
  • positive_target – The name of the class which is associated with a positive outcome
  • sensitive_cols – List of sensitive column names(when X is a dataframe) or a list of column indices when X is a numpy array.
  • C – Inverse of regularization strength; must be a positive float. Like in support vector machines, smaller values specify stronger regularization.
  • penalty – Used to specify the norm used in the penalization. Expects ‘none’ or ‘l1’
  • fit_intercept – Specifies if a constant (a.k.a. bias or intercept) should be added to the decision function.
  • max_iter – Maximum number of iterations taken for the solvers to converge.
  • train_sensitive_cols – Indicates whether the model should use the sensitive columns in the fit step.
  • multi_class – The method to use for multiclass predictions
  • n_jobs – The amount of parallel jobs thata should be used to fit multiclass models
class sklego.linear_model.FairClassifier(*args, **kwargs)[source]

Bases: sklego.linear_model.DemographicParityClassifier

Deprecated since version 0.4.0.

Please use sklego.linear_model.DemographicParityClassifier instead

class sklego.linear_model.LowessRegression(sigma=1, span=None)[source]

Bases: sklearn.base.BaseEstimator, sklearn.base.RegressorMixin

Does LowessRegression. Note that this can get expensive to predict.

Parameters:
  • sigma – float, how wide we will smooth the data
  • span – float, what percentage of the data is to be used. Defaults to using all data.
fit(X, y)[source]

Fit the model using X, y as training data.

Parameters:
  • X – array-like, shape=(n_columns, n_samples, ) training data.
  • y – array-like, shape=(n_samples, ) training data.
Returns:

Returns an instance of self.

predict(X)[source]

Fit the model using X, y as training data.

Parameters:X – array-like, shape=(n_columns, n_samples, ) training data.
Returns:Returns an array of predictions shape=(n_samples,)
class sklego.linear_model.ProbWeightRegression(non_negative=True)[source]

Bases: sklearn.base.BaseEstimator, sklearn.base.RegressorMixin

This regressor assumes that all input signals in X need to be reweighted with weights that sum up to one in order to predict y. This can be very useful in combination with sklego.meta.EstimatorTransformer because it allows you to construct an ensemble.

Parameters:non_negative – boolean, default=True, setting that forces all weights to be >= 0
fit(X, y)[source]

Fit the model using X, y as training data.

Parameters:
  • X – array-like, shape=(n_columns, n_samples, ) training data.
  • y – array-like, shape=(n_samples, ) training data.
Returns:

Returns an instance of self.

predict(X)[source]

Fit the model using X, y as training data.

Parameters:X – array-like, shape=(n_columns, n_samples, ) training data.
Returns:Returns an array of predictions shape=(n_samples,)