Mixture

class sklego.mixture.GMMClassifier(n_components=1, covariance_type='full', tol=0.001, reg_covar=1e-06, max_iter=100, n_init=1, init_params='kmeans', weights_init=None, means_init=None, precisions_init=None, random_state=None, warm_start=False, verbose=0, verbose_interval=10)[source]

Bases: sklearn.base.BaseEstimator, sklearn.base.ClassifierMixin

fit(X: numpy.array, y: numpy.array) → sklego.mixture.gmm_classifier.GMMClassifier[source]

Fit the model using X, y as training data.

Parameters:
  • X – array-like, shape=(n_columns, n_samples, ) training data.
  • y – array-like, shape=(n_samples, ) training data.
Returns:

Returns an instance of self.

predict(X)[source]
predict_proba(X)[source]
class sklego.mixture.BayesianGMMClassifier(n_components=1, covariance_type='full', tol=0.001, reg_covar=1e-06, max_iter=100, n_init=1, init_params='kmeans', weight_concentration_prior_type='dirichlet_process', weight_concentration_prior=None, mean_precision_prior=None, mean_prior=None, degrees_of_freedom_prior=None, covariance_prior=None, random_state=None, warm_start=False, verbose=0, verbose_interval=10)[source]

Bases: sklearn.base.BaseEstimator, sklearn.base.ClassifierMixin

fit(X: numpy.array, y: numpy.array) → sklego.mixture.bayesian_gmm_classifier.BayesianGMMClassifier[source]

Fit the model using X, y as training data.

Parameters:
  • X – array-like, shape=(n_columns, n_samples, ) training data.
  • y – array-like, shape=(n_samples, ) training data.
Returns:

Returns an instance of self.

predict(X)[source]
predict_proba(X)[source]
class sklego.mixture.GMMOutlierDetector(threshold=0.99, method='quantile', n_components=1, covariance_type='full', tol=0.001, reg_covar=1e-06, max_iter=100, n_init=1, init_params='kmeans', weights_init=None, means_init=None, precisions_init=None, random_state=None, warm_start=False, verbose=0, verbose_interval=10)[source]

Bases: sklearn.base.OutlierMixin, sklearn.base.BaseEstimator

The GMMDetector trains a Gaussian Mixture Model on a dataset X. Once a density is trained we can evaluate the likelihood scores to see if it is deemed likely. By giving a threshold this model might then label outliers if their likelihood score is too low.

Parameters:
  • threshold – the limit at which the model thinks an outlier appears, must be between (0, 1)
  • method – the method that the threshold will be applied to, possible values = [stddev, default=quantile]

If you select method=”quantile” then the threshold value represents the quantile value to start calling something an outlier.

If you select method=”stddev” then the threshold value represents the numbers of standard deviations before calling something an outlier.

decision_function(X)[source]
fit(X: numpy.array, y=None) → sklego.mixture.gmm_outlier_detector.GMMOutlierDetector[source]

Fit the model using X, y as training data.

Parameters:
  • X – array-like, shape=(n_columns, n_samples,) training data.
  • y – ignored but kept in for pipeline support
Returns:

Returns an instance of self.

predict(X)[source]

Predict if a point is an outlier.

Parameters:X – array-like, shape=(n_columns, n_samples, ) training data.
Returns:array, shape=(n_samples,) the predicted data. 1 for inliers, -1 for outliers.
score_samples(X)[source]
class sklego.mixture.BayesianGMMOutlierDetector(threshold=0.99, method='quantile', n_components=1, covariance_type='full', tol=0.001, reg_covar=1e-06, max_iter=100, n_init=1, init_params='kmeans', weight_concentration_prior_type='dirichlet_process', weight_concentration_prior=None, mean_precision_prior=None, mean_prior=None, degrees_of_freedom_prior=None, covariance_prior=None, random_state=None, warm_start=False, verbose=0, verbose_interval=10)[source]

Bases: sklearn.base.OutlierMixin, sklearn.base.BaseEstimator

The GMMDetector trains a Bayesian Gaussian Mixture Model on a dataset X. Once a density is trained we can evaluate the likelihood scores to see if it is deemed likely. By giving a threshold this model might then label outliers if their likelihood score is too low.

Parameters:
  • threshold – the limit at which the model thinks an outlier appears, must be between (0, 1)
  • method – the method that the threshold will be applied to, possible values = [stddev, default=quantile]

If you select method=”quantile” then the threshold value represents the quantile value to start calling something an outlier.

If you select method=”stddev” then the threshold value represents the numbers of standard deviations before calling something an outlier.

There are other settings too, these are best described in the BayesianGaussianMixture documentation found here:

https://scikit-learn.org/stable/modules/generated/sklearn.mixture.BayesianGaussianMixture.html.

decision_function(X)[source]
fit(X: numpy.array, y=None) → sklego.mixture.bayesian_gmm_detector.BayesianGMMOutlierDetector[source]

Fit the model using X, y as training data.

Parameters:
  • X – array-like, shape=(n_columns, n_samples,) training data.
  • y – ignored but kept in for pipeline support
Returns:

Returns an instance of self.

predict(X)[source]

Predict if a point is an outlier. :param X: array-like, shape=(n_columns, n_samples, ) training data. :return: array, shape=(n_samples,) the predicted data. 1 for inliers, -1 for outliers.

score_samples(X)[source]