abaco.batch_correction module#

class abaco.batch_correction.ConQur(batch_cols, covariate_cols, reference_batch, quantiles=(0.05, 0.5, 0.95), logistic_kwargs=None, quantile_kwargs=None)[source]#

Bases: TransformerMixin, BaseEstimator

Conditional Quantile Regression (ConQuR) batch correction transformer.

Parameters:
  • batch_cols (list of str) – List of batch column names.

  • covariate_cols (list of str) – List of covariate column names.

  • reference_batch (dict) – Dictionary specifying reference batch values for each batch column.

  • quantiles (tuple of float, optional) – Quantiles to use for quantile regression, by default (0.05, 0.5, 0.95).

  • logistic_kwargs (dict, optional) – Keyword arguments for LogisticRegression.

  • quantile_kwargs (dict, optional) – Keyword arguments for QuantileRegressor.

_logit_models#

Fitted logistic regression models for zero-mass.

Type:

dict

_quantile_models#

Fitted quantile regression models for nonzero values.

Type:

dict

_col_order#

Order of columns used in the model.

Type:

list

_feature_cols#

List of feature columns.

Type:

list

fit(df, y=None)[source]#
set_fit_request(*, df: bool | None | str = '$UNCHANGED$') ConQur#

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

df (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for df parameter in fit.

Returns:

self – The updated object.

Return type:

object

set_transform_request(*, df: bool | None | str = '$UNCHANGED$') ConQur#

Configure whether metadata should be requested to be passed to the transform method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to transform if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to transform.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

df (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for df parameter in transform.

Returns:

self – The updated object.

Return type:

object

transform(df)[source]#
class abaco.batch_correction.PLSDA(ncomp=1, keepX=None, tol=1e-06, max_iter=500)[source]#

Bases: object

Partial Least Squares Discriminant Analysis (PLSDA) implementation.

Parameters:
  • ncomp (int, optional) – Number of components to extract, by default 1.

  • keepX (list of int, optional) – Number of variables to keep for each component (for sparsity).

  • tol (float, optional) – Convergence tolerance, by default 1e-6.

  • max_iter (int, optional) – Maximum number of iterations, by default 500.

t_#

X scores.

Type:

numpy.ndarray

u_#

Y scores.

Type:

numpy.ndarray

a_#

X loadings.

Type:

numpy.ndarray

b_#

Y loadings.

Type:

numpy.ndarray

iters_#

Number of iterations per component.

Type:

list

exp_var_#

Explained variance per component.

Type:

list

fit(X, Y)[source]#
abaco.batch_correction.correctBMC(data, sample_label, batch_label, exp_label)[source]#

This function, LITERALLY, substracts the mean of each batch (group) from each feature. Perform Batch Mean Centering (BMC) correction.

Parameters:
  • data (pandas.DataFrame) – Input data containing OTU counts and metadata.

  • sample_label (str) – Column name for sample identifiers.

  • batch_label (str) – Column name for batch identifiers.

  • exp_label (str) – Column name for experiment/tissue identifiers.

Returns:

DataFrame with sample, experiment, batch, and batch mean centered features.

Return type:

pandas.DataFrame

abaco.batch_correction.correctCombat(data, sample_label='sample', batch_label='batch', experiment_label='tissue')[source]#

Perform ComBat batch correction.

Parameters:
  • data (pandas.DataFrame) – Input data containing OTU counts and metadata.

  • sample_label (str, optional) – Column name for sample identifiers, by default ‘sample’.

  • batch_label (str, optional) – Column name for batch identifiers, by default ‘batch’.

  • experiment_label (str, optional) – Column name for experiment/tissue identifiers, by default ‘tissue’.

Returns:

DataFrame with sample, batch, experiment, and ComBat-corrected features.

Return type:

pandas.DataFrame

abaco.batch_correction.correctCombatSeq(data, sample_label, batch_label, condition_label, ref_batch=None)[source]#

Perform ComBat-seq batch correction for count data.

Parameters:
  • data (pandas.DataFrame) – Input data containing count data and metadata.

  • sample_label (str) – Column name for sample identifiers.

  • batch_label (str) – Column name for batch identifiers.

  • condition_label (str) – Column name for condition/experiment identifiers.

  • ref_batch (str or None, optional) – Reference batch to use, by default None.

Returns:

DataFrame with sample, batch, condition, and ComBat-seq corrected counts.

Return type:

pandas.DataFrame

abaco.batch_correction.correctConQuR(df, batch_cols, covariate_cols, reference_batch=None, quantiles=(0.05, 0.25, 0.5, 0.75, 0.95), logistic_kwargs={'max_iter': 200, 'penalty': 'l2', 'solver': 'lbfgs'}, quantile_kwargs={'alpha': 0.0})[source]#

Conditional logistic quantile regression (ConQuR) for batch correction.

Parameters:
  • df (pandas.DataFrame) – Input data containing OTU counts and metadata.

  • batch_cols (list of str) – List of batch column names.

  • covariate_cols (list of str) – List of covariate column names.

  • reference_batch (dict, optional) – Dictionary specifying reference batch values for each batch column. If None, uses zeros for all batch columns.

  • quantiles (tuple of float, optional) – Quantiles to use for quantile regression, by default (0.05, 0.25, 0.5, 0.75, 0.95).

  • logistic_kwargs (dict, optional) – Keyword arguments for LogisticRegression.

  • quantile_kwargs (dict, optional) – Keyword arguments for QuantileRegressor.

Returns:

Batch-corrected DataFrame.

Return type:

pandas.DataFrame

abaco.batch_correction.correctLimma_rBE(data, sample_label='sample', batch_label='batch', covariates_labels=None)[source]#

Perform batch correction using Limma’s removeBatchEffect approach.

Parameters:
  • data (pandas.DataFrame) – Input data containing OTU counts and metadata.

  • sample_label (str, optional) – Column name for sample identifiers, by default ‘sample’.

  • batch_label (str, optional) – Column name for batch identifiers, by default ‘batch’.

  • covariates_labels (str or list of str, optional) – Additional covariate column(s) to include in the model.

Returns:

DataFrame with original labels and batch-corrected numeric data.

Return type:

pandas.DataFrame

abaco.batch_correction.correctPLSDAbatch(df: DataFrame, sample_label: str, exp_label: str, batch_label: str, ncomp_trt: int = 1, ncomp_batch: int = 1)[source]#

Perform PLSDA-batch correction.

Parameters:
  • df (pandas.DataFrame) – Input data containing OTU counts and metadata.

  • sample_label (str) – Column name for sample identifiers.

  • exp_label (str) – Column name for experiment/tissue identifiers.

  • batch_label (str) – Column name for batch identifiers.

  • ncomp_trt (int, optional) – Number of treatment components, by default 1.

  • ncomp_batch (int, optional) – Number of batch components, by default 1.

Returns:

DataFrame with sample, experiment, batch, and PLSDA-batch corrected features.

Return type:

pandas.DataFrame

abaco.batch_correction.correctPLSDAbatch_R(df, sample_label, exp_label, batch_label, ncomp_trt=1, ncomp_bat=1, keepX_trt=None, keepX_bat=None, tol=1e-06, max_iter=500, near_zero_var=True, balance=True)[source]#

Python adaptation of PLSDA_batch from R. Returns corrected DataFrame.

Parameters:
  • df (pandas.DataFrame) – Input data containing OTU counts and metadata.

  • sample_label (str) – Column name for sample identifiers.

  • exp_label (str) – Column name for experiment/tissue identifiers.

  • batch_label (str) – Column name for batch identifiers.

  • ncomp_trt (int, optional) – Number of treatment components, by default 1.

  • ncomp_bat (int, optional) – Number of batch components, by default 1.

  • keepX_trt (list of int, optional) – Number of variables to keep for each treatment component.

  • keepX_bat (list of int, optional) – Number of variables to keep for each batch component.

  • tol (float, optional) – Convergence tolerance, by default 1e-6.

  • max_iter (int, optional) – Maximum number of iterations, by default 500.

  • near_zero_var (bool, optional) – Whether to filter near-zero variance features, by default True.

  • balance (bool, optional) – Whether to balance design, by default True.

Returns:

DataFrame with sample, experiment, batch, and corrected features.

Return type:

pandas.DataFrame

abaco.batch_correction.deflate_mtx(X, t)[source]#

Deflate matrix X by component t: X - t (t^T t)^{-1} t^T X

Parameters:
  • X (numpy.ndarray) – Data matrix to be deflated.

  • t (numpy.ndarray) – Component vector.

Returns:

Deflated matrix.

Return type:

numpy.ndarray