abaco.batch_correction module

abaco.batch_correction module#

class abaco.batch_correction.ConQur(batch_cols, covariate_cols, reference_batch, quantiles=(0.05, 0.5, 0.95), logistic_kwargs=None, quantile_kwargs=None)[source]#

Bases: TransformerMixin, BaseEstimator

Conditional Quantile Regression (ConQuR) batch correction transformer.

Parameters:

batch_cols (list of str) – List of batch column names.
covariate_cols (list of str) – List of covariate column names.
reference_batch (dict) – Dictionary specifying reference batch values for each batch column.
quantiles (tuple of float, optional) – Quantiles to use for quantile regression, by default (0.05, 0.5, 0.95).
logistic_kwargs (dict, optional) – Keyword arguments for LogisticRegression.
quantile_kwargs (dict, optional) – Keyword arguments for QuantileRegressor.

_logit_models#

Fitted logistic regression models for zero-mass.

Type:: dict

_quantile_models#

Fitted quantile regression models for nonzero values.

Type:: dict

_col_order#

Order of columns used in the model.

Type:: list

_feature_cols#

List of feature columns.

Type:: list

fit(df, y=None)[source]#

set_fit_request(*, df: bool | None | str = '$UNCHANGED$') → ConQur#

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:: df (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for df parameter in fit.
Returns:: self – The updated object.
Return type:: object

set_transform_request(*, df: bool | None | str = '$UNCHANGED$') → ConQur#

Configure whether metadata should be requested to be passed to the transform method.

The options for each parameter are:

True: metadata is requested, and passed to transform if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to transform.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:: df (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for df parameter in transform.
Returns:: self – The updated object.
Return type:: object

transform(df)[source]#

class abaco.batch_correction.PLSDA(ncomp=1, keepX=None, tol=1e-06, max_iter=500)[source]#

Bases: object

Partial Least Squares Discriminant Analysis (PLSDA) implementation.

Parameters:

ncomp (int, optional) – Number of components to extract, by default 1.
keepX (list of int, optional) – Number of variables to keep for each component (for sparsity).
tol (float, optional) – Convergence tolerance, by default 1e-6.
max_iter (int, optional) – Maximum number of iterations, by default 500.

t_#

X scores.

Type:: numpy.ndarray

u_#

Y scores.

Type:: numpy.ndarray

a_#

X loadings.

Type:: numpy.ndarray

b_#

Y loadings.

Type:: numpy.ndarray

iters_#

Number of iterations per component.

Type:: list

exp_var_#

Explained variance per component.

Type:: list

fit(X, Y)[source]#

abaco.batch_correction.correctBMC(data, sample_label, batch_label, exp_label)[source]#

This function, LITERALLY, substracts the mean of each batch (group) from each feature. Perform Batch Mean Centering (BMC) correction.

Parameters:

data (pandas.DataFrame) – Input data containing OTU counts and metadata.
sample_label (str) – Column name for sample identifiers.
batch_label (str) – Column name for batch identifiers.
exp_label (str) – Column name for experiment/tissue identifiers.

Returns:

DataFrame with sample, experiment, batch, and batch mean centered features.

Return type:

pandas.DataFrame

abaco.batch_correction.correctCombat(data, sample_label='sample', batch_label='batch', experiment_label='tissue')[source]#

Perform ComBat batch correction.

Parameters:

data (pandas.DataFrame) – Input data containing OTU counts and metadata.
sample_label (str, optional) – Column name for sample identifiers, by default ‘sample’.
batch_label (str, optional) – Column name for batch identifiers, by default ‘batch’.
experiment_label (str, optional) – Column name for experiment/tissue identifiers, by default ‘tissue’.

Returns:

DataFrame with sample, batch, experiment, and ComBat-corrected features.

Return type:

pandas.DataFrame

abaco.batch_correction.correctCombatSeq(data, sample_label, batch_label, condition_label, ref_batch=None)[source]#

Perform ComBat-seq batch correction for count data.

Parameters:

data (pandas.DataFrame) – Input data containing count data and metadata.
sample_label (str) – Column name for sample identifiers.
batch_label (str) – Column name for batch identifiers.
condition_label (str) – Column name for condition/experiment identifiers.
ref_batch (str or None, optional) – Reference batch to use, by default None.

Returns:

DataFrame with sample, batch, condition, and ComBat-seq corrected counts.

Return type:

pandas.DataFrame

abaco.batch_correction.correctConQuR(df, batch_cols, covariate_cols, reference_batch=None, quantiles=(0.05, 0.25, 0.5, 0.75, 0.95), logistic_kwargs={'max_iter': 200, 'penalty': 'l2', 'solver': 'lbfgs'}, quantile_kwargs={'alpha': 0.0})[source]#

Conditional logistic quantile regression (ConQuR) for batch correction.

Parameters:

df (pandas.DataFrame) – Input data containing OTU counts and metadata.
batch_cols (list of str) – List of batch column names.
covariate_cols (list of str) – List of covariate column names.
reference_batch (dict, optional) – Dictionary specifying reference batch values for each batch column. If None, uses zeros for all batch columns.
quantiles (tuple of float, optional) – Quantiles to use for quantile regression, by default (0.05, 0.25, 0.5, 0.75, 0.95).
logistic_kwargs (dict, optional) – Keyword arguments for LogisticRegression.
quantile_kwargs (dict, optional) – Keyword arguments for QuantileRegressor.

Returns:

Batch-corrected DataFrame.

Return type:

pandas.DataFrame

abaco.batch_correction.correctLimma_rBE(data, sample_label='sample', batch_label='batch', covariates_labels=None)[source]#

Perform batch correction using Limma’s removeBatchEffect approach.

Parameters:

data (pandas.DataFrame) – Input data containing OTU counts and metadata.
sample_label (str, optional) – Column name for sample identifiers, by default ‘sample’.
batch_label (str, optional) – Column name for batch identifiers, by default ‘batch’.
covariates_labels (str or list of str, optional) – Additional covariate column(s) to include in the model.

Returns:

DataFrame with original labels and batch-corrected numeric data.

Return type:

pandas.DataFrame

abaco.batch_correction.correctPLSDAbatch(df: DataFrame, sample_label: str, exp_label: str, batch_label: str, ncomp_trt: int = 1, ncomp_batch: int = 1)[source]#

Perform PLSDA-batch correction.

Parameters:

df (pandas.DataFrame) – Input data containing OTU counts and metadata.
sample_label (str) – Column name for sample identifiers.
exp_label (str) – Column name for experiment/tissue identifiers.
batch_label (str) – Column name for batch identifiers.
ncomp_trt (int, optional) – Number of treatment components, by default 1.
ncomp_batch (int, optional) – Number of batch components, by default 1.

Returns:

DataFrame with sample, experiment, batch, and PLSDA-batch corrected features.

Return type:

pandas.DataFrame

abaco.batch_correction.correctPLSDAbatch_R(df, sample_label, exp_label, batch_label, ncomp_trt=1, ncomp_bat=1, keepX_trt=None, keepX_bat=None, tol=1e-06, max_iter=500, near_zero_var=True, balance=True)[source]#

Python adaptation of PLSDA_batch from R. Returns corrected DataFrame.

Parameters:

df (pandas.DataFrame) – Input data containing OTU counts and metadata.
sample_label (str) – Column name for sample identifiers.
exp_label (str) – Column name for experiment/tissue identifiers.
batch_label (str) – Column name for batch identifiers.
ncomp_trt (int, optional) – Number of treatment components, by default 1.
ncomp_bat (int, optional) – Number of batch components, by default 1.
keepX_trt (list of int, optional) – Number of variables to keep for each treatment component.
keepX_bat (list of int, optional) – Number of variables to keep for each batch component.
tol (float, optional) – Convergence tolerance, by default 1e-6.
max_iter (int, optional) – Maximum number of iterations, by default 500.
near_zero_var (bool, optional) – Whether to filter near-zero variance features, by default True.
balance (bool, optional) – Whether to balance design, by default True.

Returns:

DataFrame with sample, experiment, batch, and corrected features.

Return type:

pandas.DataFrame

abaco.batch_correction.deflate_mtx(X, t)[source]#

Deflate matrix X by component t: X - t (t^T t)^{-1} t^T X

Parameters:

X (numpy.ndarray) – Data matrix to be deflated.
t (numpy.ndarray) – Component vector.

Returns:

Deflated matrix.

Return type:

numpy.ndarray

abaco.batch_correction module

Contents

abaco.batch_correction module#