abaco.batch_correction module#
- class abaco.batch_correction.ConQur(batch_cols, covariate_cols, reference_batch, quantiles=(0.05, 0.5, 0.95), logistic_kwargs=None, quantile_kwargs=None)[source]#
Bases:
TransformerMixin,BaseEstimatorConditional Quantile Regression (ConQuR) batch correction transformer.
- Parameters:
covariate_cols (list of str) – List of covariate column names.
reference_batch (dict) – Dictionary specifying reference batch values for each batch column.
quantiles (tuple of float, optional) – Quantiles to use for quantile regression, by default (0.05, 0.5, 0.95).
logistic_kwargs (dict, optional) – Keyword arguments for LogisticRegression.
quantile_kwargs (dict, optional) – Keyword arguments for QuantileRegressor.
- set_fit_request(*, df: bool | None | str = '$UNCHANGED$') ConQur#
Configure whether metadata should be requested to be passed to the
fitmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed tofitif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it tofit.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- set_transform_request(*, df: bool | None | str = '$UNCHANGED$') ConQur#
Configure whether metadata should be requested to be passed to the
transformmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed totransformif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it totransform.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- class abaco.batch_correction.PLSDA(ncomp=1, keepX=None, tol=1e-06, max_iter=500)[source]#
Bases:
objectPartial Least Squares Discriminant Analysis (PLSDA) implementation.
- Parameters:
ncomp (int, optional) – Number of components to extract, by default 1.
keepX (list of int, optional) – Number of variables to keep for each component (for sparsity).
tol (float, optional) – Convergence tolerance, by default 1e-6.
max_iter (int, optional) – Maximum number of iterations, by default 500.
- t_#
X scores.
- Type:
numpy.ndarray
- u_#
Y scores.
- Type:
numpy.ndarray
- a_#
X loadings.
- Type:
numpy.ndarray
- b_#
Y loadings.
- Type:
numpy.ndarray
- abaco.batch_correction.correctBMC(data, sample_label, batch_label, exp_label)[source]#
This function, LITERALLY, substracts the mean of each batch (group) from each feature. Perform Batch Mean Centering (BMC) correction.
- Parameters:
data (pandas.DataFrame) – Input data containing OTU counts and metadata.
sample_label (str) – Column name for sample identifiers.
batch_label (str) – Column name for batch identifiers.
exp_label (str) – Column name for experiment/tissue identifiers.
- Returns:
DataFrame with sample, experiment, batch, and batch mean centered features.
- Return type:
- abaco.batch_correction.correctCombat(data, sample_label='sample', batch_label='batch', experiment_label='tissue')[source]#
Perform ComBat batch correction.
- Parameters:
data (pandas.DataFrame) – Input data containing OTU counts and metadata.
sample_label (str, optional) – Column name for sample identifiers, by default ‘sample’.
batch_label (str, optional) – Column name for batch identifiers, by default ‘batch’.
experiment_label (str, optional) – Column name for experiment/tissue identifiers, by default ‘tissue’.
- Returns:
DataFrame with sample, batch, experiment, and ComBat-corrected features.
- Return type:
- abaco.batch_correction.correctCombatSeq(data, sample_label, batch_label, condition_label, ref_batch=None)[source]#
Perform ComBat-seq batch correction for count data.
- Parameters:
data (pandas.DataFrame) – Input data containing count data and metadata.
sample_label (str) – Column name for sample identifiers.
batch_label (str) – Column name for batch identifiers.
condition_label (str) – Column name for condition/experiment identifiers.
ref_batch (str or None, optional) – Reference batch to use, by default None.
- Returns:
DataFrame with sample, batch, condition, and ComBat-seq corrected counts.
- Return type:
- abaco.batch_correction.correctConQuR(df, batch_cols, covariate_cols, reference_batch=None, quantiles=(0.05, 0.25, 0.5, 0.75, 0.95), logistic_kwargs={'max_iter': 200, 'penalty': 'l2', 'solver': 'lbfgs'}, quantile_kwargs={'alpha': 0.0})[source]#
Conditional logistic quantile regression (ConQuR) for batch correction.
- Parameters:
df (pandas.DataFrame) – Input data containing OTU counts and metadata.
covariate_cols (list of str) – List of covariate column names.
reference_batch (dict, optional) – Dictionary specifying reference batch values for each batch column. If None, uses zeros for all batch columns.
quantiles (tuple of float, optional) – Quantiles to use for quantile regression, by default (0.05, 0.25, 0.5, 0.75, 0.95).
logistic_kwargs (dict, optional) – Keyword arguments for LogisticRegression.
quantile_kwargs (dict, optional) – Keyword arguments for QuantileRegressor.
- Returns:
Batch-corrected DataFrame.
- Return type:
- abaco.batch_correction.correctLimma_rBE(data, sample_label='sample', batch_label='batch', covariates_labels=None)[source]#
Perform batch correction using Limma’s removeBatchEffect approach.
- Parameters:
data (pandas.DataFrame) – Input data containing OTU counts and metadata.
sample_label (str, optional) – Column name for sample identifiers, by default ‘sample’.
batch_label (str, optional) – Column name for batch identifiers, by default ‘batch’.
covariates_labels (str or list of str, optional) – Additional covariate column(s) to include in the model.
- Returns:
DataFrame with original labels and batch-corrected numeric data.
- Return type:
- abaco.batch_correction.correctPLSDAbatch(df: DataFrame, sample_label: str, exp_label: str, batch_label: str, ncomp_trt: int = 1, ncomp_batch: int = 1)[source]#
Perform PLSDA-batch correction.
- Parameters:
df (pandas.DataFrame) – Input data containing OTU counts and metadata.
sample_label (str) – Column name for sample identifiers.
exp_label (str) – Column name for experiment/tissue identifiers.
batch_label (str) – Column name for batch identifiers.
ncomp_trt (int, optional) – Number of treatment components, by default 1.
ncomp_batch (int, optional) – Number of batch components, by default 1.
- Returns:
DataFrame with sample, experiment, batch, and PLSDA-batch corrected features.
- Return type:
- abaco.batch_correction.correctPLSDAbatch_R(df, sample_label, exp_label, batch_label, ncomp_trt=1, ncomp_bat=1, keepX_trt=None, keepX_bat=None, tol=1e-06, max_iter=500, near_zero_var=True, balance=True)[source]#
Python adaptation of PLSDA_batch from R. Returns corrected DataFrame.
- Parameters:
df (pandas.DataFrame) – Input data containing OTU counts and metadata.
sample_label (str) – Column name for sample identifiers.
exp_label (str) – Column name for experiment/tissue identifiers.
batch_label (str) – Column name for batch identifiers.
ncomp_trt (int, optional) – Number of treatment components, by default 1.
ncomp_bat (int, optional) – Number of batch components, by default 1.
keepX_trt (list of int, optional) – Number of variables to keep for each treatment component.
keepX_bat (list of int, optional) – Number of variables to keep for each batch component.
tol (float, optional) – Convergence tolerance, by default 1e-6.
max_iter (int, optional) – Maximum number of iterations, by default 500.
near_zero_var (bool, optional) – Whether to filter near-zero variance features, by default True.
balance (bool, optional) – Whether to balance design, by default True.
- Returns:
DataFrame with sample, experiment, batch, and corrected features.
- Return type: