abaco.dataloader module#
- abaco.dataloader.ABaCoDataLoader(data, device=torch.device, batch_label='batch', exp_label='tissue', batch_size=32, total_size=1024, total_batch=10)[source]#
- abaco.dataloader.DataPreprocess(path: str, factors: list = ['sample', 'batch', 'tissue'], delimiter: str = ',')[source]#
Reads a CSV file and preprocesses the data by converting specified columns to categorical type. :param path: The path to the CSV file containing the data. :type path: str :param factors: List of factor columns to convert to categorical type. Default is [“sample”, “batch”, “tissue”]. :type factors: list, optional
- Returns:
The preprocessed DataFrame with specified factor columns converted to categorical type.
- Return type:
pd.DataFrame
- abaco.dataloader.DataReverseTransform(data, original_data, factors=['sample', 'batch', 'tissue'], transformation='CLR', count=False)[source]#
- abaco.dataloader.DataTransform(data, factors=['sample', 'batch', 'tissue'], transformation='CLR', count=False)[source]#
Transforms the data based on the specified transformation method.
- Parameters:
data (pd.DataFrame) – The input data containing OTU counts and factors.
factors (list, optional) – List of factor columns to retain in the transformed data.
transformation (str, optional) – The transformation method to apply. Options are “CLR”, “Sqrt”, “ILR”, “ALR”.
count (bool, optional) – If True, the data is treated as count data; otherwise, a small offset is added to avoid log(0) issues.
- Returns:
The transformed data with the specified factors and transformed OTU counts.
- Return type:
pd.DataFrame
- abaco.dataloader.one_hot_encoding(labels: pandas.Series, dtype: torch.dtype = torch.float32) tuple[source]#
Converts a series of labels into a one-hot encoded matrix.
- Parameters:
labels (pd.Series) – The input labels to be one-hot encoded.
dtype (torch.dtype, optional) – The data type of the output tensor. Default is torch.float32.
- Returns:
A tuple containing: - torch.Tensor: A one-hot encoded matrix where each row corresponds to a label. - list: The categories (unique labels) encoded in matrix
- Return type:
Example
>>> import pandas as pd >>> import torch >>> labels = pd.Series(['A', 'B', 'A', 'C']) >>> one_hot_matrix, categories = one_hot_encoding(labels, dtype=torch.int32) >>> print(one_hot_matrix) tensor([[1, 0, 0], [0, 1, 0], [1, 0, 0], [0, 0, 1]], dtype=torch.int32) >>> print(categories) ['A', 'B', 'C']