Utils
data_utils
This file contains functions related to data utility.
- scalr.utils.data_utils.generate_dummy_anndata(n_samples, n_features, target_name='celltype')[source]
This function returns anndata object of shape (n_samples, n_features).
It generates random values for target, batch & env from below mentioned choices. If you require more columns, you can add them in the below adata.obs without editing already existing columns.
- Parameters:
n_samples – Number of samples in anndata.
n_features – Number of features in anndata.
target_name – Any preferred target name. Default is celltype.
- Returns:
Anndata object.
- scalr.utils.data_utils.generate_dummy_dge_anndata(n_donors: int = 5, cell_type_list: list[str] = ['B_cell', 'T_cell', 'DC'], cell_replicate: int = 2, n_vars: int = 10) AnnData [source]
This function returns anndata object for DGE analysis with shape (n_donors*len(cell_type_list)*cell_replicate, n_vars).
It generates obs with random donors with a fixed clinical condition (disease_x or normal). Includes all the cell types in cell_type_list with number of cell_replicate for each donor. It generates a csr(Compressed Sparse Row) matrix with random gene expression values. It generates var with random gene name as var.index of length n_vars.
- Parameters:
n_donors – Number of donors or subjects in anndata.obs.
cell_type_list – List of different cell types to include.
cell_replicate – Number of cell replicates per cell type.
n_vars – Number of genes to include in anndata.var.
- Returns:
Anndata object.
- scalr.utils.data_utils.get_one_hot_matrix(data: array)[source]
This function returns a one-hot matrix of given labels.
- Parameters:
data – Categorical data of dim 1D or 2D array.
- Returns:
one-hot matrix.
- scalr.utils.data_utils.get_random_samples(data: AnnData | AnnCollection, n_random_samples: int) tensor [source]
This function returns random N samples from given data.
- Parameters:
data – AnnData or AnnCollection object.
n_random_samples – number of random samples to extract from the data.
- Returns:
Chosen random samples tensor.
file_utils
This file contains functions related to file read-write.
- scalr.utils.file_utils._get_datapath_from_config(data_config)[source]
This function returns the datapath to be used to read from config.
- Parameters:
data_config – Data config.
- scalr.utils.file_utils.dump_anndata(adata: AnnData, filepath: str)[source]
This function writes the AnnData to filepath.
- scalr.utils.file_utils.dump_csv(df: DataFrame, filepath: str)[source]
This function stores the config file to filepath.
- scalr.utils.file_utils.dump_json(config: dict, filepath: str)[source]
This function stores the json file to filepath.
- scalr.utils.file_utils.dump_yaml(config: dict, filepath: str)[source]
This function stores the config file to filepath.
- scalr.utils.file_utils.load_full_data_from_config(data_config)[source]
This function returns full data from the data config.
- Parameters:
data_config – Data config.
- scalr.utils.file_utils.load_test_data_from_config(data_config)[source]
This function returns test data from the data config.
- Parameters:
data_config – Data config.
- scalr.utils.file_utils.load_train_val_data_from_config(data_config)[source]
This function returns train & validation data from the data config.
- Parameters:
data_config – Data config.
- scalr.utils.file_utils.read_anndata(filepath: str, backed: str = 'r') AnnData [source]
This file returns the Anndata object from filepath.
- scalr.utils.file_utils.read_chunked_anndatas(dirpath: str, backed: str = 'r', return_anncollection: bool = True) AnnCollection [source]
This file returns an AnnCollection object from multiple anndatas in dirpath directory.
- scalr.utils.file_utils.read_csv(filepath: str, index_col: int = 0) DataFrame [source]
This file returns the DataFrame file object.
- scalr.utils.file_utils.read_data(filepath: str, backed: str = 'r', index_col: int = 0, return_anncollection: bool = True) dict | AnnData | AnnCollection [source]
This function reads a json, yaml, csv or AnnData object file if the file path contains it.
Returns an AnnCollection in case of a directory with chunked anndatas.
- Parameters:
filepath (str) – path to json, yaml or h5ad file. Or directory containing multiple h5ad files.
backed (str, optional) – To load AnnData / AnnCollection in backed mode. Defaults to ‘r’.
- Raises:
ValueError – In case of the wrong file path provided.
- Returns:
Union[dict, AnnData, AnnCollection].
- scalr.utils.file_utils.read_json(filepath: str) dict [source]
This file returns the json file object.
- scalr.utils.file_utils.read_yaml(filepath: str) dict [source]
This function returns the config file loaded from yaml.
- scalr.utils.file_utils.write_chunkwise_data(full_data: AnnData | AnnCollection, sample_chunksize: int, dirpath: str, sample_inds: list[int] = None, feature_inds: list[int] = None, transform: callable = None, num_workers: int = 1)[source]
This function writes data subsets iteratively in a chunkwise manner, to ensure only at most sample_chunksize samples are loaded at a time.
This function can also apply transformation on each chunk.
- Parameters:
full_data (Union[AnnData, AnnCollection]) – data to be written in chunks.
sample_chunksize (int) – number of samples to be loaded at a time.
dirpath (str) – path/to/directory to write the chunks of data.
sample_inds (list[int], optional) – To be used in case of chunking only a subset of samples. Defaults to all samples.
feature_inds (list[int], optional) – To be used in case of writing only a subset of features.dataframe. Defaults to all features.
transform (function) – a function to apply a transformation on a chunked numpy array.
num_workers (int) – Number of jobs to run in parallel for data writing. Additional workers will not use additional memory, but will be CPU-intensive.
logger
This file contains an implementation of the logger in the pipeline.
misc_utils
This file contains functions related to miscellaneous utilities.
- scalr.utils.misc_utils.build_object(module, config: dict)[source]
A builder function to build an object from its config.
- Parameters:
module – Module containing the class.
config – Contains the name of the class and params to initialize the object.
Returns: Object, updated config.
test_file_utils
This is a test file for file_utils.py
test_misc_utils
This is a test file for misc_utils.py