Utils

data_utils

This file contains functions related to data utility.

scalr.utils.data_utils.generate_dummy_anndata(n_samples, n_features, target_name='celltype')[source]

This function returns anndata object of shape (n_samples, n_features).

It generates random values for target, batch & env from below mentioned choices. If you require more columns, you can add them in the below adata.obs without editing already existing columns.

Parameters:
  • n_samples – Number of samples in anndata.

  • n_features – Number of features in anndata.

  • target_name – Any preferred target name. Default is celltype.

Returns:

Anndata object.

scalr.utils.data_utils.generate_dummy_dge_anndata(n_donors: int = 5, cell_type_list: list[str] = ['B_cell', 'T_cell', 'DC'], cell_replicate: int = 2, n_vars: int = 10) AnnData[source]

This function returns anndata object for DGE analysis with shape (n_donors*len(cell_type_list)*cell_replicate, n_vars).

It generates obs with random donors with a fixed clinical condition (disease_x or normal). Includes all the cell types in cell_type_list with number of cell_replicate for each donor. It generates a csr(Compressed Sparse Row) matrix with random gene expression values. It generates var with random gene name as var.index of length n_vars.

Parameters:
  • n_donors – Number of donors or subjects in anndata.obs.

  • cell_type_list – List of different cell types to include.

  • cell_replicate – Number of cell replicates per cell type.

  • n_vars – Number of genes to include in anndata.var.

Returns:

Anndata object.

scalr.utils.data_utils.get_one_hot_matrix(data: array)[source]

This function returns a one-hot matrix of given labels.

Parameters:

data – Categorical data of dim 1D or 2D array.

Returns:

one-hot matrix.

scalr.utils.data_utils.get_random_samples(data: AnnData | AnnCollection, n_random_samples: int) tensor[source]

This function returns random N samples from given data.

Parameters:
  • data – AnnData or AnnCollection object.

  • n_random_samples – number of random samples to extract from the data.

Returns:

Chosen random samples tensor.

file_utils

This file contains functions related to file read-write.

scalr.utils.file_utils._get_datapath_from_config(data_config)[source]

This function returns the datapath to be used to read from config.

Parameters:

data_config – Data config.

scalr.utils.file_utils.dump_anndata(adata: AnnData, filepath: str)[source]

This function writes the AnnData to filepath.

scalr.utils.file_utils.dump_csv(df: DataFrame, filepath: str)[source]

This function stores the config file to filepath.

scalr.utils.file_utils.dump_json(config: dict, filepath: str)[source]

This function stores the json file to filepath.

scalr.utils.file_utils.dump_yaml(config: dict, filepath: str)[source]

This function stores the config file to filepath.

scalr.utils.file_utils.load_full_data_from_config(data_config)[source]

This function returns full data from the data config.

Parameters:

data_config – Data config.

scalr.utils.file_utils.load_test_data_from_config(data_config)[source]

This function returns test data from the data config.

Parameters:

data_config – Data config.

scalr.utils.file_utils.load_train_val_data_from_config(data_config)[source]

This function returns train & validation data from the data config.

Parameters:

data_config – Data config.

scalr.utils.file_utils.read_anndata(filepath: str, backed: str = 'r') AnnData[source]

This file returns the Anndata object from filepath.

scalr.utils.file_utils.read_chunked_anndatas(dirpath: str, backed: str = 'r', return_anncollection: bool = True) AnnCollection[source]

This file returns an AnnCollection object from multiple anndatas in dirpath directory.

scalr.utils.file_utils.read_csv(filepath: str, index_col: int = 0) DataFrame[source]

This file returns the DataFrame file object.

scalr.utils.file_utils.read_data(filepath: str, backed: str = 'r', index_col: int = 0, return_anncollection: bool = True) dict | AnnData | AnnCollection[source]

This function reads a json, yaml, csv or AnnData object file if the file path contains it.

Returns an AnnCollection in case of a directory with chunked anndatas.

Parameters:
  • filepath (str) – path to json, yaml or h5ad file. Or directory containing multiple h5ad files.

  • backed (str, optional) – To load AnnData / AnnCollection in backed mode. Defaults to ‘r’.

Raises:

ValueError – In case of the wrong file path provided.

Returns:

Union[dict, AnnData, AnnCollection].

scalr.utils.file_utils.read_json(filepath: str) dict[source]

This file returns the json file object.

scalr.utils.file_utils.read_yaml(filepath: str) dict[source]

This function returns the config file loaded from yaml.

scalr.utils.file_utils.write_chunkwise_data(full_data: AnnData | AnnCollection, sample_chunksize: int, dirpath: str, sample_inds: list[int] = None, feature_inds: list[int] = None, transform: callable = None, num_workers: int = 1)[source]

This function writes data subsets iteratively in a chunkwise manner, to ensure only at most sample_chunksize samples are loaded at a time.

This function can also apply transformation on each chunk.

Parameters:
  • full_data (Union[AnnData, AnnCollection]) – data to be written in chunks.

  • sample_chunksize (int) – number of samples to be loaded at a time.

  • dirpath (str) – path/to/directory to write the chunks of data.

  • sample_inds (list[int], optional) – To be used in case of chunking only a subset of samples. Defaults to all samples.

  • feature_inds (list[int], optional) – To be used in case of writing only a subset of features.dataframe. Defaults to all features.

  • transform (function) – a function to apply a transformation on a chunked numpy array.

  • num_workers (int) – Number of jobs to run in parallel for data writing. Additional workers will not use additional memory, but will be CPU-intensive.

scalr.utils.file_utils.write_data(data: dict | AnnData | DataFrame, filepath: str)[source]

This function writes data to json, yaml, csv or h5ad file.

logger

This file contains an implementation of the logger in the pipeline.

class scalr.utils.logger.EventLogger(name, level=None, filepath=None, stdout=False)[source]

Bases: Logger

Class for event logger. It logs detailed file-level logs during pipeline execution.

filepath = None
heading(msg, prefix, suffix, count)[source]

A function to configure setting for heading.

heading1(msg)[source]

A function to configure setting for heading 1.

heading2(msg)[source]

A function to configure setting for heading 2.

level = 0
class scalr.utils.logger.FlowLogger(name, level=None)[source]

Bases: Logger

Class for flow logger.

It logs high-level overview of pipeline execution in the terminal.

level = 0

misc_utils

This file contains functions related to miscellaneous utilities.

scalr.utils.misc_utils.build_object(module, config: dict)[source]

A builder function to build an object from its config.

Parameters:
  • module – Module containing the class.

  • config – Contains the name of the class and params to initialize the object.

Returns: Object, updated config.

scalr.utils.misc_utils.overwrite_default(user_config: dict, default_config: dict) dict[source]

The function recursively overwrites information from user_config onto the default_config.

scalr.utils.misc_utils.set_seed(seed: int)[source]

A function to set seed for reproducibility.

test_file_utils

This is a test file for file_utils.py

scalr.utils.test_file_utils.test_write_chunkwise_data()[source]

This function tests write_chunkwise(), write_data() & read_data() functions of file_utils.

test_misc_utils

This is a test file for misc_utils.py

scalr.utils.test_misc_utils.test_overwrite_default()[source]

This funciton tests overwrite_default() function of misc_utils.