scoring

_scoring

This file is a base class for feature scorer.

class scalr.feature.scoring._scoring.ScoringBase[source]

Bases: object

Base class for the scorer.

generate_scores(model: Module, train_data: AnnData | AnnCollection, val_data: AnnData | AnnCollection, target: str, mappings: dict) → ndarray[source]

A function to return the score of each feature for each class.

Parameters:

model (nn.Module) – Trained model to generate scores from.
train_data (Union[AnnData, AnnCollection]) – Training data of model.
val_data (Union[AnnData, AnnCollection]) – Validation data of model.
target (str) – Column in data, used to train the model on.
mappings (dict) – Mapping of model output dimension to its corresponding labels in the metadata columns.

Returns:

score_matrix [num_classes X num_features]

Return type:

np.ndarray

classmethod get_default_params() → dict[source]: Class method to get default params.

scalr.feature.scoring._scoring.build_scorer(scorer_config: dict) → tuple[ScoringBase, dict][source]: Builder object to get scorer, updated scorer_config.

linear_scorer

This file is an implementation of a linear scorer.

class scalr.feature.scoring.linear_scorer.LinearScorer[source]

Bases: ScoringBase

Class for the linear scorer.

This Scorer is only applicable for linear (single-layer) models. It directly uses the weights as the score for each feature.

generate_scores(model: Module, *args, **kwargs) → ndarray[source]: A function to generate and return the weights of the model as a score.

shap_scorer

This file is an implementation of SHAP scorer.

class scalr.feature.scoring.shap_scorer.ShapScorer(early_stop: dict, dataloader: dict, device: str = 'cpu', top_n_genes: int = 100, background_tensor: int = 200, samples_abs_mean: bool = True, logger: str = 'EventLogger', *args, **kwargs)[source]

Bases: ScoringBase

Class for SHAP scorer. It can be used for any model.

_is_shap_early_stop(batch_id: int, genes_class_shap_df: DataFrame, prev_top_genes_batch_wise: dict, top_n_genes: int, threshold: int) → Tuple[bool, dict][source]

A function to check whether previous and current batches’ common genes are: are greater than or equal to the threshold and return top genes batch wise.

Parameters:

batch_id – Current batch number.
genes_class_shap_df – label/class wise genes SHAP values(mean across samples).
prev_top_genes_batch_wise – Dictionary where prev batches per labels top genes are stored.
top_n_genes – Number of top genes check.
threshold – early stop if common genes are higher than this.

Returns:

Early stop value, top genes batch wise.

generate_scores(model: Module, train_data: AnnData | AnnCollection, val_data: AnnData | AnnCollection, target: str, mappings: dict, *args, **kwargs) → ndarray[source]

This function returns the weights of the model as a score.

Parameters:

model – Trained model that is used for SHAP.
train_data – Data that is used as reference data for SHAP.
val_data – On which SHAP will generate the score.
mappings – Contains target-related mappings.

Returns:

class * genes abs weights matrix.

classmethod get_default_params() → dict[source]: Class method to get default params.

get_top_n_genes_weights(model: Module, train_data: AnnData | AnnCollection, test_data: AnnData | AnnCollection, target: str, mappings: dict) → Tuple[ndarray, ndarray][source]

A function to get top n genes of each class and its weights.

Parameters:

model – Trained model to extract weights from.
train_data – Train data.
test_data – Test data that is used for SHAP values.
target – Target name.
mappings – Contains target-related mappings.

Returns:

(class * genes abs weights matrix, class * genes weights matrix).