scoring
_scoring
This file is a base class for feature scorer.
- class scalr.feature.scoring._scoring.ScoringBase[source]
Bases:
object
Base class for the scorer.
- generate_scores(model: Module, train_data: AnnData | AnnCollection, val_data: AnnData | AnnCollection, target: str, mappings: dict) ndarray [source]
A function to return the score of each feature for each class.
- Parameters:
model (nn.Module) – Trained model to generate scores from.
train_data (Union[AnnData, AnnCollection]) – Training data of model.
val_data (Union[AnnData, AnnCollection]) – Validation data of model.
target (str) – Column in data, used to train the model on.
mappings (dict) – Mapping of model output dimension to its corresponding labels in the metadata columns.
- Returns:
score_matrix [num_classes X num_features]
- Return type:
np.ndarray
- scalr.feature.scoring._scoring.build_scorer(scorer_config: dict) tuple[ScoringBase, dict] [source]
Builder object to get scorer, updated scorer_config.
linear_scorer
This file is an implementation of a linear scorer.
- class scalr.feature.scoring.linear_scorer.LinearScorer[source]
Bases:
ScoringBase
Class for the linear scorer.
This Scorer is only applicable for linear (single-layer) models. It directly uses the weights as the score for each feature.
shap_scorer
This file is an implementation of SHAP scorer.
- class scalr.feature.scoring.shap_scorer.ShapScorer(early_stop: dict, dataloader: dict, device: str = 'cpu', top_n_genes: int = 100, background_tensor: int = 200, samples_abs_mean: bool = True, logger: str = 'EventLogger', *args, **kwargs)[source]
Bases:
ScoringBase
Class for SHAP scorer. It can be used for any model.
- _is_shap_early_stop(batch_id: int, genes_class_shap_df: DataFrame, prev_top_genes_batch_wise: dict, top_n_genes: int, threshold: int) Tuple[bool, dict] [source]
- A function to check whether previous and current batches’ common genes are
are greater than or equal to the threshold and return top genes batch wise.
- Parameters:
batch_id – Current batch number.
genes_class_shap_df – label/class wise genes SHAP values(mean across samples).
prev_top_genes_batch_wise – Dictionary where prev batches per labels top genes are stored.
top_n_genes – Number of top genes check.
threshold – early stop if common genes are higher than this.
- Returns:
Early stop value, top genes batch wise.
- generate_scores(model: Module, train_data: AnnData | AnnCollection, val_data: AnnData | AnnCollection, target: str, mappings: dict, *args, **kwargs) ndarray [source]
This function returns the weights of the model as a score.
- Parameters:
model – Trained model that is used for SHAP.
train_data – Data that is used as reference data for SHAP.
val_data – On which SHAP will generate the score.
mappings – Contains target-related mappings.
- Returns:
class * genes abs weights matrix.
- get_top_n_genes_weights(model: Module, train_data: AnnData | AnnCollection, test_data: AnnData | AnnCollection, target: str, mappings: dict) Tuple[ndarray, ndarray] [source]
A function to get top n genes of each class and its weights.
- Parameters:
model – Trained model to extract weights from.
train_data – Train data.
test_data – Test data that is used for SHAP values.
target – Target name.
mappings – Contains target-related mappings.
- Returns:
(class * genes abs weights matrix, class * genes weights matrix).