Evaluation

Note

Run and test this code by using this notebook!

Utilizing the generated representational dissimilarity matrices (RDMs), one can conduct a comprehensive evaluation against neural data. Net2Brain provides access to three distinct datasets for exploration purposes. For those possessing custom datasets, ensure the neural activation data adheres to the following formats: [subject, stimuli, stimuli] or [subject, time, stimuli, stimuli].

Representational Similarity Analysis (RSA)

The RSA module within Net2Brain simplifies the performance of RSA, offering a DataFrame output with correlation values and statistical significance markers. For RSA execution, supply:

model_rdms_path: Directory containing the model RDMs.
brain_rdms_path: Directory containing the brain RDMs.
model_name: Identifier for the model, significant for plotting.

# Example using AlexNet
from net2brain.evaluations.rsa import RSA

# Initialize RSA
rsa_evaluation = RSA(model_rdms_path, brain_rdms_path, model_name="AlexNet")

# Conduct Evaluation - Yields a DataFrame
results_df = rsa_evaluation.evaluate()

# Display the Results
print(results_df)

Significant Model Comparisons

When the objective is to ascertain whether one model outperforms another significantly and not by mere chance, the compare_model feature comes into play.

test_statistic, significant_pairs = rsa_evaluation.compare_model(another_evaluation)

Visualizing RSA Results

The built-in plotting capabilities of Net2Brain facilitate the visualization of evaluation metrics. Initialize the Plotting class with the evaluation DataFrames, ensuring each one:

Shares identical regions of interest (ROIs) indicating tests on the same brain RDMs.
Features a unique model identifier, set manually or via the “model_name” during the evaluation process.

Required for plotting:

dataframes: A single DataFrame from the evaluation or a list of them.
pairs: (Optional) A list of significant pairs derived from the evaluation function.
metric: (Optional) The evaluation metric, either R² or R.

from net2brain.evaluations.plotting import Plotting

# Single DataFrame plotting
single_plotter = Plotting([results_df])
single_plotter.plot()

# Plotting all layers
single_plotter.plot_all_layers()

# Multiple DataFrames plotting
multiple_plotter = Plotting([results_df, another_results_df])
multiple_plotter.plot(pairs=significant_pairs)

Weighted RSA

Net2Brain extends support for Weighted RSA, incorporating weights into the analysis to refine model performance assessment and the study of neural representation correlations with computational models.

WRSA necessitates:

model_rdms_path: Directory containing the model RDMs.
brain_rdms_path: Directory containing the brain RDMs.
model_name: Identifier for the model, significant for plotting.

# Example with WRSA
from net2brain.evaluations.wrsa import WRSA

# Initialize WRSA
wrsa_evaluation = WRSA(model_rdms_path, brain_rdms_path, model_name="ResNet50")

# Perform Evaluation - Produces a DataFrame
wrsa_results_df = wrsa_evaluation.evaluate()

# Output the Results
print(wrsa_results_df)

Searchlight RSA

For a focused analysis, the toolbox offers a searchlight RSA functionality, which requires Searchlight RDMs formatted as [ROI, subject, stimuli, stimuli]. Note that this toolbox does not provide RDMs for demonstration purposes, but users can apply the feature if they have suitable RDMs.

from net2brain.evaluations.searchlight import Searchlight

# Initialize Searchlight
searchlight_evaluation = Searchlight(model_rdms_path, searchlight_rdms_path)

# Conduct Evaluation - Returns a DataFrame
searchlight_results_df = searchlight_evaluation.evaluate()

# Present the Results
print(searchlight_results_df)

Linear Encoding Models:

Another integrated analysis pipeline is a linear encoder. Given a npy file with voxel values, and extracted features, the encoder performs an X-fold regression where the training data is used to train a PCA embedding and a linear regression to predict voxel values. The output is the testing split X-fold average pearson correlation.

Note

Run and test this code by using this notebook!

Prerequisites for the linear encoding function include:

feat_path: The file path directing to the model’s features.
roi_path: The file path for brain data formatted as RDM.
model_name: The identifier for the model, crucial for subsequent visualization.
trn_tst_split: (Optional) specifies the train-test data ratio per fold, with a default of 0.8.
n_folds: (Optional) The cross-validation folds count, preset to 3.
n_components: (Optional) The PCA components quantity, defaulting to 100.
batch_size: (Optional) The incremental PCA’s batch size, with a standard value of 100.
pooling (Optional): Pooling method for variable-length features. Options: 'mean' (average across sequence), 'max' (maximum values across sequence), 'first' (use first token/position, e.g., [CLS] token), 'last' (use final token/position). Required when processing transformer/LLM features with different sequence lengths. Defaults to None for fixed-length features.

from net2brain.evaluations.encoding import Linear_Encoding

results_dataframe = Linear_Encoding(
    feat_path,
    roi_path,
    model_name,
    n_folds=3,
    trn_tst_split=0.8,
    n_components=100,
    batch_size=100
    return_correlations=True,
    save_path="path/to/csv"
)

Variance Partitioning Analysis (VPA)

Note

Run and test this code by using this notebook!

Net2Brain enhances model and cerebral data assessment through Variance Partitioning Analysis. This technique supports the evaluation of up to four independent variables in relation to a singular dependent variable, typically the neural data. It works both for time-resolved data (e.g. EEG/MEG) and for data without a time axis (e.g. fMRI).

The requirements for VPA are:

dependent_variable: The brain data. For time-resolved data (e.g. EEG) this is a single path to a pre-stacked array shaped [subjects, time, pairs]. For data without a time axis (e.g. fMRI) it can be either a single [subjects, pairs] array or a list of per-subject RDM paths; in that case the results are returned per subject, with no time dimension.
independent_variable: A list with one entry per independent variable. Each entry is itself a list of RDM paths belonging to that group, which are averaged together when average_models=True. A bare path string is also accepted for a single-model group.
variable_names: The labels for the independent variables, integral for visualization.

Returns:

dataframe: Contains all unique and shared variances, plus the upper and lower noise ceiling as UNC and LNC rows. The dataframe can be filtered to only contain relevant information.

from net2brain.evaluations.variance_partitioning_analysis import VPA

independent_variables = [paths_to_RDM_folder_1, paths_to_RDM_folder_2, paths_to_RDM_folder_3, paths_to_RDM_folder_4]
variable_names = ["Ind_Var1", "Ind_Var2", "Ind_Var3", "Ind_Var4"]

VPA_eval = VPA(dependent_variable, independent_variables, variable_names)
dataframe = VPA_eval.evaluate(average_models=True)

# Filter the dataframe to include only the unique variances and the shared variance by all variables
dataframe = dataframe.query("Variable in ['y1234', 'y1', 'y2', 'y3', 'y4']").reset_index(drop=True)

Plotting VPA

The plotting utilities of Net2Brain cover both kinds of VPA results, and both overlay the upper and lower noise ceiling.

For time-resolved data, plotting_over_time draws each component as a line over time and supports an optional standard-deviation band.

add_std: Enable to display the standard deviation on the graph. Defaults to False.

from net2brain.evaluations.plotting import Plotting

# Time-resolved results (e.g. EEG)
plotter = Plotting(dataframe)
plotter.plotting_over_time(add_std=True)

For data without a time axis, plotting_components draws each component as a bar (mean across subjects with SEM error bars) and marks significance.

threshold: Significance level for the markers. Defaults to 0.05.

# Non-time-resolved results (e.g. fMRI)
plotter = Plotting(dataframe)
plotter.plotting_components()

Centered Kernel Alignment (CKA)

Centered Kernel Alignment (CKA) is a similarity metric used to compare two datasets, such as DNN-derived features and brain activity, based on the relationships within each dataset. Unlike traditional correlations, CKA is scale-invariant and focuses on the structure of pairwise similarities within each dataset.

Note

Run and test this code by using this notebook!

Prerequisites for the CKA function include:

feat_path: The file path directing to the model’s feature .npz files, where each file contains multiple layer activations.
brain_path: The file path for .npy files containing brain data, with each file representing a specific ROI.
model_name: The identifier for the model, crucial for labeling the output.

Returns: - dataframe: Contains CKA scores for each ROI and each layer of the DNN.

from net2brain.evaluations.cka import CKA

results_dataframe = CKA.run(
    feat_path="path/to/features",
    brain_path="path/to/brain_data",
    model_name="model_name"
)

Distributional Comparison (DC)

Distributional Comparison evaluates the similarity between datasets by comparing the feature-wise distributions. Two metrics are available: - Jensen-Shannon Divergence (JSD): Measures the divergence between two probability distributions. It is symmetric and bounded between 0 and 1. - Wasserstein Distance (WD): Also known as Earth Mover’s Distance, measures the cost of transforming one distribution into the other.

Note

Run and test this code by using this notebook!

Prerequisites for the Distributional Comparison function include:

feat_path: The file path directing to the model’s feature .npz files, where each file contains multiple layer activations.
brain_path: The file path for .npy files containing brain data, with each file representing a specific ROI.
metric: The distance metric used to compare distributions. Options are: - “jsd”: Jensen-Shannon Divergence. - “wasserstein”: Wasserstein Distance.
“bins”: Number of bins for histogramming.
model_name: The identifier for the model, crucial for labeling the output.

Warning

If the feature lengths differ between the feature data and brain data, PCA is applied to reduce dimensionality to the same size. While this ensures compatibility, it alters the feature space.

Returns: - dataframe: Contains distributional comparison scores for each ROI and each layer of the DNN.

from net2brain.evaluations.distributional_comparisons import DistributionalComparison

# Using Jensen-Shannon Divergence (JSD)
results_jsd = DistributionalComparison.run(
    feat_path="path/to/features",
    brain_path="path/to/brain_data",
    metric="jsd",
    bins=50,
    model_name="model_name"
)

# Using Wasserstein Distance (WD)
results_wasserstein = DistributionalComparison.run(
    feat_path="path/to/features",
    brain_path="path/to/brain_data",
    metric="wasserstein",
    bins=50,
    model_name="model_name"
)

Stacked Encoding

Stacked Encoding combines predictions from multiple feature spaces (such as different neural network layers) to improve brain activity prediction. Instead of using a single layer or concatenating features, stacked encoding:

Trains separate ridge regression models for each feature space.
Learns optimal weights to combine these predictions.
Produces weights that indicate each feature space’s importance.

Note

Run and test this code by using this notebook!

Prerequisites for the Stacked Encoding function include:

feat_path: Path to model activation files (.npz), with one file per layer.
roi_path: Path to fMRI data files (.npy), or a list of paths for multiple ROIs/subjects.
model_name: Identifier for the model (e.g., ‘AlexNet’, ‘ResNet50’) for labeling results.
n_folds: Number of cross-validation folds for stacking (higher = more robust, but slower).
n_components: Number of principal components for dimensionality reduction (None to skip).
vpa: Whether to perform variance partitioning analysis automatically.
save_path: Directory to save encoding results and variance partitioning data.

Returns: - dataframe: Contains model performance metrics per layer and ROI.

from brain_encoder import Stacked_Encoding

# Basic usage
results_df = Stacked_Encoding(
    feat_path="path/to/model/features/",
    roi_path="path/to/brain/data/",
    model_name="AlexNet",
    n_folds=5,
    n_components=100,
    vpa=True,
    save_path="results/alexnet_encoding"
)

# Inspect results
print(results_df.head())

—

Stacked Variance Partitioning

Stacked Variance Partitioning analyzes the importance of different neural network layers in predicting brain activity. It operates in two directions:

Forward direction: Starts with simple layers and adds complexity. - Identifies the minimum complexity required for prediction.
Backward direction: Starts with complex layers and removes them. - Determines whether brain regions also process lower-level information.

This method defines an interval of relevant layers for each brain region.

Note

Run and test this code by using this notebook!

Prerequisites for the Stacked Variance Partitioning function include:

r2s: Array of R² values for individual layer models (n_layers × n_voxels).
stacked_r2s: Array of R² values for the full stacked model (n_voxels,).
save_path: Path to save variance partitioning results as an .npz file.

Returns: - dictionary: Contains variance partitioning results, including layer assignments.

from brain_encoder import Stacked_Variance_Partitioning
import numpy as np

# Load R² values from stacked encoding
r2s = np.load("path/to/r2s.npy")                # R² values for individual layers
stacked_r2s = np.load("path/to/stacked_r2s.npy")  # R² values for stacked model

# Perform variance partitioning
vp_results = Stacked_Variance_Partitioning(
    r2s=r2s,
    stacked_r2s=stacked_r2s,
    save_path="results/variance_partitioning"
)

# Access the results
forward_layers = vp_results["vp_sel_layer_forward"]   # Layers selected in forward analysis
backward_layers = vp_results["vpr_sel_layer_backward"]  # Layers selected in backward analysis