Skip to content

microsoft/automated-explanations

Repository files navigation

Automated explanations

A generative framework to bridge data-driven models and scientific theories in language neuroscience (arXiv 2024)

Explaining black box text modules in natural language with language models (arXiv 2023)

This repo contains code to reproduce the experiments in the GEM-V paper and the SASC paper. SASC takes in a text module and produces a natural explanation for it that describes what it types of inputs elicit the largest response from the module (see Fig below). GEM-V tests tests this in detail in an fMRI setting.

SASC is similar to the nice concurrent paper by OpenAI, but simplifies explanations to describe the function rather than produce token-level activations. This makes it simpler/faster, and makes it more effective at describing semantic functions from limited data (e.g. fMRI voxels) but worse at finding patterns that depend on sequences / ordering.

For a simple scikit-learn interface to use SASC, use the imodelsX library. Install with pip install imodelsx then the below shows a quickstart example.

from imodelsx import explain_module_sasc
# a toy module that responds to the length of a string
mod = lambda str_list: np.array([len(s) for s in str_list])

# a toy dataset where the longest strings are animals
text_str_list = ["red", "blue", "x", "1", "2", "hippopotamus", "elephant", "rhinoceros"]
explanation_dict = explain_module_sasc(
    text_str_list,
    mod,
    ngrams=1,
)

Reference

@misc{antonello2024generativeframeworkbridgedatadriven,
      title={A generative framework to bridge data-driven models and scientific theories in language neuroscience}, 
      author={Richard Antonello and Chandan Singh and Shailee Jain and Aliyah Hsu and Jianfeng Gao and Bin Yu and Alexander Huth},
      year={2024},
      eprint={2410.00812},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2410.00812}, 
}

@misc{singh2023explaining,
      title={Explaining black box text modules in natural language with language models}, 
      author={Chandan Singh and Aliyah R. Hsu and Richard Antonello and Shailee Jain and Alexander G. Huth and Bin Yu and Jianfeng Gao},
      year={2023},
      eprint={2305.09863},
      archivePrefix={arXiv},
      primaryClass={cs.AI}
}