_pages/resources/responsible-ai/ai-safety.md

---
layout: single
title: AI Safety
permalink: /resources/responsible-ai/ai-safety/
classes: wide
toc: true
# toc_label: "My Table of Contents"
# toc_icon: "cog"
---

More about:
- [Adversarial Robustness](adversarial-robustness.md)
- [LLMs](LLM.md)


On this page, below, a lot of resources on various topics are listed.

# Theory
- [https://aisafetyfundamentals.com/](https://aisafetyfundamentals.com/) (2023)

# Organizations
- [https://aisafety.world/](https://aisafety.world/) (2023)

# Funding


# Open-source toolkits

## Interpretability and Explainability
### Explainability, counterfactuals and probing
- [Explabox](https://github.com/MarcelRobeer/explabox) (2022)
- [IBM: AIX360](https://github.com/Trusted-AI/AIX360) (2019)
- [Microsoft: Responsible AI Toolbox](https://responsibleaitoolbox.ai/) (2021)
    - Dashboard that integrates: Error analysis, Fairlearn, InterpretML, DiCE, EconML and Data Balance
- [InterpretML](https://github.com/interpretml/interpret-community)
    - SHAP, Mimic and LIME explainers. Permutation feature importance.
- [MI2.ai](Ihttps://www.mi2.ai/)
    - [DrWhy](https://github.com/ModelOriented/DrWhy/tree/master) (2019)
        - DALEX, survex, Arena, fairmodels,
    - Currently working on: ARES, xSurvival, Large Model Analysis
- [XAI](https://github.com/EthicalML/xai) (2018)
- [ELI5](https://eli5.readthedocs.io/en/latest/overview.html)
- [NN-SVG](https://alexlenail.me/NN-SVG/)
- [Neptune-AI blog](https://neptune.ai/blog/ml-model-interpretation-tools)
- [Neptune-AI blog](https://neptune.ai/blog/explainability-auditability-ml-definitions-techniques-tools)
- [AI Ethics tool landscape](https://edwinwenink.github.io/ai-ethics-tool-landscape/)

### Mechanistic interpretability
- [TransformerLens](https://pypi.org/project/transformer-lens/)
- [Pyvene (intervention focused)](https://github.com/stanfordnlp/pyvene?tab=readme-ov-file)
- [Transformer Debugger (OpenAI)](https://github.com/openai/transformer-debugger)
- [BauKit](https://github.com/davidbau/baukit)
- [nnsight](https://github.com/ndif-team/nnsight)
- [Graphpatch](https://github.com/evan-lloyd/graphpatch)


## Fairness
- [IBM: Fairness 360](https://www.ibm.com/opensource/open/projects/ai-fairness-360/)
- [Fairlearn](https://fairlearn.org/)

## Adversarial robustness
- [IBM: ART](https://github.com/Trusted-AI/adversarial-robustness-toolbox) (2018)
- [IBM: URET](https://github.com/IBM/URET) (2023)
    - [Paper](https://arxiv.org/pdf/2308.01840.pdf)
- [TextAttack](https://github.com/QData/TextAttack) (2020)
- [Advertorch](https://github.com/BorealisAI/advertorch) (2019)
- [Ares](https://github.com/thu-ml/ares)
- [MART](https://github.com/IntelLabs/MART)
- [FSR](https://github.com/wkim97/FSR) (2023)

## Causal analysis
- [EconML](https://github.com/py-why/EconML)
- [DiCE](https://github.com/interpretml/DiCE)

# MLOps
## Model repositories
- [IBM: Factsheets](https://aifs360.res.ibm.com/) 
- [Model card toolkit](https://github.com/tensorflow/model-card-toolkit)

## Platforms
- [wandb.ai](https://wandb.ai/site)
- [run.ai](https://www.run.ai/)
- [mlflow.org](https://mlflow.org/)
- [kubeflow.org](https://www.kubeflow.org/)