-
Notifications
You must be signed in to change notification settings - Fork 0
/
ai-safety.md
85 lines (68 loc) · 3.07 KB
/
ai-safety.md
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
---
layout: single
title: AI Safety
permalink: /resources/responsible-ai/ai-safety/
classes: wide
toc: true
# toc_label: "My Table of Contents"
# toc_icon: "cog"
---
More about:
- [Adversarial Robustness](adversarial-robustness.md)
- [LLMs](LLM.md)
On this page, below, a lot of resources on various topics are listed.
# Theory
- [https://aisafetyfundamentals.com/](https://aisafetyfundamentals.com/) (2023)
# Organizations
- [https://aisafety.world/](https://aisafety.world/) (2023)
# Funding
# Open-source toolkits
## Interpretability and Explainability
### Explainability, counterfactuals and probing
- [Explabox](https://github.com/MarcelRobeer/explabox) (2022)
- [IBM: AIX360](https://github.com/Trusted-AI/AIX360) (2019)
- [Microsoft: Responsible AI Toolbox](https://responsibleaitoolbox.ai/) (2021)
- Dashboard that integrates: Error analysis, Fairlearn, InterpretML, DiCE, EconML and Data Balance
- [InterpretML](https://github.com/interpretml/interpret-community)
- SHAP, Mimic and LIME explainers. Permutation feature importance.
- [MI2.ai](Ihttps://www.mi2.ai/)
- [DrWhy](https://github.com/ModelOriented/DrWhy/tree/master) (2019)
- DALEX, survex, Arena, fairmodels,
- Currently working on: ARES, xSurvival, Large Model Analysis
- [XAI](https://github.com/EthicalML/xai) (2018)
- [ELI5](https://eli5.readthedocs.io/en/latest/overview.html)
- [NN-SVG](https://alexlenail.me/NN-SVG/)
- [Neptune-AI blog](https://neptune.ai/blog/ml-model-interpretation-tools)
- [Neptune-AI blog](https://neptune.ai/blog/explainability-auditability-ml-definitions-techniques-tools)
- [AI Ethics tool landscape](https://edwinwenink.github.io/ai-ethics-tool-landscape/)
### Mechanistic interpretability
- [TransformerLens](https://pypi.org/project/transformer-lens/)
- [Pyvene (intervention focused)](https://github.com/stanfordnlp/pyvene?tab=readme-ov-file)
- [Transformer Debugger (OpenAI)](https://github.com/openai/transformer-debugger)
- [BauKit](https://github.com/davidbau/baukit)
- [nnsight](https://github.com/ndif-team/nnsight)
- [Graphpatch](https://github.com/evan-lloyd/graphpatch)
## Fairness
- [IBM: Fairness 360](https://www.ibm.com/opensource/open/projects/ai-fairness-360/)
- [Fairlearn](https://fairlearn.org/)
## Adversarial robustness
- [IBM: ART](https://github.com/Trusted-AI/adversarial-robustness-toolbox) (2018)
- [IBM: URET](https://github.com/IBM/URET) (2023)
- [Paper](https://arxiv.org/pdf/2308.01840.pdf)
- [TextAttack](https://github.com/QData/TextAttack) (2020)
- [Advertorch](https://github.com/BorealisAI/advertorch) (2019)
- [Ares](https://github.com/thu-ml/ares)
- [MART](https://github.com/IntelLabs/MART)
- [FSR](https://github.com/wkim97/FSR) (2023)
## Causal analysis
- [EconML](https://github.com/py-why/EconML)
- [DiCE](https://github.com/interpretml/DiCE)
# MLOps
## Model repositories
- [IBM: Factsheets](https://aifs360.res.ibm.com/)
- [Model card toolkit](https://github.com/tensorflow/model-card-toolkit)
## Platforms
- [wandb.ai](https://wandb.ai/site)
- [run.ai](https://www.run.ai/)
- [mlflow.org](https://mlflow.org/)
- [kubeflow.org](https://www.kubeflow.org/)