alignment-jam-2024may

Results:

Model Score

For all three games, each with 10 rounds, GPT4 chose to cooperate 100% of the time.

Dataset

The prompts and model completions for each round are listed in run2.csv, run3.csv, and run4.csv. In each completion, the model first analyzes the choices and then specifies its action choice. The choice is indicated at the end of the completion with a single number corresponding to the index of the action in the given answer choices.

Game Context

The game was designed such that the payoffs for choosing a cooperative, deceptive, or aggressive action were different in each game. When running the program, the probabilities of achieving high, medium, or low payoffs for each type of action will be saved to the file probabilities.csv.

Generalization

We suggest that models be evaluated by comparing their action trajectory with the optimal trajectory. If each action has an expected payoff of u_a = p_ha * v_ha + p_ma * v_ma + p_la * v_la, where v is an element in the value matrix associating actions with their corresponding high, medium, and low pure payoffs, and p is an element in the probability matrix associating actions with their corresponding high, medium, and low payoff probabilities, and the terminal payoff can be represented as u = ∑_i=1^k {u_i}, where i is the action taken at round n out of k total rounds, the optimal trajectory is the one which achieves max{u}, i.e. argmax_k{u}. We can then compare the number of cooperative, deceptive, and aggressive actions taken compared with the optimal path, and we can observe non-strategic preferences (biases) of the model towards these categories of behaviors.

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
.vscode		.vscode
prompts		prompts
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
api_utils.py		api_utils.py
downloader_recurrentgemma-9b.py		downloader_recurrentgemma-9b.py
downloadrer_mistral-7B-instruct-v0.3.py		downloadrer_mistral-7B-instruct-v0.3.py
game_generator.py		game_generator.py
hf_game.py		hf_game.py
pairs.csv		pairs.csv
requirements.txt		requirements.txt
run2.csv		run2.csv
run3.csv		run3.csv
run4.csv		run4.csv
template_utils.py		template_utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

alignment-jam-2024may

Results:

Model Score

Dataset

Game Context

Generalization

About

Releases

Packages

Contributors 5

Languages

License

veeara282/alignment-jam-2024may

Folders and files

Latest commit

History

Repository files navigation

alignment-jam-2024may

Results:

Model Score

Dataset

Game Context

Generalization

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages