PKU-Alignment

All

14 repositories

ProgressGym
Public
Alignment with a millennium of moral progress.
Python
•
MIT License
•2•10•0•0•Updated Oct 22, 2024Oct 22, 2024
omnisafe
Public
JMLR: OmniSafe is an infrastructural framework for accelerating SafeRL research.
benchmark machine-learning reinforcement-learning deep-learning deep-reinforcement-learning constraint-satisfaction-problem pytorch safety-critical saferl safe-reinforcement-learning
Python
•
Apache License 2.0
•132•926•8•3•Updated Oct 15, 2024Oct 15, 2024
align-anything
Public
Align Anything: Training All-modality Model with Feedback
chameleon multimodal dpo large-language-models rlhf vision-language-model
Python
•
Apache License 2.0
•31•189•4•1•Updated Oct 11, 2024Oct 11, 2024
safe-sora
Public
SafeSora is a human preference dataset designed to support safety alignment research in the text-to-video generation field, aiming to enhance the helpfulness and harmlessness of Large Vision Models (LVMs).
alignment human-preferences text-to-video-generation large-vision-models
Python
•5•24•1•0•Updated Aug 20, 2024Aug 20, 2024
.github
Public
0•0•0•0•Updated Jul 14, 2024Jul 14, 2024
safe-rlhf
Public
Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback
reinforcement-learning transformers transformer safety llama gpt datasets beaver alpaca ai-safety
Python
•
Apache License 2.0
•118•1.3k•12•0•Updated Jun 13, 2024Jun 13, 2024
llms-resist-alignment
Public
Repo for paper "Language Models Resist Alignment"
alignment llama safe alpaca ai-safety vicuna llm llms rlhf safe-rlhf
Python
•0•3•0•0•Updated Jun 9, 2024Jun 9, 2024
safety-gymnasium
Public
NeurIPS 2023: Safety-Gymnasium: A Unified Safe Reinforcement Learning Benchmark
reinforcement-learning constraint-satisfaction-problem safety-critical safety-critical-systems safe-reinforcement-learning safe-reinforcement-learning-environments constraint-rl safe-policy-optimization
Python
•
Apache License 2.0
•53•386•3•0•Updated May 14, 2024May 14, 2024
ProAgent
Public
ProAgent: Building Proactive Cooperative Agents with Large Language Models
language-model cooperative human-ai overcooked human-ai-interaction cooperative-ai llm-agent
JavaScript
•
MIT License
•5•56•1•0•Updated Apr 8, 2024Apr 8, 2024
SafeDreamer
Public
ICLR 2024: SafeDreamer: Safe Reinforcement Learning with World Models
reinforcement-learning constraint-satisfaction-problem safety-critical-systems safe-reinforcement-learning constraint-rl safe-policy-optimization
Python
•
Apache License 2.0
•6•43•1•0•Updated Apr 8, 2024Apr 8, 2024
Safe-Policy-Optimization
Public
NeurIPS 2023: Safe Policy Optimization: A benchmark repository for safe reinforcement learning algorithms
benchmarks reinforcement-learning-algorithms safe safe-reinforcement-learning constrained-reinforcement-learning
Python
•
Apache License 2.0
•46•322•0•0•Updated Mar 20, 2024Mar 20, 2024
AlignmentSurvey
Public
AI Alignment: A Comprehensive Survey
awesome reinforcement-learning ai deep-learning survey alignment papers interpretability red-teaming large-language-models
0•126•0•0•Updated Nov 2, 2023Nov 2, 2023
beavertails
Public
BeaverTails is a collection of datasets designed to facilitate research on safety alignment in large language models (LLMs).
safety llama gpt datasets language-model beaver ai-safety human-feedback-data llm llms
Makefile
•
Apache License 2.0
•5•107•2•0•Updated Oct 27, 2023Oct 27, 2023
ReDMan
Public
ReDMan is an open-source simulation platform that provides a standardized implementation of safe RL algorithms for Reliable Dexterous Manipulation.
Python
•
Apache License 2.0
•2•15•0•0•Updated May 2, 2023May 2, 2023