RuLES基准测试评 估语言模型遵循规则能力
RuLES是一个评估语言模型遵循规则能力的基准测试项目。它提供多种测试场景,如身份验证和问答。项目包括评估脚本、红队测试工具和测试用例可视化工具。研究人员可以评估不同语言模型遵循简单规则的表现,并计算RuLES得分。项目还包含GCG攻击和模型微调的相关代码与指南。
As of March 7 2024, we have updated the repo with a revised v2.0 benchmark with new test cases. Please see our updated paper for more details.
This repo contains the code for RuLES: Rule-following Language Evaluation Scenarios, a benchmark for evaluating rule-following in language models.
SimonSays
and Questions
scenarios, added support for Google VertexAI API models. Please re-evaluate existing results with python -m llm_rules.scripts.reevaluate
.llm_rules
library.--conv_template
to --fastchat_template
.pip install -e .
To evaluate models with our API wrappers (llm_rules/models/*
), install the optional dependencies:
pip install -e .[models]
OPENAI_API_KEY=<key>
ANTHROPIC_API_KEY=<key>
GOOGLE_API_KEY=<key>
GCP_PROJECT_ID=<project_id>
>>> from huggingface_hub import snapshot_download
>>> snapshot_download(repo_id="meta-llama/Llama-2-7b-chat-hf", local_dir="/my_models/Llama-2-7b-chat-hf", local_dir_use_symlinks=False)
logs/
.Launch an interactive session with:
python -m llm_rules.scripts.manual_redteam --provider openai --model gpt-3.5-turbo-0613 --scenario Authentication --stream
Visualize test cases with:
python -m llm_rules.scripts.show_testcases --test_suite redteam
Our main evaluation script is llm_rules/scripts/evaluate.py
, but since we support lots of evaluation options the code may be hard to follow. Please see llm_rules/scripts/evaluate_simple.py
for a simplified version of the evaluation script.
We wrap API calls with unlimited retries for ease of evaluation. You may want to change the retry functionality to suit your needs.
redteam
test suitepython -m llm_rules.scripts.evaluate --provider openai --model gpt-3.5-turbo-0613 --test_suite redteam --output_dir logs/redteam
When evaluating models using vLLM, evaluate.py
launches an API server in-process. Concurrency should be set much higher for vLLM models. Run evaluation with:
python -m llm_rules.scripts.evaluate --provider vllm --model /path/to/model --fastchat_template llama-2 --concurrency 100
View detailed results on a single test suite with:
python -m llm_rules.scripts.read_results --output_dir logs/redteam/gpt-3.5-turbo-0613
After evaluating on all three test suites (Benign, Basic, and Redteam), compute aggregate RuLES score with:
python -m llm_rules.scripts.read_scores --model_name gpt-3.5-turbo-0613
Finally, you can view responses to individual test casees with:
python -m llm_rules.scripts.show_responses --output_dir logs/redteam/gpt-3.5-turbo-0613 --failed_only
Run the GCG attack with randomized scenario parameters in each iteration:
cd gcg_attack
python main_gcg.py --model /path/to/model --fastchat_template <template_name> --scenario Authentication --behavior withholdsecret
Output logs will be stored in logs/gcg_attack
.
To then evaluate models on the direct_request
test cases with the resulting GCG suffixes:
python -m llm_rules.scripts.evaluate --provider vllm --model /path/to/model --suffix_dir logs/gcg_attack/<model_name> --test_dir data/direct_request --output_dir logs/direct_request_gcg
To reproduce our fine-tuning experiments with Llama-2 7B Chat on the basic_like
test cases:
cd finetune
./finetune_llama.sh
We used 4x A100-80G GPUs for fine-tuning Llama-2 7B Chat and Mistral 7B Instruct, you may be able to adjust deepspeed settings to run on smaller/fewer GPUs.
When evaluating community models, we mostly rely on FastChat conversation templates (documented in model_templates.yaml
) with the exception of a few custom templates added to llm_rules/templates.py
.
@article{mu2023rules,
title={Can LLMs Follow Simple Rules?},
author={Norman Mu and Sarah Chen and
Zifan Wang and Sizhe Chen and David Karamardian and
Lulwa Aljeraisy and Basel Alomair and
Dan Hendrycks and David Wagner},
journal={arXiv},
year={2023}
}
AI数字人视频创作平台
Keevx 一款开箱即用的AI数字人视频创作平台,广泛适用于电商广告、企业培训与社媒宣传,让全球企业与个人创作者无需拍摄剪辑,就能快速生成多语言、高质量的专业视频。
一站式AI创作平台
提供 AI 驱动的图片、视频生成及数字人等功能,助力创意创作
AI办公助手,复杂任务高效处理
AI办公助手,复杂任务高效处理。办公效率低?扣子空间AI助手支持播客生成、PPT制作、网页开发及报告写作,覆盖科研、商业、舆情等领域的专家Agent 7x24小时响应,生活工作无缝切换,提升50%效率!
AI辅助编程,代码自动修复
Trae是一种自适应的集成开发环境(IDE),通过自动化和多元协作改变开发流程。利用Trae,团队能够更快速、精确地编写和部署代码,从而提高编程效 率和项目交付速度。Trae具备上下文感知和代码自动完成功能,是提升开发效率的理想工具。
AI小说写作助手,一站式润色、改写、扩写
蛙蛙写作—国内先进的AI写作平台,涵盖小说、学术、社交媒体等多场景。提供续写、改写、润色等功能,助力创作者高效优化写作流程。界面简洁,功能全面,适合各类写作者提升内容品质和工作效率。