通过多任务微调实现跨语言泛化

本仓库概述了用于创建 BLOOMZ、mT0 和 xP3 的所有组件,这些内容在论文《通过多任务微调实现跨语言泛化》中有介绍。

数据
模型
创建 xP3
训练模型
- BLOOMZ
- mT0
评估模型
- 排序评估
- 生成评估
图表
- 图形
- 表格
引用

数据

<table> <tr> <th>名称</th> <th>说明</th> <th>示例模型</th> </tr> <tr> <td><a href=https://huggingface.co/datasets/Muennighoff/xP3x>xP3x</a></t> <td>包含 277 种语言的 17 项任务混合,使用英语提示</td> <td>正在开发 - 加入我们的 Aya 项目 @<a href=https://cohere.for.ai/>C4AI</a> 来帮忙!</td> </tr> <tr> <td><a href=https://huggingface.co/datasets/bigscience/xP3>xP3</a></t> <td>包含 46 种语言的 13 项训练任务混合,使用英语提示</td> <td><a href=https://huggingface.co/bigscience/bloomz>BLOOMZ</a> 和 <a href=https://huggingface.co/bigscience/mt0-xxl>mT0-13B</a></td> </tr> <tr> <td><a href=https://huggingface.co/datasets/bigscience/xP3mt>xP3mt</a></t> <td>包含 46 种语言的 13 项训练任务混合,使用 20 种语言的提示(从英语机器翻译而来)</td> <td><a href=https://huggingface.co/bigscience/bloomz-mt>BLOOMZ-MT</a> 和 <a href=https://huggingface.co/bigscience/mt0-xxl-mt>mT0-13B-MT</a></td> </tr> <tr> <td><a href=https://huggingface.co/datasets/bigscience/xP3all>xP3all</a></t> <td>xP3 加上我们的评估数据集,增加了 3 项任务,总共 16 项任务,涉及 46 种语言,使用英语提示</td> <td></td> </tr> <tr> <td><a href=https://huggingface.co/datasets/bigscience/xP3megds>xP3megds</a></t> <td>使用 <a href=https://github.com/bigscience-workshop/Megatron-DeepSpeed>Megatron-DeepSpeed</a> 处理的 xP3 版本</td> <td><a href=https://huggingface.co/bigscience/bloomz>BLOOMZ</a></td> </tr> <tr> <td><a href=https://huggingface.co/datasets/Muennighoff/P3>P3</a></t> <td>重新处理的仅英语 <a href=https://huggingface.co/datasets/bigscience/P3>P3</a> 版本,包含 8 项训练任务</td> <td><a href=https://huggingface.co/bigscience/bloomz-p3>BLOOMZ-P3</a> 和 <a href=https://huggingface.co/bigscience/mt0-xxl-p3>mT0-13B-P3</a></td> </tr> </table>

模型

<table> <tr> <th colspan="12">在 <a style="font-weight:bold" href=https://huggingface.co/datasets/bigscience/xP3>xP3</a> 上进行多任务微调。推荐用于英语提示。 </tr> <tr> <td>参数</td> <td>300M</td> <td>580M</td> <td>1.2B</td> <td>3.7B</td> <td>13B</td> <td>560M</td> <td>1.1B</td> <td>1.7B</td> <td>3B</td> <td>7.1B</td> <td>176B</td> </tr> <tr> <td>微调模型</td> <td><a href=https://huggingface.co/bigscience/mt0-small>mt0-small</a></td> <td><a href=https://huggingface.co/bigscience/mt0-base>mt0-base</a></td> <td><a href=https://huggingface.co/bigscience/mt0-large>mt0-large</a></td> <td><a href=https://huggingface.co/bigscience/mt0-xl>mt0-xl</a></td> <td><a href=https://huggingface.co/bigscience/mt0-xxl>mt0-xxl</a></td> <td><a href=https://huggingface.co/bigscience/bloomz-560m>bloomz-560m</a></td> <td><a href=https://huggingface.co/bigscience/bloomz-1b1>bloomz-1b1</a></td> <td><a href=https://huggingface.co/bigscience/bloomz-1b7>bloomz-1b7</a></td> <td><a href=https://huggingface.co/bigscience/bloomz-3b>bloomz-3b</a></td> <td><a href=https://huggingface.co/bigscience/bloomz-7b1>bloomz-7b1</a></td> <td><a href=https://huggingface.co/bigscience/bloomz>bloomz</a></td> </tr> </tr> <tr> <th colspan="12">在 <a style="font-weight:bold" href=https://huggingface.co/datasets/bigscience/xP3mt>xP3mt</a> 上进行多任务微调。推荐用于非英语提示。</th> </tr> <tr> <td>微调模型</td> <td></td> <td></td> <td></td> <td></td> <td><a href=https://huggingface.co/bigscience/mt0-xxl-mt>mt0-xxl-mt</a></td> <td></td> <td></td> <td></td> <td></td> <td><a href=https://huggingface.co/bigscience/bloomz-7b1-mt>bloomz-7b1-mt</a></td> <td><a href=https://huggingface.co/bigscience/bloomz-mt>bloomz-mt</a></td> </tr> <th colspan="12">在 <a style="font-weight:bold" href=https://huggingface.co/datasets/Muennighoff/P3>P3</a> 上进行多任务微调。仅供研究目的发布。严格来说不如上述模型!</th> </tr> <tr> <td>微调模型</td> <td></td> <td></td> <td></td> <td></td> <td><a href=https://huggingface.co/bigscience/mt0-xxl-p3>mt0-xxl-p3</a></td> <td></td> <td></td> <td></td> <td></td> <td><a href=https://huggingface.co/bigscience/bloomz-7b1-p3>bloomz-7b1-p3</a></td> <td><a href=https://huggingface.co/bigscience/bloomz-p3>bloomz-p3</a></td> </tr> <th colspan="12">原始预训练检查点。不推荐使用。</th> <tr> <td>预训练模型</td> <td><a href=https://huggingface.co/google/mt5-small>mt5-small</a></td> <td><a href=https://huggingface.co/google/mt5-base>mt5-base</a></td> <td><a href=https://huggingface.co/google/mt5-large>mt5-large</a></td> <td><a href=https://huggingface.co/google/mt5-xl>mt5-xl</a></td> <td><a href=https://huggingface.co/google/mt5-xxl>mt5-xxl</a></td> <td><a href=https://huggingface.co/bigscience/bloom-560m>bloom-560m</a></td> <td><a href=https://huggingface.co/bigscience/bloom-1b1>bloom-1b1</a></td> <td><a href=https://huggingface.co/bigscience/bloom-1b7>bloom-1b7</a></td> <td><a href=https://huggingface.co/bigscience/bloom-3b>bloom-3b</a></td> <td><a href=https://huggingface.co/bigscience/bloom-7b1>bloom-7b1</a></td> <td><a href=https://huggingface.co/bigscience/bloom>bloom</a></td> </tr> </table>

创建 xP3(x)

我们已经处理并上传了 xP3。如果你想重新创建它,请按以下步骤操作:

获取promptsource：对于xP3mt，执行git clone -b xp3mt https://github.com/Muennighoff/promptsource.git；对于xP3，执行git clone -b tr13 https://github.com/Muennighoff/promptsource.git，然后安装：cd promptsource; pip install -e .
安装包：pip install -q datasets iso-639
获取创建脚本并根据需要进行编辑：
- 对于xP3mt，在开头设置USE_ENGLISH_PROMPTS = False
- 对于xP3，在开头设置USE_ENGLISH_PROMPTS = True
运行脚本，例如通过python prepare_xp3.py或SLURM脚本

对于xP3的新扩展xP3x，过程基本相同，除了：

安装xp3x分支：pip install git+https://github.com/Muennighoff/promptsource.git@xp3x
创建脚本位于本仓库，名为create_xp3x.py。

xP3x是xP3的超集，除非你想复现论文，否则我们建议始终使用xP3x（或如果你想要机器翻译的提示，则使用xP3mt）。

训练模型

BLOOMZ

下载预训练模型检查点，其形状为PP=12，TP=4，DP=4。如果你想重塑模型，还需要下载通用检查点。如果你想继续微调，应使用我们的微调检查点，其形状为PP=72，TP=1，DP=4。
设置训练代码：git clone -b t0loading https://github.com/bigscience-workshop/Megatron-DeepSpeed，并按照其设置指南创建包含必要包的环境。
下载Megatron-DeepSpeed处理过的xP3megds，或自行下载xP3，移除merged_{lang}.jsonl文件，并使用此处的脚本为Megatron-DeepSpeed重新预处理。
设置并运行训练脚本：我们使用位于bigscience-workshop/bigscience/train/tr13-mtf的SLURM脚本，称为xp3capmixnewcodelonglossseq。例如，这是用于训练bloomz的脚本。需要修改的脚本重要部分包括：

#SBATCH变量，如节点、GPU、时间等 - 我们的SLURM指南在这里
source $six_ALL_CCFRWORK/start-tr13f-6B3-ml-t0指向你通过Megatron-DeepSpeed设置的conda环境
PATH环境变量，特别是
- TRAIN_DATA_PATH和VALID_DATA_PATH，指向处理过的训练和验证数据文件。我们在本仓库中提供了文件（xp3capmixnewcodelong_train.txt和xp3capmixnewcodelong_validation.txt），但你可能需要更改其中的路径。每种语言的百分比基于它们在xP3中的占比，代码稍微上采样。
PP_SIZE=72，TP_SIZE=1和BATCH SIZE等指定布局。这取决于你可用的硬件。如果更改，可能需要重塑模型。重塑时需要使用通用检查点并在脚本中使用--universal标志。我们建议在之后立即保存新检查点，然后继续训练时不使用--universal，这样会更快。
如果要从保存的检查点重新开始（例如，在训练几步后），确保删除--no-load-optim和--reset-progress标志
训练后，可以使用这里的脚本将检查点转换为transformers格式

有用资源：

博客文章
BLOOM社区标签，例如这里

mT0

按照这里的微调说明进行操作，确保使用预训练的mT5模型和xP3数据集。

有用资源：

T5X论文

评估模型

所有评估结果都可在此仓库获取：https://huggingface.co/datasets/bigscience/evaluation-results，位于各自模型下。以下我们解释如何进行评估。

排序评估

我们在XCOPA、XNLI、XStoryCloze和XWinograd上进行排序评估：

获取promptsource分支：git clone -b xp3mt https://github.com/Muennighoff/promptsource.git，然后cd promptsource; pip install -e .
获取t-zero分支：git clone -b muennighoff/upgrdps https://github.com/Muennighoff/t-zero.git，然后cd t-zero; pip install -e .
下载模型并运行评估脚本，例如bloomz。

生成评估

我们在训练过程中对翻译和摘要进行生成评估以进行验证：

获取promptsource分支：git clone -b xp3mt https://github.com/Muennighoff/promptsource，然后cd promptsource; pip install -e .
获取bigscience-workshop/lm-evaluation-harness：git clone https://github.com/bigscience-workshop/lm-evaluation-harness。例如，7.1B模型的脚本在这里。

我们还在HumanEval上评估代码生成：

获取代码评估代码：git clone https://github.com/loubnabnl/bloom-code-evaluation，并完成其设置。
在code_eval.py中的complete_code(model, tokenizer, prompt, num_completions=1, prepend_eos=True, **gen_kwargs)处将prepend_eos设为False，即complete_code(model, tokenizer, prompt, num_completions=1, prepend_eos=False, **gen_kwargs)。
下载模型并运行评估脚本，将MODEL_CKPT替换为你的路径，例如bloomz使用这个。

图表和表格

图表

图1：plotstables/xp3_taxonomy.drawio 和 plotstables/xp3_taxonomy.pdf
图2：plotstables/xp3_languages.ipynb 和 colab
图3：plotstables/xp3_variants.pdf 和绘图
图4：plotstables/xp3_generalization_bar.pdf 和 colab
图5：plotstables/lang_generalization 和 colab
图6：plotstables/scale.pdf 和 colab
图7：plotstables/validation.pdf 和 colab
图8：plotstables/pretraining_sizes.pdf 和 colab
图9：plotstables/english_task_generalization.pdf 和 colab
图10：plotstables/task_generalization.pdf 和 colab
图11：plotstables/roots_xp3_languages.pdf 和 colab，需要 plotstables/contamination 中的一些文件
图12：plotstables/examples/bloom_code_example.py 和 plotstables/examples/bloom_code_light.pdf 和 plotstables/examples/bloomz_code_light.pdf；原始代码文件可以在这里和这里找到
图13至图16：plotstables/examples/*.pdf 和 plotstables/examples/generations.drawio

表格

表1：Colab 和复杂版本的Colab
表2：改编自Codex论文
表3：手动
表4：plotstables/compute_codegen_len.ipynb 用于生成，plotstables/countcode.py 用于xP3
表5：手动
表6：手动
表7：plotstables/levenshtein.py
表8：与表1相同，但语言从L1交换到L2
表9：Colab
表10：Colab
提示附录：https://github.com/albanie/prompt_formatting_in_latex

引用

@article{muennighoff2022crosslingual,
  title={Crosslingual generalization through multitask finetuning},
  author={Muennighoff, Niklas and Wang, Thomas and Sutawika, Lintang and Roberts, Adam and Biderman, Stella and Scao, Teven Le and Bari, M Saiful and Shen, Sheng and Yong, Zheng-Xin and Schoelkopf, Hailey and others},
  journal={arXiv preprint arXiv:2211.01786},
  year={2022}
}