switch-base-128

switch-base-128项目介绍

项目概述

Switch-base-128项目是一种新型的“专家混合”（Mixture of Experts，MoE）语言模型，专注于通过使用“稀疏多层感知器”（Sparse MLP）层来提升模型的训练速度和性能。这一模型基于经典的T5架构，但特别之处在于将原本的前馈层替换成包含MLP专家的稀疏层。根据相关研究论文，该模型在进行微调任务时性能优于传统的T5模型，并在训练效率上实现了四倍的提升。

模型详情

模型类型： 语言模型
自然语言处理语言： 英语
许可证： Apache 2.0
相关模型： Switch Transformers系列的所有检查点
源代码和资源：

使用指南

请注意，该模型的检查点是经过掩码语言建模（MLM）任务训练的，因此还未准备好直接用于下游任务。若需执行微调任务，可使用FLAN-T5模型或根据示例教程微调自己的MoE模型。

使用实例

为便于理解，这里提供了在transformers工具中使用该模型的示例：

在CPU上运行模型

from transformers import AutoTokenizer, SwitchTransformersForConditionalGeneration

tokenizer = AutoTokenizer.from_pretrained("google/switch-base-128")
model = SwitchTransformersForConditionalGeneration.from_pretrained("google/switch-base-128")

input_text = "A <extra_id_0> walks into a bar a orders a <extra_id_1> with <extra_id_2> pinch of <extra_id_3>."
input_ids = tokenizer(input_text, return_tensors="pt").input_ids

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))

在GPU上运行模型

from transformers import AutoTokenizer, SwitchTransformersForConditionalGeneration

tokenizer = AutoTokenizer.from_pretrained("google/switch-base-128")
model = SwitchTransformersForConditionalGeneration.from_pretrained("google/switch-base-128", device_map="auto")

input_text = "A <extra_id_0> walks into a bar a orders a <extra_id_1> with <extra_id_2> pinch of <extra_id_3>."
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to(0)

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))