计算PyTorch模型的FLOPs:calflops工具详解

calculate-flops.pytorch

计算PyTorch模型的FLOPs:calflops工具详解

随着深度学习模型日益复杂,准确评估模型的计算复杂度变得越来越重要。FLOPs(浮点运算次数)是衡量模型计算复杂度的重要指标之一。本文将介绍一个强大的FLOPs计算工具 - calflops,它可以方便地计算各种PyTorch神经网络模型的FLOPs、MACs和参数数量。

calflops简介

calflops是一个专门用于计算PyTorch模型理论FLOPs、MACs(乘加运算次数)和参数数量的Python库。它具有以下主要特点:

支持各种神经网络结构,包括Linear、CNN、RNN、GCN、Transformer等。
可计算大语言模型(如BERT、LLaMA等)的FLOPs。
支持自定义模型,只要基于PyTorch实现。
可打印每个子模块的FLOPs、参数等详细信息。
支持在线计算Hugging Face模型的FLOPs,无需下载完整权重。

安装和基本使用

calflops可以通过pip安装:

pip install --upgrade calflops

基本使用示例:

from calflops import calculate_flops
from torchvision import models

model = models.alexnet()
batch_size = 1
input_shape = (batch_size, 3, 224, 224)

flops, macs, params = calculate_flops(model=model,
                                      input_shape=input_shape,
                                      output_as_string=True)

print(f"AlexNet FLOPs:{flops} MACs:{macs} Params:{params}")

这将输出AlexNet模型的FLOPs、MACs和参数数量。

计算Transformer模型

对于Transformer类模型,calflops提供了更便捷的使用方式:

from calflops import calculate_flops
from transformers import AutoModel, AutoTokenizer

model_name = "bert-base-uncased"
model = AutoModel.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

flops, macs, params = calculate_flops(model=model,
                                      input_shape=(1, 128),
                                      transformer_tokenizer=tokenizer)

print(f"BERT FLOPs:{flops} MACs:{macs} Params:{params}")

通过传入transformer_tokenizer,calflops可以自动生成合适的输入数据。

在线计算Hugging Face模型

calflops还支持直接计算Hugging Face上的模型,无需下载完整权重:

from calflops import calculate_flops_hf

model_name = "meta-llama/Llama-2-7b"
flops, macs, params = calculate_flops_hf(model_name=model_name,
                                         input_shape=(1, 128))

print(f"{model_name} FLOPs:{flops} MACs:{macs} Params:{params}")

这对于计算大型语言模型的FLOPs特别有用。

详细输出

calflops可以打印每个子模块的详细计算结果:

flops, macs, params = calculate_flops(model, 
                                      input_shape=(1,3,224,224),
                                      print_results=True,
                                      print_detailed=True)

这将输出类似下面的详细信息:

-------------------------------- Detailed Calculated FLOPs Results --------------------------------
Each module caculated is listed after its name in the following order:
params, percentage of total params, MACs, percentage of total MACs, FLOPS, percentage of total FLOPs

AlexNet(
  61.1 M = 100% Params, 715.51 MMACs = 100% MACs, 1.43 GFLOPS = 50% FLOPs
  (features): Sequential(
    2.47 M = 4.04% Params, 666.52 MMACs = 93.15% MACs, 1.33 GFLOPS = 46.58% FLOPs
    (0): Conv2d(23.3 K = 0.04% Params, 105.71 MMACs = 14.77% MACs, 211.41 MFLOPS = 7.39% FLOPs, 3, 64, kernel_size=(11, 11), stride=(4, 4), padding=(2, 2))
    (1): ReLU(0 = 0% Params, 0.29 MMACs = 0.04% MACs, 0.58 MFLOPS = 0.02% FLOPs, inplace=True)
    (2): MaxPool2d(0 = 0% Params, 0.29 MMACs = 0.04% MACs, 0.58 MFLOPS = 0.02% FLOPs, kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    ...
  )
  (avgpool): AdaptiveAvgPool2d(0 = 0% Params, 0.01 MMACs = 0% MACs, 0.03 MFLOPS = 0% FLOPs, output_size=(6, 6))
  (classifier): Sequential(
    58.63 M = 95.96% Params, 48.98 MMACs = 6.85% MACs, 97.96 MFLOPS = 3.42% FLOPs
    (0): Dropout(0 = 0% Params, 0 MACs = 0% MACs, 0 FLOPS = 0% FLOPs, p=0.5, inplace=False)
    (1): Linear(37.75 M = 61.79% Params, 37.75 MMACs = 5.28% MACs, 75.5 MFLOPS = 2.64% FLOPs, in_features=9216, out_features=4096, bias=True)
    (2): ReLU(0 = 0% Params, 0 MACs = 0% MACs, 0.01 MFLOPS = 0% FLOPs, inplace=True)
    ...
  )
)

这种详细输出可以帮助我们更好地理解模型中各部分的计算复杂度分布。

其他功能

除了基本的FLOPs计算,calflops还提供了一些高级功能:

计算反向传播的FLOPs:

flops, macs, params = calculate_flops(model, 
                                      input_shape=(1,3,224,224),
                                      include_backPropagation=True)

忽略特定模块:

flops, macs, params = calculate_flops(model, 
                                      input_shape=(1,3,224,224),
                                      ignore_modules=[nn.ReLU, nn.Dropout])

计算模型生成过程的FLOPs:

flops, macs, params = calculate_flops(model, 
                                      input_shape=(1,128),
                                      forward_mode="generate")