Virchow

Virchow项目介绍

项目概述

Virchow项目是由Paige和微软研究院联合开发的一个自监督视觉转换模型，专门用于从海量病理图像中提取特征。该模型预训练使用了150万张全视野病理图像，能够在多种计算病理学的下游应用中实现最先进的效果。Virchow可以作为瓷砖级特征提取器（固定或精细调整）使用。

模型细节

开发团队: Paige（位于美国纽约市）和微软研究院（位于美国剑桥）
模型类型: 图像特征骨干
模型参数:
- 参数数量: 632百万
- 图像大小: 224 x 224 像素
模型架构:
- 架构: ViT-H/14
- 补丁大小: 14
- 层数: 32
- 嵌入维度: 1280
- 激活函数: SwiGLU
- 注意头数: 16
- 层级缩放: 启用
训练细节:
- 精度: 混合精度 (fp16)
- 训练目标: DINOv2
预训练数据集: 使用了来自纪念斯隆凯特琳癌症中心内的150万张高分辨率全片病理图像（每像素0.5微米分辨率，20倍放大）进行预训练。
许可证: Apache 2.0

模型使用方法

使用条件

需要的技术栈包括：
- PyTorch（建议版本为2.0以上）
- timm（版本>= 0.9.11）
- huggingface_hub

登入步骤

在使用模型之前，需要在Hugging Face平台上登入，方式如下：

在命令行执行：

huggingface-cli login

在Python代码中执行：

from huggingface_hub import login
login()

更多详细信息请参考Hugging Face 官方文档。

图片嵌入

以下是使用Virchow模型进行图像特征嵌入的基本框架：

import timm
import torch
from timm.data import resolve_data_config
from timm.data.transforms_factory import create_transform
from timm.layers import SwiGLUPacked
from PIL import Image

model = timm.create_model("hf-hub:paige-ai/Virchow", pretrained=True, mlp_layer=SwiGLUPacked, act_layer=torch.nn.SiLU)
model = model.eval()

transforms = create_transform(**resolve_data_config(model.pretrained_cfg, model=model))

image = Image.open("/path/to/your/image.png")
image = transforms(image).unsqueeze(0)

output = model(image)

class_token = output[:, 0]
patch_tokens = output[:, 1:]

embedding = torch.cat([class_token, patch_tokens.mean(1)], dim=-1)

在资源受限的环境中，可以尝试仅使用类标记或补丁标记的均值。对于需要密集输出的下游任务（如分割），可以使用256 x 1280的补丁标记张量。

强烈建议在GPU上以混合精度（fp16）运行该模型：

model = model.to("cuda")
image = image.to("cuda")

with torch.inference_mode(), torch.autocast(device_type="cuda", dtype=torch.float16):
  output = model(image)

class_token = output[:, 0]
patch_tokens = output[:, 1:]

embedding = torch.cat([class_token, patch_tokens.mean(1)], dim=-1)

embedding = embedding.to(torch.float16)