mae_st

掩码自编码器作为时空学习器：PyTorch实现

这是论文《掩码自编码器作为时空学习器》的PyTorch/GPU重新实现：

@Article{MaskedAutoencodersSpatiotemporal2022,
  author  = {Christoph Feichtenhofer and Haoqi Fan and Yanghao Li and Kaiming He},
  journal = {arXiv:2205.09113},
  title   = {Masked Autoencoders As Spatiotemporal Learners},
  year    = {2022},
}

支持AVA和SSv2下游评估的另一个实现可在PySlowFast中找到。

本仓库是对MAE仓库的修改。安装和准备步骤请参考INSTALL.md。
本仓库基于timm==0.3.2，需要应用修复以与PyTorch 1.8.1+兼容。

可视化演示

对同一视频使用95%（左）和98%（右）掩码率的MAE输出可视化。

使用Colab笔记本运行我们的交互式可视化演示（无需GPU）：

使用预训练检查点进行微调

下表提供了论文中使用的预训练检查点，使用90%掩码率和1600个有效轮次进行预训练，从PySlowFast代码库转换而来：

<table><tbody>   <th valign="bottom"></th> <th valign="bottom">ViT-Large</th> <th valign="bottom">ViT-Huge</th>  <tr><td align="left">Kinetics-400预训练检查点</td> <td align="center"><a href="https://dl.fbaipublicfiles.com/video-mae/pretrain/mae_pretrain_vit_large_k400.pth">下载</a></td> <td align="center"><a href="https://dl.fbaipublicfiles.com/video-mae/pretrain/mae_pretrain_vit_huge_k400.pth">下载</a></td> </tr> <tr><td align="left">md5</td> <td align="center"><tt>edf3a5</tt></td> <td align="center"><tt>3d7f64</tt></td> </tr> </tbody></table> <table><tbody>   <th valign="bottom"></th> <th valign="bottom">ViT-Large</th> <th valign="bottom">ViT-Huge</th> <tr><td align="left">Kinetics-600预训练检查点</td> <td align="center"><a href="https://dl.fbaipublicfiles.com/video-mae/pretrain/mae_pretrain_vit_large_k600.pth">下载</a></td> <td align="center"><a href="https://dl.fbaipublicfiles.com/video-mae/pretrain/mae_pretrain_vit_huge_k600.pth">下载</a></td> </tr> <tr><td align="left">md5</td> <td align="center"><tt>9a9645</tt></td> <td align="center"><tt>27495e</tt></td> </tr> </tbody></table> <table><tbody>   <th valign="bottom"></th> <th valign="bottom">ViT-Large</th> <th valign="bottom">ViT-Huge</th> <tr><td align="left">Kinetics-700预训练检查点</td> <td align="center"><a href="https://dl.fbaipublicfiles.com/video-mae/pretrain/mae_pretrain_vit_large_k700.pth">下载</a></td> <td align="center"><a href="https://dl.fbaipublicfiles.com/video-mae/pretrain/mae_pretrain_vit_huge_k700.pth">下载</a></td> </tr> <tr><td align="left">md5</td> <td align="center"><tt>cdbada</tt></td> <td align="center"><tt>4c4e3c</tt></td> </tr> </tbody></table>

微调说明请参见FINETUNE.md。