🐼 熊猫-70M

这是熊猫-70M的官方GitHub仓库。

熊猫-70M：利用多个跨模态教师为7000万个视频添加说明文字 </br> 陈采宣、阿列克桑德尔·夏罗辛、威利·梅纳帕切、叶卡捷琳娜·代涅卡、赵祥维、全秉恩、方宇威、李欣颖、任剑、杨明轩、谢尔盖·图利亚科夫 </br> 2024年计算机视觉与模式识别会议（CVPR）

简介

熊猫-70M是一个包含7000万个高质量视频-说明文字对的大规模数据集。本仓库包含三个部分：

数据集数据加载包括列出熊猫-70M数据的csv文件和下载数据集的代码。
分割包括将长视频分割成多个语义连贯的短片段的代码。
说明文字生成包括在熊猫-70M上训练的视频说明文字生成模型。

数据集

收集流程

下载

分割	下载链接	源视频数量	样本数量	视频时长	存储空间
训练集（完整）	链接 (2.01 GB)	3,779,763	70,723,513	167,000小时	约36 TB
训练集（1000万）	链接 (381 MB)	3,755,240	10,473,922	37,000小时	约8.0 TB
训练集（200万）	链接 (86.5 MB)	800,000	2,400,000	7,560小时	约1.6 TB
验证集	链接 (803 KB)	2,000	6,000	18.5小时	约4.0 GB
测试集	链接 (803 KB)	2,000	6,000	18.5小时	约4.0 GB
更多细节可在数据集数据加载章节中找到。

演示

Panda-70M中的视频-标题对

<table class="center"> <tr> <td width=33.3% style="border: none"><img src="https://yellow-cdn.veclightyear.com/835a84d5/65b9bf85-2788-4b9d-878e-cf6b5af4ac84.gif"></td> <td width=33.3% style="border: none"><img src="https://yellow-cdn.veclightyear.com/835a84d5/3332f3c8-6be4-4a93-9de9-a12cf1a4079d.gif"></td> <td width=33.3% style="border: none"><img src="https://yellow-cdn.veclightyear.com/835a84d5/6a666ffa-7ef1-4c00-be09-bec0100a7066.gif"></td> </tr> <tr style="text-align: center;"> <td width=33.3% style="border: none">一头犀牛和一头狮子在泥土中打斗。</td> <td width=33.3% style="border: none">一个人正抱着一只长毛腊肠犬。</td> <td width=33.3% style="border: none">一枚火箭在发射台上发射升空。</td> </tr> </table> <table class="center"> <tr> <td width=33.3% style="border: none"><img src="https://yellow-cdn.veclightyear.com/835a84d5/914112e6-3f71-4c71-b651-b847e932f39f.gif"></td> <td width=33.3% style="border: none"><img src="https://yellow-cdn.veclightyear.com/835a84d5/2e3153ec-85b6-459a-8ee2-f6824eadc9f0.gif"></td> <td width=33.3% style="border: none"><img src="https://yellow-cdn.veclightyear.com/835a84d5/81a9f358-bdbe-40e6-b886-79acf4d14686.gif"></td> </tr> <tr style="text-align: center;"> <td width=33.3% style="border: none">一个人正在揉面团并往上面涂果酱。</td> <td width=33.3% style="border: none">一个小男孩在城市里玩篮球。</td> <td width=33.3% style="border: none">一个3D渲染的动物园，里面有动物和一列火车。</td> </tr> </table> <table class="center"> <tr> <td width=33.3% style="border: none"><img src="https://yellow-cdn.veclightyear.com/835a84d5/5a94fc5b-d9b1-4e24-8f23-fe8f31288893.gif"></td> <td width=33.3% style="border: none"><img src="https://yellow-cdn.veclightyear.com/835a84d5/4c4bf1eb-cf7b-44cc-847c-e1381ad03c11.gif"></td> <td width=33.3% style="border: none"><img src="https://yellow-cdn.veclightyear.com/835a84d5/f576b7a6-c56c-4b67-963d-e0488c952c92.gif"></td> </tr> <tr style="text-align: center;"> <td width=33.3% style="border: none">一个戴蓝色手套的人正在将电源连接到喷油器上。</td> <td width=33.3% style="border: none">前景是一片有波浪和岩石的海滩，背景是城市天际线。</td> <td width=33.3% style="border: none">这是一辆拉力赛车在乡间的土路上行驶，路边有人观看。</td> </tr> </table>

<sup>**如果您需要，我们将从我们的数据集/Github/项目网页/技术演示中移除视频样本。请联系tsaishienchen@gmail.com提出请求。</sup>

更多样本请查看这里。

长视频分割和标题生成

https://github.com/snap-research/Panda-70M/assets/3857997/8144cf3d-c20c-4c18-a4bd-011451da9f9b

https://github.com/snap-research/Panda-70M/assets/3857997/b262128e-2152-41e8-873e-db2dc275c40f

Panda-70M许可证

请查看许可证。视频样本来自一个公开可用的数据集。用户必须遵守相关许可证才能使用这些视频样本。

引用

如果您发现本项目对您的研究有用，请引用我们的论文。 :blush:

@article{chen2024panda70m,
    title   = {Panda-70M：利用多个跨模态教师为7000万个视频添加说明文字},
    author  = {陈才显 and Siarohin, Aliaksandr and Menapace, Willi and Deyneka, Ekaterina and 赵向维 and 全秉恩 and 方宇威 and 李欣颖 and 任健 and 杨明轩 and Tulyakov, Sergey},
    journal = {arXiv预印本 arXiv:2402.19479},
    year    = {2024}
}