TF-ID-large-no-caption

TF-ID-large-no-caption项目介绍

项目概述

TF-ID (Table/Figure IDentifier) 是由Yifei Hu 创建的一系列用于学术论文中表格和图像提取的目标检测模型。该项目特别关注从学术论文中识别和提取表格及图像，提供有助于更好分析和理解学术文献中的数据可视化内容。TF-ID 项目包括四种不同的模型版本，其中 “TF-ID-large-no-caption” 是推荐的版本之一。

模型细节

“TF-ID-large-no-caption” 模型是从微软的 Florence-2 大型模型微调而来的。这一版本特别用于提取学术文章中的表格和图像，但不包括文字说明部分。这一功能非常适合那些只需要图形化内容，不关注其文字描述的应用场景。

模型规格

模型大小: 0.77 亿参数
功能: 提取表格和图形，且不包含任何说明文字
推荐: 由于模型的准确性和功能的有效性，大模型版本一直是推荐选择。

输入输出

输入: 接收单页论文的图像作为输入
输出: 返回该页面中所有表格和图像的边界框

训练数据和代码

TF-ID 模型的训练基于 Hugging Face 提供的 Daily Papers 数据集。所有的边界框都经过人工标记和校对，确保数据的准确性和可靠性。

数据集: yifeihu/TF-ID-arxiv-papers
代码库: 可在 GitHub 上找到所有相关的代码资源。

性能表现

在测试中，“TF-ID-large-no-caption” 模型处理来自训练数据集之外的论文页面，得到了相当高的成功率。

总测试图片: 261 张
正确输出图片数量: 254 张
成功率: 97.32%

如何开始使用

为了使用这个模型，可以参考以下Python代码示例：

import requests
from PIL import Image
from transformers import AutoProcessor, AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained("yifeihu/TF-ID-large-no-caption", trust_remote_code=True)
processor = AutoProcessor.from_pretrained("yifeihu/TF-ID-large-no-caption", trust_remote_code=True)

prompt = "<OD>"
url = "https://huggingface.co/yifeihu/TF-ID-base/resolve/main/arxiv_2305_10853_5.png?download=true"
image = Image.open(requests.get(url, stream=True).raw)

inputs = processor(text=prompt, images=image, return_tensors="pt")
generated_ids = model.generate(
    input_ids=inputs["input_ids"],
    pixel_values=inputs["pixel_values"],
    max_new_tokens=1024,
    do_sample=False,
    num_beams=3
)

generated_text = processor.batch_decode(generated_ids, skip_special_tokens=False)[0]
parsed_answer = processor.post_process_generation(generated_text, task="<OD>", image_size=(image.width, image.height))
print(parsed_answer)

如此，用户可以轻松上手，快速应用于学术文献的图形内容提取。

引用信息

如需引用此项目，请使用以下信息：

@misc{TF-ID,
  author = {Yifei Hu},
  title = {TF-ID: Table/Figure IDentifier for academic papers},
  year = {2024},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/ai8hyf/TF-ID}},
}

通过TF-ID-large-no-caption模型，用户能够高效实现学术文献中表格和图形的自动提取，以便更好地理解和分析学术内容。