fair-software.nl https://fair-software.nl/ 建议：

代码质量检查：

.. image:: readthedocs/_static/matchms_header.png :target: readthedocs/_static/matchms.png :align: left :alt: matchms

Matchms 是一个多功能的开源 Python 包，用于导入、处理、清理和比较质谱数据（MS/MS）。它有助于实现简单、可重现的工作流程，将常见质谱文件格式的原始数据转换为预处理和后处理的光谱数据，并支持大规模光谱相似性比较。

该软件支持多种流行的光谱数据格式，包括 mzML、mzXML、msp、metabolomics-USI、MGF 和 JSON。Matchms 提供了一系列用于元数据清理和验证的工具，以及基本的峰值过滤功能，以确保数据的准确性和完整性。Matchms 的一个关键特性是能够应用各种成对相似性度量来比较大量光谱。这不仅包括常见的余弦相关评分，还包括基于分子指纹的比较和其他元数据相关的评估。

Matchms 的一大优势是其可扩展性，允许用户集成自定义的相似性度量。为 Matchms 量身定制的光谱相似性度量的显著例子包括 Spec2Vec 和 MS2DeepScore。此外，Matchms 通过使用更快的相似性度量进行初步预选并支持将结果存储在稀疏数据格式中来提高效率，从而能够比较数十万个光谱。这些特性的组合使 Matchms 成为一个全面的质谱数据分析工具。

如果您在研究中使用 Matchms，请引用以下软件论文：

F Huber, S. Verhoeven, C. Meijer, H. Spreeuw, E. M. Villanueva Castilla, C. Geng, J.J.J. van der Hooft, S. Rogers, A. Belloum, F. Diblen, J.H. Spaaks, (2020). matchms - processing and similarity evaluation of mass spectrometry data. Journal of Open Source Software, 5(52), 2411, https://doi.org/10.21105/joss.02411

de Jonge NF, Hecht H, van der Hooft JJJ, Huber F. (2023). Reproducible MS/MS library cleaning pipeline in matchms. ChemRxiv. Cambridge: Cambridge Open Engage; 2023, https://doi.org/10.26434/chemrxiv-2023-l44cm

最新变更（matchms >= 0.18.0）

Pipeline 类

为了使典型的 matchms 工作流程（数据导入、处理、评分计算）对用户更加易用，matchms 现在提供了一个 Pipeline 类来处理复杂的工作流程。这还允许使用 yaml 文件创建、导入、导出或修改工作流程。请参见以下代码示例（即将更新的教程）。

稀疏评分数组

我们意识到许多基于 matchms 的工作流程旨在比较多对多光谱，而并非所有配对和评分都同等重要。例如，通常是要搜索相似或相关的光谱/化合物。这也意味着通常不需要存储（或计算）所有评分。因此，我们现在在 matchms 中转向稀疏处理评分（这意味着：只存储实际计算的、非空值）。 .. image:: readthedocs/_static/matchms_sketch.png :target: readthedocs/_static/matchms_sketch.png :align: left :alt: matchms代码设计

用户文档

更详细的文档请参阅我们的 readthedocs <https://matchms.readthedocs.io/en/latest/>、我们的 matchms介绍教程 <https://blog.esciencecenter.nl/build-your-own-mass-spectrometry-analysis-pipeline-in-python-using-matchms-part-i-d96c718c68ee> 或 用户文档 <https://matchms.github.io/matchms-docs/intro.html>_。

安装

先决条件：

Python 3.9 - 3.12（更高版本应该也可以工作，但尚未进行系统测试）
Anaconda（推荐）

我们建议在新的虚拟环境中安装matchms以避免依赖冲突

.. code-block:: console

conda create --name matchms python=3.11 conda activate matchms conda install --channel bioconda --channel conda-forge matchms

matchms生态系统 -> 附加功能

以下附加包可以补充Matchms的功能：

Spec2Vec <https://github.com/iomega/spec2vec>_ 是一种替代性的机器学习谱图相似度评分方法，可以通过 pip install spec2vec 安装，并通过 from spec2vec import Spec2Vec 导入，遵循与 matchms.similarity 中的评分相同的API。
MS2DeepScore <https://github.com/matchms/ms2deepscore>_ 是一种基于监督深度学习的谱图相似度评分方法，可以通过 pip install ms2deepscore 安装，并通过 from ms2deepscore import MS2DeepScore 导入，遵循与 matchms.similarity 中的评分相同的API。
matchmsextras <https://github.com/matchms/matchmsextras>_ 包含基于谱图相似度创建网络、对 PubChem 运行谱图搜索或附加绘图方法的额外函数。
MS2Query <https://github.com/iomega/ms2query>_ 是一种可靠快速的基于MS/MS谱图的类似物搜索方法，建立在matchms之上运行。
memo <https://github.com/mandelbrot-project/memo>_ 是一种允许使用组分的碎片谱图（MS2）进行代谢组学样品保留时间（RT）无关对齐的方法。
RIAssigner <https://github.com/RECETOX/RIAssigner>_ 是一种用于气相色谱-质谱（GC-MS）数据保留指数计算的工具。
MSMetaEnhancer <https://github.com/RECETOX/MSMetaEnhancer>_ 是一个Python包，用于使用各种网络服务和计算化学包收集质谱库元数据。
SimMS <https://github.com/PangeAI/SimMS>_ 是一个Python包，提供常见相似度类如 CudaCosineGreedy 和 CudaModifiedCosine 的快速GPU实现。

（如果你知道其他与matchms完全兼容的包，请告诉我们！）

简介

要开始使用matchms，我们建议您按照我们的 matchms介绍教程 <https://blog.esciencecenter.nl/build-your-own-mass-spectrometry-analysis-pipeline-in-python-using-matchms-part-i-d96c718c68ee>_ 进行操作。

以下是使用默认过滤步骤清理谱图，然后计算 tests/testdata/pesticides.mgf <https://github.com/matchms/matchms/blob/master/tests/testdata/pesticides.mgf>_ 文件中质谱之间余弦评分的示例。

.. code-block:: python

from matchms.Pipeline import Pipeline, create_workflow

workflow = create_workflow(
    yaml_file_name="my_config_file.yaml", # 工作流将存储在yaml文件中，可用于重新运行工作流或与他人共享。
    score_computations=[["cosinegreedy", {"tolerance": 1.0}]],
    )
pipeline = Pipeline(workflow)
pipeline.logging_file = "my_pipeline.log"  # 用于管道和日志消息
pipeline.run("tests/testdata/pesticides.mgf")

以下是一个更高级的代码示例，展示如何根据您的需求制定特定的管道。

.. code-block:: python

import os
from matchms.Pipeline import Pipeline, create_workflow
from matchms.filtering.default_pipelines import DEFAULT_FILTERS, LIBRARY_CLEANING

results_folder = "./results"
os.makedirs(results_folder, exist_ok=True)

workflow = create_workflow(
    yaml_file_name=os.path.join(results_folder, "my_config_file.yaml"),  # 工作流将存储在yaml文件中。
    query_filters=DEFAULT_FILTERS,
    reference_filters=LIBRARY_CLEANING + ["add_fingerprint"],
    score_computations=[["precursormzmatch", {"tolerance": 100.0}],
                        ["cosinegreedy", {"tolerance": 1.0}],
                        ["filter_by_range", {"name": "CosineGreedy_score", "low": 0.2}]],
)
pipeline = Pipeline(workflow)
pipeline.logging_file = os.path.join(results_folder, "my_pipeline.log")  # 用于管道和日志消息
pipeline.logging_level = "WARNING"  # 定义日志的详细程度
pipeline.run("tests/testdata/pesticides.mgf", "my_reference_library.mgf",
             cleaned_query_file=os.path.join(results_folder, "cleaned_query_spectra.mgf"),
             cleaned_reference_file=os.path.join(results_folder,
                                                 "cleaned_library_spectra.mgf"))  # 选择您自己的文件

或者，特别是如果您需要更多空间来添加自定义函数和步骤，可以不使用matchms的 Pipeline 单独运行各个步骤：

.. code-block:: python

from matchms.importing import load_from_mgf
from matchms.filtering import default_filters, normalize_intensities
from matchms import calculate_scores
from matchms.similarity import CosineGreedy

# 从MGF格式文件读取谱图，其他格式请参见 https://matchms.readthedocs.io/en/latest/api/matchms.importing.html 
file = load_from_mgf("tests/testdata/pesticides.mgf")

# 应用过滤器清理和增强每个谱图
spectra = []
for spectrum in file:
    # 应用默认过滤器以标准化离子模式、纠正电荷等。
    # 默认过滤器的完整说明见 https://matchms.readthedocs.io/en/latest/api/matchms.filtering.html 。
    spectrum = default_filters(spectrum)
    # 将峰强度缩放至最大值1
    spectrum = normalize_intensities(spectrum)
    spectra.append(spectrum)

# 计算所有谱图之间的余弦相似度评分
# 其他相似度评分方法见 https://matchms.readthedocs.io/en/latest/api/matchms.similarity.html 。
scores = calculate_scores(references=spectra,
                          queries=spectra,
                          similarity_function=CosineGreedy())

# Matchms允许使用scores_by_query为任何查询获取最佳匹配
query = spectra[15]  # 仅作为示例
best_matches = scores.scores_by_query(query, 'CosineGreedy_score', sort=True)

打印每对光谱计算得到的分数

for (reference, score) in best_matches[:10]: # 忽略相同光谱之间的分数 if reference is not query: print(f"参考扫描id: {reference.metadata['scans']}") print(f"查询扫描id: {query.metadata['scans']}") print(f"分数: {score[0]:.4f}") print(f"匹配峰数: {score[1]}") print("----------------------------")

不同的光谱相似度分数

Matchms在matchms.similarity中提供了多种不同的评分方法,还可以通过Spec2Vec或MS2DeepScore等外部包补充更多评分方法。

代码示例:

.. code-block:: python

from matchms.importing import load_from_usi
import matchms.filtering as msfilters
import matchms.similarity as mssim


usi1 = "mzspec:GNPS:GNPS-LIBRARY:accession:CCMSLIB00000424840"
usi2 = "mzspec:MSV000086109:BD5_dil2x_BD5_01_57213:scan:760"

mz_tolerance = 0.1

spectrum1 = load_from_usi(usi1)
spectrum1 = msfilters.select_by_mz(spectrum1, 0, spectrum1.get("precursor_mz"))
spectrum1 = msfilters.remove_peaks_around_precursor_mz(spectrum1,
                                                       mz_tolerance=0.1)

spectrum2 = load_from_usi(usi2)
spectrum2 = msfilters.select_by_mz(spectrum2, 0, spectrum1.get("precursor_mz"))
spectrum2 = msfilters.remove_peaks_around_precursor_mz(spectrum2,
                                                       mz_tolerance=0.1)
# 计算分数:
similarity_cosine = mssim.CosineGreedy(tolerance=mz_tolerance).pair(spectrum1, spectrum2)
similarity_modified_cosine = mssim.ModifiedCosine(tolerance=mz_tolerance).pair(spectrum1, spectrum2)
similarity_neutral_losses = mssim.NeutralLossesCosine(tolerance=mz_tolerance).pair(spectrum1, spectrum2)

print(f"余弦相似度: {similarity_cosine}")
print(f"修正余弦相似度: {similarity_modified_cosine}")
print(f"中性损失余弦相似度: {similarity_neutral_losses}")

spectrum1.plot_against(spectrum2)

开发者文档

安装

安装matchms的步骤如下:

.. code-block:: console

git clone https://github.com/matchms/matchms.git cd matchms conda create --name matchms-dev python=3.11 conda activate matchms-dev

使用conda安装rdkit,其余依赖可以用pip安装

conda install -c conda-forge rdkit python -m pip install --upgrade pip pip install --editable .[dev] # 如果不行请尝试 "poetry install --with dev"

运行代码检查:

.. code-block:: console

prospector

自动修复导入顺序不正确的问题:

.. code-block:: console

isort .

文件将被就地修改,需要手动提交。如果只想查看isort的建议,只需运行:

.. code-block:: console

isort --check-only --diff .

运行测试(包括覆盖率):

.. code-block:: console

pytest

Conda包

Conda打包由Bioconda上的配方处理 <https://github.com/bioconda/bioconda-recipes/blob/master/recipes/matchms/meta.yaml>_。

发布到PyPI将触发在bioconda recipes仓库上创建拉取请求 <https://github.com/bioconda/bioconda-recipes/pulls?q=is%3Apr+is%3Aopen+matchms>_ 一旦PR被合并,matchms的新版本将出现在https://anaconda.org/bioconda/matchms <https://anaconda.org/bioconda/matchms>_