Non-targeted metabolomics has emerged as a powerful tool for precision medicine and biomarker discovery, yet it is often hampered by the challenge of identifying compounds from tandem mass spectra (MS/MS). A significant portion of mass spectra remains unexplained due to the incomplete nature of spectral reference libraries. Addressing this gap, researchers have introduced FIORA, an innovative open-source graph neural network (GNN) capable of simulating tandem mass spectra with heightened accuracy.
The creation of high-quality spectral libraries is fundamental for metabolomics, which facilitates the analysis of metabolites within biological systems through liquid chromatography-mass spectrometry (LC-MS). This method isolates and analyzes compounds based on their physical properties and mass. Subsequent fragmentation of these compounds leads to the generation of product ions, which appear as distinct peaks on the mass spectrum, serving as fingerprints for identification. Despite the advancements made since the introduction of MS/MS techniques, significant challenges persist. For example, it was reported by da Silva et al. (2015) only about 34% of MS/MS spectra from non-targeted studies could be annotated due to the limitations of spectral libraries. This phenomenon is often referred to as ‘dark matter’ within the field, as vast numbers of signals from unidentified species remain unexplored.
Efforts to improve compound identification have led to various algorithms. Traditional methods have limited identification rates, and the recent challenges such as CASMI (Critical Assessment of Mass Spectral Interpretation) have highlighted persistent gaps. Between 2016 and 2022, these challenges revealed identification rates to be consistently below 30%, underscoring the importance of enhanced identification techniques.
The development of computational approaches has gained traction; theoretical product ion spectra can now be constructed from chemical structures. Here, FIORA marks a quantum leap. Unlike prior models such as ICEBERG and CFM-ID, FIORA utilizes the molecular neighborhood of bonds to discern bond-breaking patterns, enhancing fragment ion probability predictions. This innovation allows FIORA to surpass state-of-the-art fragmentation algorithms, delivering more accurate simulations of MS/MS spectra.
FIORA operates using advanced GNN techniques, which are adept at modeling complex molecular structures and predicting spectral features. To assess its effectiveness, the research team benchmarked FIORA against its predecessors. The results demonstrated substantial improvement, with FIORA exceeding prior models by providing not only high-quality spectral predictions but also estimating retention time and collision cross-section, which are additional factors key to compound characterization.
The benchmark results reveal FIORA's predicted MS/MS spectra achieved the highest median cosine similarity to reference spectra across various datasets, demonstrating gains from 10% to 49% over ICEBERG, particularly emphasizing its proficiency with negative ionization mode. This translates to improved applicability of FIORA for annotative purposes when exploring complex metabolomic data.
The speed of FIORA’s predictions is also noteworthy; when optimized with GPU acceleration, it operates approximately 30 times faster than conventional CPU methods, predicting around 10,000 spectra within five minutes. This rapid processing time makes it feasible to handle extensive chemical structures efficiently.
While FIORA shows promise with its single-step fragmentation approach, which simplifies the prediction process by focusing on specific bond cleavages and reducing computation complexity, the limitation of missing multi-step fragmentation remains. Future iterations of FIORA could potentially explore these extensive fragmentation pathways, paving the way for even broader applications within untapped areas of metabolomics.
Overall, the introduction of FIORA not only addresses the perennial issues of incomplete spectral libraries but also opens avenues for improved spectral resolution, paving the way for future research to expand our biochemical knowledge and improve the precision of metabolomics.