Oral Presentation 25th Annual Lorne Proteomics Symposium 2020

OmixLitMiner - Tool for fast evaluation of knowledge and importance of regulated individual proteins derived from differential proteomics (#33)

Hartmut Schlüter 1 , Pascal Steffen 2 , Jemma Wu 3 , Vijay Raghunath 4 , Hannah Voß 1 , Mark P Molloy 5
  1. Section Mass Spectrometry and Proteomics, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
  2. Bowel Cancer & Biomarker Lab, Kolling Institute, The University of Sydney, St.Leonards, Sydney , New South Wales, Australia
  3. Department of Molecular Sciences, Macquarie University, Sydney, , New South Wales, Australia
  4. Sydney Informatics Hub, The University of Sydney, Sydney, New South Wales, Australia
  5. Bowel Cancer & Biomarker Lab Kolling Institute, St.Leonards, The University of Sydney, Sydney, New South Wales, Australia

Differential proteomics studies today are usually often yielding hundreds of down- or up-regulated proteins which are associated with a defined perturbation. Because of the large number of identified regulated proteins, it is time consuming to answer the question, which proteins are the most important ones with respect to the scientific question and to estimate which of the regulated proteins are most promising for gaining new knowledge. For speeding up the process of evaluation of the importance of the proteins and for highlighting proteins yet unknown in the context of the scientific question, a text mining tool was developed, termed OmixLitMiner. Lists of accession numbers of the identified regulated proteins are uploaded into the tool (based on a script written in the computational language R), which automatically is searching for synonyms in UniProt by transferring them into PubMed and combining them with a keyword characterizing the scientific question. The PubMed search is repeated adding step by step several filters (filter “Title”: the synonyms must occur in the title; filter “Review”: Only hits are searched for representing review papers) and listing the hits after each step of search. After the searches are finished, the tool will summarize the results in a word cloud, statistical plots, MeSH-based clusters and assignment into one of the three categories: Category-1: “well studied”; Category-2: “not well studied”; Category-3: “Not known”. Category-1 proteins can help to validate the experiment since these proteins have been mentioned in the titles of reviews, which are also related to the scientific question of the study. Proteins, which are categorized as Category-2 and Category-3 proteins have not been reported in relationship to the topic of the study. Thus, these proteins may give access to new hypothesis and thereby for new knowledge.