Goslin logo

Goslin

Normalization of lipid names

Description

Goslin is the Grammar of succinct lipid nomenclature project. It defines multiple grammars, one for each lipid name dialect, e.g., LipidMaps, SwissLipids, HMDB, Liebisch shorthand nomenclature. This allows to provide immediate feedback whether a processed lipid notation string is compliant with a particular grammar, or not. Goslin provides libraries for C++, Java, Python and R to read-in lipid names and generate normalized lipid names for a streamlined subsequent data analysis and integration. Each library can process over 1000 lipid names within one second providing the normalized lipid name, chemical sum formula, lipid mass and all structure details. In its current version, it fully supports the revised Liebisch shorthand nomenclature from 2020. This can only be achieved due to the ability of context-free parsing of nested or recursive patterns which may occur in lipid names on higher structural resolution. Another advantage of Goslin is, that the parsers are very robust against syntactically incorrect lipid names avoiding misinterpretation. Providing lipid names on a certain lipid name structure, Goslin is able to provide normalized lipid names on all lower structure hierarchies, e.g. providing the lipid class or category. This makes statistical analysis requests (e.g., lipid category distribution) very easy to execute.

Technical Information

Publications:
PMID:32589019
Training datasets:
NA
Download / Web-service link:
Programming languages:
C++,
Java,
Python,
R
Platforms:
Windows,
Linux,
MacOS
Output formats:
CSV
Input formats:
CSV
Web platform:
Yes
Desktop client:
No
CLI:
Yes
GUI:
Yes
License:
GPL & MIT (Academic)

Tasks

7.1) Lipid annotations and ID converters
Other features:
Mass calculation,
Chemical formular
Link to the external databases:
Output annotations or ids:
LIPID MAPS LMSD,
SwissLipids,
HMDB,
Shorthand notation (according to PMID: 33037133)
Input annotations or ids:
LIPID MAPS LMSD,
SwissLipids,
HMDB,
Shorthand notation (according to PMID: 33037133)
Supported levels of structural annotations:
Species,
Molecular species,
Structure defined levels
Normenclature:
LIPID MAPS classification, nomenclature, and shorthand notation (PMID: 33037133)
Supported lipid classes:
7 out of 8 LIPID MAPS categories, 289 subclasses