Goslin
Normalization of lipid names
Description
Goslin is the Grammar of succinct lipid nomenclature project. It defines multiple grammars, one for each lipid name dialect, e.g., LipidMaps, SwissLipids, HMDB, Liebisch shorthand nomenclature. This allows to provide immediate feedback whether a processed lipid notation string is compliant with a particular grammar, or not. Goslin provides libraries for C++, Java, Python and R to read-in lipid names and generate normalized lipid names for a streamlined subsequent data analysis and integration. Each library can process over 1000 lipid names within one second providing the normalized lipid name, chemical sum formula, lipid mass and all structure details. In its current version, it fully supports the revised Liebisch shorthand nomenclature from 2020. This can only be achieved due to the ability of context-free parsing of nested or recursive patterns which may occur in lipid names on higher structural resolution. Another advantage of Goslin is, that the parsers are very robust against syntactically incorrect lipid names avoiding misinterpretation. Providing lipid names on a certain lipid name structure, Goslin is able to provide normalized lipid names on all lower structure hierarchies, e.g. providing the lipid class or category. This makes statistical analysis requests (e.g., lipid category distribution) very easy to execute.