Bioinformatics has long been an unusually collaborative and transparent field, with genomes, protein structures, and other complex biological data habitually deposited into open databases during the course of research. The situation was no different at the outset of the COVID-19 pandemic, when a small group of scientists developed the Pango nomenclature for classifying variants of the SARS-CoV-2 virus. Outside of a handful of Greek-letter “variants of concern” names assigned by the World Health Organization, the Pango nomenclature is the standard for tracking the evolution of the SARS-CoV-2 virus. You may recall names such as B.1.1.7 (Alpha or the UK variant), B.1.351 (Beta or the South African variant), and P.1 (Gamma or the Brazilian variant). You can see a complete list of active SARS-CoV-2 lineages using the Pango nomenclature here.
By August 2020, the work of defining new lineages of SARS-CoV-2 had moved to GitHub, where the scientific process could happen in transparent and collaborative way. The definition of new lineages happens on proposals submitted as GitHub issues. In May 2023, a second GitHub repository was opened to move discussions of smaller or less clear lineages out of the main repository. These discussions can be promoted to the main repository, as this issue tracking LP.8.1 sub-lineages was in May 2025.
The work of defining new lineages of SARS-CoV-2 continues to this day on the GitHub repository, as the virus continues to mutate and evolve. And bioinformatics continues to be a shining beacon for open science for the rest of us to learn from.
