Cristin-resultat-ID: 2050693
Sist endret: 24. januar 2023, 16:39
NVI-rapporteringsår: 2022
Resultat
Vitenskapelig artikkel
2022

Mandrake: visualizing microbial population structure by embedding millions of genomes into a low-dimensional representation

Bidragsytere:
  • John A. Lees
  • Gerry Tonkin-Hill
  • Zhirong Yang og
  • Jukka Corander

Tidsskrift

Philosophical Transactions of the Royal Society of London. Biological Sciences
ISSN 0962-8436
e-ISSN 1471-2970
NVI-nivå 2

Om resultatet

Vitenskapelig artikkel
Publiseringsår: 2022
Volum: 377
Hefte: 1861
Artikkelnummer: 20210237
Open Access

Importkilder

Scopus-ID: 2-s2.0-85136908395

Beskrivelse Beskrivelse

Tittel

Mandrake: visualizing microbial population structure by embedding millions of genomes into a low-dimensional representation

Sammendrag

In less than a decade, population genomics of microbes has progressed fromthe effort of sequencing dozens of strains to thousands, or even tens of thou-sands of strains in a single study. There are now hundreds of thousands ofgenomes available even for a single bacterial species, and the number of gen-omes is expected to continue to increase at an accelerated pace given theadvances in sequencing technology and widespread genomic surveillanceinitiatives. This explosion of data calls for innovative methods to enablerapid exploration of the structure of a population based on different datamodalities, such as multiple sequence alignments, assemblies and estimatesof gene content across different genomes. Here, we present Mandrake, anefficient implementation of a dimensional reduction method tailored forthe needs of large-scale population genomics. Mandrake is capable of visua-lizing population structure from millions of whole genomes, and weillustrate its usefulness with several datasets representing major pathogens.Our method is freely available both as an analysis pipeline (https://github.com/johnlees/mandrake) and as a browser-based interactive application(https://gtonkinhill.github.io/mandrake-web/).This article is part of a discussion meeting issue‘Genomic populationstructures of microbial pathogens’.1. IntroductionAdvances in DNA sequencing technology have recently made whole-genomesequencing both affordable and scalable enough for routine use in pathogensurveillance by research organizations and public health agencies around theworld [1,2]. A striking example of this is genomic surveillance of the SARS-CoV-2 virus for which over one million genome sequences became availablein just 15 months after its initial discovery [3]. To shed light on population geno-mic data at this scale calls for new tools that can be used for rapid exploration ofthe structure among the samples, with particular emphasis on detecting clustersof similar sequences [4,5]. In this paper, we explore and extend a class ofmethods that aims to reduce the dimensionality of such data to only two dimen-sions, in a manner that supports ready visualization and identification ofclusters.© 2022 The Authors. Published by the Royal Society under the terms of the Creative Commons AttributionLicense http://creativecommons.org/licenses/by/4.0/, which permits unrestricted use, provided the originalauthor and source are credited.

Bidragsytere

John A. Lees

  • Tilknyttet:
    Forfatter
    ved European Bioinformatics Institute
  • Tilknyttet:
    Forfatter
    ved Imperial College London

Gerry Tonkin-Hill

  • Tilknyttet:
    Forfatter
    ved Probabilistisk inferens laboratorium ved Universitetet i Oslo
Aktiv cristin-person

Zhirong Yang

  • Tilknyttet:
    Forfatter
    ved Aalto-yliopisto / Aalto-universitetet
  • Tilknyttet:
    Forfatter
    ved Institutt for datateknologi og informatikk ved Norges teknisk-naturvitenskapelige universitet

Jukka Corander

  • Tilknyttet:
    Forfatter
    ved University of Cambridge
  • Tilknyttet:
    Forfatter
    ved Helsingin yliopisto / Helsingfors universitet
  • Tilknyttet:
    Forfatter
    ved Probabilistisk inferens laboratorium ved Universitetet i Oslo
1 - 4 av 4