AMR.watch logo

Overview of the filtering process within the amr.watch workflow applied to public genomes of priority bacterial pathogens available in the International Nucleotide Sequence Database Collaboration (INSDC) databases. The genome data are filtered in a series of steps, depicted from left to right in the table, with the numbers in each column representing a subset of those from the previous column.

For the pathogens that are grouped together, we initially accept genomes annotated in the ENA with any of the corresponding taxonomy IDs from the same group and use the Speciator assignments in subsequent processing.

PathogenENA entries (run accessions)Illumina paired-end entriesFiltered entries1Entries with geotemporal data2Entries available for download in SRAEntries associated with unique samples3Assembled genomesGenomes with correct speciesGenomes that passed QCGenomes with collection date post-2010Last updated
All Pathogens 1,735,0841,532,7491,505,140883,048873,491840,277812,359792,304771,139706,413
A. baumannii 50,53441,14239,00326,60926,15225,33625,02524,66024,23222,696
C. coli 174,733169,844166,796113,227111,992111,391109,14033,66232,39831,267
C. jejuni 73,69670,39867,886
E. cloacae complex 22,02419,43719,02914,50714,40113,16812,80712,38611,75511,295
E. faecium 44,50542,55142,22826,61126,59825,52425,33925,08624,74423,495
E. coli 629,048523,467517,130294,161292,931281,835273,338235,587231,500211,915
S. flexneri 13,70513,28612,505
S. sonnei 15,72415,27014,290
H. influenzae 18,61617,30917,2428,3858,3608,0757,4437,4267,2626,646
K. pneumoniae 129,805113,149108,45970,69869,30863,83161,99259,74458,01255,951
N. gonorrhoeae 81,49977,48277,36251,92851,92848,68345,56245,15543,07842,294
P. aeruginosa 85,17162,35259,88131,72431,55930,42828,64428,28027,36025,644
Salmonella Typhi 130,384121,575120,50793,85093,62791,60490,2859,1278,9556,917
Salmonella Typhimurium 28,28527,55525,379
Salmonella Enteritidis 48,78047,90246,220
S. aureus 184,201167,602161,40893,96489,53285,66281,72780,29777,86666,111
S. pneumoniae 184,564176,839176,09557,38457,10354,74051,05750,70449,56635,902

  1. Entries (runs) are filtered to include only those with two FASTQ files, ≥20x mean coverage (via assessment of the "base_count" field) and those associated with a single sample accession.
  2. Entries (runs) are filtered to include those with a collection date that is decodable to at least the year and a sampling location that is decodable to at least the country level.
  3. Entries (runs) are filtered to ensure only one run per sample accession is included (selecting the run with the highest number of bases via assessment of the "base_count" field).