Overview of the filtering process within the amr.watch workflow applied to public genomes of priority bacterial pathogens available in the International Nucleotide Sequence Database Collaboration (INSDC) databases. The genome data are filtered in a series of steps, depicted from left to right in the table, with the numbers in each column representing a subset of those from the previous column.

For the pathogens that are grouped together, we initially accept genomes annotated in the ENA with any of the corresponding taxonomy IDs from the same group and use the Speciator assignments in subsequent processing.

Pathogen	ENA entries (run accessions)	Illumina paired-end entries	Filtered entries¹	Entries with geotemporal data²	Entries available for download in SRA	Entries associated with unique samples³	Assembled genomes	Genomes with correct species	Genomes that passed QC	Genomes with collection date post-2010	Last updated
All Pathogens	1,819,286	1,605,699	1,547,862	929,769	926,887	892,220	858,842	836,019	814,212	748,900
A. baumannii	53,268	43,515	40,173	27,690	27,644	26,792	26,296	25,910	25,478	23,942	5/18/2026
C. coli	179,857	174,805	168,759	115,036	115,015	114,388	111,800	34,459	33,131	31,941	5/18/2026
C. jejuni	179,857	174,805	168,759	115,036	115,015	114,388	111,800	75,465	72,067	69,459	5/18/2026
E. cloacae complex	23,192	20,425	19,590	15,045	14,958	13,685	13,284	12,858	12,216	11,756	5/14/2026
E. faecium	46,420	44,340	42,839	27,034	27,019	25,916	25,707	25,446	25,091	23,842	5/14/2026
E. coli	662,946	553,325	538,925	314,604	313,060	301,356	290,991	251,266	247,123	227,492	5/18/2026
S. flexneri								14,404	13,974	13,193	5/18/2026
S. sonnei								16,595	16,131	15,151	5/18/2026
H. influenzae	19,253	17,689	17,558	8,679	8,679	8,380	7,645	7,627	7,460	6,844	5/14/2026
K. pneumoniae	143,039	125,543	115,275	86,655	86,211	80,209	76,619	72,269	70,341	68,234	5/19/2026
N. gonorrhoeae	84,562	80,433	79,876	53,647	53,646	50,342	47,166	46,759	44,596	43,542	5/14/2026
P. aeruginosa	91,162	66,249	60,970	32,786	32,775	31,610	29,581	29,167	28,217	26,498	5/18/2026
Salmonella Typhi	132,935	123,639	121,310	94,612	94,610	92,566	91,173	9,333	9,138	7,080	5/18/2026
Salmonella Typhimurium								28,466	27,736	25,551	5/18/2026
Salmonella Enteritidis								49,261	48,380	46,698	5/18/2026
S. aureus	191,258	172,949	163,809	95,537	95,073	91,177	86,609	85,139	82,695	70,925	5/18/2026
S. pneumoniae	191,394	182,787	178,778	58,444	58,197	55,799	51,971	51,595	50,438	36,752	5/14/2026

Entries (runs) are filtered to include only those with two FASTQ files, ≥20x mean coverage (via assessment of the "base_count" field) and those associated with a single sample accession.
Entries (runs) are filtered to include those with a collection date that is decodable to at least the year and a sampling location that is decodable to at least the country level.
Entries (runs) are filtered to ensure only one run per sample accession is included (selecting the run with the highest number of bases via assessment of the "base_count" field).