http://socictopen.socict.org/files/original/a73176cf6613ae226f10820665f4e85d.pdf bbc6cd57e0b3d8e2ad94fb2147ce58a4 Dublin Core The Dublin Core metadata element set is common to all Omeka records, including items, files, and collections. For more information see, http://dublincore.org/documents/dces/. Title A name given to the resource Coronavirus Description An account of the resource Dominio científico: Coronavirus Text A resource consisting primarily of words for reading. Examples include books, letters, dissertations, poems, newspapers, articles, archives of mailing lists. Note that facsimiles or images of texts are still of the genre Text. Dublin Core The Dublin Core metadata element set is common to all Omeka records, including items, files, and collections. For more information see, http://dublincore.org/documents/dces/. Title A name given to the resource Comparing fixed sampling with minimizer sampling when using k-mer indexes to find maximal exact matches. Creator An entity primarily responsible for making the resource Meznah Almutairy, Eric Torng Description An account of the resource Bioinformatics applications and pipelines increasingly use k-mer indexes to search for similar sequences. The major problem with k-mer indexes is that they require lots of memory. Sampling is often used to reduce index size and query time. Most applications use one of two major types of sampling: fixed sampling and minimizer sampling. It is well known that fixed sampling will produce a smaller index, typically by roughly a factor of two, whereas it is generally assumed that minimizer sampling will produce faster query times since query k-mers can also be sampled. However, no direct comparison of fixed and minimizer sampling has been performed to verify these assumptions. We systematically compare fixed and minimizer sampling using the human genome as our database. We use the resulting k-mer indexes for fixed sampling and minimizer sampling to find all maximal exact matches between our database, the human genome, and three separate query sets, the mouse genome, the chimp genome, and an NGS data set. We reach the following conclusions. First, using larger k-mers reduces query time for both fixed sampling and minimizer sampling at a cost of requiring more space. If we use the same k-mer size for both methods, fixed sampling requires typically half as much space whereas minimizer sampling processes queries only slightly faster. If we are allowed to use any k-mer size for each method, then we can choose a k-mer size such that fixed sampling both uses less space and processes queries faster than minimizer sampling. The reason is that although minimizer sampling is able to sample query k-mers, the number of shared k-mer occurrences that must be processed is much larger for minimizer sampling than fixed sampling. In conclusion, we argue that for any application where each shared k-mer occurrence must be processed, fixed sampling is the right sampling method. Date A point or period of time associated with an event in the lifecycle of the resource 2018 Identifier An unambiguous reference to the resource within a given context DOI: 10.1371/journal.pone.0189960 Source A related resource from which the described resource is derived PLoS ONE Publisher An entity responsible for making the resource available Public Library of Science (PLoS) Coverage The spatial or temporal topic of the resource, the spatial applicability of the resource, or the jurisdiction under which the resource is relevant Science, Medicine Language A language of the resource EN