0.2.0	Jun 24, 2024
0.1.0	Nov 10, 2022

#243 in Biology

73 downloads per month

MIT license

20KB
93 lines

seqdupes

Removes duplicates from FASTA files. Supports filtering based on sequence content or header information.

Installation

Download the source code and run:

cargo install

Run seqdupes to process FASTA files. You can specify whether to filter by sequence or by header.

seqdupes -f path/to/sequence.fastq -j path/to/output.json > no_dupes.fas

If you prefer to filter duplicates based on headers rather than sequences, use the --by-header flag.

seqdupes -f path/to/sequence.fastq -j path/to/output.json --by-header > no_dupes.fas

Parameter	Default	Description
-f, --fasta	-	The path to the FASTQ file to use.
-j, --json	-	The output path for listing duplicates.
-b, --by-header	-	Enables filtering based on headers (optional).

The tool outputs a FASTA file with duplicates removed to stdout and a JSON file containing details of the duplicates to the specified path.

~24–42MB
~611K SLoC