NAME

fu-grep - Extract sequences using patterns

VERSION

version 1.7.0

SYNOPSIS

fu-grep [options] Pattern InputFile.fa [...]

DESCRIPTION

fu-grep is a versatile tool for searching and extracting sequences from FASTA/FASTQ files based on various criteria. It can search for patterns in:

  • DNA sequences (including reverse complement)

  • Sequence names

  • Sequence comments

The tool supports both stranded and unstranded searches, and can provide detailed annotations about the matches found.

NAME

fu-grep - Extract sequences using patterns from FASTA/FASTQ files

OPTIONS

-a, --annotate

Add comments to the sequence when match is found. The annotation includes:

  • Total number of matches

  • Number of forward matches

  • Number of reverse complement matches (unless --stranded is used)

  • Source filename (when processing multiple files)

-n, --name

Search pattern in sequence name instead of the sequence itself

-c, --comments

Search pattern in sequence comments instead of the sequence itself

-s, --stranded

Do not search for reverse complemented oligo

-f, --fasta

Force output in FASTA format, even for FASTQ input

--cs, --comment-separator STR

Specify custom comment separator (default: tab)

-v, --verbose

Print verbose output

-d, --debug

Print debug information

--version

Print version information and exit

EXAMPLES

Search for a specific DNA pattern:

fu-grep AAGCTT input.fa > matched.fa

Search in multiple files with annotation:

fu-grep -a AAGCTT sample1.fa sample2.fa > matches.fa

Search in sequence names:

fu-grep -n "gene" sequences.fa > named.fa

Process FASTQ file but output in FASTA format:

fu-grep -f AAGCTT input.fastq > output.fa

NOTES

The tool will automatically search for both forward and reverse complement sequences unless the --stranded option is used.

MODERN ALTERNATIVE

This suite of tools has been superseded by SeqFu, a compiled program providing faster and safer tools for sequence analysis. This suite is maintained for the higher portability of Perl scripts under certain circumstances.

SeqFu is available at https://github.com/telatin/seqfu2, and can be installed with BioConda conda install -c bioconda seqfu

CITING

Telatin A, Fariselli P, Birolo G. SeqFu: A Suite of Utilities for the Robust and Reproducible Manipulation of Sequence Files. Bioengineering 2021, 8, 59. https://doi.org/10.3390/bioengineering8050059

AUTHOR

Andrea Telatin <andrea@telatin.com>

COPYRIGHT AND LICENSE

This software is Copyright (c) 2018-2027 by Quadram Institute Bioscience.

This is free software, licensed under:

The MIT (X11) License