NAME

fu-len - Filter FASTA/FASTQ files by sequence length

VERSION

version 1.7.0

SYNOPSIS

fu-len [options] FILE1 [FILE2 ...]

DESCRIPTION

fu-len is a versatile tool for filtering sequences from FASTA/FASTQ files based on their length. It provides additional functionality for sequence reformatting and name manipulation. The tool can process both FASTA and FASTQ files, including gzipped files, and can handle input from standard input using '-' as the filename.

NAME

fu-len - Filter and manipulate FASTA/FASTQ files based on sequence length

OPTIONS

Input/Output Control

-m, --min INT: Minimum length to keep a sequence. Sequences shorter than this will be filtered out.
-x, --max INT: Maximum length to keep a sequence. Sequences longer than this will be filtered out.
-f, --fasta: Force output in FASTA format, regardless of input format.
-w, --fasta-width INT: Wrap FASTA sequence lines to the specified width. If not specified, sequences will be written as single lines.

Sequence Naming

-n, --namescheme STR

Choose how sequence names should be generated. Available schemes:

raw - Use original sequence names (default)
num - Number sequences sequentially (see --prefix)
file - Use input filename as prefix followed by sequence number

-p, --prefix STR

Prefix to use for sequence names when using the 'num' name scheme.

-s, --separator STR

Separator to use between prefix and number (default: '.').

Sequence Annotation

-l, --len: Add sequence length as a comment to each sequence header.
-c, --strip-comment: Remove existing sequence comments.

Other Options

-v, --verbose: Print verbose information to STDERR.
--version: Print version information and exit.

EXAMPLES

Filter sequences by length:

# Keep sequences between 100 and 1000 bp
fu-len -m 100 -x 1000 input.fa > filtered.fa

Convert FASTQ to wrapped FASTA:

# Convert to FASTA and wrap to 60 characters per line
fu-len -f -w 60 input.fastq > output.fa

Number sequences with custom prefix:

# Add sequential numbers and length information
fu-len -n num -p 'seq' -l input.fa > numbered.fa

Process multiple files:

# Filter all sequences and force FASTA output
fu-len -m 500 -f file1.fq file2.fa > combined.fa

NOTES

When processing multiple files, be aware that:

Duplicate sequence names can cause errors
Mixing FASTA and FASTQ files without --fasta may cause formatting issues
Memory usage increases when checking for duplicate names

MODERN ALTERNATIVE

This suite of tools has been superseded by SeqFu, a compiled program providing faster and safer tools for sequence analysis. This suite is maintained for the higher portability of Perl scripts under certain circumstances.

SeqFu is available at https://github.com/telatin/seqfu2, and can be installed with BioConda conda install -c bioconda seqfu

CITING

Telatin A, Fariselli P, Birolo G. SeqFu: A Suite of Utilities for the Robust and Reproducible Manipulation of Sequence Files. Bioengineering 2021, 8, 59. https://doi.org/10.3390/bioengineering8050059

AUTHOR

Andrea Telatin <andrea@telatin.com>

COPYRIGHT AND LICENSE

This is free software, licensed under:

The MIT (X11) License

To install Proch::N50, copy and paste the appropriate command in to your terminal.

cpanm

cpanm Proch::N50

CPAN shell

perl -MCPAN -e shell
install Proch::N50

For more information on module installation, please visit the detailed CPAN module installation guide.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)

NAME

VERSION

SYNOPSIS

DESCRIPTION

NAME

OPTIONS

Input/Output Control

Sequence Naming

Sequence Annotation

Other Options

EXAMPLES

NOTES

MODERN ALTERNATIVE

CITING

AUTHOR

COPYRIGHT AND LICENSE

Module Install Instructions