NAME

Seq::Parse - The Bioperl ReadSeq interface

SYNOPSIS

Simple perl interface/wrapper to D.G. Gilbert's ReadSeq program. Used by Seq.pm when internal parsing/formatting code fails.

**NOTE** Not currently used by any of the core bioperl modules. It can be used as a standalone interface to the readseq package but manual editing of is required. See the first few lines of the .pm file for details.

DESCRIPTION

This package was called upon by Seq.pm when internal attemts to format or parse a sequence fail. It is currently not used by any bioperl core module. Basically we decided to deal with sequence formatting in a different way.

Parse.pm is a simple interface to D.G. Gilbert's ReadSeq program, it is not meant to be particularly elegant or efficient. The interface should be abstract enough to allow future versions to seamlessly access other sequence conversion programs besides ReadSeq.

At this time the interface methods have not been fully thought out or implemented. Suggestions are welcome.

If ReadSeq is not on the local system, or this package is not properly configured, Seq.pm will (hopefully) realize this and not attempt to use this code.

USAGE

The ReadSeq executable needs to be installed on your system.

Readseq is freely distributed and is available in shell archive (.shar) form via FTP from ftp.bio.indiana.edu (129.79.224.25) in the molbio/readseq directory. (URL) ftp://ftp.bio.indiana.edu/molbio/readseq/

Standalone

use Parse;

With Seq.pm

If properly configured, Seq.pm will automatically use this module when internal methods at parsing or formatting fail.

The correct path to the readseq executable is configured into this module during the 'make Makefile.PL' phase of installation.

Manual edits needed in Parse.pm if auto-configuration does not happen:

- Change the value of $READSEQ_PATH so that it defines a path to the ReadSeq executable on your system.

- Uncomment the line(s) containing $OK = "Y"

As a standalone module

Parse.pm should be usable is a standalone module. See the usage instructions.

Sequence Conversion/Formatting

ReadSeq has trouble with raw sequences so an explicit convert_from_raw() method has been written. The following code will return the sequence "GAATTCGTT" as a GCG formatted string.

$reply  = &Parse::convert_from_raw(-sequence=>'GAATTCGTT',
                                   -fmt=>'gcg'); 

The "fmt" named-parameter field can be set for the following formats:

IG        (or 'Stanford')
GenBank   (or 'GB')
NBRF
EMBL
GCG
Strider
Fitch
Fasta
Zuker
Phylip3.2 (use 'Phylip3')
Phylip
Plain     (or 'Raw')
PIR       (or 'CODATA')
MSF
ASN.1     (use 'ASN1')
PAUP
Pretty

The "options" named-parameter field can be used to pass switches directly to the ReadSeq executable. This option should only be used by people familiar with operating ReadSeq on the command-line. Use at your own risk as this has not been fully tested.

As an example, the ReadSeq switch '-c' will cause all of the characters in the formatted sequence to be returned in lowercase.

$reply  = &Parse::convert_from_raw(-sequence=>"$seq_string",
                                   -options=>'-c', 
                                   -fmt=>'gcg'); 

Appendix

The following documentation describes the various functions contained in this package. Some functions are for internal use and are not meant to be called by the user; they are preceded by an underscore ("_").

## Internal methods ##

_rearrange()

Title     : _rearrange
Usage     : n/a (internal function)
Function  : Rearranges named parameters to requested order.
Example   : &_rearrange([SEQUENCE,ID,DESC],@p);
Returns   : @params - an array of parameters in the requested order.
Argument  : $order : a reference to an array which describes the desired
                     order of the named parameters.
            @param : an array of parameters, either as a list (in
                     which case the function simply returns the list),
                     or as an associative array (in which case the
                     function sorts the values according to @{$order}
                     and returns that new array.

_write_tmp_file()

Title     : _write_tmp_file
Usage     : n/a (internal function)
Function  : Writes a temporary file to disk. Uses
          : the POSIX tmpnam() call to get path &
          : filename. Should be more portable than
          : just writing to /tmp. 
          :
Example   : &_write_tmp_file("$formatted_sequence");
Returns   : string containing the temp file path 
Argument  : string that is to be written to disk

version()

Title     : version
Usage     : &Parse::version;
Function  : Prints current package version 
Example   : &Parse::version;
Returns   : none
Argument  : none
          :

convert_from_raw()

Title     : convert_from_raw()
Usage     : print &Parse::convert_from_raw(-sequence=>$raw_seq,
          :                                -fmt=>'asn1');
          :
          : $reply  = &Parse::convert_from_raw(-sequence=>'GAATTCGTT',
          :                                    -options=>'-c',
          :                                    -fmt=>'gcg'); 
          :
Function  : ReadSeq does not function well when called upon 
          : to read or convert "raw" or unformatted sequence 
          : strings or files. This code will take a given 
          : raw sequence and manipulate it into FASTA
          : format before invoking ReadSeq.
          :
          : The following named paramters may be used as
          : arguments:
          :
          :  -sequence=>     Sequence string.
          :  -fmt=>          Format sequence will be converted to. 
          :  -options=>      String containing command-line
          :                  switches for ReadSeq. Passed
          :                  directly.
          :
Example   : see usage
Returns   : Formatted sequence string 
Argument  : named parameters, see function
          :

convert()

Title     : convert
          :
Usage     : print &Parse::convert(-sequence=>$raw_seq,
          :                       -fmt=>'asn1');
          :
          : $reply  = &Parse::convert(-sequence=>'GAATTCGTT',
          :                           -options=>'-c',
          :                           -fmt=>'gcg'); 
          :
          : $reply  = &Parse::convert(-location=>'/tmp/a.seq',
          :                           -fmt=>'raw'); 
          :
Note      : ReadSeq does not function well when called upon 
          : to read or convert "raw" or unformatted sequence 
          : strings or files. User beware.
          : 
Function  : Will read/parse a given sequence string *OR* a given
          : sequence file.
          :
          : If a sequence string AND a sequence file path are
          : both passed in, the file path will be used with no
          : complaint.
          :
          : The following named paramters may be used as
          : arguments:
          : 
          :  -sequence=>     Sequence string.
          :  -location=>     Sequence file path.
          :  -fmt=>          Format sequence will be converted to. 
          :  -options=>      String containing command-line
          :                  switches for ReadSeq. Passed
          :                  directly.
          :
Example   : see usage
Returns   : Formatted sequence string 
Argument  : named parameters, see function
          :

ACKNOWLEDGEMENTS

SEE ALSO

Core bioperl modules

REFERENCES

Bioperl Project http://bio.perl.org

COPYWRITE

Copyright (c) 1997-1998 Chris Dagdigian, Georg Fuellen, Steven E. Brenner and others. All Rights Reserved. This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.