<!DOCTYPE html>
<html>
    <head>
        <meta charset="utf-8">
        <meta name="viewport" content="width=device-width, initial-scale=1">

        <title>HIV Sequence Locator API</title>
        <link rel="stylesheet" href="//netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css">
        <link rel="stylesheet" href="//netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap-theme.min.css">
        <style type="text/css">
            code {
                color: inherit;
            }

            .form-inline .form-group {
                vertical-align: top;
            }

            .form-inline .btn {
                margin-left: 1em;
            }
        </style>
    </head>
    <body>
    <div class="container-fluid">
        <div class="row">
            <div class="col-md-6">
                <h1>HIV Sequence Locator API</h1>
                <p>
                    A dead-simple web API for
                    <a href="http://www.hiv.lanl.gov/content/sequence/LOCATE/locate.html">LANL's HIV sequence locator</a>
                    providing results in JSON.  Positioning, region, and
                    protein information is all available.  Most of the data
                    presented in the human-readable HTML page is extracted via
                    this API.  Get in touch if you need something that's
                    missing!
                </p>

                <h2>Endpoint</h2>
                <h3><code>POST .../within/hiv</code></h3>
                <p>
                    Requires one or more values for the POST parameter
                    <code>sequence</code> <em>or</em> a single-valued
                    <code>fasta</code> parameter as a URL-encoded string or
                    file upload of <a href="https://en.wikipedia.org/wiki/FASTA">FASTA</a>-formatted
                    sequences.
                </p>
                <p>
                    Both protein and nucleotide
                    sequences are accepted, although the data returned varies
                    by type due to what LANL returns.  See the
                    <a href="#curl">curl example</a> which queries a protein
                    sequence and the same sequence as nucleotides.  If you use
                    LANL's tool directly, the reverse complements of your
                    sequences will also be attempted and the best matching
                    picked; in the interests of reliability and consistency,
                    this API tells LANL <b>not</b> to reverse complement
                    sequences.  You should instead take care of this before
                    submitting.
                </p>
                <p>
                    Optionally accepts a (<em>highly recommended</em>) <code>base</code>
                    parameter set to <code>nucleotide</code> or <code>amino
                    acid</code> which forces all sequences to be interpreted as
                    the given base type.  This is necessary when submitting
                    sequences with an ambiguous base type due to the overlap in
                    IUPAC alphabets.  In such cases, LANL seems to assume
                    nucleotides, potentially producing incorrect results.  For
                    example, the amino acid sequence <code>MGGDMKDNW</code> is
                    also a valid nucleotide sequence, albeit one many ambiguous
                    bases.  Interpreting it as nucleotides, however, is
                    incorrect.  It is not uncommon for short amino acid
                    peptides to exhibit this property.
                </p>
                <p>
                    On success (HTTP 200) the response body is a JSON array
                    of objects, one per sequence.  Both HTTP 4xx and 5xx
                    status codes are used on failure with plain text bodies
                    containing an error message.
                </p>
                <p>
                    The <code>format</code> parameter may be set to
                    <code>csv</code> to return comma-separated values partially
                    representating the full results.  <code>format</code> may
                    also be explicitly set to <code>json</code>, though there is
                    no need to as JSON is the default and will remain so.
                </p>
                <table class="table table-condensed">
                    <thead>
                        <tr>
                            <th>HTTP Status</th>
                            <th>Reason</th>
                        </tr>
                    </thead>
                    <tbody>
                        <tr>
                            <td>405 Method Not Allowed</td>
                            <td>The request did not use the HTTP POST method</td>
                        </tr>
                        <tr>
                            <td>415 Unsupported Media Type</td>
                            <td>The provided <code>fasta</code> parameter appears to be in the wrong format</td>
                        </tr>
                        <tr>
                            <td>422 Unprocessable Entity</td>
                            <td>No <code>sequence</code> or <code>fasta</code> parameter was provided, or the parameter did not contain any sequences</td>
                        </tr>
                        <tr>
                            <td>503 Service Unavailable</td>
                            <td>An unexpected condition occurred while parsing results from LANL</td>
                        </tr>
                        <tr>
                            <td>500 Internal Server Error</td>
                            <td>An unexpected error occurred while processing your request</td>
                        </tr>
                    </tbody>
                </table>
                <p>
                    The API tries not to return incorrect data from misparses
                    of LANL's output.  If it detects an anomoly in any of its
                    parsing stages, it will abort the request and return an
                    HTTP 503 Service Unavailable.  If this happens to your
                    request, or if you are receiving results you don't expect,
                    please <a href="mailto:mullspt+cfar@uw.edu">let us know</a>!
                </p>

                <h2>Quick lookup</h2>
                <p>
                    Submit sequences as a FASTA file and download the location
                    results as a CSV file.  Note that the CSV does not contain
                    all of the information the API can provide since CSV does
                    not have standard support for nested or multi-valued data
                    structures.  This form uses the API described above.
                </p>
                <form action="within/hiv" method="POST" enctype="multipart/form-data" role="form" class="form-inline">
                    <div class="form-group">
                        <label for="fasta-upload">FASTA file</label>
                        <input type="file" id="fasta-upload" name="fasta" style="width: 15em"></label>
                    </div>
                    <div class="form-group">
                        <label><input type="radio" name="base" value="nuc"> Nucleotides</label><br>
                        <label><input type="radio" name="base" value="aa"> Amino acids</label>
                    </div>
                    <input type="hidden" name="format" value="csv">
                    <button type="submit" class="btn btn-default">Submit</button>
                </form>

                <hr>
                <p>
                    Created by Thomas Sibley of the
                    <a href="http://mullinslab.microbiol.washington.edu">Mullins Lab</a>
                    at the University of Washington, Department of Microbiology.
                </p>
                <p>
                    Questions? <a href="mailto:mullspt+cfar@uw.edu">Drop us a line</a>.
                </p>
                <p>
                    <a href="https://github.com/MullinsLab/Bio-WebService-LANL-SequenceLocator">Source code</a>
                </p>
            </div>
            <div class="col-md-6">
                <h2>Examples</h2>
<a name="curl"></a>
<h3>curl</h3>
<pre>
curl -X POST https://indra.mullins.microbiol.washington.edu/locate-sequence/within/hiv \
     --data sequence=SLYNTVAVLYYVHQR \
     --data sequence=TCATTATATAATACAGTAGCAACCCTCTATTGTGTGCATCAAAGG
</pre>
<pre class="pre-scrollable">
[
   {
      "query" : "sequence_1",
      "query_sequence" : "SLYNTVAVLYYVHQR",
      "base_type" : "amino acid",
      "reverse_complement" : "0",
      "alignment" : "\n Query SLYNTVAVLY YVHQR  15\n       :::::::.::  ::::    \n  HXB2 SLYNTVATLY CVHQR\n\n  ",
      "hxb2_sequence" : "SLYNTVATLYCVHQR",
      "similarity_to_hxb2" : "86.7",
      "start" : "77",
      "end" : "91",
      "genome_start" : "1018",
      "genome_end" : "1062",
      "polyprotein" : "Gag",
      "region_names" : [
         "Gag",
         "p17"
      ],
      "regions" : [
         {
            "cds" : "Gag",
            "aa_from_cds_start" : [
               "229",
               "273"
            ],
            "aa_from_polyprotein_start" : null,
            "aa_from_protein_start" : [
               "77",
               "91"
            ],
            "aa_from_query_start" : [
               "1",
               "15"
            ],
            "na_from_hxb2_start" : [
               "1018",
               "1062"
            ]
         },
         {
            "cds" : "p17",
            "aa_from_cds_start" : [
               "229",
               "273"
            ],
            "aa_from_polyprotein_start" : null,
            "aa_from_protein_start" : [
               "77",
               "91"
            ],
            "aa_from_query_start" : [
               "1",
               "15"
            ],
            "na_from_hxb2_start" : [
               "1018",
               "1062"
            ]
         }
      ]
   },
   {
      "query" : "sequence_2",
      "query_sequence" : "TCATTATATAATACAGTAGCAACCCTCTATTGTGTGCATCAAAGG",
      "base_type" : "nucleotide",
      "reverse_complement" : "0",
      "alignment" : "\n Query TCATTATATA ATACAGTAGC AACCCTCTAT TGTGTGCATC AAAGG  45\n       :::::::::: :::::::::: :::::::::: :::::::::: ::::: \n  HXB2 TCATTATATA ATACAGTAGC AACCCTCTAT TGTGTGCATC AAAGG  1062\n\n  ",
      "hxb2_sequence" : "TCATTATATAATACAGTAGCAACCCTCTATTGTGTGCATCAAAGG",
      "similarity_to_hxb2" : "100.0",
      "start" : "229",
      "end" : "273",
      "genome_start" : "1018",
      "genome_end" : "1062",
      "polyprotein" : "Gag",
      "region_names" : [
         "Gag",
         "p17"
      ],
      "regions" : [
         {
            "cds" : "Gag",
            "aa_from_protein_start" : [
               "77",
               "91"
            ],
            "na_from_cds_start" : [
               "229",
               "273"
            ],
            "na_from_hxb2_start" : [
               "1018",
               "1062"
            ],
            "na_from_query_start" : [
               "1",
               "45"
            ],
            "protein_translation" : "SLYNTVATLYCVHQR"
         },
         {
            "cds" : "p17",
            "aa_from_protein_start" : [
               "77",
               "91"
            ],
            "na_from_cds_start" : [
               "229",
               "273"
            ],
            "na_from_hxb2_start" : [
               "1018",
               "1062"
            ],
            "na_from_query_start" : [
               "1",
               "45"
            ],
            "protein_translation" : "SLYNTVATLYCVHQR"
         }
      ]
   }
]
</pre>
<pre>
curl -X POST https://indra.mullins.microbiol.washington.edu/locate-sequence/within/hiv \
     --data base='amino acid' \
     --data sequence=MGGDMKDNW
</pre>
<pre>
curl -X POST https://indra.mullins.microbiol.washington.edu/locate-sequence/within/hiv \
     --form base=nucleotide \
     --form fasta=@<i>/path/to/your/input.fa</i>
</pre>

<a name="perl"></a>
<h3>Perl</h3>
<h4>Directly using <a href="https://metacpan.org/pod/Bio::WebService::LANL::SequenceLocator"><code>Bio::WebService::LANL::SequenceLocator</code></a></h4>
<pre>
#!/usr/bin/env perl
#
# First install the library:
#   cpan -i Bio::WebService::LANL::SequenceLocator
# 
use strict;
use warnings;
use Bio::WebService::LANL::SequenceLocator;

my $locator = Bio::WebService::LANL::SequenceLocator->new(
    agent_string => 'Your Organization - you@example.com',
);

my @sequences = $locator->find([
    "agcaatcagatggtcagccaaaattgccctatagtgcagaacatcc"
   ."aggggcaagtggtacatcaggccatatcacctagaactttaaatgca",
]);
</pre>

<h4>Through our web API</h4>
<pre>
#!/usr/bin/env perl
use strict;
use warnings;

use JSON qw< decode_json >;
use LWP::UserAgent;

my $agent    = LWP::UserAgent->new( agent => 'you@example.com' );
my $response = $agent->post(
    "https://indra.mullins.microbiol.washington.edu/locate-sequence/within/hiv" => [
        sequence => "TCATTATATAATACAGTAGCAACCCTCTATTGTGTGCATCAAAGG",
    ],
);
unless ($response->is_success) {
    die "Request failed: ", $response->status_line, "\n",
        $response->decoded_content;
}
my $results = decode_json( $response->decoded_content );

# $results is now an array ref, like the JSON above
print $results->[0]{polyprotein}, "\n";
</pre>

<a name="python"></a>
<h3>Python</h3>
<pre>
#!/usr/bin/env python2
from urllib2 import Request, urlopen, URLError
from urllib  import urlencode
import json

request = Request('https://indra.microbiol.washington.edu/locate-sequence/within/hiv')
data = urlencode({
    'sequence': [
        'SLYNTVAVLYYVHQR',
        'TCATTATATAATACAGTAGCAACCCTCTATTGTGTGCATCAAAGG'
    ]
}, True);

try:
    response = urlopen(request, data)
    text     = response.read()
    results  = json.loads(text)
except URLError, e:
    print 'Request failed: ', e
except ValueError, e:
    print 'Decoding JSON failed: ', e
finally:
    if results == None:
        exit(1)

print results
</pre>

<a name="R"></a>
<h3>R</h3>
<pre>
library("RCurl")
library("rjson")

results = tryCatch(
  fromJSON(
    postForm(
      "https://indra.mullins.microbiol.washington.edu/locate-sequence/within/hiv",
      sequence="SLYNTVAVLYYVHQR",
      sequence="TCATTATATAATACAGTAGCAACCCTCTATTGTGTGCATCAAAGG")),
  HTTPError = function(e) cat("Error making request: ", e$message),
  error = function(e) cat("Error decoding JSON"))

print(lapply(results, function(s) s$genome_start))
</pre>
                </div>
            </div>
        </div>
    </body>
</html>