AmiGO Help: BLAST Search
What is BLAST?
BLAST is a search algorithm designed to find sequence similarities (Altschul et al., 1990). An online guide to BLAST searching can be found at the NCBI BLAST Help Manual [external website]. The Gene Ontology BLAST server uses WU-BLAST (Gish, W. (1996-2004), http://blast.wustl.edu [external website]); technical information may be found at Washington University BLAST Archives [external website].
What does the BLAST search do?
The AmiGO BLAST server searches the sequences from the GO protein sequence database, which comprises protein sequences of genes and gene products that have been annotated to a GO term and submitted to the GO Consortium. Protein queries are searched using BLASTP, while nucleotide sequences are searched using BLASTX. There is no need to specify which program to use, but if more than one sequence is submitted in single query, all sequences must be of the same type.
Entering sequences
The BLAST query form accepts three methods for submitting a query sequence:
- Enter a UniProt accession ID, for example P55269.
- Upload a file containing sequences in FASTA format. Sequences should be separated by a line break.
- Paste FASTA sequence(s) into the textbox. Sequences should be separated by a line break.
When entering more than one sequence, separate queries with a line break.
GOst allows BLAST queries of up to 100 sequences; the total number of residues cannot exceed 3 million. More information on FASTA format can be found at Wikipedia [external website].
BLAST parameters
- Expect threshold
- The expect threshold (E value) is the maximum expect value required for a hit to be returned. The expect threshold is the statistical significance threshold for reporting matches against database sequences. If the statistical significance ascribed to a match is greater than the expect threshold, the match will not be reported. Lower expect threshold values are more stringent, leading to fewer chance matches being reported (source: http://www.ncbi.nlm.nih.gov/blast/blast_help.shtml [external website]).
- Maximum number of alignments
- Select the number of target sequences to display in the results. Choosing fewer sequences produces results faster.
- BLAST filter
- Filtering is on by default and filters the query sequence for low complexity regions. In a protein search low complexity regions appear as 'X's in the alignment while in a nucleotide search they appear as 'N's. The score and E value of a match may be affected by filtering since it effectively shortens the query length.
Submit
Hit Submit to submit the query to the BLAST server. An intermediate page containing the BLAST parameters is displayed while the BLAST job is running. This page will automatically refresh until the BLAST job is finished.
