python - Using Biopython to retrieve details on an unknown sequence by BLAST -


i'm using biopython first time. have sequence data unknown organisms, , trying use blast tell organism have come from. wrote following function that:

def find_organism(file):     """     receives fasta file single seq, , uses blast find     organism taken.     """     # seq fasta file     seqrecord = seqio.read(file,"fasta")     # run blast     blastresult = ncbiwww.qblast("blastn", "nt", seqrecord.seq)     # first hit     blastrecord = ncbixml.read(blastresult)     firsthit = blastrecord.alignments[0]     # hit's gi number     title = firsthit.title     gi = title.split("|")[1]     # search ncbi gi number     ncbiresult = entrez.efetch(db="nucleotide", id=gi, rettype="gb", retmode="text")     ncbiresultseqrec = seqio.read(ncbiresult,"gb")     # organism     annotatdict = ncbiresultseqrec.annotations     return(annotatdict['organism']) 

it works fine, takes 2 minutes retrieve organism each species, seems slow me. i'm wondering if better. know may create local copy of ncbi improve performance, , might that. however, suspect querying blast first, take id , use query entrez not way go. have other suggestions improvements?
thanks!

you can organism with:

[...] blastresult = ncbiwww.qblast("blastn", "nt", seqrecord.seq) blastrecord = ncbixml.read(blastresult)  first_organism = blastrecord.descriptions[0] 

this save @ least efetch query. anyway "blastn" can take long, , if planning massively query ncbi you're going banned.


Comments

Popular posts from this blog

javascript - Jquery show_hide, what to add in order to make the page scroll to the bottom of the hidden field once button is clicked -

javascript - Highcharts multi-color line -

javascript - Enter key does not work in search box -