python - Using Biopython to retrieve details on an unknown sequence by BLAST -
i'm using biopython first time. have sequence data unknown organisms, , trying use blast tell organism have come from. wrote following function that:
def find_organism(file): """ receives fasta file single seq, , uses blast find organism taken. """ # seq fasta file seqrecord = seqio.read(file,"fasta") # run blast blastresult = ncbiwww.qblast("blastn", "nt", seqrecord.seq) # first hit blastrecord = ncbixml.read(blastresult) firsthit = blastrecord.alignments[0] # hit's gi number title = firsthit.title gi = title.split("|")[1] # search ncbi gi number ncbiresult = entrez.efetch(db="nucleotide", id=gi, rettype="gb", retmode="text") ncbiresultseqrec = seqio.read(ncbiresult,"gb") # organism annotatdict = ncbiresultseqrec.annotations return(annotatdict['organism'])
it works fine, takes 2 minutes retrieve organism each species, seems slow me. i'm wondering if better. know may create local copy of ncbi improve performance, , might that. however, suspect querying blast first, take id , use query entrez not way go. have other suggestions improvements?
thanks!
you can organism with:
[...] blastresult = ncbiwww.qblast("blastn", "nt", seqrecord.seq) blastrecord = ncbixml.read(blastresult) first_organism = blastrecord.descriptions[0]
this save @ least efetch query. anyway "blastn" can take long, , if planning massively query ncbi you're going banned.
Comments
Post a Comment