Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NCBI Entrez eSearch RuntimeError: Invalid db name specified: nuccore #915

Closed
dmenning opened this issue Aug 22, 2016 · 24 comments
Closed

NCBI Entrez eSearch RuntimeError: Invalid db name specified: nuccore #915

dmenning opened this issue Aug 22, 2016 · 24 comments

Comments

@dmenning
Copy link

Hello everyone,

Two questions:

  1. When NCBI phases out GI numbers, what will search_results["IdList"] return? Does the code need to be changed to get Accession.Version?

  2. I am running a Python 2.7 script that has worked in the past but now is throwing an error. I am wondering if there has been a change recently that may be causing the problem.

My script searches NCBI using each Genus species name from a list (species), a general category like bony fish (cat), and a search term like 16S (add_term) and returns a list of ids. The script is below:

def ncbi_search(species_list):
    global full_gi_list
    print "\nSearching NCBI for"+str(species)+' and '+str(add_term)+' and '+str(cat)+'\n'
    while True:
        try:
            search_handle=Entrez.esearch(db="nucleotide", term=cat + ' AND ' + add_term + ' AND ' + species, usehistory="y", RetMax=9999)
            search_results=Entrez.read(search_handle)
            search_handle.close()
            gi_list=search_results["IdList"]
            full_gi_list.extend(gi_list)
            return full_gi_list
        except:
            failed_search_hand.write(str(species)+'\n')
            print "\n\nERROR: "+str(species)+' and '+str(add_term)+' and '+str(cat)+" did not finish searching\n"
            print "\nERROR: Waiting 10 seconds then restarting parse. Check the fail log "+str(failed_search_file)+"\n"
            time.sleep(10)
            continue
        else:
            break  

for species in species_list:
    ncbi_search(species_list)

The error I am getting is:

Traceback (most recent call last):
File "D:\Python27\0 eDNA\02_GenBank_get_fasta_or_gb_no_repeats_1_search_criteria.py", line 98, in ncbi_search(species_list)
File "D:\Python27\0 eDNA\02_GenBank_get_fasta_or_gb_no_repeats_1_search_criteria.py", line 49, in ncbi_search search_results = Entrez.read(search_handle)
File "C:\Python27\lib\site-packages\Bio\Entrez__init__.py", line 376, in read record = handler.read(handle)
File "C:\Python27\lib\site-packages\Bio\Entrez\Parser.py", line 205, in read self.parser.ParseFile(handle)
File "C:\Python27\lib\site-packages\Bio\Entrez\Parser.py", line 343, in endElementHandler raise RuntimeError(value)
RuntimeError: Invalid db name specified: nuccore

I have tried changing the db to "nuccore" and I get the same error. I put in the while True: to keep the script running after the error. Any ideas on what is going wrong?

Thanks for the help

Damian

@peterjc
Copy link
Member

peterjc commented Aug 23, 2016

You've already tried nucleotide to nuccore, which was something I might have suggested.

This could be a temporary problem which might be fixed in a day or so, but if not I think you will need to email the NCBI Entrez help address - and then please update this GitHub issue. Thanks.

@idoerg
Copy link
Contributor

idoerg commented Aug 23, 2016

Just a question: why are you looping with species through the list, and passing the same list each time as an argument?

@peterjc
Copy link
Member

peterjc commented Aug 23, 2016

Good point Iddo - it looks like the script works due to a second error in the function definition, and Python's overly helpful scope rules. It should be:

def ncbi_search(species):
    # Do stuff
    ...

for species in species_list:
    ncbi_search(species)

@idoerg
Copy link
Contributor

idoerg commented Aug 23, 2016

This line:
print "\nSearching NCBI for"+str(species)+' and '+str(add_term)+' and '+str(cat)+'\n'

will have problems. add_term and cat are undefined. Would be nice to have a working script with sample input. I'm not sure what these variables are supposed to have.

@skbrimer
Copy link

I wonder why nuccore didn't work according to the NCBI docs it is the correct database name. https://www.ncbi.nlm.nih.gov/books/NBK25497/table/chapter2.T._entrez_unique_identifiers_ui/?report=objectonly

@dmenning
Copy link
Author

dmenning commented Aug 23, 2016

Good point Iddo. I will make that change, species_list to species.
add_term is a user defined search term like 16S, cat is another user
defined search term but for a broad category like "bony
fishes"[porgn:__txid7898]. This script has worked in the past. Last time
I tried it was about two weeks ago, then again yesterday when it stopped
working.

I just tried the script again and am getting the same error with both
nucleotide and nuccore as the db. I am going to email NCBI and see what
they have to say.

On Tue, Aug 23, 2016 at 7:55 AM, Sean notifications@github.com wrote:

I wonder why nuccore didn't work according to the NCBI docs it is the
correct database name. https://www.ncbi.nlm.nih.gov/
books/NBK25497/table/chapter2.T.entrez_unique_identifiers
ui/?report=objectonly


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#915 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ANKc77v7hRxZz_fTV_cDahMMuTNIVUPvks5qixflgaJpZM4JqZ8w
.

@idoerg
Copy link
Contributor

idoerg commented Aug 23, 2016

Well, it won't run now "as is" simply because the variables have no values.
If you would like help in debugging it, please write a script with assignment commands to all variables.

@dmenning
Copy link
Author

Here is the code. Right now it will search a file I titled "Fish list.txt" that contains a list of common names and fish Genus species. It will take each Genus species and search NCBI for 16S and '"bony fishes"[porgn:__txid7898]' and will make a list of all the IDs found. It will then go through that list creating a new list. If the ID isn't in the new list it will download the .fasta I couldn't figure out how to attach the fish list.txt file so there is a list at the bottom. It hasn't failed on the download portion yet using efetch. It just fails on the esearch portion. There is a while True: in there to keep the script running when it fails, but an ERROR message will pop up.

Let me know if you have any ideas.

Thanks

Damian

import os
import re
import time
import datetime
from Bio import Entrez

print "\nThis script will communicate with the National Center for Biotechnology"
print "Information GenBank database which requests users e-mail address."
while True:
    email=raw_input("\nPlease type your e-mail address: ")
    if email=="":
        quit()
    if "@" in email:
        Entrez.email=email
        break
    else:
        print "\nPlease enter a valid email address"

name="Fish list.txt"
fhand=open(name, 'r')

pos=1
type="fasta"
num=1
add_term=16S
cat='"bony fishes"[porgn:__txid7898]'

def make_species_list(fhand):
    global species_list
    print "\n Making species list from "+name+'\n'
    global pos
    for line in fhand:
        line=line.rstrip()
        line=line.split(',')
        species_list.append(line[pos])
    return species_list

def ncbi_search(species):
    global full_gi_list
    print "\nSearching NCBI for"+str(species)+' and '+str(add_term)+' and '+str(cat)+'\n'
    while True:
        try:
            search_handle=Entrez.esearch(db="nucleotide", term=cat + ' AND ' + add_term + ' AND ' + species, usehistory="y", RetMax=9999)
            search_results=Entrez.read(search_handle)
            search_handle.close()
            gi_list=search_results["IdList"]
            full_gi_list.extend(gi_list)
            return full_gi_list
        except:
            failed_search_hand.write(str(species)+'\n')
            print "\n\nERROR: "+str(species)+' and '+str(add_term)+' and '+str(cat)+" did not finish searching\n"
            print "\nERROR: Waiting 10 seconds then restarting parse. Check the fail log "+str(failed_search_file)+"\n"
            time.sleep(10)
            continue
        else:
            break          

def download_species(full_gi_list):
    print "Starting download of non-repeat GI numbers to "+no_repeats_file+'\n'
    no_repeats=[]
    count=0
    start_time=datetime.datetime.now()
    for x in full_gi_list:
        if x not in no_repeats:
            no_repeats_hand.write(x + '\n')
            no_repeats.append(x)
            print "Downloading "+x
            while True:
                try:
                    fetch_handle=Entrez.efetch(db="nucleotide", rettype=type, retmode="text", id=x)
                    data=fetch_handle.read()
                    fetch_handle.close()
                    totalhand.write(data)
                    count+=1
                    if count%100==0:
                        current_time=datetime.datetime.now()
                        running_total_time=current_time-start_time
                        print "Downloaded so far: "+str(count)+".\nTime taken so far: "+str(running_total_time)
                        print "Current time: "+str(current_time)
                except:
                    failed_download_hand.write(str(x)+'\n')
                    print "\n\nERROR: "+str(x)+" did not finish downloading\n"
                    print "\nERROR: Waiting 10 seconds then restarting download. Check the fail log "+str(failed_download_file)+"\n"
                    time.sleep(10)
                    continue
                else:
                    break

    final_time=datetime.datetime.now()
    total_time=final_time-start_time
    current_time=datetime.datetime.now()
    print "\n"+str(count)+" file(s) downloaded from NCBI to "+str(outfile)
    print "\nThe Assession number list was written to "+str(no_repeats_file)
    print "\nTotal time taken: "+str(total_time)
    print "Current time: "+str(current_time)

t = datetime.datetime.now()
s = t.strftime('%Y%m%d %H:%M:%S.%f')
date_stamp = s[:8]

if num == 1:
    species_list = []
    full_gi_list = []

    outfile = os.path.splitext(name)[0]+" "+add_term+" "+str(date_stamp)+"."+type
    totalhand = open(outfile, 'w')

    no_repeats_file = os.path.splitext(name)[0]+" "+add_term+" GI list "+str(date_stamp)+".txt"
    no_repeats_hand = open(no_repeats_file, 'w')

    failed_search_file = os.path.splitext(name)[0]+" "+add_term+" GI list "+str(date_stamp)+" failed search.txt"
    failed_search_hand = open(failed_search_file, 'w')

    failed_download_file = os.path.splitext(name)[0]+" "+add_term+" GI list "+str(date_stamp)+" failed download.txt"
    failed_download_hand = open(failed_download_file, 'w')

    make_species_list(fhand)
    for species in species_list:
        ncbi_search(species)
    download_species(full_gi_list)

else:
    for line in fhand:
        full_gi_list = []
        line = line.rstrip()
        line = line.split(',')
        species = (line[pos])

        outfile = species+" "+add_term+" "+str(date_stamp)+"."+type
        totalhand = open(outfile, 'w')

        no_repeats_file = species+" "+add_term+" GI list "+str(date_stamp)+".txt"
        no_repeats_hand = open(no_repeats_file, 'w')
        ncbi_search(species)

        download_species(full_gi_list)

fhand.close()
totalhand.close()
no_repeats_hand.close()

Green sturgeon_, Acipenser medirostris
Green sturgeon, Acipenser acutirostris
White sturgeon_, Acipenser transmontanus
White sturgeon, Acipenser aleutensis
Longnose sucker_, Catostomus catostomus
Longnose sucker, Cyprinus catostomus
Longnose sucker, Catostomus catostomus catostomus
Longnose sucker, Cyprinus castostomus
Longnose sucker, Cyprinus rostratus
Longnose sucker, Catostomus catostomus rostratus
Longnose sucker, Catostomus longirostrum
Longnose sucker, Catostomus longirostris
Longnose sucker, Catostomus hudsonius
Longnose sucker, Cyprinus hudsonius
Longnose sucker, Catostomus forsterianus
Longnose sucker, Catostomus aurora
Longnose sucker, Catostomus griseus
Longnose sucker, Catostomus nanomyzon
Longnose sucker, Catostomus catostomus lacustris
Northern pike_, Esox lucius
Northern pike, Trematina foveolata
Northern pike, Lucius lucius
Northern pike, Luccius vorax
Northern pike, Esox estor
Northern pike, Esox lucioides
Northern pike, Esox boreus
Northern pike, Esox reichertii baicalensis
Northern pike, Esox lucius atrox
Northern pike, Esox lucius bergi
Alaska blackfish_, Dallia pectoralis
Alaska blackfish, Dalia pectoralis
Burbot_, Lota lota
Burbot, Gadus lota
Burbot, Enchelyopus lota
Burbot, Lota lota lota
Burbot, Molva lota
Burbot, Gadus lacustris
Burbot, Lota lota lacustris
Burbot, Gadus maculosus
Burbot, Lota lota maculosa
Burbot, Lota maculosa
Burbot, Molva maculosa
Burbot, Gadus maculosa
Burbot, Gadus compressus
Burbot, Lota compressa
Burbot, Lota vulgaris
Burbot, Lota fluviatilis
Burbot, Lota marmorata
Burbot, Lota inornata
Burbot, Lota brosmiana
Burbot, Lota brosmina
Burbot, Lota communis
Burbot, Lota linnei
Burbot, Lota vulgaris obensis
Burbot, Lota lota kamensis
Burbot, Lota lota leptura
Burbot, Lota lota onegensis
Burbot, Lota lota asiatica
Three-spined stickleback_, Gasterosteus aculeatus
Three-spined stickleback, Gasterosteus aculeatus aculeatus
Three-spined stickleback, Leiurus aculeatus
Three-spined stickleback, Gasterosteus bispinosus
Three-spined stickleback, Gasterosteus teraculeatus
Three-spined stickleback, Gasteracanthus cataphractus
Three-spined stickleback, Gasterosteus cataphractus
Three-spined stickleback, Gasterosteus biaculeatus
Three-spined stickleback, Gasterosteus semiarmatus
Three-spined stickleback, Gasterosteus niger
Three-spined stickleback, Gasterosteus trachurus
Three-spined stickleback, Gasterosteus leiurus
Three-spined stickleback, Gasterosteus semiloricatus
Three-spined stickleback, Gasterosteus argyropomus
Three-spined stickleback, Gasterosteus tetracanthus
Three-spined stickleback, Gasterosteus brachycentrus
Three-spined stickleback, Gasterosteus noveboracensis
Three-spined stickleback, Gasterosteus obolarius
Three-spined stickleback, Gasterosteus spinulosus
Three-spined stickleback, Gasterosteus dimidiatus
Three-spined stickleback, Gasterosteus loricatus
Three-spined stickleback, Gasterosteus biarmatus
Three-spined stickleback, Gasterosteus ponticus
Three-spined stickleback, Gasterosteus neoboracensis
Three-spined stickleback, Gasterosteus nemausensis
Three-spined stickleback, Gasterosteus quadrispinosa
Three-spined stickleback, Gasterosteus quadrispinosus
Three-spined stickleback, Gasterosteus cuvieri
Three-spined stickleback, Gasterosteus williamsoni
Three-spined stickleback, Gasterosteus aculeatus williamsoni
Three-spined stickleback, Gasterosteus inopinatus
Three-spined stickleback, Gasterosteus plebeius
Three-spined stickleback, Gasterosteus Dekayi
Three-spined stickleback, Gasterosteus dekayi
Three-spined stickleback, Gasterosteus serratus
Three-spined stickleback, Gasterosteus insculptus
Three-spined stickleback, Gasterosteus intermedius
Three-spined stickleback, Gasterosteus pugetti
Three-spined stickleback, Gasterosteus neustrianus
Three-spined stickleback, Gasterosteus argentatissimus
Three-spined stickleback, Gasterosteus elegans
Three-spined stickleback, Gasterosteus bailloni
Three-spined stickleback, Gasterosteus texanus
Three-spined stickleback, Gasterosteus algeriensis
Three-spined stickleback, Gasterosteus aculeatus algeriensis
Three-spined stickleback, Gasterosteus suppositus
Three-spined stickleback, Gasterosteus atkinsii
Three-spined stickleback, Gasterosteus atkinsi
Three-spined stickleback, Gastrosteus hologymnus
Three-spined stickleback, Gasterosteus hologymnus
Three-spined stickleback, Gasterosteus santaeannae
Three-spined stickleback, Gasterosteus aculeatus santaeannae
Ninespine stickleback_, Pungitius pungitius
Ninespine stickleback, Gasterosteus pungitius
Ninespine stickleback, Gasteracanthus pungitius
Ninespine stickleback, Pungitius pungitius pungitius
Ninespine stickleback, Pygosteus pungitius
Ninespine stickleback, Gasterosteus occidentalis
Ninespine stickleback, Gasterosteus concinnus
Ninespine stickleback, Gasterosteus mainensis
Ninespine stickleback, Gasterosteus dekayi
Ninespine stickleback, Gasterosteus nebulosus
Ninespine stickleback, Gasterosteus globiceps
Ninespine stickleback, Gasterosteus blanchardi
Ninespine stickleback, Gasterosteus pungitius brachypoda
Ninespine stickleback, Pygosteus pungitius brachypoda
Pond smelt_, Hypomesus olidus
Pond smelt, Salmo olidus
Pond smelt, Mesopus olidus
Pond smelt, Hypomesus olidus bergi
Pond smelt, Hypomesus olidus drjagini
Pond smelt, Hypomesus sakhalinus
Rainbow smelt_, Osmerus mordax
Rainbow smelt, Atherina mordax
Rainbow smelt, Osmerus mordax mordax
Longfin smelt_, Spirinchus thaleichthys
Longfin smelt, Osmerus thaleichthys
Longfin smelt, Spirinchus dilatus
Eulachon_, Thaleichthys pacificus
Eulachon, Salmo pacificus
Eulachon, Osmerus pacificus
Eulachon, Thaleichthys stevensi
Eulachon, Osmerus albatrossis
Eulachon, Lestidium parri
Trout-perch_, Percopsis omiscomaycus
Trout-perch, Salmo omiscomaycus
Trout-perch, Percopsis omisco-maycus
Trout-perch, Percopsis guttatus
Trout-perch, Percopsis pellucida
Trout-perch, Salmoperca pellucida
Trout-perch, Percopsis hammondii
Arctic cisco_, Coregonus autumnalis
Arctic cisco, Salmo autumnalis
Arctic cisco, Coregonus autumnalis autumnalis
Arctic cisco, Goregonus autumnalis
Lake whitefish_, Coregonus clupeaformis
Lake whitefish, Salmo clupeaformis
Lake whitefish, Coregonus clupeiformis
Lake whitefish, Coregonus albus
Lake whitefish, Salmo labradoricus
Lake whitefish, Coregonus labradoricus
Lake whitefish, Coregonus sapidissimus
Lake whitefish, Coregonus latior
Lake whitefish, Coregonus atikameg
Bering cisco_, Coregonus laurettae
Bering cisco, Argyrosomus laurettae
Bering cisco, Argyrosomus alascanus
Bering cisco, Coregonus autumnalis laurettae
Broad whitefish_, Coregonus nasus
Broad whitefish, Salmo nasus
Broad whitefish, Salmo nasutus
Broad whitefish, Salmo schokur
Broad whitefish, Coregonus kennicotti
Broad whitefish, Coregonus nasus kennicotti
Humpback whitefish_, Coregonus pidschian
Humpback whitefish, Salmo pidschian
Humpback whitefish, Coregonus lavaretus pidschian
Humpback whitefish, Salmo polcur
Humpback whitefish, Coregonus polcur
Humpback whitefish, Salmo sikus
Humpback whitefish, Coregonus sikus
Humpback whitefish, Coregonus fera inarensis
Sardine/Least cisco_, Coregonus sardinella
Sardine/Least cisco, Leucichthys sardinella
Sardine/Least cisco, Coregonus merkii
Sardine/Least cisco, Coregonus pusillus
Sardine/Least cisco, Argyrosomus pusillus
Sardine/Least cisco, Leucichthys pusillus
Lake chub_, Couesius plumbeus
Lake chub, Gobio plumbeus
Lake chub, Hybopsis plumbea
Lake chub, Leucosomus dissimilis
Lake chub, Ceratichthys prosthemius
Lake chub, Couesius prosthemius
Lake chub, Nocomis milneri
Lake chub, Couesius greeni
Lake chub, Couesius adustus
Lake chub, Couesius plumbeus rubrilateralis
Pygmy whitefish_, Prosopium coulterii
Pygmy whitefish, Prosopium coulteri
Pygmy whitefish, Coregonus coulterii
Pygmy whitefish, Coregonus coulteri
Pygmy whitefish, Prosopium snyderi
Round whitefish_, Prosopium cylindraceum
Round whitefish, Salmo cylindraceus
Round whitefish, Coregonus cylindraceus
Round whitefish, Prosopium cylindraceus
Round whitefish, Salmo microstomus
Round whitefish, Coregonus quadrilateralis
Round whitefish, Prosopium quadrilaterale
Round whitefish, Salmo quadrilateralis
Round whitefish, Coregonus mongolicus
Round whitefish, Prosopium preblei
Inconnu/Sheefish_, Stenodus leucichthys
Inconnu/Sheefish, Salmo leucichthys
Inconnu/Sheefish, Stenodus leucichthys leucichthys
Inconnu/Sheefish, Salmo mackenzii
Inconnu/Sheefish, Stenodus leucichthys mackenzii
Inconnu/Sheefish, Stenodus mackenzii
Cutthroat trout_, Oncorhynchus clarkii
Cutthroat trout, Salmo clarkii
Cutthroat trout, Fario clarkii
Cutthroat trout, Oncorhynchus clarkii clarkii
Cutthroat trout, Parasalmo clarkii
Cutthroat trout, Salmo clarkii clarkii
Cutthroat trout, Oncorhynchus clarki
Cutthroat trout, Oncorhynchus clarki clarki
Cutthroat trout, Salmo clarki
Cutthroat trout, Salmo clarki clarki
Cutthroat trout, Fario stellatus
Cutthroat trout, Salmo stellatus
Cutthroat trout, Salar lewisi
Cutthroat trout, Oncorhynchus clarkii lewisi
Cutthroat trout, Salmo lewisi
Cutthroat trout, Oncorhynchus clarki lewisi
Cutthroat trout, Salmo clarki lewisi
Cutthroat trout, Salmo clarkii lewisi
Cutthroat trout, Salmo brevicauda
Cutthroat trout, Salmo pleuriticus
Cutthroat trout, Oncorhynchus clarkii pleuriticus
Cutthroat trout, Oncorhynchus clarki pleuriticus
Cutthroat trout, Salmo purpuratus bouvieri
Cutthroat trout, Oncorhynchus clarkii bouvieri
Cutthroat trout, Oncorhynchus clarki bouvieri
Cutthroat trout, Salmo clarkii alpestris
Cutthroat trout, Oncorhynchus clarkii humboldtensis
Pink salmon_, Oncorhynchus gorbuscha
Pink salmon, Salmo gorbuscha
Pink salmon, Salmo scouleri
Pink salmon, Oncorhynchus scouleri
Chum salmon_, Oncorhynchus keta
Chum salmon, Salmo keta
Chum salmon, Salmo lagocephalus
Chum salmon, Oncorhynchus lagocephalus
Chum salmon, Salmo dermatinus
Chum salmon, Oncorhynchus dermatinus
Chum salmon, Salmo consuetus
Chum salmon, Oncorhynchus consuetus
Chum salmon, Salmo canis
Chum salmon, Oncorhynchus canis
Coho salmon_, Oncorhynchus kisutch
Coho salmon, Salmo kisatch
Coho salmon, Salmo hisutch
Coho salmon, Salmo kisutch
Coho salmon, Oncorhynchus kisatch
Coho salmon, Oncorhynchus kisutsh
Coho salmon, Salmo hisatch
Coho salmon, Salmo milktschutsch
Coho salmon, Oncorhynchus milktschutsch
Coho salmon, Salmo tsuppitch
Rainbow trout_, Oncorhynchus mykiss
Rainbow trout, Salmo mykiss
Rainbow trout, Parasalmo mykiss
Rainbow trout, Onchorhynchus mykiss
Rainbow trout, Onchorrhychus mykiss
Rainbow trout, Onchorynchus mykiss
Rainbow trout, Oncorhynchus myskis
Rainbow trout, Salmo purpuratus
Rainbow trout, Salmo penshinensis
Rainbow trout, Parasalmo penshinensis
Rainbow trout, Salmo gairdnerii
Rainbow trout, Fario gairdneri
Rainbow trout, Oncorhynchus gairdnerii
Rainbow trout, Salmo gairdnerii gairdnerii
Rainbow trout, Salmo gairdneri
Rainbow trout, Salmo iridea
Rainbow trout, Salmo gairdnerii irideus
Rainbow trout, Salmo irideus
Rainbow trout, Trutta iridea
Rainbow trout, Salmo gairdneri irideus
Rainbow trout, Salmo irideux
Rainbow trout, Salmo truncatus
Rainbow trout, Salmo masoni
Rainbow trout, Oncorhynchus kamloops
Rainbow trout, Salmo kamloops
Rainbow trout, Salmo gairdneri shasta
Rainbow trout, Salmo gilberti
Rainbow trout, Salmo nelsoni
Rainbow trout, Oncorhynchus mykiss nelsoni
Rainbow trout, Salmo irideus argentatus
Rainbow trout, Salmo kamloops whitehousei
Sockeye salmon_, Oncorhynchus nerka
Sockeye salmon, Salmo nerka
Sockeye salmon, Salmo paucidens
Sockeye salmon, Salmo kennerlyi
Sockeye salmon, Hypsifario kennerlyi
Sockeye salmon, Oncorhynchus nerka kennerlyi
Chinook salmon_, Oncorhynchus tshawytscha
Chinook salmon, Salmo tshawytscha
Chinook salmon, Oncorhynchus tschawytscha
Chinook salmon, Oncorhynchus tshawytsha
Chinook salmon, Salmo tschawytscha
Chinook salmon, Salmo tschawytscha
Chinook salmon, Salmo orientalis
Chinook salmon, Salmo quinnat
Chinook salmon, Salmo cooperi
Chinook salmon, Oncorhynchus cooperi
Chinook salmon, Salmo warreni
Chinook salmon, Salmo richardii
Chinook salmon, Salmo richardi
Chinook salmon, Oncorhynchus chouicha

@dmenning
Copy link
Author

dmenning commented Aug 23, 2016

P.S. I've e-mailed NCBI and am waiting on a reply

@idoerg
Copy link
Contributor

idoerg commented Aug 23, 2016

Are you sure you have enough fishies there? ;)

The following worked. I used only the Linnean genus / species epithets, and I wrapped it in quotes (otherwise they become separate terms).

>>> from Bio import Entrez
>>> Entrez.email="whatever@youremail.com"
>>> cat = ' "bony fishes"[porgn:__txid7898]'
>>> add_term = "16S"
>>> species = ' "Salmo salar" '
>>> sh = Entrez.esearch(db="nuccore", term=cat + ' AND ' + add_term + ' AND ' + species)
>>> sr = Entrez.read(sh)
>>> sr
{u'Count': '131', u'RetMax': '20', u'IdList': ['194398791', '194398789', '194398785', '194398783', '194398781', '7381061', '5442063', '5442061', '5442059', '5442057', '5442055', '5442053', '5442051', '5442049', '5442047', '3775976', '291190481', '213511323', '213511201', '868611242'], u'TranslationStack': [{u'Count': '12832875', u'Field': 'porgn', u'Term': '"bony fishes"[porgn]', u'Explode': 'Y'}, {u'Count': '12529000', u'Field': 'All Fields', u'Term': '16S[All Fields]', u'Explode': 'N'}, 'AND', {u'Count': '581160', u'Field': 'Organism', u'Term': '"Salmo salar"[Organism]', u'Explode': 'Y'}, {u'Count': '612287', u'Field': 'All Fields', u'Term': '"Salmo salar"[All Fields]', u'Explode': 'N'}, 'OR', 'GROUP', 'AND'], u'TranslationSet': [{u'To': '"Salmo salar"[Organism] OR "Salmo salar"[All Fields]', u'From': '"Salmo salar"'}], u'RetStart': '0', u'QueryTranslation': '"bony fishes"[porgn] AND 16S[All Fields] AND ("Salmo salar"[Organism] OR "Salmo salar"[All Fields])'}

@dmenning
Copy link
Author

Thanks for the info on putting the Genus species names in quotes. That seemed to helped. Unfortunately, it didn't completely fix it. My list is around 500 fish long, it started throwing the same errors at fish number 323. I may just have to split up my fish list and do multiple runs.

Thanks Iddo

@idoerg
Copy link
Contributor

idoerg commented Aug 23, 2016

Some fish simply give nothing. Like Salmo pacificus, which has no 16S it
appears. Try querying manually as a control.

Iddo Friedberg
http://iddo-friedberg.net
Sent from a machine that promotes typos

On Aug 23, 2016 5:53 PM, "Damian" notifications@github.com wrote:

Thanks for the info on putting the Genus species names in quotes. That
seemed to helped. Unfortunately, it didn't completely fix it. My list is
around 500 fish long, it started throwing the same errors at fish number
323. I may just have to split up my fish list and do multiple runs.

Thanks Iddo


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#915 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAD98L_vsOr6mZlwtuN1ZyATwMxIt6k8ks5qi3nTgaJpZM4JqZ8w
.

@deto
Copy link

deto commented Aug 28, 2016

Not sure if you ever heard back from ncbi on this, but I'm guessing it may be a general issue with Entrez.

I'm scripting access to the 'gds' database (Geo Data Sets). And while it works most of the time, about 1 out of 20 attempts I get a "Invalid db name specified: gds" error. I'm debating modifying my script to just check for that and retry, but I'd like to find a better solution.

@peterjc
Copy link
Member

peterjc commented Aug 28, 2016

Even on the sequence database side of Entrez, for non-trivial usage you will need retries - this is life with a networked resource.

@dmenning
Copy link
Author

dmenning commented Aug 29, 2016

I have not heard back from NCBI yet. The address I e-mailed was
eutilities@ncbi.nlm.nih.gov

I tried my script last Thursday 8/26 and it failed on every run. I tried
again this morning and am getting the same error. It appears to be a
problem with Entrez.esearch. I have another script that pulls .fasta using
accession numbers and Entrez.efetch and that works fine.

On Sun, Aug 28, 2016 at 1:05 AM, Peter Cock notifications@github.com
wrote:

Even on the sequence database side of Entrez, for non-trivial usage you
will need retries - this is life with a networked resource.


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#915 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ANKc7w2ySyED79iVCAavtfG9vWKPQxvgks5qkU9igaJpZM4JqZ8w
.

@deto
Copy link

deto commented Aug 29, 2016

I understand that servers might fail to respond, necessitating retries, but
here the message Biopython is displaying seems to be an erroneous response
by ncbi, since it's saying that "gds" is not a database sometimes. Though
perhaps the error code from ncbi is more generic and Biopython is just
assuming that it's because of a badly specified database?

@peterjc
Copy link
Member

peterjc commented Aug 29, 2016

@deto do you have the full stack traceback, or at least the exact exception text? I'm pretty sure the "not a database" message can only be coming from the NCBI.

See also #515 about adding some validation on our side to make up for some unhelpful errors the NCBI sometimes exposes to the user.

@deto
Copy link

deto commented Aug 29, 2016

@peterjc I don't have a trace saved, but I remember the message being exactly the same as in this issue title:

RuntimeError: Invalid db name specified: gds

Which doesn't make sense coming from NCBI since gds definitely is one of the databases, and my queries work 95% of the time. Usually repeating the same esummary command exactly is enough to get a valid response.

@peterjc
Copy link
Member

peterjc commented Aug 29, 2016

This is probably coming from https://github.com/biopython/biopython/blob/master/Bio/Entrez/Parser.py#L390 where Bio.Entrez.parser just reports whatever error message the NCBI gave in their XML file.

It does sound like an intermittent problem with the NCBI GDS database not always responding via Entrez.

@dmenning
Copy link
Author

dmenning commented Aug 30, 2016

Just tried again and am now getting a new error:

Searching NCBI for Strongylocentrotus franciscanus and 12S and "sea urchins"[porgn:__txid7625]

Traceback (most recent call last):
  File "D:\Python27\0 eDNA\02_get_fasta_test.py", line 204, in <module> ncbi_search(species)
  File "D:\Python27\0 eDNA\02_get_fasta_test.py", line 119, in ncbi_search search_results=Entrez.read(search_handle)
  File "C:\Python27\lib\site-packages\Bio\Entrez\__init__.py", line 376, in read record = handler.read(handle)
  File "C:\Python27\lib\site-packages\Bio\Entrez\Parser.py", line 205, in read self.parser.ParseFile(handle)
  File "C:\Python27\lib\site-packages\Bio\Entrez\Parser.py", line 513, in externalEntityRefHandler self.dtd_urls.append(url)
UnboundLocalError: local variable 'url' referenced before assignment

@peterjc
Copy link
Member

peterjc commented Aug 31, 2016

@dmenning From the line numbering in the traceback for the exception you must have a very old release, probably Biopython 1.65?

https://github.com/biopython/biopython/blob/biopython-165/Bio/Entrez/Parser.py#L513

The UnboundLocalError looks like an old Biopython bug, specifically (update, corrected) #527, which was fixed in Biopython 1.66.

Our current release is Biopython 1.68, released last month. Could you update your copy of Biopython and re-test please?

@dmenning
Copy link
Author

Done. It appears to be working now. I also got a reply from NCBI. It seems like they are having numerous issues.

"Thank for reporting this issue. We have recently been experiencing issues with our E-Utilities services regarding db name; it is not unique to nuccore. We are currently investigating the issue to determine the root cause. In the meantime, we can only suggest that you continue to attempt your query several times, as the issue appears to be intermittent and random. As of now, we do not have a specific timeline for resolution of this issue."

@peterjc
Copy link
Member

peterjc commented Sep 1, 2016

Thank you for passing on those details from the NCBI. Hopefully they can fix things on their side soon.

Is there anything else you think we need to do on the Biopython side, or can we close this issue? Thanks!

@dmenning
Copy link
Author

dmenning commented Sep 1, 2016

Everything appears to be functioning normally as before. I think this issue is closed.

Thanks everyone for all of the help.

Damian

@peterjc peterjc closed this as completed Sep 1, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants