Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ranks[tax_id] in per_rank_map: Exception AttributeError #87

Closed
FreddieLPF opened this issue Sep 9, 2020 · 35 comments
Closed

ranks[tax_id] in per_rank_map: Exception AttributeError #87

FreddieLPF opened this issue Sep 9, 2020 · 35 comments
Labels

Comments

@FreddieLPF
Copy link

Hello

I was trying to run metagenome_from_profile.py, but kept getting this ranks[tax_id] in per_rank_map: Exception AttributeError:

Error - Copy

I've tried to use python debugger pdb module to debug, and got to this point where I couldn't proceed,

The error seems to me is database error or something wrong with the mini.biom file, any ideas?

error_cami-sim

Thank you!

@AlphaSquad
Copy link
Collaborator

AlphaSquad commented Sep 10, 2020

Hi,
that is something which I have not encountered before. The error ist actually not the AttributeError, that is some strange bug which sometimes occurs when deleting the log object after CAMISIM has already finished. Instead it is the "KeyError 1". I see some problematic code in that function now, but that should not cause the error (and does not for me).

Did you change something in either the biom file or the default config (defaults/default_config.ini) or somewhere else in the code? If not, then maybe you can send me your python (and maybe the ete2) version so I can try to reproduce this.

@AlphaSquad
Copy link
Collaborator

I fixed the suspicious method, but since it was working for me even without that, I am not sure whether it also resolves this issue for you. Could you pull the latest version and check please?

@punnettsun
Copy link

I have re-downloaded CAMISIM and tried it again. However, I am still getting the same error like below:

$ python metagenome_from_profile.py -p defaults/mini.biom --debug
2020-09-11 08:06:50 INFO: [root] Using commands:
2020-09-11 08:06:50 INFO: [root] -profile: defaults/mini.biom
2020-09-11 08:06:50 INFO: [root] -tmp: None
2020-09-11 08:06:50 INFO: [root] -community_only: False
2020-09-11 08:06:50 INFO: [root] -ncbi: tools/ncbi-taxonomy_20170222.tar.gz
2020-09-11 08:06:50 INFO: [root] -reference_genomes: tools/assembly_summary_complete_genomes.txt
2020-09-11 08:06:50 INFO: [root] -fill_up: False
2020-09-11 08:06:50 INFO: [root] -o: out/
2020-09-11 08:06:50 INFO: [root] -no_replace: True
2020-09-11 08:06:50 INFO: [root] -seed: None
2020-09-11 08:06:50 INFO: [root] -additional_references: None
2020-09-11 08:06:50 INFO: [root] -samples: None
2020-09-11 08:06:50 INFO: [root] -debug: True
2020-09-11 08:06:50 INFO: [root] -config: defaults/default_config.ini
2020-09-11 08:06:50 WARNING: [root] Max strains per OTU not set, using default (3)
2020-09-11 08:06:50 WARNING: [root] Mu and sigma have not been set, using defaults (1,2)
Traceback (most recent call last):
File "metagenome_from_profile.py", line 98, in
config = GG.generate_input(args) # total number of genomes and path to updated config
File "/home/ec2-user/CAMISIM-master/scripts/get_genomes.py", line 349, in generate_input
per_rank_map = get_genomes_per_rank(genomes_map, RANKS, MAX_RANK)
File "/home/ec2-user/CAMISIM-master/scripts/get_genomes.py", line 95, in get_genomes_per_rank
if ranks_lin[tax_id] in per_rank_map: # if we are a legal rank
KeyError: 1
Exception AttributeError: "'NoneType' object has no attribute '_map_logfile_handler'" in <bound method LoggingWrapper.del of <scripts.loggingwrapper.LoggingWrapper object at 0x7fe1b1db0c90>> ignored

AlphaSquad added a commit that referenced this issue Sep 11, 2020
@AlphaSquad
Copy link
Collaborator

I have added a check which should prevent this crash, but I am not sure whether this resolves everything (since it keeps on working on my test machines). Could you please try again and thank you for your patience.

@punnettsun
Copy link

Thank you for your quick responses. This is the error I get now:

$ python metagenome_from_profile.py -p defaults/mini.biom --debug
2020-09-11 08:43:13 INFO: [root] Using commands:
2020-09-11 08:43:13 INFO: [root] -profile: defaults/mini.biom
2020-09-11 08:43:13 INFO: [root] -tmp: None
2020-09-11 08:43:13 INFO: [root] -community_only: False
2020-09-11 08:43:13 INFO: [root] -ncbi: tools/ncbi-taxonomy_20170222.tar.gz
2020-09-11 08:43:13 INFO: [root] -reference_genomes: tools/assembly_summary_complete_genomes.txt
2020-09-11 08:43:13 INFO: [root] -fill_up: False
2020-09-11 08:43:13 INFO: [root] -o: out/
2020-09-11 08:43:13 INFO: [root] -no_replace: True
2020-09-11 08:43:13 INFO: [root] -seed: None
2020-09-11 08:43:13 INFO: [root] -additional_references: None
2020-09-11 08:43:13 INFO: [root] -samples: None
2020-09-11 08:43:13 INFO: [root] -debug: True
2020-09-11 08:43:13 INFO: [root] -config: defaults/default_config.ini
2020-09-11 08:43:13 WARNING: [root] Max strains per OTU not set, using default (3)
2020-09-11 08:43:13 WARNING: [root] Mu and sigma have not been set, using defaults (1,2)
2020-09-11 08:43:13 WARNING: [root] Some OTUs could not be mapped
2020-09-11 08:43:13 WARNING: [root] No matching NCBI ID for otu Genome3, scientific name Enterobacterales
2020-09-11 08:43:13 WARNING: [root] No matching NCBI ID for otu Genome1, scientific name Escherichia coli
2020-09-11 08:43:13 WARNING: [root] No matching NCBI ID for otu Genome2, scientific name Escherichia mulleri
2020-09-11 08:43:13 INFO: [root] Downloading 0 genomes
2020-09-11 08:43:13 INFO: [root] Community design finished
2020-09-11 08:43:13 ERROR: [Community] Invalid digit, must be bigger than 1, but was 0
2020-09-11 08:43:13 ERROR: [MetagenomeSimulationPipeline] [community0] Has an invalid value!
2020-09-11 08:43:13 INFO: [MetagenomeSimulationPipeline] Metagenome simulation aborted

@AlphaSquad
Copy link
Collaborator

AlphaSquad commented Sep 11, 2020

Okay, this is as I feared. Since I cannot reproduce the error, could you open the file scripts/get_genomes.py, go to line 98 and add the following print lines before the continue, such that the code looks like this:

except KeyError:
    print lineage
    print ranks_lin
    continue

and post the resulting log? Either ete2 was not able to retrieve the lineage at all or the format might have changed

@punnettsun
Copy link

Here is the attached error:
error.txt

@FreddieLPF
Copy link
Author

@AlphaSquad, @punnettsun, Thank you both for posting, I wonder if there's any update since the last discussion?

@AlphaSquad
Copy link
Collaborator

AlphaSquad commented Sep 15, 2020

Oh, I wrote a comment but it got lost. Unfortunately it is hard to me to figure this out just yet.
If any of you has a python interpreter available, it would be great if you could test ete2:

from ete2 import NCBITaxa

ncbi = NCBITaxa()
lin = ncbi.get_lineage(562)
print lin
ncbi.get_rank(lin)

This should print the lineage of E.coli (NCBI ID 562) and the associated ranks to all genomes in the lineage.

[1, 131567, 2, 1224, 1236, 91347, 543, 561, 562]
{1: u'no rank', 2: u'superkingdom', 1224: u'phylum', 131567: u'no rank', 561: u'genus', 562: u'species', 91347: u'order', 1236: u'class', 543: u'family'}

If this works, then the problem is the conversion of the BIOM profile and I will have to dig deeper yet.
Thanks for your help

@FreddieLPF
Copy link
Author

Hi, @AlphaSquad , i tried to test but got empty return.
image

is that BIOM issue or database issue?

@AlphaSquad
Copy link
Collaborator

Hm, seems to be an issue with the ete2 database. Could you try, within the interpreter, to run ncbi.update_taxonomy_database() before the lin = ... command and see whether that does anything?

@FreddieLPF
Copy link
Author

Hi @AlphaSquad, I tried and that ncbi.update_taxonomy_database() command generated an error when "inserting synonyms", which seems like a database issue, please see below screenshot:

image

@AlphaSquad
Copy link
Collaborator

AlphaSquad commented Sep 22, 2020

Okay, thank you, it indeed seems like an issue with the database. I will try to re-write this part using the newer ete3 database to see if it resolves the issue

@FreddieLPF
Copy link
Author

@AlphaSquad Good day! I wonder if there's any update regarding the bug?

@AlphaSquad
Copy link
Collaborator

Hey, unfortunately I didn't find the time yet to look into the code, I am very sorry. By the end of the next week I hope to have written something.

@AlphaSquad
Copy link
Collaborator

It seems like this was a known problem in ete and they described a patch in their repository, but it triggered me to try to port CAMISIM to python3. My test cases worked but it might be unstable, if you could test the new python3 branch, that would be greatly appreciated!

@FreddieLPF
Copy link
Author

Hi, @AlphaSquad thank you, did you mean use python version 3 and keep other dependencies the same version listed?

@AlphaSquad
Copy link
Collaborator

If you check out the python3 branch you will see an updated requirements.txt file. If possible you could set up a new conda environment from that file by using conda create -n camisim --file requirements.txt. If you don't have conda or cannot use it, then you need to install the new requirements manually - but since they could be conflicting with the old requirements it is possible that the master branch doesn't work anymore then.

@FreddieLPF
Copy link
Author

I created a conda python2 environment before with every dependency installed in that conda environment, is there anything I can do to just update the python version there instead of creating a new conda environment?

@AlphaSquad
Copy link
Collaborator

Of course you could try to install the new requirements.txt in that environment. But since some of the old dependencies explicitely required python2 it is possible that there are conflicts which cannot be resolved.

@manli-zou
Copy link

hi~, I've encountered the same issue when runnning CAMISIM these days.
Wondered whether it has been tackled or not ?
Thanks very much.

@AlphaSquad
Copy link
Collaborator

Unfortunately there are multiple different issues in this thread, could you post the exact error from the logfile?

@manli-zou
Copy link

A. when applying metagenomesimulation.py debug info are:
“Invalid taxid: ‘{}’” .format(taxid)
1, Line 146 in get_updated_taxid
2, Line 239 in get_lineage_of_legal_ranks
3, Line 230 in _stream_tp_rows
4, Line 130 in _stream_taxonomic_profile
5, Line 89 in write_taxonomic_profile
6, Line 58 in write_taxonomic_profile_from_abundance_files
7, Line 203 in write_profile_gold_standard
8, Line 263 in _design_community
9, Line 83, in run_pipeline

B. when applying metagenome_from_profile.py the issue was the ete2 database problem as listed above .

C. Could these issues come from ramdomly assigned OTU number and novelty category ?

Thanks!!

@AlphaSquad
Copy link
Collaborator

Did you try switching to the python3 branch of CAMISIM (like described in my message from 6th October)? It seems like there is an issue with the ete2 database (which I cannot resolve on CAMISIMs side)

@manli-zou
Copy link

manli-zou commented Mar 5, 2021

Switching to the python3 branch, I got the following :
2021-03-05 09:56:31 WARNING: [root] 2681466 taxid not found
2021-03-05 09:56:31 WARNING: [root] 1192013 taxid not found
2021-03-05 09:56:31 WARNING: [root] 2704140 taxid not found
2021-03-05 09:56:31 WARNING: [root] 1844999 taxid not found
2021-03-05 09:56:31 WARNING: [root] 2697503 taxid not found
2021-03-05 09:56:31 WARNING: [root] 2060052 taxid not found
2021-03-05 09:56:31 WARNING: [root] 2060053 taxid not found
2021-03-05 09:56:31 WARNING: [root] 29159 taxid not found
2021-03-05 09:56:31 WARNING: [root] 111886 taxid not found
2021-03-05 09:56:31 WARNING: [root] 2707345 taxid not found
2021-03-05 09:56:31 WARNING: [root] 2614830 taxid not found
2021-03-05 09:56:31 WARNING: [root] 2691041 taxid not found
2021-03-05 09:56:31 WARNING: [root] 2709132 taxid not found
2021-03-05 09:56:31 WARNING: [root] 2709133 taxid not found
...
2021-03-05 09:56:32 WARNING: [root] No matching NCBI ID for otu Genome38.0, scientific name Trametes versicolor
2021-03-05 09:56:32 WARNING: [root] No matching NCBI ID for otu Genome12.0, scientific name Torulaspora delbrueckii
2021-03-05 09:56:32 WARNING: [root] No matching NCBI ID for otu Genome72.0, scientific name Yamadazyma tenuis
2021-03-05 09:56:32 WARNING: [root] No matching NCBI ID for otu Genome60.0, scientific name Parastagonospora nodorum
2021-03-05 09:56:32 WARNING: [root] No matching NCBI ID for otu Genome42.0, scientific name Coprinopsis cinerea
2021-03-05 09:56:32 WARNING: [root] No matching NCBI ID for otu Genome13.0, scientific name Yarrowia lipolytica
2021-03-05 09:56:32 WARNING: [root] No matching NCBI ID for otu Genome24.0, scientific name Aspergillus niger
2021-03-05 09:56:32 WARNING: [root] No matching NCBI ID for otu Genome71.0, scientific name Pseudocercospora fijiensis
2021-03-05 09:56:32 WARNING: [root] No matching NCBI ID for otu Genome55.0, scientific name Saitoella complicata
2021-03-05 09:56:32 WARNING: [root] No matching NCBI ID for otu Genome62.0, scientific name Naumovozyma dairenensis
2021-03-05 09:56:32 WARNING: [root] No matching NCBI ID for otu Genome30.0, scientific name Neurospora crassa
2021-03-05 09:56:32 WARNING: [root] No matching NCBI ID for otu Genome29.0, scientific name Fusarium fujikuroi
2021-03-05 09:56:32 WARNING: [root] No matching NCBI ID for otu Genome53.0, scientific name Aureobasidium pullulans
2021-03-05 09:56:32 WARNING: [root] No matching NCBI ID for otu Genome78.0, scientific name Venustampulla echinocandica
2021-03-05 09:56:32 WARNING: [root] No matching NCBI ID for otu Genome66.0, scientific name Baudoinia panamericana
2021-03-05 09:56:32 WARNING: [root] No matching NCBI ID for otu Genome31.0, scientific name Sordaria macrospora
2021-03-05 09:56:32 WARNING: [root] No matching NCBI ID for otu Genome44.0, scientific name [Candida] glabrata
2021-03-05 09:56:32 WARNING: [root] No matching NCBI ID for otu Genome67.0, scientific name Kwoniella shandongensis
...
2021-03-05 09:56:32 INFO: [root] Downloading 0 genomes
2021-03-05 09:56:32 INFO: [root] Community design finished
Aborted
#######
a question remained in both python2 & python3 version: why taxid cannot be found since they are the right numbers ?
And the ncbi taxonomic file is the most recent -- new_taxdump.tar.gz

Thanks for your patience !

@AlphaSquad
Copy link
Collaborator

AlphaSquad commented Mar 5, 2021

That is indeed strange as - just like you said - I could find the genomes by name and by taxonomy ID. Also, the taxonomy IDs were found on my end. Your local ete3 database seems to be out of date, I added an automatic update with the latest version. ete3 should now find your taxonomy IDs. It is possible that this already resolves the rest of the issues as well. Could you please pull the latest version from the python3 branch and see whether that helped?

@manli-zou
Copy link

Hi, after update ete3, the procedure went on until Aborted, and the output files are listed as follows:

2021-03-09 09:15:23 INFO: [root] Downloading 84 genomes
2021-03-09 09:24:39 INFO: [root] Community design finished
Aborted

Output file:
1429 Mar 9 09:24 abundance0.tsv
1439 Mar 9 09:24 abundance1.tsv
1147 Mar 9 09:24 config.ini
4096 Mar 9 09:24 genomes
4732 Mar 9 09:24 genome_to_id.tsv
3189 Mar 9 09:24 metadata.tsv

What could be the problems since I still could not get the final results.

Thanks!

@AlphaSquad
Copy link
Collaborator

Strange. Since all the inputs for CAMISIM are there, could you check the config file whether it looks good and then try running metagenomesimulation.py config.ini?

@manli-zou
Copy link

./metagenomesimulation.py ./config.ini --debug

2021-03-11 19:15:18 ERROR: [MetadataReader 78944841691] Key column is not unique! Key: 'Genome11.0.0'
2021-03-11 19:15:18 DEBUG: [MetagenomeSimulationPipeline]
Traceback (most recent call last):
File "./metagenomesimulation.py", line 74, in run_pipeline
genome_id_to_path_map = self.get_dict_gid_to_genome_file_path()
File "./metagenomesimulation.py", line 223, in get_dict_gid_to_genome_file_path
return meta_data_table.get_map(0, 1)
File "/SBU/Subject/MD_S/zoumanli/mock_community/CAMISIM-python3/scripts/MetaDataTable/metadatatable.py", line 630, in get_map
raise KeyError(msg)
KeyError: "Key column is not unique! Key: 'Genome11.0.0'"

###################
yet 'Genome11.0.0' only appears once
genome_ID OTU NCBI_ID novelty_category
Genome11.0.0 4932 4932 known_strain
Genome38.0.0 5325 5325 known_strain
Genome12.0.0 4950 4950 known_strain
...

Thanks

@AlphaSquad
Copy link
Collaborator

Hm, that is strange indeed. Did you make sure that your output directory is empty? If you already did, would it be possible to send me the files so I can try to reproduce the problem locally?

@manli-zou
Copy link

attached is .biom file -
filenew.zip

@manli-zou
Copy link

hi, may I ask if you encountered the same problem when you did the test?
(please let me know if there is any problems with the biom file )

Thanks :)

@AlphaSquad
Copy link
Collaborator

Hi, sorry, I didn't have the time to test this so far, I will do this as soon as possible and come back to you with results.

@manli-zou
Copy link

manli-zou commented Mar 20, 2021

hi, that's ok !

I tried it again, and there was a different error:

2021-03-22 09:32:32 ERROR: [NcbiTaxonomy] Invalid taxid: ''
2021-03-22 09:32:32 DEBUG: [MetagenomeSimulationPipeline]
Traceback (most recent call last):
File "./metagenomesimulation.py", line 83, in run_pipeline
genome_id_to_path_map, list_of_file_paths_distributions = self._design_community()
File "./metagenomesimulation.py", line 263, in _design_community
self.write_profile_gold_standard(meta_data_table, list_of_file_paths_distribution)
File "./metagenomesimulation.py", line 203, in write_profile_gold_standard
sample_id=""

raise ValueError("Invalid taxid")

ValueError: Invalid taxid

This issue arises from:
'Genome55.0.0': ['2759', '4890', '', '', None, '5605', '5606', '5606.1']

Must the genome has the whole rank information for each taxon level ?

Thanks !

@AlphaSquad
Copy link
Collaborator

Hi, you are correct that CAMISIM expected the rank information of every rank to be present, I added a check to skip the ranks if no ID is present at a certain rank, so you could try if it works for you now. If it does not work, could you please comment in #103 ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants