Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error during detector training #73

Open
esraagithub opened this issue May 15, 2022 · 5 comments
Open

Error during detector training #73

esraagithub opened this issue May 15, 2022 · 5 comments

Comments

@esraagithub
Copy link

hello
i faced a problem during detector o my sample

here is the error message, it says i didn't use a negative dataset but actually i used one that is called GeneSwap_Negatives.pfam.tsv
i think deepbgc can't see the negative dataset because of a error o my command. --help didn't tell where or how to put it

  'optimizer': 'adam',
                  'shuffle': True,
                  'timesteps': 256,
                  'validation_size': 0,
                  'verbose': 1,
                  'weighted': True},
'input_params': {   'features': [   {'type': 'ProteinBorderTransformer'},
                                    {   'type': 'Pfam2VecTransformer',
                                        'vector_path': 'pfam2vec.csv'}]},
'type': 'KerasRNN'}

INFO 15/05 09:28:22 Loaded 41102 samples and 80777 domains from sample1_deepbgc_prepare_result.tsv
INFO 15/05 09:28:28 Loaded 10128 samples and 706950 domains from GeneSwap_Negatives.pfam.tsv
ERROR 15/05 09:28:33 Got target variable with only one value {0} in: ['sample1_deepbgc_prepare_result.tsv', 'GeneSwap_Negatives.pfam.tsv']
Traceback (most recent call last):
File "/root/esraa/miniconda3/envs/deepbgcv0.1.29/lib/python3.7/site-packages/deepbgc/main.py", line 113, in main
run(argv)
File "/root/esraa/miniconda3/envs/deepbgcv0.1.29/lib/python3.7/site-packages/deepbgc/main.py", line 102, in run
args.func.run(**args_dict)
File "/root/esraa/miniconda3/envs/deepbgcv0.1.29/lib/python3.7/site-packages/deepbgc/command/train.py", line 60, in run
train_samples, train_y = util.read_samples(inputs, target)
File "/root/esraa/miniconda3/envs/deepbgcv0.1.29/lib/python3.7/site-packages/deepbgc/util.py", line 574, in read_samples
'Did you provide positive and negative samples?')
ValueError: ("Got target variable with only one value {0} in: ['sample1_deepbgc_prepare_result.tsv', 'GeneSwap_Negatives.pfam.tsv']", 'At least two values are required to train a model. ', 'Did you provide positive and negative samples?')
ERROR 15/05 09:28:33 ================================================================================
ERROR 15/05 09:28:33 DeepBGC failed with ValueError: Got target variable with only one value {0} in: ['sample1_deepbgc_prepare_result.tsv', 'GeneSwap_Negatives.pfam.tsv']
ERROR 15/05 09:28:33 ================================================================================
ERROR 15/05 09:28:33 At least two values are required to train a model.
ERROR 15/05 09:28:33 Did you provide positive and negative samples?
ERROR 15/05 09:28:33 ================================================================================

my cmmand:

deepbgc train --model templates/deepbgc.json --output MyDeepBGCDetector.pkl sample1_deepbgc_prepare_result.tsv GeneSwap_Negatives.pfam.tsv --config PFAM2VEC pfam2vec
.csv -v ClusterFinder_Annotated_Contigs.full.gbk

@prihoda
Copy link
Collaborator

prihoda commented May 16, 2022

Hi @esraagithub, your file sample1_deepbgc_prepare_result.tsv contains the BGC samples, is that correct? This file will need to contain an in_cluster column, which will have a value of 1 in all rows (in case the file only contains "positive" BGC samples). Your file should also contain a sequence_id column which should contain an identifier of each BGC.

@esraagithub
Copy link
Author

@prihoda
Thank you for your response
Yes this file sample1_deepbgc_prepare_result.tsv resulted from deepbgc prepare command. It actually contain sequence id column and in_cluster column but in_cluster column has 0 in all raws not 1
I don't know why it has only zero

@prihoda
Copy link
Collaborator

prihoda commented May 21, 2022

Hi @esraagithub if that file contains just BGC samples, you can manually change the value to 1 in all rows

@esraagithub
Copy link
Author

esraagithub commented May 21, 2022 via email

@esraagithub
Copy link
Author

thank you, i tried it and it worked well but i get another error in the next step "training the classifier"

raise ValueError('No overlap found between classes and samples. Classes should be indexed by sequence_id.')
ValueError: No overlap found between classes and samples. Classes should be indexed by sequence_id.
ERROR 23/05 23:53:18 ================================================================================
ERROR 23/05 23:53:18 DeepBGC failed with ValueError: No overlap found between classes and samples. Classes should be indexed by sequence_id.
ERROR 23/05 23:53:18 ================================================================================

I have a "sequence id" column in sample file (which i got from deepbgc prepare ) but no overlaps between it and classes file. so what should i do in this case? is this means i can't proceed with training?
@prihoda

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants