Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error running example notebook #4

Closed
00dylan00 opened this issue Jul 4, 2024 · 2 comments · Fixed by #5
Closed

Error running example notebook #4

00dylan00 opened this issue Jul 4, 2024 · 2 comments · Fixed by #5

Comments

@00dylan00
Copy link

Hi! I have run the downstream_task_example.ipynb but ran into the following issue:

parameters, forward_fn, tokenizer, config, mlm_config = get_pretrained_downstream_model(
    model_name="tcga_5_cohorts",
    checkpoint_directory="../checkpoints/",
)

Which returned the following errror:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[2], [line 1](vscode-notebook-cell:?execution_count=2&line=1)
----> [1](vscode-notebook-cell:?execution_count=2&line=1) parameters, forward_fn, tokenizer, config, mlm_config = get_pretrained_downstream_model(
      [2](vscode-notebook-cell:?execution_count=2&line=2)     model_name="tcga_5_cohorts",
      [3](vscode-notebook-cell:?execution_count=2&line=3)     checkpoint_directory="../checkpoints/",
      [4](vscode-notebook-cell:?execution_count=2&line=4) )
      [5](vscode-notebook-cell:?execution_count=2&line=5) forward_fn = hk.transform(forward_fn)

File /aloy/home/ddalton/projects/multiomics-open-research/multiomics_open_research/bulk_rna_bert/downstream/pretrained.py:93, in get_pretrained_downstream_model(model_name, compute_dtype, param_dtype, output_dtype, checkpoint_directory)
     [90](https://file+.vscode-resource.vscode-cdn.net/aloy/home/ddalton/projects/multiomics-open-research/multiomics_open_research/bulk_rna_bert/downstream/pretrained.py:90)     embeddings_layer_to_use = mlm_config.num_layers
     [91](https://file+.vscode-resource.vscode-cdn.net/aloy/home/ddalton/projects/multiomics-open-research/multiomics_open_research/bulk_rna_bert/downstream/pretrained.py:91) mlm_config.embeddings_layers_to_save = (embeddings_layer_to_use,)
---> [93](https://file+.vscode-resource.vscode-cdn.net/aloy/home/ddalton/projects/multiomics-open-research/multiomics_open_research/bulk_rna_bert/downstream/pretrained.py:93) tokenizer = BinnedExpressionTokenizer(
     [94](https://file+.vscode-resource.vscode-cdn.net/aloy/home/ddalton/projects/multiomics-open-research/multiomics_open_research/bulk_rna_bert/downstream/pretrained.py:94)     gene_expression_bins=np.array(mlm_config.rnaseq_tokenizer_bins),
     [95](https://file+.vscode-resource.vscode-cdn.net/aloy/home/ddalton/projects/multiomics-open-research/multiomics_open_research/bulk_rna_bert/downstream/pretrained.py:95)     prepend_cls_token=False,
     [96](https://file+.vscode-resource.vscode-cdn.net/aloy/home/ddalton/projects/multiomics-open-research/multiomics_open_research/bulk_rna_bert/downstream/pretrained.py:96) )
     [98](https://file+.vscode-resource.vscode-cdn.net/aloy/home/ddalton/projects/multiomics-open-research/multiomics_open_research/bulk_rna_bert/downstream/pretrained.py:98) if model_name not in MODEL_NAME_TO_HEAD_NAME:
     [99](https://file+.vscode-resource.vscode-cdn.net/aloy/home/ddalton/projects/multiomics-open-research/multiomics_open_research/bulk_rna_bert/downstream/pretrained.py:99)     raise ValueError(f"Model {model_name} not supported.")

TypeError: BinnedExpressionTokenizer.__init__() got an unexpected keyword argument 'gene_expression_bins'

I imagined the parameter gene_expression_bins could be referring to n_expressions_bins but changing this just gave back more errors.

As a side-note installation was smooth through pip install -e . as suggested in the README although I had to add sys.path.append(os.path.dirname(os.path.abspath(os.getcwd()))) when running the example notebook for python to find the package.

Thanks!

@Maxenceglrd
Copy link
Collaborator

Maxenceglrd commented Jul 11, 2024

Hi! Thanks a lot for your interest in this repository and for having raised this issue.

It is indeed a mistake on the tokenizer loading, it will be corrected shortly.

Concerning your installation side-note, can you make sure that the Python kernel you are using when running your notebook is the same as the one in which you have installed the repository as package (using pip install -e .)?

Maxence

@00dylan00
Copy link
Author

Fantastic works perfectly!

How would fine-tuning of this model be performed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants