Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broken Tutorial: single_gpu_tutorial.ipynb #105

Closed
axelmagn opened this issue Jun 12, 2024 · 3 comments
Closed

Broken Tutorial: single_gpu_tutorial.ipynb #105

axelmagn opened this issue Jun 12, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@axelmagn
Copy link

tutorials/single_node_tutorial/single_gpu_tutorial.ipynb section 5.3 contains a line:

# Read .jsonl input data
ddf_text = get_text_ddf_from_json_path_with_blocksize(
    input_data_paths=input_data_paths,
    num_files=num_files,
    blocksize=text_ddf_blocksize,
    id_column=input_id_field,
    text_column=input_text_field,
)

which produces the following error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[48], line 5
      2 num_workers = get_num_workers(client)
      4 # Read .jsonl input data
----> 5 ddf_text = get_text_ddf_from_json_path_with_blocksize(
      6     input_data_paths=input_data_paths,
      7     num_files=num_files,
      8     blocksize=text_ddf_blocksize,
      9     id_column=input_id_field,
     10     text_column=input_text_field,
     11 )
     12 # Read "_buckets.parquet"
     13 ddf_bk = get_bucket_ddf_from_parquet_path(input_bucket_path=input_bucket_path, num_workers=num_workers)

TypeError: get_text_ddf_from_json_path_with_blocksize() missing 1 required positional argument: 'input_meta'

I've checked the corresponding function, and it does indeed require a input_meta argument. However it's missing a docstring and I'm not sure what form this argument should take.

I suspect that the signature of this function has drifted since the tutorial was written. Maybe consider giving input_meta a default value (if possible), or if not, writing some docstrings for these functions?

@axelmagn axelmagn added the bug Something isn't working label Jun 12, 2024
@ayushdg
Copy link
Collaborator

ayushdg commented Jun 12, 2024

Thanks for opening @axelmagn. You're indeed right that the function has drifted since the tutorial was initially written. In this case I'm in favor of having a default for input_meta set to None for this method.

cc: @miguelusque if you have any opinions.

@miguelusque
Copy link
Contributor

Hi @ayushdg , I think you are right. Setting it to None should make it work.

I will change that method to have a default value of None to avoid the happening in the future. Apologies for the inconvenience.

nicoleeeluo added a commit to nicoleeeluo/NeMo-Curator that referenced this issue Jun 13, 2024
nicoleeeluo added a commit to nicoleeeluo/NeMo-Curator that referenced this issue Jun 13, 2024
Signed-off-by: Nicole Luo <nluo@nvidia.com>
@nicoleeeluo nicoleeeluo mentioned this issue Jun 13, 2024
3 tasks
@miguelusque
Copy link
Contributor

Fixed via #102.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants