Skip to content

Issues: NVIDIA/NeMo-Curator

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Assignee
Filter by who’s assigned
Sort

Issues list

[BUG] Semdedup Embedding Restart not working cleanly bug Something isn't working
#211 opened Aug 19, 2024 by VibhuJawa
removing fuzzy duplicates bug in single node tutorial bug Something isn't working
#210 opened Aug 16, 2024 by yyu22
[FEA] Add license detector for code repositories enhancement New feature or request
#208 opened Aug 15, 2024 by miguelusque
Update SemDeDup README documentation Improvements or additions to documentation
#203 opened Aug 13, 2024 by sarahyurick
DocumentDataset bug for reading relative file paths bug Something isn't working
#201 opened Aug 12, 2024 by sarahyurick
Improved Semantic Deduplication Docs enhancement New feature or request
#198 opened Aug 9, 2024 by ryantwolf
Semantic Deduplication Docs not in Index bug Something isn't working
#197 opened Aug 9, 2024 by ryantwolf
Pandas and cuDF DataFrames in DocumentDataset bug Something isn't working
#195 opened Aug 8, 2024 by sarahyurick
[META] Update python version to include python 3.11 meta General NeMo-Curator maintenance/packaging
#188 opened Aug 6, 2024 by VibhuJawa
Update single_gpu_tutorial.ipynb to use a recent snapshot bug Something isn't working
#185 opened Aug 5, 2024 by ronjer30
single_gpu_tutorial.ipynb fails to run on GPU bug Something isn't working
#183 opened Aug 5, 2024 by ronjer30
Remove text field requirement from Download and Extract enhancement New feature or request
#158 opened Jul 22, 2024 by ryantwolf
Default int_to_str_id error in jaccard shuffle bug Something isn't working
#150 opened Jul 9, 2024 by yyu22
Running into OOM with add id bug Something isn't working
#142 opened Jul 8, 2024 by yyu22
Add pytests for semantic dedup enhancement New feature or request
#141 opened Jul 5, 2024 by ayushdg
Incorrect Shuffle results with dask-cuda 24.06 & above bug Something isn't working
#134 opened Jul 1, 2024 by ayushdg
ProTip! Mix and match filters to narrow down what you’re looking for.