Skip to content

Actions: huggingface/datatrove

Lint

Actions

Loading...
Loading

Show workflow options

Create status badge

Loading
462 workflow runs
462 workflow runs

Filter by Event

Filter by Status

Filter by Branch

Filter by Actor

started work on formatters and ccnet perplexity
Lint #462: Commit 09c5245 pushed by guipenedo
December 18, 2023 18:44 26s pipeline_blocks_misc
December 18, 2023 18:44 26s
Set minimal Python version to 3.10
Lint #461: Commit b9735fe pushed by mariosasko
December 18, 2023 18:40 24s fix-minimal-version
December 18, 2023 18:40 24s
recursive was not taken into account in fsspec
Lint #460: Pull request #38 synchronize by thomwolf
December 17, 2023 09:16 21s fix-recursive
December 17, 2023 09:16 21s
batched tokenization
Lint #459: Commit 4f5cd00 pushed by thomwolf
December 17, 2023 09:16 24s fix-recursive
December 17, 2023 09:16 24s
Optimize ParquetReader (#40)
Lint #458: Commit 46750dd pushed by guipenedo
December 15, 2023 19:44 25s main
December 15, 2023 19:44 25s
Optimize ParquetReader
Lint #457: Pull request #40 synchronize by guipenedo
December 13, 2023 14:44 26s optimize-parquet-reader
December 13, 2023 14:44 26s
correctly track time
Lint #456: Commit 0ebc066 pushed by guipenedo
December 13, 2023 14:44 22s optimize-parquet-reader
December 13, 2023 14:44 22s
Optimize ParquetReader
Lint #455: Pull request #40 synchronize by mariosasko
December 13, 2023 12:57 26s optimize-parquet-reader
December 13, 2023 12:57 26s
Address comments
Lint #454: Commit cb81215 pushed by mariosasko
December 13, 2023 12:57 35s optimize-parquet-reader
December 13, 2023 12:57 35s
Optimize ParquetReader
Lint #453: Pull request #40 opened by mariosasko
December 12, 2023 18:07 24s optimize-parquet-reader
December 12, 2023 18:07 24s
Optimize ParquetReader
Lint #452: Commit ba648c9 pushed by mariosasko
December 12, 2023 18:03 26s optimize-parquet-reader
December 12, 2023 18:03 26s
Support Python 3.8
Lint #451: Pull request #39 opened by mariosasko
December 12, 2023 17:18 41s python3.8-support
December 12, 2023 17:18 41s
Nit
Lint #450: Commit 7b6e3ba pushed by mariosasko
December 12, 2023 17:07 32s python3.8-support
December 12, 2023 17:07 32s
bugfix to also read WET files with WarcReader
Lint #449: Commit 6014a6f pushed by guipenedo
December 12, 2023 15:43 24s main
December 12, 2023 15:43 24s
bugfix outputfilename with gz
Lint #448: Commit 476de37 pushed by guipenedo
December 12, 2023 03:43 22s main
December 12, 2023 03:43 22s
recursive was not taken into account in fsspec
Lint #447: Pull request #38 synchronize by thomwolf
December 10, 2023 01:44 19s fix-recursive
December 10, 2023 01:44 19s
updates
Lint #446: Commit 126a0b9 pushed by thomwolf
December 10, 2023 01:44 22s fix-recursive
December 10, 2023 01:44 22s
recursive was not taken into account in fsspec
Lint #445: Pull request #38 opened by thomwolf
December 7, 2023 08:26 24s fix-recursive
December 7, 2023 08:26 24s
recursive was not taken into account in fsspec
Lint #444: Commit baced94 pushed by thomwolf
December 6, 2023 23:44 22s fix-recursive
December 6, 2023 23:44 22s
started work on the readme
Lint #443: Commit 35fc53e pushed by guipenedo
December 6, 2023 15:52 23s main
December 6, 2023 15:52 23s
added check on rank 0 for input files on readers
Lint #442: Commit c834995 pushed by guipenedo
December 6, 2023 12:48 20s main
December 6, 2023 12:48 20s
fix tokenization error on empty data
Lint #441: Commit 3e3f0c8 pushed by guipenedo
December 6, 2023 12:38 25s main
December 6, 2023 12:38 25s
Merge pull request #37 from huggingface/labeling
Lint #440: Commit 7e007ff pushed by thomwolf
December 6, 2023 12:24 24s main
December 6, 2023 12:24 24s
Should have been merged as well in the labelling tool
Lint #439: Pull request #37 opened by thomwolf
December 6, 2023 12:24 22s labeling
December 6, 2023 12:24 22s
Merge pull request #36 from huggingface/doc-length
Lint #438: Commit e38f52e pushed by thomwolf
December 6, 2023 12:22 19s main
December 6, 2023 12:22 19s