Skip to content

Commit

Permalink
readme changes for the release
Browse files Browse the repository at this point in the history
  • Loading branch information
guipenedo committed Feb 7, 2024
1 parent 2741e01 commit bd3c89a
Showing 1 changed file with 8 additions and 8 deletions.
16 changes: 8 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,15 +38,14 @@ Local, remote and other file systems are supported through [fsspec](https://file
## Installation

```bash
git clone git@github.com:huggingface/datatrove.git && cd datatrove
pip install -e ".[FLAVOUR]"
pip install datatrove[FLAVOUR]
```
Available flavours (combine them with `,` i.e. `[processing,s3]`:
- `all` installs everything
- `io` dependencies to read `warc/arc/wet` files and arrow/parquet formats
- `processing` dependencies for text extraction, filtering and tokenization
- `s3` s3 support
- `cli` for command line tools
Available flavours (combine them with `,` i.e. `[processing,s3]`):
- `all` installs everything: `pip install datatrove[all]`
- `io` dependencies to read `warc/arc/wet` files and arrow/parquet formats: `pip install datatrove[io]`
- `processing` dependencies for text extraction, filtering and tokenization: `pip install datatrove[processing]`
- `s3` s3 support: `pip install datatrove[s3]`
- `cli` for command line tools: `pip install datatrove[cli]`

## Quickstart examples
You can check the following [examples](examples):
Expand Down Expand Up @@ -376,6 +375,7 @@ You could also inherit from [`BaseExtractor`](src/datatrove/pipeline/extractors/
## Contributing

```bash
git clone git@github.com:huggingface/datatrove.git && cd datatrove
pip install -e ".[dev]"
```

Expand Down

0 comments on commit bd3c89a

Please sign in to comment.