Skip to content

NCTU-SYNC/sync-crawler

Repository files navigation

SYNC Crawler

Description

This is a crawler for the SYNC project. It crawls news articles from different news websites and stores them in a database.

Installation

This project use poetry to manage dependencies. Please install it first.

Execution Only

# Crawler and mongodb client
poetry install

# Or

# Crawler, mongodb client and chromadb client
# To store data in chromadb, you might need GPUs for execute embedding models
poetry install --with chroma

Development

# Crawler and mongodb client
poetry install --with dev
poetry run pre-commit install

# Or

# Crawler, mongodb client and chromadb client
# To store data in chromadb, you might need GPUs for execute embedding models
poetry install --with dev,chroma
poetry run pre-commit install

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages