A multithreaded 🕸️ web crawler that recursively crawls a website and creates a 🔽 markdown file for each page, designed for LLM RAG
-
Updated
Aug 13, 2024 - Python
A multithreaded 🕸️ web crawler that recursively crawls a website and creates a 🔽 markdown file for each page, designed for LLM RAG
This Python-based repository hosts a sophisticated service designed for scraping web articles and converting them into Markdown format. The core functionality of this service includes extracting the main content of articles, such as headlines, key paragraphs, and associated images, and then seamlessly transforming this content into well-structured…
Add a description, image, and links to the html-to-markdown-converter topic page so that developers can more easily learn about it.
To associate your repository with the html-to-markdown-converter topic, visit your repo's landing page and select "manage topics."