Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How do I use this scrapper? #168

Closed
JoseGeorges8 opened this issue Jun 16, 2020 · 2 comments
Closed

How do I use this scrapper? #168

JoseGeorges8 opened this issue Jun 16, 2020 · 2 comments

Comments

@JoseGeorges8
Copy link

Hi! I'm new to web-scraping and python. I want to create a simple project where people can lookup recipes based on ingredients, and I think this scrapper could help, but I'm not sure how to approach it.

I cloned the code locally and I've been trying to go through it and this is what I sort of understand:

  • AbstractScrapper is the heart of the library. Each scrapper implements this abstract class and override the methods, and use utils to retrieve the info from the recipe.

I noticed some scrappers like allrecipes that only have a host method implemented. I guess my question is are these scrappers incomplete? or is there something I'm not seeing?

And about the ingredients, my idea is to run this scrapper through all the available keys (websites), scrap most of the recipes and store the json on a database. Then, I would create queries looking for the recipes that match the ingredients.

Is this a good use case?

@PatrickPierce
Copy link
Contributor

PatrickPierce commented Jun 24, 2020

I'm still new to python, but this is my understanding.

Information is pulled using a Schema. The supported ones are "recipe" and "webpage" (Line 5, _schemaorg.py).

AbstractScrapper will use decorators (_decorators.py) and attempt to find the information based on the schema template. If there are no templates, it will use the scraper found in the scraper folder. This is why "allrecipes" only have the host define since the website is designed with schema in mind.

The first two code block in the README.rst describes how to use the scraper.

For your case, you would import the scraper, assign the recipe to variable and call the methods on that.

from recipe_scrapers import scrape_me

ar_spinach = scrape_me('https://www.allrecipes.com/recipe/158968/spinach-and-feta-turkey-burgers/')
bb_ginger = scrape_me('https://www.budgetbytes.com/sticky-ginger-soy-glazed-chicken/')

ar_spinach.title()
bb_ginger.title()

sticky_ginger_chicken_ingredients = bb_ginger.ingredients()

# save your ingredients in json for database

@hhursev
Copy link
Owner

hhursev commented Jun 28, 2020

For allrecipes - it's hidden in plain sight but it works as marmiton.py.

To save time with the initial scraping I'd suggest going to the archive in this early days issue #9.

For the search based on ingredients I suggest looking at tools as Solr and Elasticsearch. And if you are using Django you can further simplify your work with django-haystack.
Reading a bit about Solr or Elasticsearch will help you understand how most of the searches on the internet work.

I'm closing the issue but feel free to reopen if anything arises again.

@hhursev hhursev closed this as completed Jun 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants