How do I use this scrapper? #168

JoseGeorges8 · 2020-06-16T22:59:32Z

Hi! I'm new to web-scraping and python. I want to create a simple project where people can lookup recipes based on ingredients, and I think this scrapper could help, but I'm not sure how to approach it.

I cloned the code locally and I've been trying to go through it and this is what I sort of understand:

AbstractScrapper is the heart of the library. Each scrapper implements this abstract class and override the methods, and use utils to retrieve the info from the recipe.

I noticed some scrappers like allrecipes that only have a host method implemented. I guess my question is are these scrappers incomplete? or is there something I'm not seeing?

And about the ingredients, my idea is to run this scrapper through all the available keys (websites), scrap most of the recipes and store the json on a database. Then, I would create queries looking for the recipes that match the ingredients.

Is this a good use case?

PatrickPierce · 2020-06-24T19:32:27Z

I'm still new to python, but this is my understanding.

Information is pulled using a Schema. The supported ones are "recipe" and "webpage" (Line 5, _schemaorg.py).

AbstractScrapper will use decorators (_decorators.py) and attempt to find the information based on the schema template. If there are no templates, it will use the scraper found in the scraper folder. This is why "allrecipes" only have the host define since the website is designed with schema in mind.

The first two code block in the README.rst describes how to use the scraper.

For your case, you would import the scraper, assign the recipe to variable and call the methods on that.

from recipe_scrapers import scrape_me

ar_spinach = scrape_me('https://www.allrecipes.com/recipe/158968/spinach-and-feta-turkey-burgers/')
bb_ginger = scrape_me('https://www.budgetbytes.com/sticky-ginger-soy-glazed-chicken/')

ar_spinach.title()
bb_ginger.title()

sticky_ginger_chicken_ingredients = bb_ginger.ingredients()

# save your ingredients in json for database

hhursev · 2020-06-28T11:07:00Z

For allrecipes - it's hidden in plain sight but it works as marmiton.py.

To save time with the initial scraping I'd suggest going to the archive in this early days issue #9.

For the search based on ingredients I suggest looking at tools as Solr and Elasticsearch. And if you are using Django you can further simplify your work with django-haystack.
Reading a bit about Solr or Elasticsearch will help you understand how most of the searches on the internet work.

I'm closing the issue but feel free to reopen if anything arises again.

hhursev closed this as completed Jun 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How do I use this scrapper? #168

How do I use this scrapper? #168

JoseGeorges8 commented Jun 16, 2020

PatrickPierce commented Jun 24, 2020 •

edited

Loading

hhursev commented Jun 28, 2020

How do I use this scrapper? #168

How do I use this scrapper? #168

Comments

JoseGeorges8 commented Jun 16, 2020

PatrickPierce commented Jun 24, 2020 • edited Loading

hhursev commented Jun 28, 2020

PatrickPierce commented Jun 24, 2020 •

edited

Loading