Exception with 6.0.7: "This should be implemented". #148

synergiator · 2020-05-04T19:09:33Z

Installed the packege 6.0.7 and tried with Python 3.6 following the README file:


from recipe_scrapers import scrape_me
scraper = scrape_me('http://allrecipes.com/Recipe/Apple-Cake-Iv/Detail.aspx')

scraper.title()
scraper.total_time()
scraper.yields()
scraper.ingredients()
scraper.instructions()
scraper.image()
scraper.links()

Error:


Traceback (most recent call last):
  File "sandbox.py", line 10, in <module>
    scraper.total_time()
  File "/home/user/.local/lib/python3.6/site-packages/recipe_scrapers/_abstract.py", line 44, in schema_org_priority_wrapper
    return value or decorated(self, *args, **kwargs)
  File "/home/user/.local/lib/python3.6/site-packages/recipe_scrapers/_abstract.py", line 85, in total_time
    raise NotImplementedError("This should be implemented.")
NotImplementedError: This should be implemented.

The text was updated successfully, but these errors were encountered:

jayaddison · 2020-05-06T22:32:57Z

Thanks for the report @synergiator; the allrecipe scraper should be picking up the necessary data from schema.org metadata in the page and returning it to us, but it looks like that isn't happening here.

Do you have any time to dig into why the code falls into this path for the example recipe?

synergiator · 2020-05-09T14:54:42Z

Hi there @jayaddison, I can now share some facts what I could find so far.

All scrapers base on the abstract class AbstractScraper. So if one of its methods is not overriden, like total_time, it raises an exception.
The error I've observed with the allrecipes scraper seems therefore so to say to be "correct" as it indeed does not implement an overriding method - in fact, it implements just host() method.
So I thought maybe with another scraper which does implement this method things should work better, for example the Epicurious scraper. But, it produces same error which I can't understand.

The above test was in a fresh Python virtual environment with just the package and its requirements installed. Then, I've tried same test from the path where all other tests are, in a corresponding recipe-scrapers development virtual environment. No error. This behavior gives me an impression there is some wrongly coded dynamic loader / dependency injection or similar (just a guess), but for time being I can't tell more and better pass the word to the authors of this code. :-)

jayaddison · 2020-05-10T12:37:07Z

Thanks for the analysis @synergiator - this leads me to believe we have a regression somewhere between 6.0.7 and the 5.x.x branch (latest release 5.18.0).

The largest change between the two is the introduction of schema.org handlers taking priority.

Essentially the code will look for a schema.org (or JSON-LD) metadata property on the target page, and only if it does not find it will it revert to using the def total_time(...) (or other extraction field) method.

It seems that the example Apple Cake recipe doesn't contain such metadata, and so the code tries to invoke the AllRecipes extraction methods -- but then finds that they don't exist either, so it finally raises the NotImplementedError.

I've no sense for how many AllRecipes pages contain vs. do not contain schema.org properties, unfortunately, so I don't know how many scrapes this would affect.

There are a few options here:

If you wish, you could try the 5.18.0 release and you should find that it works as advertised
We may fix the readme link and provide an example URL that works, 'hiding' the problem
We may restore the ability of the AllRecipes scraper (and others?) to handle pages-without-metadata -- this is the most 'complete' fix

It could be a while before we get to item 3. Let me know if this helps you out short-term :)

synergiator · 2020-05-10T13:20:00Z

Hi, there, @jayaddison, thanks for the explanation! In the meantime I've found out there is a quite large dataset from MIT with about 1M recipes so I don't need currently to scrape by myself.

Regarding the issue, it's strange enough same site uses different templates to render recipes.

hhursev · 2020-05-10T16:13:23Z

@synergiator version 6.1.0 is updated and will handle "gracefully" the exception (defaulting to 0 if there is no information for preparation time found). Thanks for pointing out the issue!

The dataset you've linked is awesome! You can take a look at this issue for another set of recipes w/o needing to scrape a thing ;)

synergiator · 2020-05-14T11:17:57Z

hi there @hhursev! thank you for the fix, and the link to the other data set!

hhursev mentioned this issue May 10, 2020

Option for default_exception_handling from the package #151

Merged

hhursev closed this as completed May 10, 2020

TomBursch mentioned this issue Jun 3, 2022

NotImplementedException scraper.description() #554

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exception with 6.0.7: "This should be implemented". #148

Exception with 6.0.7: "This should be implemented". #148

synergiator commented May 4, 2020

jayaddison commented May 6, 2020

synergiator commented May 9, 2020

jayaddison commented May 10, 2020

synergiator commented May 10, 2020

hhursev commented May 10, 2020

synergiator commented May 14, 2020

Exception with 6.0.7: "This should be implemented". #148

Exception with 6.0.7: "This should be implemented". #148

Comments

synergiator commented May 4, 2020

jayaddison commented May 6, 2020

synergiator commented May 9, 2020

jayaddison commented May 10, 2020

synergiator commented May 10, 2020

hhursev commented May 10, 2020

synergiator commented May 14, 2020