Gramhagen/wikidata #902

gramhagen · 2019-08-22T03:59:09Z

Description

clean up to wikidata notebook and utils, this does speed up the notebook execution a bit (first data pull went from 8s -> 5s), the longer movielens data pull should be faster too (also i just clipped it to 50 by default)

@almudenasanz it would be great to get your feedback here. I hid some of the functionality to make it easier to reuse the code, but if you think it's important to surface the functions to get the entities, links, and descriptions we can add that back into the notebook.

Related Issues

#880 it's possible that this might help (mainly due to session caching?) I did limit some of the results in the normal case, but that shouldn't impact the integration test.

Checklist:

I have followed the contribution guidelines and code style for this project.
I have added tests covering my contributions.
I have updated the documentation accordingly.

review-notebook-app · 2019-08-22T03:59:15Z

Check out this pull request on ReviewNB: https://app.reviewnb.com/microsoft/recommenders/pull/902

You'll be able to see notebook diffs and discuss changes. Powered by ReviewNB.

almudenasanz · 2019-08-23T10:32:48Z

Description

clean up to wikidata notebook and utils, this does speed up the notebook execution a bit (first data pull went from 8s -> 5s), the longer movielens data pull should be faster too (also i just clipped it to 50 by default)

@almudenasanz it would be great to get your feedback here. I hid some of the functionality to make it easier to reuse the code, but if you think it's important to surface the functions to get the entities, links, and descriptions we can add that back into the notebook.

Related Issues

#880 it's possible that this might help (mainly due to session caching?) I did limit some of the results in the normal case, but that shouldn't impact the integration test.

Thanks a lot @gramhagen ! Very nice to see how you handled the sessions.

Maybe we could surface one example of the steps on how to get the Wikidata ID and the Links as different steps, since we query different APIs and some people may want to just use one or another example. Eg: just get the wikidata ID from a text query, or from a wikidata ID get the related entities or description

gramhagen · 2019-08-23T10:47:32Z

Makes sense. We can show the steps in the first example and then use the helper function later. I'll update the notebook.

miguelgfierro

LGTM

miguelgfierro · 2019-09-05T15:58:35Z

weird error in pyspark:


tests/unit/test_notebooks_pyspark.py ....F.                              [100%]

=================================== FAILURES ===================================
______________________________ test_spark_tuning _______________________________

notebooks = {'als_deep_dive': '/data/home/recocat/cicd/4/s/notebooks/02_model/als_deep_dive.ipynb', 'als_pyspark': '/data/home/rec...baseline_deep_dive.ipynb', 'data_split': '/data/home/recocat/cicd/4/s/notebooks/01_prepare_data/data_split.ipynb', ...}

    @pytest.mark.notebooks
    @pytest.mark.spark
    def test_spark_tuning(notebooks):
        notebook_path = notebooks["spark_tuning"]
        pm.execute_notebook(
            notebook_path,
            OUTPUT_NOTEBOOK,
            kernel_name=KERNEL_NAME,
            parameters=dict(
                NUMBER_CORES="*",
                NUMBER_ITERATIONS=3,
                SUBSET_RATIO=0.5,
                RANK=[5, 5],
>               REG=[0.1, 0.01]
            )
        )

tests/unit/test_notebooks_pyspark.py:51: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/anaconda/envs/reco_pyspark/lib/python3.6/site-packages/papermill/execute.py:94: in execute_notebook
    raise_for_execution_errors(nb, output_path)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

nb = {'cells': [{'cell_type': 'code', 'metadata': {'inputHidden': True, 'hide_input': True}, 'execution_count': None, 'sour...nd_time': '2019-09-05T13:15:25.886974', 'duration': 21.609476, 'exception': True}}, 'nbformat': 4, 'nbformat_minor': 2}
output_path = 'output.ipynb'

    def raise_for_execution_errors(nb, output_path):
        """Assigned parameters into the appropriate place in the input notebook
    
        Parameters
        ----------
        nb : NotebookNode
           Executable notebook object
        output_path : str
           Path to write executed notebook
        """
        error = None
        for cell in nb.cells:
            if cell.get("outputs") is None:
                continue
    
            for output in cell.outputs:
                if output.output_type == "error":
                    error = PapermillExecutionError(
                        exec_count=cell.execution_count,
                        source=cell.source,
                        ename=output.ename,
                        evalue=output.evalue,
                        traceback=output.traceback,
                    )
                    break
    
        if error:
            # Write notebook back out with the Error Message at the top of the Notebook.
            error_msg = ERROR_MESSAGE_TEMPLATE % str(error.exec_count)
            error_msg_cell = nbformat.v4.new_code_cell(
                source="%%html\n" + error_msg,
                outputs=[
                    nbformat.v4.new_output(output_type="display_data", data={"text/html": error_msg})
                ],
                metadata={"inputHidden": True, "hide_input": True},
            )
            nb.cells = [error_msg_cell] + nb.cells
            write_ipynb(nb, output_path)
>           raise error
E           papermill.exceptions.PapermillExecutionError: 
E           ---------------------------------------------------------------------------
E           Exception encountered at "In [11]":
E           ---------------------------------------------------------------------------
E           Py4JJavaError                             Traceback (most recent call last)
E           /anaconda/envs/reco_pyspark/lib/python3.6/site-packages/pyspark/sql/utils.py in deco(*a, **kw)
E                62         try:
E           ---> 63             return f(*a, **kw)
E                64         except py4j.protocol.Py4JJavaError as e:
E           
E           /anaconda/envs/reco_pyspark/lib/python3.6/site-packages/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
E               327                     "An error occurred while calling {0}{1}{2}.\n".
E           --> 328                     format(target_id, ".", name), value)
E               329             else:
E           
E           <class 'str'>: (<class 'py4j.protocol.Py4JNetworkError'>, Py4JNetworkError('An error occurred while trying to connect to the Java server (127.0.0.1:35421)',))
E

maybe a problem with the spark instantiation?

Gramhagen/wikidata

updates to wikidata notebook and utils

b12529f

gramhagen requested a review from almudenasanz August 22, 2019 03:59

gramhagen requested review from miguelgfierro and yueguoguo as code owners August 22, 2019 03:59

gramhagen changed the base branch from master to staging August 22, 2019 03:59

miguelgfierro approved these changes Sep 3, 2019

View reviewed changes

addressing cr comments for wikidata

5b5dd48

fix typo in wikidata notebook

33d3a23

miguelgfierro merged commit 8556a7f into staging Sep 6, 2019

miguelgfierro deleted the gramhagen/wikidata branch September 6, 2019 11:33

yueguoguo pushed a commit that referenced this pull request Sep 9, 2019

Merge pull request #902 from microsoft/gramhagen/wikidata

e9baeb2

Gramhagen/wikidata

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gramhagen/wikidata #902

Gramhagen/wikidata #902

gramhagen commented Aug 22, 2019

review-notebook-app bot commented Aug 22, 2019

almudenasanz commented Aug 23, 2019

Description

Related Issues

gramhagen commented Aug 23, 2019

miguelgfierro left a comment

miguelgfierro commented Sep 5, 2019

Gramhagen/wikidata #902

Gramhagen/wikidata #902

Conversation

gramhagen commented Aug 22, 2019

Description

Related Issues

Checklist:

review-notebook-app bot commented Aug 22, 2019

almudenasanz commented Aug 23, 2019

Description

Related Issues

gramhagen commented Aug 23, 2019

miguelgfierro left a comment

Choose a reason for hiding this comment

miguelgfierro commented Sep 5, 2019