Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make demo notebook runnable in Colab #630

Merged
merged 3 commits into from
Nov 10, 2023
Merged

Conversation

leemthompo
Copy link
Contributor

@leemthompo leemthompo commented Nov 8, 2023

Summary:

  • Upload dataset directly in notebook
  • Create index with mappings
  • Ignore docs that fail to index
  • Group imports and installs

Related work:

This is a more simplistic approach to #604

Preview in Colab

Visual diff

(Just the opening section — no changes to the actual examples)

@leemthompo leemthompo added the documentation Improvements or additions to documentation label Nov 8, 2023
@leemthompo leemthompo marked this pull request as ready for review November 9, 2023 08:43
Copy link
Member

@pquentin pquentin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! LGTM.

Great work here on understanding what was needed to run on Colab and not doing more than necessary while still producing a great result.

I'm seeing ids like lswztIsBt5-t_OatRvrU in places like lswztIsBt5-t_OatRvrU instead of monotically increasing ids, and I'd like to know why, but it's not blocking IMO.

@leemthompo
Copy link
Contributor Author

leemthompo commented Nov 9, 2023

Should be good now using

def generate_actions(data):
    for idx, entry in enumerate(data, start=0):  # Use monotonically increasing IDs starting from 0
        yield {
            "_index": FLIGHTS_INDEX_NAME,
            "_id": idx,  # Use the current index as the document ID
            "_source": entry
        }

Copy link
Member

@pquentin pquentin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Much better. It's indeed how the tests do it (well the use Panda's itterrows, but that's the same).

@pquentin pquentin merged commit 508de98 into elastic:main Nov 10, 2023
4 checks passed
@leemthompo leemthompo deleted the demo-notebook branch November 13, 2023 09:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants