Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature proposal: Slurp multiple pages in website and locally link them #59

Open
eri24816 opened this issue Sep 16, 2024 · 0 comments
Open

Comments

@eri24816
Copy link

I tried to use Obsidian to help me learn from this website, which has over hundreds of pages intensely interlinked. Given Obsidian's ability to visually represent interlinked documents, it helped me easily capture the big picture and navigate between topics.

What I did is to write a Python script to

  1. scrape some pages of the website with BFS from a random starting point
  2. convert html to markdown
  3. convert all internal links, https://ccrma.stanford.edu/~jos/<path>.html for this case, into local md links [<title>](./<path>.md)

The result:

image

A problem is that the website has too many pages to be entirely downloaded at once, so sometimes when I'm navigating to another page by clicking the link (or the node), it will be a blank file. It would be great if there is an automation that download the missing page once it detects the user opens it.

Then I found this plugin and I'm thinking if we can implement the mechanics onto it as an optional functionality. I'm new to Obsidian plugin and have no quite clear idea about the implementation, but maybe we can do this:

  • command Slurp: Assign directory to website (local_dir,website_root):
    assign a local directory (local_dir) that serves as a local copy of a target website url (website_root)
  • command Slurp: Create notes from url and its related pages (bfs_source_url,max_pages,max_distance):
    download a specific page bfs_source_url of the target website and its surrounding pages into local_dir. bfs_source_url must be in website_root.
  • when the user clicks a not-yet-created file in local_dir, the plugin automatically download the corresponding page to fill the local file
  • all links with the form website_root/<path>.html is translated into local_dir/<path>.md
  • all of above only takes effect inside of the local_dir. Outside of it, everything work as usual

I'm not sure if the proposal complies with the purpose of this project. Looking forward to hearing your thoughts!

@eri24816 eri24816 changed the title Feature proposal: Awareness of internal links in website Feature proposal: Slurp multiple pages in website and locally link them Sep 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant