Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Ability to slurp multiple pages using [] range format #51

Open
zanodor opened this issue Jul 27, 2024 · 6 comments
Open
Labels
enhancement New feature or request

Comments

@zanodor
Copy link

zanodor commented Jul 27, 2024

I've used this in the past in with the browser extension DownThemAll.
There the syntax to extract pdf pages was:
https://adt.arcanum.com/check-access-save/MNYTESZ_Hun_1/?pg=[0:1149]

Now, you understand I don't want pdf's with this plugin. I just showed the syntax.

So syntax would be something like [0:9] or [0-9].

Plugin creator would not be responsible for knowing how many digits or items there are in the range (some sites use 1, 01 or even 001 as first item or sometimes the range you'd want is from 22 to 45 only) -- that's for the user to feel out.

Great plugin,

Cheers

Z.

@inhumantsar inhumantsar added the enhancement New feature or request label Jul 28, 2024
@inhumantsar
Copy link
Owner

could you provide a few examples of sites you would use this on?

@zanodor
Copy link
Author

zanodor commented Jul 28, 2024

@inhumantsar
Copy link
Owner

Hmm slurp seems to have grabbed it but loading the markdown crashes obsidian for me on Android.

Anyway, I wanted to try it because Slurp/Readability is built to parse news articles, blog posts, and things like that, not tabular data.

If you're mostly thinking of using this feature for that kind of page, I suspect it will disappoint you.

@zanodor
Copy link
Author

zanodor commented Jul 28, 2024

Worked perfectly for me, buddy, but of course the volume was too large for Obsidian.

But going thru the numbers one by one is tedious.

Any way to hook your plugin to a Templater script, maybe?

@inhumantsar
Copy link
Owner

inhumantsar commented Jul 29, 2024

Worked perfectly for me, buddy, but of course the volume was too large for Obsidian.

yeah when i was back at my PC, i noticed that the mobile client was able to parse, save, and even sync the file. kind of surprised that it crashed the app though, it's only ~2MB. big for a plaintext file sure, but not excessive or complex. 🤷

Any way to hook your plugin to a Templater script, maybe?

i'm not sure about templater scripting. haven't really used it.

i do see the use case for this. there's just going to be some landmines to avoid. eg: if you assume that it takes an average of 3s to download, parse, and save a page like that, then 200 of those pages is going to take ~10 minutes. i'm not sure how obsidian or the various OSs its running on will react. hitting a server could get slurp's user agent or the user's IP (or both) banned for bot scraping.

i'll have a look into it at some point and see what's possible. in the short term, i'd probably recommend running a simple script. you should be able to do something like this:

#!/bin/bash
URL="https://...your url here.../"
START=1
END=200

for i in $(seq $START $END); do
    # use slurp's obsidian's URL integration
    # on MacOS *i think* you can use "open" instead of "xdg-open"
    xdg-open obsidian://slurp?url=${URL}${i}
    # give slurp time to do the thing
    sleep 3
done

save that somewhere with whatever name you like, eg: multislurp.sh then open up a terminal window and run:

cd /path/to/the/dir
chmod +x multislurp.sh
./multislurp.sh

i'm sure the same could be accomplished with PowerShell on Windows as well but i'm not really sure how. i do know that running the bash script in WSL on Windows will not work though.

@zanodor
Copy link
Author

zanodor commented Jul 29, 2024

Good (and I daresay "un-inhuman") of you to take the time to provide this information.
After a few tries, I got it to work with user input (on Linux). I put this up for posterity:

#!/bin/bash

# Prompt user for start and end values using zenity
START=$(zenity --entry --title="Input Start Value" --text="Enter the start value:")
END=$(zenity --entry --title="Input End Value" --text="Enter the end value:")

# Check if user input is valid (non-empty and numeric)
if ! [[ "$START" =~ ^[0-9]+$ ]] || ! [[ "$END" =~ ^[0-9]+$ ]]; then
    zenity --error --text="Invalid input! Please enter numeric values."
    exit 1
fi

URL="https://lpan.eva.mpg.de/austronesian/word.php?v="
SORT="&sort=language"
SLEEP_TIME=5

for i in $(seq $START $END); do
    FULL_URL="${URL}${i}${SORT}"
    ENCODED_URL=$(printf '%s' "$FULL_URL" | jq -s -R -r @uri)
    echo "Processing URL: $FULL_URL"
    echo "Encoded URL: $ENCODED_URL"
    xdg-open "obsidian://slurp?url=${ENCODED_URL}"
    sleep $SLEEP_TIME
done

This method would suffice for me, surely. So I'd say only implement something if the FR racks up a dozen likes or so.

Cheers mate

All the best,
Z.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants