Skip to content
/ spix Public

A broken url finder to helps you track the invalid data inside your datas and fix them

Notifications You must be signed in to change notification settings

imhassane/spix

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SPIX

An automatic broken url finder.

Main goal

I'm writing this to help me replace all the broken url in a database I'm working on currently. Those url are for examples youtube urls used in my project or images url.
It helps me replacing them and keeping my database coherent with its data.

Feel free to use it or edit it as you want it for your project.

Usage

To install it, clone this repository or download it.

Input file

The input file is a txt file containing all the urls to test. You can also add rows such as databases names, tables or table's ids.

url table id
https://google.fr t_suggetions_sug 12
https://theoncebook.wordpress.com t_favorites_fav 100
https://not-an-url.com/hello.gif t_covers_cov 5
https://inc.api/martin.png t_covers_cov 10

Let's suppose our txt file is organized this way and is name test.txt. The first column is required to be the url, then we can add the others rows. The txt file should look like this one below:

https://google.fr t_suggetions_sug 2
https://theoncebook.wordpress.com t_favorites_fav 100
https://not-an-url.com/hello.gif t_covers_cov 5
https://inc.api/martin.png t_covers_cov 10

Getting the broken urls

To get the broken urls we have to run the main.py file with its arguments. If no arguments are provided the default configuration will be used and the example above won't work as we suppose by default there only should be urls. To get this example to work, we need to run the command below:

python main.py --rows "table id" --input-address test.txt

Here we are adding two new rows to the SPIX configuration and the result should be:

SPIX Broken url finder
STEP 1/3: Getting the file content
STEP 2/3: Filtering broken urls
         waiting for threads to end...
STEP 3/3: Dumping results into the output file: result
2 brokens url found
Done. Check your output file to see the broken urls and eventually replace them
:) :)

I hope I helped you understanding it more. 😍😍

About

A broken url finder to helps you track the invalid data inside your datas and fix them

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages