Skip to content

Python Script to extract and sanitize vegan ingredients from the OpenFoodFacts database, providing a clean and useful dataset for vegan product analysis and application development.

License

Notifications You must be signed in to change notification settings

frontendnetwork/Vegan-Ingredients-Extractor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Vegan Ingredients Extractor

Vegan Ingredients Extractor is a Python script designed to parse through the OpenFoodFacts database dump in jsonl-format (Direct Download, ~ 6GB gzipped, ~40GB uncompressed) to identify vegan products and extract their ingredients.

It is mainly designed to improve the Vegan Ingredients Checker on the Veganify API and provides an efficient way to filter through extensive datasets, leveraging labels and ingredients data to curate a list of vegan-friendly ingredients. This script can also be altered to look for other stuff.

How to Use

Prerequisites:

Setup:

Clone this repository to your local machine and navigate to the repository's directory in your terminal.

$ cd Vegan-Ingredients-Extractor

Running the Script:

The script can be executed with or without a command-line argument for the file path. To run it with a file path argument:

$ python3 extract_vegan_ingredients.py --path=/path/to/your/openfoodfacts-products.jsonl

Output:

The script will process the specified .jsonl file to identify vegan products and extract their ingredients. It removes duplicates and cleans up the data to ensure a refined list.

Upon completion, the script outputs a file named vegan_ingredients.json containing all unique vegan ingredients extracted from the dataset. To modify the outputs file name or path, alter Line 81:

save_ingredients(ingredients, 'vegan_ingredients.json')

License

This project is open-sourced under the WTFPL License. Do what the fuck you want.

About

Python Script to extract and sanitize vegan ingredients from the OpenFoodFacts database, providing a clean and useful dataset for vegan product analysis and application development.

Topics

Resources

License

Stars

Watchers

Forks

Languages