Skip to content

2.1. Annotating datasets

Jim Schwoebel edited this page Aug 7, 2020 · 9 revisions

You can simply annotate using the command-line interface here:

cd /Users/jim/desktop/allie
cd annotation
python3 annotate.py -d /Users/jim/desktop/allie/train_dir/males/ -s audio -c male -p classification

What results is annotated folders in .JSON format following the standard dictionary.

{"sampletype": "audio", "transcripts": {"audio": {}, "text": {}, "image": {}, "video": {}, "csv": {}}, "features": {"audio": {}, "text": {}, "image": {}, "video": {}, "csv": {}}, "models": {"audio": {}, "text": {}, "image": {}, "video": {}, "csv": {}}, "labels": [{"male": {"value": 1.0, "datetime": "2020-08-03 14:06:53.101763", "filetype": "audio", "file": "1.wav", "problemtype": "classification", "annotate_dir": "/Users/jim/desktop/allie/train_dir/males"}}], "errors": [], "settings": {"version": "1.0.0", "augment_data": false, "balance_data": true, "clean_data": false, "create_csv": true, "default_audio_augmenters": ["augment_tsaug"], "default_audio_cleaners": ["clean_mono16hz"], "default_audio_features": ["librosa_features"], "default_audio_transcriber": ["deepspeech_dict"], "default_csv_augmenters": ["augment_ctgan_regression"], "default_csv_cleaners": ["clean_csv"], "default_csv_features": ["csv_features"], "default_csv_transcriber": ["raw text"], "default_dimensionality_reducer": ["pca"], "default_feature_selector": ["rfe"], "default_image_augmenters": ["augment_imaug"], "default_image_cleaners": ["clean_greyscale"], "default_image_features": ["image_features"], "default_image_transcriber": ["tesseract"], "default_outlier_detector": ["isolationforest"], "default_scaler": ["standard_scaler"], "default_text_augmenters": ["augment_textacy"], "default_text_cleaners": ["remove_duplicates"], "default_text_features": ["nltk_features"], "default_text_transcriber": ["raw text"], "default_training_script": ["tpot"], "default_video_augmenters": ["augment_vidaug"], "default_video_cleaners": ["remove_duplicates"], "default_video_features": ["video_features"], "default_video_transcriber": ["tesseract (averaged over frames)"], "dimension_number": 2, "feature_number": 20, "model_compress": false, "reduce_dimensions": false, "remove_outliers": true, "scale_features": true, "select_features": true, "test_size": 0.1, "transcribe_audio": true, "transcribe_csv": true, "transcribe_image": true, "transcribe_text": true, "transcribe_video": true, "visualize_data": false, "transcribe_videos": true}}

After you annotate, you can create a nicely formatted .CSV for machine learning:

cd /Users/jim/desktop/allie
cd annotation
python3 create_csv.py -d /Users/jim/desktop/allie/train_dir/males/ -s audio -c male -p classification

Click the .GIF below for a quick tutorial and example.

What each CLI argument represents

CLI argument description possible options example
-d the specified directory full of files that you want to annotate. any directory path "/Users/jim/desktop/allie/train_dir/males/"
-s the file type / type of problem that you are annotating ["audio", "text", "image", "video"] "audio"
-c the class name that you are annotating any string / class name "male"
-p the type of problem you are annotating - classification or regression label ["classification", "regression"] "classification"