Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Variation between current and earlier versions #180

Open
bitsofbits opened this issue Nov 6, 2017 · 4 comments
Open

Variation between current and earlier versions #180

bitsofbits opened this issue Nov 6, 2017 · 4 comments

Comments

@bitsofbits
Copy link
Contributor

bitsofbits commented Nov 6, 2017

  • There is significant variation in the classification of vessels since the earlier release.

    • Shows up globally, but no on the test set. Probably because...

    • This is primarily in Chinese vessels, which tend to be in an area of poor coverage
      and we have limited ground truth.

  • Two possible explanations:

    • Change in training data. We added more data since then and the training / test sets were
      recomputed as a result.

    • Change in features.

      • Started using cos/sin trick to deal with cyclic parameters (unlikely to matter)

      • Added sin(lat) to so that we can look at seasonal data (could lead to overfitting).

@bitsofbits
Copy link
Contributor Author

David's table of differences:

image

@bitsofbits
Copy link
Contributor Author

bitsofbits commented Nov 6, 2017

Experiment 1:

  • Remove sin(lat), month of year and time of day in favor of so called day-angle:
    - 0 at noon, pi/2 at sunset, pi/-pi at midnight and -pi/2 at dawn.
    - linear between sunrise and sunset and sunset and sunrise.

Counts (using grep trawler dayangle_fine_labels/ALL_YEARS.csv | wc, so not quite exact):

Gear Type Count
Trawlers 38143
Fixed Gear 13520
Purse Seines 7698
Drifting Longlines 4134
Cargo 161255

Trawler count is pretty close to its old value, the others not so close.

@davidkroodsma:

results of run are here:

gs://machine-learning-dev-ttl-30d/dayangle_*


Details of run (on branch no-lat-feature):

sbt features/'run  --env=dev \
                              --zone=us-central1-f \
                              --maxNumWorkers=200 \
                              --job-name=features-day-angle-shuffle-2 \
                              --generate-model-features=True \
                              --generate-encounters=False \
                               --job-config=feature-pipeline/config/standard_config.yaml \
                              --experiments=shuffle_mode=service'

./deploy_cloudml.py --env dev \
                                   --model prod.vessel_characterization \
                                   --job_name day_angle_10_31_17 \
                                   --config_file deploy_characterization.yaml

Copy training file down to instance then:

 python -m classification.run_inference prod.vessel_characterization \
         --root_feature_path gs://machine-learning-dev-ttl-30d/classification/timothyhochberg/features-day-angle-shuffle-2/pipeline/output/features  \
         --inference_parallelism 32 \
         --feature_dimensions 11 \
         --model_checkpoint_path   ./dayAngleModel.ckpt-601668  \
         --metadata_file training_classes.csv \
         --fishing_ranges_file combined_fishing_ranges.csv \
         --inference_results_path=./day_angle3_all.json.gz

python compute_metrics.py   --inference-path day_angle3_all.json.gz  \
                                                 --label-path classification/data/training_classes.csv  \     
                                                 --dest-path dayangle3_vessel_char.html \
                                                 --fishing-ranges classification/data/combined_fishing_ranges.csv \
                                                 --skip-localisation-metrics \
                                                 --dump-labels-to dayangle_labels \
                                                 --dump-fine-labels-to day angle_fine_labels \
                                                 --dump-attributes-to dayangle_attribs

@bitsofbits
Copy link
Contributor Author

bitsofbits commented Nov 6, 2017

Old school features.

These features are the same as those used for the paper except for bug fixes and using
sin/cos trick for cyclic parameters.

run details (on branch old-school-features):

First couple runs had (a) stability and then (b) too low learning rate issues. This is with branch:
698b711

sbt features/'run  --env=dev --zone=us-central1-f --maxNumWorkers=200 --job-name=features-old-school --generate-model-features=True --generate-encounters=False --job-config=feature-pipeline/config/standard_config.yaml --experiments=shuffle_mode=service'

./deploy_cloudml.py --env dev --model prod.vessel_characterization --job_name old_school_7 --config_file deploy_characterization.yaml

  gsutil cp gs://world-fishing-827-dev-ttl30d/data-production/classification/timothyhochberg/old_school_7/models/prod.vessel_characterization/train/model.ckpt-600114.meta oldschool_7.ckpt

 python -m classification.run_inference prod.vessel_characterization \
     --root_feature_path gs://machine-learning-dev-ttl-30d/classification/timothyhochberg/features-old-school/pipeline/output/features  \
     --inference_parallelism 32 \
     --feature_dimensions 13 \
     --model_checkpoint_path   ./oldschool_7.ckpt \
     --metadata_file training_classes.csv \
     --fishing_ranges_file combined_fishing_ranges.csv \
     --inference_results_path=./old_school.json.gz \
     --dataset_split Test

python compute_metrics.py   --inference-path old_school.json.gz  \
                                                 --label-path classification/data/training_classes.csv  \
                                                 --dest-path oldschool_vessel_char.html \
                                                  --fishing-ranges classification/data/combined_fishing_ranges.csv \
                                                  --skip-localisation-metrics \
                                                  --dump-labels-to oldschool_labels \
                                                  --dump-fine-labels-to oldschool_fine_labels \
                                                  --dump-attributes-to oldschool_attribs

./deploy_cloudml.py --env dev --model prod.vessel_characterization --job_name old_ald_school --config_file deploy_characterization.yaml
gsutil cp gs://world-fishing-827-dev-ttl30d/data-production/classification/timothyhochberg/old_ald_school/models/prod.vessel_characterization/train/model.ckpt-600096 oldoldschool.ckpt

Something was off here in the mmsi lists and I had to manually regenerate them so I didn't get crashes

python -m classification.run_inference prod.vessel_characterization \
     --root_feature_path gs://machine-learning-dev-ttl-30d/classification/timothyhochberg/features-old-old-school/pipeline/output/features  \
     --inference_parallelism 32 \
     --feature_dimensions 14 \
     --model_checkpoint_path   ./oldoldschool.ckpt \
     --metadata_file training_classes.csv \
     --fishing_ranges_file combined_fishing_ranges.csv \
     --inference_results_path=./old_old_school.json.gz \


python compute_metrics.py   --inference-path old_old_school.json.gz  \
                                                 --label-path classification/data/training_classes.csv  \
                                                 --dest-path oldoldschool_vessel_char.html \
                                                  --fishing-ranges classification/data/combined_fishing_ranges.csv \
                                                  --skip-localisation-metrics \
                                                  --dump-labels-to oldoldschool_labels \
                                                  --dump-fine-labels-to oldoldschool_fine_labels \
                                                  --dump-attributes-to oldoldschool_attribs

====

Also for fishing

./deploy_cloudml.py --env dev --model prod.fishing_detection --job_name old_old_fishing --config_file deploy_fishing.yaml

python -m classification.run_inference prod.fishing_detection \
     --root_feature_path gs://machine-learning-dev-ttl-30d/classification/timothyhochberg/features-old-old-school/pipeline/output/features  \
     --inference_parallelism 64 \
     --feature_dimensions 14 \
     --model_checkpoint_path   ./old_old_fishing.ckpt \
     --metadata_file training_classes.csv \
     --fishing_ranges_file combined_fishing_ranges.csv \
     --inference_results_path=./old_old_fishing.json.gz 
  • Compute Metrics

       python compute_metrics.py  \
                                               --inference-path old_old_fishing.json.gz  \
                                               --label-path classification/data/training_classes.csv  \
                                               --dest-path old_old_fishing.html \
                                                --fishing-ranges classification/data/combined_fishing_ranges.csv \
                                                --skip-class-metrics \
                                                --skip-attribute-metrics \
                                                --test-only
    
Gear Type (mmsi:true/total) Precision Recall Accuracy F1-Score
drifting_longlines (21:332205/558062) 0.96 0.95 0.95 0.96
purse_seines (25:88846/632808) 0.87 0.83 0.96 0.85
stationary_gear (12:57438/139288) 0.98 0.89 0.95 0.93
trawlers (24:645889/1425957) 0.95 0.90 0.93 0.92
other (13:104304/233514) 0.96 0.96 0.96 0.96
         
Overall 0.95 0.91 0.94 0.93
  • Run annotation

      sbt features/'run --job-config=ais-annotator/config/full_annotation_old_old.yaml \
             --env=dev \
             --job-name=annotate_all \
             --maxNumWorkers=100 \
             --diskSizeGb=100 \
            --workerMachineType=custom-1-6656 \
            --output-path=gs://machine-learning-dev-ttl-30d/classification/old_old_fishing'
    

@bitsofbits
Copy link
Contributor Author

bitsofbits commented Nov 9, 2017

Simpler features (branch simpler-features). Reduce features more to make things simpler. Check performance.

(First try, using reported course fared poorly, retry with implied course. That also faired poorly -- was using raw, rather than logged feature. Suspect that is the problem. [Update] Using raw features was the primary problem removing them allowed the model to train. Seems to be not quite as good though.

 sbt features/'run  --env=dev --zone=us-central1-f --maxNumWorkers=200 --job-name=features-simpler-2 --generate-model-features=True --generate-encounters=False --job-config=feature-pipeline/config/standard_config.yaml --experiments=shuffle_mode=service'

./deploy_cloudml.py --env dev --model prod.vessel_characterization --job_name simple_2 --config_file deploy_characterization.yaml

[Above versions were's comitted as none worked well except the last which was only OK.

Main issue with features is how to generate rapidly, so try with only simple to generate features:

 sbt features/'run  --env=dev --zone=us-central1-f --maxNumWorkers=200 --job-name=features-simpler-gen --generate-model-features=True --generate-encounters=False --job-config=feature-pipeline/config/standard_config.yaml --experiments=shuffle_mode=service'

./deploy_cloudml.py --env dev --model prod.vessel_characterization --job_name simple_gen --config_file deploy_characterization.yaml

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant