Variation between current and earlier versions #180

bitsofbits · 2017-11-06T15:56:57Z

There is significant variation in the classification of vessels since the earlier release.
- Shows up globally, but no on the test set. Probably because...
- This is primarily in Chinese vessels, which tend to be in an area of poor coverage
  and we have limited ground truth.
Two possible explanations:
- Change in training data. We added more data since then and the training / test sets were
  recomputed as a result.
- Change in features.
  - Started using cos/sin trick to deal with cyclic parameters (unlikely to matter)
  - Added sin(lat) to so that we can look at seasonal data (could lead to overfitting).

bitsofbits · 2017-11-06T15:57:27Z

David's table of differences:

bitsofbits · 2017-11-06T15:59:47Z

Experiment 1:

Remove sin(lat), month of year and time of day in favor of so called day-angle:
- 0 at noon, pi/2 at sunset, pi/-pi at midnight and -pi/2 at dawn.
- linear between sunrise and sunset and sunset and sunrise.

Counts (using grep trawler dayangle_fine_labels/ALL_YEARS.csv | wc, so not quite exact):

Gear Type	Count
Trawlers	38143
Fixed Gear	13520
Purse Seines	7698
Drifting Longlines	4134
Cargo	161255

Trawler count is pretty close to its old value, the others not so close.

@davidkroodsma:

results of run are here:

gs://machine-learning-dev-ttl-30d/dayangle_*

Details of run (on branch no-lat-feature):

sbt features/'run  --env=dev \
                              --zone=us-central1-f \
                              --maxNumWorkers=200 \
                              --job-name=features-day-angle-shuffle-2 \
                              --generate-model-features=True \
                              --generate-encounters=False \
                               --job-config=feature-pipeline/config/standard_config.yaml \
                              --experiments=shuffle_mode=service'

./deploy_cloudml.py --env dev \
                                   --model prod.vessel_characterization \
                                   --job_name day_angle_10_31_17 \
                                   --config_file deploy_characterization.yaml

Copy training file down to instance then:

 python -m classification.run_inference prod.vessel_characterization \
         --root_feature_path gs://machine-learning-dev-ttl-30d/classification/timothyhochberg/features-day-angle-shuffle-2/pipeline/output/features  \
         --inference_parallelism 32 \
         --feature_dimensions 11 \
         --model_checkpoint_path   ./dayAngleModel.ckpt-601668  \
         --metadata_file training_classes.csv \
         --fishing_ranges_file combined_fishing_ranges.csv \
         --inference_results_path=./day_angle3_all.json.gz

python compute_metrics.py   --inference-path day_angle3_all.json.gz  \
                                                 --label-path classification/data/training_classes.csv  \     
                                                 --dest-path dayangle3_vessel_char.html \
                                                 --fishing-ranges classification/data/combined_fishing_ranges.csv \
                                                 --skip-localisation-metrics \
                                                 --dump-labels-to dayangle_labels \
                                                 --dump-fine-labels-to day angle_fine_labels \
                                                 --dump-attributes-to dayangle_attribs

bitsofbits · 2017-11-06T16:55:10Z

Old school features.

These features are the same as those used for the paper except for bug fixes and using
sin/cos trick for cyclic parameters.

run details (on branch old-school-features):

First couple runs had (a) stability and then (b) too low learning rate issues. This is with branch:
698b711

sbt features/'run  --env=dev --zone=us-central1-f --maxNumWorkers=200 --job-name=features-old-school --generate-model-features=True --generate-encounters=False --job-config=feature-pipeline/config/standard_config.yaml --experiments=shuffle_mode=service'

./deploy_cloudml.py --env dev --model prod.vessel_characterization --job_name old_school_7 --config_file deploy_characterization.yaml

  gsutil cp gs://world-fishing-827-dev-ttl30d/data-production/classification/timothyhochberg/old_school_7/models/prod.vessel_characterization/train/model.ckpt-600114.meta oldschool_7.ckpt

 python -m classification.run_inference prod.vessel_characterization \
     --root_feature_path gs://machine-learning-dev-ttl-30d/classification/timothyhochberg/features-old-school/pipeline/output/features  \
     --inference_parallelism 32 \
     --feature_dimensions 13 \
     --model_checkpoint_path   ./oldschool_7.ckpt \
     --metadata_file training_classes.csv \
     --fishing_ranges_file combined_fishing_ranges.csv \
     --inference_results_path=./old_school.json.gz \
     --dataset_split Test

python compute_metrics.py   --inference-path old_school.json.gz  \
                                                 --label-path classification/data/training_classes.csv  \
                                                 --dest-path oldschool_vessel_char.html \
                                                  --fishing-ranges classification/data/combined_fishing_ranges.csv \
                                                  --skip-localisation-metrics \
                                                  --dump-labels-to oldschool_labels \
                                                  --dump-fine-labels-to oldschool_fine_labels \
                                                  --dump-attributes-to oldschool_attribs

./deploy_cloudml.py --env dev --model prod.vessel_characterization --job_name old_ald_school --config_file deploy_characterization.yaml
gsutil cp gs://world-fishing-827-dev-ttl30d/data-production/classification/timothyhochberg/old_ald_school/models/prod.vessel_characterization/train/model.ckpt-600096 oldoldschool.ckpt

Something was off here in the mmsi lists and I had to manually regenerate them so I didn't get crashes

python -m classification.run_inference prod.vessel_characterization \
     --root_feature_path gs://machine-learning-dev-ttl-30d/classification/timothyhochberg/features-old-old-school/pipeline/output/features  \
     --inference_parallelism 32 \
     --feature_dimensions 14 \
     --model_checkpoint_path   ./oldoldschool.ckpt \
     --metadata_file training_classes.csv \
     --fishing_ranges_file combined_fishing_ranges.csv \
     --inference_results_path=./old_old_school.json.gz \


python compute_metrics.py   --inference-path old_old_school.json.gz  \
                                                 --label-path classification/data/training_classes.csv  \
                                                 --dest-path oldoldschool_vessel_char.html \
                                                  --fishing-ranges classification/data/combined_fishing_ranges.csv \
                                                  --skip-localisation-metrics \
                                                  --dump-labels-to oldoldschool_labels \
                                                  --dump-fine-labels-to oldoldschool_fine_labels \
                                                  --dump-attributes-to oldoldschool_attribs

====

Also for fishing

./deploy_cloudml.py --env dev --model prod.fishing_detection --job_name old_old_fishing --config_file deploy_fishing.yaml

python -m classification.run_inference prod.fishing_detection \
     --root_feature_path gs://machine-learning-dev-ttl-30d/classification/timothyhochberg/features-old-old-school/pipeline/output/features  \
     --inference_parallelism 64 \
     --feature_dimensions 14 \
     --model_checkpoint_path   ./old_old_fishing.ckpt \
     --metadata_file training_classes.csv \
     --fishing_ranges_file combined_fishing_ranges.csv \
     --inference_results_path=./old_old_fishing.json.gz

Compute Metrics

   python compute_metrics.py  \
                                           --inference-path old_old_fishing.json.gz  \
                                           --label-path classification/data/training_classes.csv  \
                                           --dest-path old_old_fishing.html \
                                            --fishing-ranges classification/data/combined_fishing_ranges.csv \
                                            --skip-class-metrics \
                                            --skip-attribute-metrics \
                                            --test-only

Gear Type (mmsi:true/total)	Precision	Recall	Accuracy	F1-Score
drifting_longlines (21:332205/558062)	0.96	0.95	0.95	0.96
purse_seines (25:88846/632808)	0.87	0.83	0.96	0.85
stationary_gear (12:57438/139288)	0.98	0.89	0.95	0.93
trawlers (24:645889/1425957)	0.95	0.90	0.93	0.92
other (13:104304/233514)	0.96	0.96	0.96	0.96

Overall	0.95	0.91	0.94	0.93

Run annotation

  sbt features/'run --job-config=ais-annotator/config/full_annotation_old_old.yaml \
         --env=dev \
         --job-name=annotate_all \
         --maxNumWorkers=100 \
         --diskSizeGb=100 \
        --workerMachineType=custom-1-6656 \
        --output-path=gs://machine-learning-dev-ttl-30d/classification/old_old_fishing'

bitsofbits · 2017-11-09T20:21:35Z

Simpler features (branch simpler-features). Reduce features more to make things simpler. Check performance.

(First try, using reported course fared poorly, retry with implied course. That also faired poorly -- was using raw, rather than logged feature. Suspect that is the problem. [Update] Using raw features was the primary problem removing them allowed the model to train. Seems to be not quite as good though.

 sbt features/'run  --env=dev --zone=us-central1-f --maxNumWorkers=200 --job-name=features-simpler-2 --generate-model-features=True --generate-encounters=False --job-config=feature-pipeline/config/standard_config.yaml --experiments=shuffle_mode=service'

./deploy_cloudml.py --env dev --model prod.vessel_characterization --job_name simple_2 --config_file deploy_characterization.yaml

[Above versions were's comitted as none worked well except the last which was only OK.

Main issue with features is how to generate rapidly, so try with only simple to generate features:

 sbt features/'run  --env=dev --zone=us-central1-f --maxNumWorkers=200 --job-name=features-simpler-gen --generate-model-features=True --generate-encounters=False --job-config=feature-pipeline/config/standard_config.yaml --experiments=shuffle_mode=service'

./deploy_cloudml.py --env dev --model prod.vessel_characterization --job_name simple_gen --config_file deploy_characterization.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Variation between current and earlier versions #180

Variation between current and earlier versions #180

bitsofbits commented Nov 6, 2017 •

edited

Loading

bitsofbits commented Nov 6, 2017

bitsofbits commented Nov 6, 2017 •

edited

Loading

bitsofbits commented Nov 6, 2017 •

edited

Loading

bitsofbits commented Nov 9, 2017 •

edited

Loading

Variation between current and earlier versions #180

Variation between current and earlier versions #180

Comments

bitsofbits commented Nov 6, 2017 • edited Loading

bitsofbits commented Nov 6, 2017

bitsofbits commented Nov 6, 2017 • edited Loading

bitsofbits commented Nov 6, 2017 • edited Loading

Also for fishing

bitsofbits commented Nov 9, 2017 • edited Loading

bitsofbits commented Nov 6, 2017 •

edited

Loading

bitsofbits commented Nov 6, 2017 •

edited

Loading

bitsofbits commented Nov 6, 2017 •

edited

Loading

bitsofbits commented Nov 9, 2017 •

edited

Loading