Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check population density around stops #870

Merged
merged 4 commits into from
Sep 18, 2023
Merged

Check population density around stops #870

merged 4 commits into from
Sep 18, 2023

Conversation

abyrd
Copy link
Member

@abyrd abyrd commented Mar 16, 2023

While working on duplicate ID checks I noticed we were running tests on this validator but not applying it to the stops.

Results seem good after testing out on several feeds. Null island ferry results in warnings, most other feeds produce no warnings. Some places like Montreal get warnings on particular routes that have stops in mountainous parkland (Parc des Sept Chutes) but this makes sense.

I'm creating this as a draft. This is not a critical validation rule, and the population density thresholds might benefit from some tweaking.

I also noticed that we don't return the affected entity ID in com.conveyal.analysis.models.Bundle.GtfsErrorSummary, so we end up generating a lot of custom messages that contain the entity ID, rather than providing structured data that would allow the UI to display the ID together with the filename, line number, and field name.

@abyrd abyrd marked this pull request as ready for review August 17, 2023 10:39
@abyrd abyrd enabled auto-merge August 17, 2023 10:41
* before they are transformed to WGS84: the origin of a common French coordinate system is in the Sahara.
*/
private void validateStopPopulationDensity () {
BooleanAsciiGrid popGrid = BooleanAsciiGrid.forEarthPopulation();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this population grid loaded and used anywhere else? I'm wondering if we should be concerned about resource management or additional time delays here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it was only used by a GTFS validator that was not included in r5 when we moved the gtfs-lib code in, so is not currently used elsewhere. This is a 4.3 kb text file containing ones and zeros which gets loaded into a BitSet, so should amount to a few hundred bytes in memory. There would be some cost of reading that file and creating that object, but this is in the context of loading and validating GTFS which involves loading large volumes of numbers from text files into a database. So I'd say it's a one-off cost per feed that's a similar action to what GTFS ingestion already requires, but several orders of magnitude smaller. In principle a single instance of the grid could be cached, but at a few hundred bytes it's so tiny that I don't think it will have any impact.

@ansoncfit
Copy link
Member

As we discussed in a recent call, I have two minor concerns about possible support questions that may arise from this newly enabled validation. I tried to address both in my commit here.

  1. Boundary cases, where a stop may be just over the edge of a cell with population. This may also be a concern in areas that have been newly developed since 2015. To address this, I "blurred" the population layer, so cells will have value 1 if any of the neighboring cells has population exceeding the previously used 5 inhabitants per square km. threshold.
  2. "1/4 degree cell" may confuse some users. I revised the message to note sparse population first.

Copy link
Member

@ansoncfit ansoncfit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abyrd I added some changes we discussed and disabled auto-merge so you can take a look before merging.

@abyrd
Copy link
Member Author

abyrd commented Sep 18, 2023

Thanks for making the changes we discussed. A couple of notes for future reference: This PR does not introduce any tests, but the boolean ASCII grid was already present before the PR and has an automated test. It just wasn't wired up as a validator for uploaded GTFS feeds. That test uses some hard-coded coordinates rather than reading and validating a full GTFS feed. I had already tested that the SuspectStopLocationError introduced by this PR was produced by feeds with strange stop placement and that the error was visible in the UI. The final commit on this PR just changes the message, not anything else about the error generation process, so it should still work. We probably still want to test this out manually when preparing the next release.

@abyrd abyrd merged commit c6d92a9 into dev Sep 18, 2023
3 checks passed
@abyrd abyrd deleted the check-stop-pop-density branch September 18, 2023 12:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants