Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check population density around stops #870

Merged
merged 4 commits into from
Sep 18, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
package com.conveyal.gtfs.error;

import com.conveyal.gtfs.validator.model.Priority;

import java.io.Serializable;

/**
* Indicates that a stop is in a suspect location, for example in a place like a desert where there are not enough
* people to support public transit. This can be the result of incorrect coordinate transforms into WGS84.
* Stops are often located on "null island" at (0,0). This can also happen in other coordinate systems before they
* are transformed to WGS84: the origin of a common French coordinate system is in the Sahara.
*/
public class SuspectStopLocationError extends GTFSError implements Serializable {
public static final long serialVersionUID = 1L;

public SuspectStopLocationError(String stopId, long line) {
super("stops", line, "stop_id", stopId);
}

@Override public String getMessage() {
return String.format(
"Stop with ID %s is in a sparsely populated area (fewer than 5 inhabitants per square km in any " +
"neighboring 1/4 degree cell)",
affectedEntityId
);
}

@Override public Priority getPriority() {
return Priority.MEDIUM;
}
}
13 changes: 8 additions & 5 deletions src/main/java/com/conveyal/gtfs/storage/BooleanAsciiGrid.java
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@

/**
* Loads an ESRI ASCII grid containing integers and allows looking up values as booleans (where > 0).
* This is used for
* This can be used for a basic stop location test using a simple population raster.
*/
public class BooleanAsciiGrid {

Expand Down Expand Up @@ -51,19 +51,22 @@ public BooleanAsciiGrid (InputStream inputStream, boolean gzipped) {
}

/**
* Get a grid for places where population density is over 5 people per square kilometer.
* Get a grid for places where population density is over 5 people per square kilometer in any neighboring cell.
* We use the Gridded Population of the World v3 data set for 2015 UN-adjusted population density.
* This data set was downloaded at 1/4 degree resolution in ESRI ASCII grid format. The grid file was edited
* manually to eliminate the no-data value header, since I could not find a way to operate on no-value cells in the
* QGIS raster calculator. Then the raster calculator in QGIS was used with the formula ("glds00ag15@1" > 5),
* which makes all cells with population density above the threshold have a value of one,
* and all others a value of zero (since the no data value in the grid is -9999). This was then exported as another
* ASCII grid file, which zips well. The license for this data set is Creative Commons Attribution.
* and all others a value of zero (since the no data value in the grid is -9999). Next, the GRASS r.neighbors tool
* in QGIS was used to blur this layer, adding to each cell the sum of the neighboring eight cells ("neighborhood
* size" of 3, the dimension in each direction). A final logical operation ("blurred@1" > 0) was applied, with the
* result exported as another ASCII grid file, which zips well. The license for this data set is Creative Commons
* Attribution.
* See http://sedac.ciesin.columbia.edu/data/collection/gpw-v3
*/
public static BooleanAsciiGrid forEarthPopulation() {
try {
InputStream gridStream = BooleanAsciiGrid.class.getResourceAsStream("gpwv3-quarter-boolean.asc");
InputStream gridStream = BooleanAsciiGrid.class.getResourceAsStream("gpwv3-quarter-buffer-boolean.asc");
return new BooleanAsciiGrid(gridStream, false);
} catch (Exception ex) {
throw new RuntimeException(ex);
Expand Down
17 changes: 17 additions & 0 deletions src/main/java/com/conveyal/gtfs/validator/PostLoadValidator.java
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,9 @@
import com.conveyal.gtfs.error.GeneralError;
import com.conveyal.gtfs.error.RangeError;
import com.conveyal.gtfs.error.ReferentialIntegrityError;
import com.conveyal.gtfs.error.SuspectStopLocationError;
import com.conveyal.gtfs.model.Stop;
import com.conveyal.gtfs.storage.BooleanAsciiGrid;

import java.util.List;

Expand All @@ -33,6 +35,7 @@ public PostLoadValidator (GTFSFeed feed) {
public void validate () {
validateCalendarServices();
validateParentStations();
validateStopPopulationDensity();
}

/**
Expand All @@ -46,6 +49,20 @@ private void validateCalendarServices () {
}
}

/**
* Validate that stops are not in locations with no people. This can happen from incorrect coordinate transforms
* into WGS84. Stops are often located on "null island" at (0,0). This can also happen in other coordinate systems
* before they are transformed to WGS84: the origin of a common French coordinate system is in the Sahara.
*/
private void validateStopPopulationDensity () {
BooleanAsciiGrid popGrid = BooleanAsciiGrid.forEarthPopulation();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this population grid loaded and used anywhere else? I'm wondering if we should be concerned about resource management or additional time delays here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it was only used by a GTFS validator that was not included in r5 when we moved the gtfs-lib code in, so is not currently used elsewhere. This is a 4.3 kb text file containing ones and zeros which gets loaded into a BitSet, so should amount to a few hundred bytes in memory. There would be some cost of reading that file and creating that object, but this is in the context of loading and validating GTFS which involves loading large volumes of numbers from text files into a database. So I'd say it's a one-off cost per feed that's a similar action to what GTFS ingestion already requires, but several orders of magnitude smaller. In principle a single instance of the grid could be cached, but at a few hundred bytes it's so tiny that I don't think it will have any impact.

for (Stop stop : feed.stops.values()) {
if (!(popGrid.getValueForCoords(stop.stop_lon, stop.stop_lat))) {
feed.errors.add(new SuspectStopLocationError(stop.stop_id, stop.sourceFileLine));
}
}
}

/**
* Validate location_type and parent_station constraints as well as referential integrity.
* Individual validation actions like this could be factored out into separate classes (PostLoadValidators)
Expand Down
Loading
Loading