Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Editorial and formatting changes for clarity #120

Merged
merged 5 commits into from
Feb 6, 2019

Conversation

antrim
Copy link
Contributor

@antrim antrim commented Nov 20, 2018

Motivation: Inconsistencies in the GTFS make it more difficult for newcomers to understand the specification and build software. Our motivation is to make GTFS more clear and consistent.

Summary: This PR makes the following editorial and formatting changes to the GTFS specification. Except for one possible exception (discussed below), this pull request merely changes the language of the GTFS reference, but does not change any of its meanings.

List of changes:

  • Standardized the choice of certain words
    • passenger/customer --> rider (Except in the case of “customer service”)
    • Blank --> empty
    • Feed vs dataset
    • Field vs field value
    • Transit Organization --> transit agency
    • Note: organization is still used for the group that publishes the feed
    • Itinerary --> "Rider Journey" [note: modified Dec 13]
  • Made editorial changes to enhance clarity and legibility, including removing unnecessary and repetitive language. Major rewrites of:
    • parent_station
    • stop_timezone
    • arrival_time and departure_time
    • Monday,...,Sunday
    • timepoint
    • exact_times
  • Formatting changes
    • Added code ticks to denote fields and other code values
    • Standardized Enum option formatting
    • Added hyperlinks to all .txt files and websites
    • Put field types in alphabetical order.
    • Different type setting for examples
  • Defined or redefined the following terms:
    • Dataset
    • Field Value
    • Record
    • ID
    • Added referencing ID

Background discussion is in the original (@MobilityData) draft pull request: MobilityData#8

* Editorial and formatting changes

* Delete .gitignore

* Added suggested changes

* Minor updates suggested by Leo
@googlebot
Copy link
Collaborator

We found a Contributor License Agreement for you (the sender of this pull request), but were unable to find agreements for all the commit author(s) or Co-authors. If you authored these, maybe you used a different email address in the git commits than was used to sign the CLA (login here to double check)? If these were authored by someone else, then they will need to sign a CLA as well, and confirm that they're okay with these being contributed to Google.
In order to pass this check, please resolve this problem and have the pull request author add another comment and the bot will run again. If the bot doesn't comment, it means it doesn't think anything has changed.

@antrim antrim changed the title Squashing Line-Edits (#13) Editorial and formatting changes for clarity Nov 20, 2018
@antrim
Copy link
Contributor Author

antrim commented Nov 20, 2018

Earlier, we experimented with defining the term feed in relation to dataset:

  • Dataset - A complete set of files defined by this specification reference. Datasets should be published at a public, permanent URL, including the zip file name. (e.g., www.agency.org/gtfs/gtfs.zip). For more information see GTFS best practices.
  • Feed - Successive datasets comprise a feed.

The above definition of feed was not included in this pull request. The changes in this pull request removes ambiguous uses of the word feed.

Questions:

  • What do you or your organization believe is the correct definition of feed? How do you use the term?
  • Should we include a definition of feed in the GTFS reference? Or leave that out of the reference?

@barbeau barbeau added the GTFS Schedule Issues and Pull Requests that focus on GTFS Schedule label Nov 27, 2018
@skinkie
Copy link
Contributor

skinkie commented Nov 29, 2018

While I would personally rather see that documenation would be written in
Transmodel terms, I can see that, this is not working due to the header
choices in GTFS. Still nothing is wrong to do the documenation better than
the headers. So from your list I would make for example;

Feed vs dataset

Publication

Transit Organization --> transit agency

Operator

For some changes I think this goes in the wrong direction, specifically:

Itinerary --> Journey (this matches Fares proposal)

This means two totally different things in my opinion, and would only
confuse more people. I would like to see a better propesal for this that
would not confuse Itinerary with ServiceJourney =/= GTFS Trip.

@antrim
Copy link
Contributor Author

antrim commented Nov 30, 2018

Thanks for your review. Here are responses.

Operator

I'm hesitant to use the word operator because "transit agency" has historically referred to service brand which is sometimes distinct from the operator. For example, RATP operates Capital MetroBus in Austin TX but we don't want to display "RATP" to riders in Austin because it doesn't mean anything to them.

Publication

I worry that "publication" has a very general meaning in English. As an alternative to dataset, what do you think of "feed iteration"?

Itinerary --> Journey

I am guessing that "trip" in GTFS is equivalent to "ServiceJourney" in Transmodel? Would "journey" confuse those who are familiar with Transmodel terms?

@googlebot
Copy link
Collaborator

So there's good news and bad news.

👍 The good news is that everyone that needs to sign a CLA (the pull request submitter and all commit authors) have done so. Everything is all good there.

😕 The bad news is that it appears that one or more commits were authored or co-authored by someone other than the pull request submitter. We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that here in the pull request.

Note to project maintainer: This is a terminal state, meaning the cla/google commit status will not change from this state. It's up to you to confirm consent of all the commit author(s), set the cla label to yes (if enabled on your project), and then merge this pull request when appropriate.

@giocorti
Copy link
Contributor

I'm the commit author and I'm okay with contributing.

@skinkie
Copy link
Contributor

skinkie commented Nov 30, 2018

I'm hesitant to use the word operator because "transit agency" has historically referred to service brand which is sometimes distinct from the operator. For example, RATP operates Capital MetroBus in Austin TX but we don't want to display "RATP" to riders in Austin because it doesn't mean anything to them.

This discussion is really great. Because it shows you show that we are not interested in an operator or agency, but instead: the brand. Would you agree that we should use brand instead?

Publication

I worry that "publication" has a very general meaning in English. As an alternative to dataset, what do you think of "feed iteration"?

For NeTEx a delivery starts with:

<PublicationDelivery xmlns:mstns="http://www.netex.org.uk/netex" xmlns="http://www.netex.org.uk/netex" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:gml="http://www.opengis.net/gml/3.2" version="1.0">

I would say lets not try find synonyms or in betweeners.

Itinerary --> Journey

I am guessing that "trip" in GTFS is equivalent to "ServiceJourney" in Transmodel?

Exactly.

Would "journey" confuse those who are familiar with Transmodel terms?

It again depends on what you are trying to communicate here. Do you mean all the legs that make up the entire journey the rider takes to go from A to Z via B, C, D etc. or do you mean the leg that could be walking or a specific ServiceJourney/Trip that goes from A to B, B to C, etc.

This is why we need an unambigious shared vocabulary.

@antrim
Copy link
Contributor Author

antrim commented Dec 6, 2018

…we are not interested in an operator or agency, but instead: the brand. Would you agree that we should use brand instead?

This is an area where the spec is ambiguous, but in practice agency_name generally means "transit brand". We might go with either "brand", "transit brand". This would be more true to practice, but might necessitate changing how other data elements are presented in relation to agency.txt. For example, "Timezone where the transit brand is located" reads oddly compared to "Timezone where the transit agency is located." Even though "transit agency" isn't precisely accurate, it does follow the terminology embedded in the spec's file and field names.

How about we alter the text to reference "brand" where that will make sense? Addressing branding completely will require changes to the spec.

Do you mean all the legs that make up the entire journey the rider takes to go from A to Z via B, C, D etc. or do you mean the leg that could be walking or a specific ServiceJourney/Trip that goes from A to B, B to C, etc.

By "journey" we mean all the legs that make up the entire journey a rider takes to go from A to Z via B, C, D, etc. As you point out this is potentially confusing. I propose we use the term "rider journey" to denote the entire trip from A to Z.

Publication

I'm opposed to the term "publication" because I don't think it means anything to most existing GTFS users. Of course I may be wrong about that and we should involve some more people in this discussion.

@giocorti
Copy link
Contributor

giocorti commented Dec 13, 2018

How about we alter the text to reference "brand" where that will make sense? Addressing branding completely will require changes to the spec.

IMO "transit brand" is more technically correct than "transit agency" but, due to that fact that agency is written into many field names, I think we lose more than we gain by shoehorning "transit brand" in. I should also note that MobilityData plans to add transit branding as a fully featured GTFS extension. So, for now, I propose we leave "transit agency" as it is and fix this issue with the transit branding extension.

I've also replaced "journey" with "rider journey"

changed "journey" to "rider journey"
@antrim
Copy link
Contributor Author

antrim commented Dec 13, 2018

Note @giocorti's earlier comment was edited. We propose to leave "transit agency" in for now, and later address questions of agency branding holistically. This follows the purpose of this spec modification -- to provide greater clarity without making substantive changes in meaning.

If there are no other comments, I'd like to call the vote tomorrow.

@antrim
Copy link
Contributor Author

antrim commented Dec 19, 2018

I'd like to call the vote on this change. This PR makes editorial and formatting changes to the GTFS specification. See the opening comment for a summary and motivation.

The vote will close on Dec 26 at 23:59:59 UTC.

@skinkie
Copy link
Contributor

skinkie commented Dec 19, 2018

-1

As mentioned before, there is still too much unclarity and we are again introducing new terms for subjects that have been heavily standardised. Obviously this isn't all black/white, but some change are more controversal than others.

@antrim
Copy link
Contributor Author

antrim commented Dec 19, 2018

@skinkie: Which changes can we remove or alter to get your support for the proposal?

@skinkie
Copy link
Contributor

skinkie commented Dec 20, 2018

@antrim as we have discussed before "transit agency" is at this moment what it is described. But the change of itinerary to "Rider Journey" just becomes newspeak.

@antrim
Copy link
Contributor Author

antrim commented Dec 20, 2018

@skinkie If we reverted back to "itinerary" instead of "rider journey" would that solve your concern?

@skinkie
Copy link
Contributor

skinkie commented Dec 20, 2018

And add that brands are at this moment encoded as different agencies: sure.

@giocorti
Copy link
Contributor

I've made the requested changes. "Itinerary" has been kept and "brand" vs "agency" is now discussed.

I should also note that we've defined the term "service day" in this PR, and this was accidentally omitted from the initial change list.

@skinkie
Copy link
Contributor

skinkie commented Dec 21, 2018

@giocorti thanks for this effort, I really appreciate it.

With respect to service day, we currently define a service_id that is never used in the context of the words "calendar" or "calendar_dates". It suggests that it is some abstract grouping of both.

My preference would be that it wouldn't be called service day but operational day.

But I do want to ask another question, please don't take this as an offence. Where do you get your inspiration to get to these terms? For example compare the search queries:

  • "operational day" public transport
  • "service day" public transport

@giocorti
Copy link
Contributor

@skinkie thanks for the feedback, and no offense taken at all!

I'm using the term "service day" because its already in the spec. I've specifically added it as a defined term because it appears in a number of places and it could be potentially confusing as it doesn't correspond to an actual day (service days can be greater than 24 hours). So really the definition is just to make it explicit and obvious that a service day is not the same as a day.

But you bring up a good point about "service day" vs "operational day". Unless someone else is opposed, I think that this a change that should be incorporated.

@skinkie
Copy link
Contributor

skinkie commented Dec 21, 2018

Please allow some time for feedback on the service day subject. If it is already used somewhere in the spec I am not opposed to use the term here. What I am eager to learn is where these terms originate from and where they started to deviate (etymology).

@giocorti
Copy link
Contributor

What I am eager to learn is where these terms originate from and where they started to deviate (etymology).

My inspiration for these terms just comes from the general (American) english lexicon. In general I've just tried to use the most accurate and understandable word in order to minimize confusion for someone reading the reference. In some cases (such as "record") I've also consulted what is, IMO, the appropriate technical literature. I should explicitly state that I am not trying to mirror the terminology defined by some outside party or spec. Rather, I'm trying to make the reference as understandable and consistent as possible to a wide variety of readers. I admit that, as a native American english speaker and someone who is not well versed in other specs such as Transmodel , I do exhibit a linguistic bias towards vernacular American english.

In some cases, such as empty vs blank, there is no real reason to chose one word over the other. I just wanted a single word to be consistently used.

I'm more than happy to discuss my reasons for choosing specific words if there are any you're wondering about.

@antrim
Copy link
Contributor Author

antrim commented Dec 23, 2018

How about we call the vote after the new year, since many people will be away for the holidays? This will let everyone weigh in and allow a discussion of the terminology. We have 30 days from the last vote to continue working on the proposal.

@antrim
Copy link
Contributor Author

antrim commented Jan 3, 2019

I am calling a second vote on this change, which @giocorti updated (a48aa36 Dec 20) after it failed to pass. This PR makes editorial and formatting changes to the GTFS specification. See the opening comment for a summary and motivation.

The vote will close on Jan 10 [updated] at 23:59:59 UTC.

@skinkie
Copy link
Contributor

skinkie commented Jan 3, 2019

+1

@LeoFrachet
Copy link
Contributor

Good catch @prhod. Thanks.

We definitely do not intend to change this behavior. It has been discussed in the past and the conclusion was to keep it as it is. I don't want to re-open this discussion in this thread.

@giocorti Could you have a look on that? I see you define ID type as "A sequence of any UTF-8 characters which uniquely identifies an entity, but does not necessarily identify a specific record in a table." But is there a place where you define uniqueness?

@skinkie
Copy link
Contributor

skinkie commented Jan 9, 2019

@LeoFrachet while not intended I don't mind to give my support for this great idea of non-continuous service dates ;-)

@giocorti
Copy link
Contributor

giocorti commented Jan 9, 2019

Uniqueness is not explicitly defined anywhere in the spec. I've actually removed language that specified that IDs were "dataset unique" where "dataset unique" was defined as

Dataset Unique - The field contains a value that maps to a single distinct entity within the column.

I removed that language because it was factually incorrect in some cases. For example, shape_id in shapes.txt does not identify a distinct entity within a column.

Of course, uniqueness is an important concept in GTFS so we may want to define it. But we'd also need to be careful about the exact language we use to do as its easy to write something that entirely accurate.

@skinkie
Copy link
Contributor

skinkie commented Jan 9, 2019

@giocorti the point with service_id is that in calendar_dates.txt it is defined multiple times.

Added clarity and specificity by explicitly stating that a service_id may appear only once in calendar.txt.
@LeoFrachet
Copy link
Contributor

@giocorti updated the proposal to specify that service_id can appear only once in calendar.txt. Since the proposal changed, I assume (@barbeau?) that we have to reboot the vote.

@barbeau
Copy link
Collaborator

barbeau commented Jan 10, 2019

@LeoFrachet I agree - I think that change is significant enough it requires a re-vote.

@LeoFrachet
Copy link
Contributor

LeoFrachet commented Jan 10, 2019

I am calling a third vote (!) on this change, which @giocorti update (c7f18ba Jan 20) during the second vote. This PR makes editorial and formatting changes to the GTFS specification. See the opening comment for a summary and motivation.

The vote will close on Jan 17 at 23:59:59 UTC.

For the four who had already casted their ballot, please revote: @skinkie, @abyrd, @barbeau, @prhod.

@barbeau
Copy link
Collaborator

barbeau commented Jan 10, 2019

+1

1 similar comment
@skinkie
Copy link
Contributor

skinkie commented Jan 10, 2019

+1

@prhod
Copy link

prhod commented Jan 12, 2019

+1 (Thanks for the clarification). Lets keep the non-continuous service dates for later ;-)

@abyrd
Copy link

abyrd commented Jan 16, 2019

Yes, good catch @prhod. +1

@dbabramov
Copy link
Contributor

+1

@antrim
Copy link
Contributor Author

antrim commented Jan 19, 2019

The vote closed on Jan 17 at 23:59:59 UTC. We have 5 votes in favor of the change, from both GTFS producers and consumers, and no votes against. So this change passes. We'll get this merged.

@LeoFrachet
Copy link
Contributor

Nice!

Aaron (@antrim), since all the commits are from Giovanni (@giocorti), may I let you handle the CLA issues?

Thanks

@antrim
Copy link
Contributor Author

antrim commented Jan 21, 2019

@LeoFrachet Yes.

@antrim
Copy link
Contributor Author

antrim commented Jan 21, 2019

@googlebot. The pull request author (@giocorti) confirmed they are ok with contributing in this comment: #120 (comment)

It looks as though that is the last step we need to pass the checks.

@googlebot
Copy link
Collaborator

A Googler has manually verified that the CLAs look good.

(Googler, please make sure the reason for overriding the CLA status is clearly documented in these comments.)

@googlebot
Copy link
Collaborator

So there's good news and bad news.

👍 The good news is that everyone that needs to sign a CLA (the pull request submitter and all commit authors) have done so. Everything is all good there.

😕 The bad news is that it appears that one or more commits were authored or co-authored by someone other than the pull request submitter. We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that here in the pull request.

Note to project maintainer: This is a terminal state, meaning the cla/google commit status will not change from this state. It's up to you to confirm consent of all the commit author(s), set the cla label to yes (if enabled on your project), and then merge this pull request when appropriate.

@googlebot googlebot added cla: no and removed cla: yes labels Feb 5, 2019
@barbeau
Copy link
Collaborator

barbeau commented Feb 5, 2019

I just updated the revision history in 61c15c5 and it looks like that confused the CLA bot again. I'm ok with my commits being contributed to this project (obviously).

@googlebot
Copy link
Collaborator

A Googler has manually verified that the CLAs look good.

(Googler, please make sure the reason for overriding the CLA status is clearly documented in these comments.)

@barbeau barbeau merged commit 6a1e439 into google:master Feb 6, 2019
@barbeau barbeau deleted the Line-Edits-Squashed branch February 6, 2019 14:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GTFS Schedule Issues and Pull Requests that focus on GTFS Schedule
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants