Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter analytics data by API Backend? #252

Closed
brylie opened this issue Jun 16, 2016 · 12 comments
Closed

Filter analytics data by API Backend? #252

brylie opened this issue Jun 16, 2016 · 12 comments
Milestone

Comments

@brylie
Copy link
Contributor

brylie commented Jun 16, 2016

We are building an analytics dashboard for API Umbrella, and would like users to be able to filter the dashboard to show only analytics for a single API Backend. E.g. users can filter by API Umbrella backend ID.

How can we query Elastic or the Admin API to receive analytics related to a single API Umbrella API Backend?

@GUI
Copy link
Member

GUI commented Jun 16, 2016

Unfortunately, we don't currently store the API Backend ID in the analytics database. But I can see some potential use-cases for that, so we'd welcome welcome any pull requests, or we can see about adding it ourselves some day.

Although, if it helps, we approach filtering for specific APIs slightly differently in the default admin. We filter everything based on the URL host and path of the requests (which are already stored in the analytics). This is also how the default admin permissions with API Scopes work--you're granted permissions to a URL host and path prefix, and then the analytics are automatically filtered based on that root. There may be any number of logical APIs under a prefix, so to make the filtering more granular, you can add more specific API scopes.

We prefer filtering based on the URL, since we view API Backend IDs as an implementation detail that might change over time for the same API URL (API backends might be replaced or more specific API backends could be added that route sub-URLs differently). By basing the filtering on the URL we don't need to worry about potential API backend ID changes (or trying to track those over time).

Does that approach make sense? If so, does it seem like that could fit your use-case, or would you still prefer to filter on specific backend IDs for your purposes?

If you're looking to add the backend ID to the analytics storage, I think the main areas involved would be storing the matched API backend on an ngx.ctx variable, then adding that to the log message data, and then ensuring that gets pushed onto the data sent to elasticsearch.

@bajiat
Copy link

bajiat commented Jun 22, 2016

Thanks for the reply @GUI! I quess your approach entails that a certain API URL can only be stored once in the database - or does it?

@GUI
Copy link
Member

GUI commented Jun 23, 2016

@bajiat: Are you referring to the analytics database or the API Backends database?

If you're referring to the analytics database, then each individual API request is logged as a separate entry to the elasticsearch database. So there can be lots of duplicate log entries for a single URL. We then perform aggregation queries to determine the total number of API hits. Filters can also be applied to those queries to find the totals for a more specific subset of the API hits (for example, you could filter to just view APIs where the URL path began with /foo/*, or you could filter to just view a specific API where the URL path equaled /some/specific/api/endpoint.json).

If you're referring to the API Backends database, then generally speaking, yes, there would only be one backend per URL. Only a single API backend can be matched for a specific API URL (although which specific API backend is matched may change over time, if you add, edit, delete, or change the match order of API backends).

I'm not totally sure I understood the question, but does that help answer things? Let me know if not.

@brylie
Copy link
Contributor Author

brylie commented Jun 27, 2016

@GUI I am able to add two API backends with the same frontend prefix, directly via the API Umbrella UI. How would we distinguish between the API Backends that share the same frontend prefix?

@brylie
Copy link
Contributor Author

brylie commented Jun 27, 2016

Here are two screenshots showing separate API Backends with duplicate configuration:

Amazing API

screenshot from 2016-06-27 10-27-56

Amazing API Duplicate

screenshot from 2016-06-27 10-30-30

@brylie
Copy link
Contributor Author

brylie commented Jun 27, 2016

In both of those API Backends, the following attributes are the same:

  • Backend protocol
  • Server
  • Frontend host
  • Backend host
  • Frontend prefix
  • Backend prefix

In our analytics, how would we distinguish Amazing API from Amazing API Duplicate?

@GUI
Copy link
Member

GUI commented Jun 27, 2016

In the case of duplicate frontend prefixes, only one of those API backends would be matched and used. Which one is used would depend on the "Matching Order" of the API backend (by default, the first one added would be matched, unless you explicitly altered the matching order to give one of them higher matching precedence).

So in the case of colliding routes, we don't have a way to distinguish between the API backends in the analytics data. But this specific situation might be more related to the need for better validation or warnings when there are duplicate routes: #239 & 18F/api.data.gov#186 Or do you have situations where those duplicate prefixes are expected?

@bajiat
Copy link

bajiat commented Jun 27, 2016

@GUI Are you expecting that APIs are only added by particular organizations or persons so that you don't have to restrict duplicate backends? At the moment anyone can add an API backend to our database, so there is a high possibility for duplicates.

@brylie
Copy link
Contributor Author

brylie commented Jun 27, 2016

We need to prevent route collision. E.g. we are considering adding a unique validation to ensure routes are unique in our database. Are there any plans for stricter validation in the API Umbrella schema?

@KrishnaPG
Copy link

If supporting the validations of backend routes, then another case that could add value to the scenario @bajiat mentioned is:

  • verifying that the new route (that is being added) passes a generic validation regex (specified by admin)

For example,

  • allow only routes from particular domain or set of domains to be added
  • or reject domains or routes that contain some specific regEx strings (inappropriate sites / content)

The reason being, no point in taxing our gateway servers for routing traffic to, say googleAPI or other public API. So, a regex that rejects all google api domain routes to be added to our gateway for routing should make our servers only cater the owners needs.

GUI added a commit that referenced this issue Feb 8, 2017
See #252

This logs the API backend ID that was matched while serving the API
request, along with the ID of the more specific "url_matches" record
within that API.

These details might serve as a more efficient way to lookup logs that a
user has permissions to view based on IDs (versus the host and path
prefix based current approach). Although, switching to use this might be
tricky given our current approach to admin permissions (since IDs are a
little less flexible and we have to deal with historical IDs). But in
any case, let's start logging this so we can explore this and allow for
other use cases that have requested this functionality.
@GUI GUI added this to the v0.14.0 milestone Feb 8, 2017
@GUI
Copy link
Member

GUI commented Feb 8, 2017

Logging the API backend details (in the api_backend_id and api_backend_url_match_id fields) has been added in dfea879. This will be part of the forthcoming v0.14 release, which I'm hoping to finally get wrapped up in the next week.

@GUI
Copy link
Member

GUI commented Feb 23, 2017

v0.14.0 is released which adds these additional backend details to the analytics database.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants