Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CLI] Measure AsyncAPI adoption #841

Closed
1 of 2 tasks
Tracked by #879 ...
smoya opened this issue Oct 10, 2023 · 23 comments
Closed
1 of 2 tasks
Tracked by #879 ...

[CLI] Measure AsyncAPI adoption #841

smoya opened this issue Oct 10, 2023 · 23 comments

Comments

@smoya
Copy link
Member

smoya commented Oct 10, 2023

Related to asyncapi/community#879

Scope

  • Define metrics to collect
  • Collect metrics in code
@smoya
Copy link
Member Author

smoya commented Oct 10, 2023

Having in mind that CLI command executions are stateless, and short-lived processes, how are we gonna keep the concept of a session so we can track consequent executions of commands (let's call it funnels).
Funnels can be tracked by using some correlation id, but that id should be stored somewhere, disk in particular.

The following questions should be answered soon or later in order to keep track of those sequence of executions:

  1. What's the TTL (Time To Live) of each session? it is fixed? It is renewed on each interaction? Do we care about this but instead we use always the same UUID unless not found (and we create a new one) ?
  2. What do we use for generating the session ID? UUID? Something based on particular seed?.
  3. Where should we store that session ID? /tmp dir? Shall we also mention that to the users?

Copy link
Member

derberg commented Oct 11, 2023

I'm more interested with vision on:

  • is it on by default
  • how we inform the user
  • how easy it is to opt out
  • and yes, we definitely need a detailed documentation what we do collecting data and what data we collect, with code pointers

regarding session? UUID would be definitely useful to identify "single" user. But why do we care about session. It is about AsyncAPI adoption right? what funnels have to do with this? For me this is different topic.

btw, we also need a way to say "this is CI/CD run"

@smoya
Copy link
Member Author

smoya commented Oct 11, 2023

ut why do we care about session. It is about AsyncAPI adoption right? what funnels have to do with this? For me this is different topic.

Different topic from what, exactly?
Session is needed in order to keep track of the following (which can be found in the description of asyncapi/community#879):

"From all the users doing asyncapi validate successfully, 40% run asyncapi generate next, 20% run asyncapi validate again, and the rest simply stop there.

You need to correlate events somehow, right? You can use a correlationID when sending those events to whatever place. That's the concept of a session as well.

@smoya
Copy link
Member Author

smoya commented Oct 11, 2023

I'm gonna share what I think about the following:

  • What's the TTL (Time To Live) of each session? it is fixed? It is renewed on each interaction? Do we care about this but instead we use always the same UUID unless not found (and we create a new one) ?

It is a requirement to send something that acts as a correlation ID when sending those events, otherwise we won't be able to link them. New Relic, for example, has the Funnels feature. See https://docs.newrelic.com/docs/query-your-data/nrql-new-relic-query-language/nrql-query-tutorials/funnels-evaluate-data-series-related-events/

I find that the correlation ID should be replaced every certain period of time like a session that times out. Otherwise, results won't match with the reality, and instead events could be linked between them but with a really big diff in time. In fact until that correlation ID stored somewhere (/tmp ?) is removed and we recreate it.

  • What do we use for generating the session ID? UUID? Something based on particular seed?.

UUID it's ok. Either v1 or v4.

  • Where should we store that session ID? /tmp dir? Shall we also mention that to the users?

I guess creating a file in /tmp. The name of the file could be always the same one (so its reproducible) or rather change every TTL period. We should evaluate but please keep It simple.

Copy link
Member

derberg commented Oct 12, 2023

Different topic from what, exactly?

different topic from Measuring AsyncAPI adoption.

From all the users doing asyncapi validate successfully, 40% run asyncapi generate next, 20% run asyncapi validate again, and the rest simply stop there

this is cool info but:

  • what it has to do with AsyncAPI adoption?
  • even though it is cool to know that people run generate after validation, it has nothing to do with adoption. The only advantage from such data that I can imagine is possible DX improvements (like if majority of people after validation run generation, than maybe we should make it clear in docs and explain that generation also do validation, so they do not have to do it 😃 )

@smoya
Copy link
Member Author

smoya commented Oct 25, 2023

/progress 50 POC with the first metric is available in #859

@smoya
Copy link
Member Author

smoya commented Oct 25, 2023

Regarding the metrics to collect, I suggest we do a first iteration with at least just 1 metric that will track the execution of a few commands, carrying on metadata based on the AsyncAPI document in place.
For the metadata, I would say the following fields could help us with out mission:

  • document_metadata:
    • _asyncapi_version
    • _asyncapi_servers
    • _asyncapi_channels
    • _asyncapi_messages
    • _asyncapi_operations_send
    • _asyncapi_operations_receive
    • _asyncapi_schemas

I'm gonna write a list of commands and their metrics I consider as good starting point. Please note that wherever you see ...document_metadata will mean that it will include the metadata specified above.

Command Metric Type Metadata
validate asyncapi_adoption.action.executed COUNT ...document_metadata, action: validate, success: bool, validation_result: `valid
convert asyncapi_adoption.action.executed COUNT ...document_metadata (after convert), action: convert, success: bool, from_version: string, to_version: string
optimize asyncapi_adoption.action.executed COUNT ...document_metadata, action: optimize, success: bool, optimizations: string[]
generate fromTemplate asyncapi_adoption.action.executed COUNT ...document_metadata, action: generate_fromTemplate, success: bool, template: string
bundle asyncapi_adoption.action.executed COUNT ...document_metadata (from bundled), action: bundle, success: bool, files: number (number of bundled files)

Feel free to suggest any change. The idea is that we can create a working POC but not released, reporting metrics somewhere (ok with New Relic) so we can create a dashboard and validate, after running few commands locally, if the metrics make sense.

cc @peter-rr @Amzani @derberg @fmvilas

@fmvilas
Copy link
Member

fmvilas commented Oct 26, 2023

I like it. This set of initial metrics is a good way to test it 👍 I would also be interested in having metrics of which command is used (for every command). So as soon as it's invoked, we send the metric saying, e.g., "config has been used" (not literally these words, just the concept).

Once this is done, experimenting with "session" metrics to understand things like a single user has performed "validate" first and then "generate fromTemplate", for instance. In other words, anonymously identifying the user (machine) performing the action so we can group them together and have a funnel/timeline.

@smoya
Copy link
Member Author

smoya commented Oct 30, 2023

I would also be interested in having metrics of which command is used (for every command). So as soon as it's invoked, we send the metric saying, e.g., "config has been used" (not literally these words, just the concept).

You mean it is executed but still not finished, right?

@fmvilas
Copy link
Member

fmvilas commented Oct 30, 2023

Yeah, I just want to answer the question "What are the most popular commands?". I don't care if it failed or succeeded. Just want to know it was called. So yeah, we can send the metric as soon as the command is invoked. We can probably put it here so it applies to every command: https://github.com/asyncapi/cli/blob/master/src/base.ts.

@peter-rr
Copy link
Member

I don't care if it failed or succeeded. Just want to know it was called. So yeah, we can send the metric as soon as the command is invoked.

Taking that into account, should we then omit (at this POC stage) sending the metadata regarding the result of the command and the info related to the command itself (like version converted, number of bundled files, numbers of optimizations, template used, etc)?

@smoya
Copy link
Member Author

smoya commented Oct 30, 2023

Yeah, I just want to answer the question "What are the most popular commands?". I don't care if it failed or succeeded. Just want to know it was called. So yeah, we can send the metric as soon as the command is invoked. We can probably put it here so it applies to every command: https://github.com/asyncapi/cli/blob/master/src/base.ts.

Yes, this has been mentioned during a 1:1 chat we had Peter and me. To add that code into the base.ts and get the name of the command either from the class name or add a property in all of the commands that specify it and its accessible from it somehow)

I just wanted to confirm we are aligned. My concern on that is that, if we want to keep both the metric for starting the command, and another for when the command finishes which includes results (success true/false, converted from/to, etc), we would need to add a discriminator or rely on the property success (missing when just started, present if it finished no matters if errored or not).

I think tracking finished executions is important because of the metadata extracted from the AsyncAPI documents once docs are parsed (number of servers, channels, etc).

@fmvilas
Copy link
Member

fmvilas commented Oct 30, 2023

I definitely want to track the result as well. How we do it, it doesn't really matter much to me. For invocation we can use one metric and for those containing results we can use another one. It doesn't have to be the same, unless you want to really keep it clean and neat and somehow relate one metric to another.

@smoya
Copy link
Member Author

smoya commented Nov 3, 2023

/progress 45 POC is getting more metrics on CLI. New Relic sink is ready to be used and test the metrics collection work and make sense

@smoya
Copy link
Member Author

smoya commented Nov 13, 2023

/progress 60 Tested the whole integration with the NewRelic sink by executing locally the CLI. After finding several bugs in the implementation, we finally ended up sending metrics to a new temporary New Relic account so we can confirm all now works as expected.

@peter-rr
Copy link
Member

peter-rr commented Nov 14, 2023

/progress 65 Working now on the implementation of a new metric in order to collect the "most popular commands", as mentioned above, regardless the command failed or succeeded.

@peter-rr
Copy link
Member

/progress 80 New metric action.invoked already implemented. Now working on unifying the logic so that the metrics are collected in every CLI command when invoked.

@peter-rr
Copy link
Member

peter-rr commented Dec 20, 2023

/progress 90 Message to warn users about metrics collection already implemented. Now working on unifying the logic so that the metric action.executed is collected for every CLI command after being executed.

@peter-rr
Copy link
Member

peter-rr commented Jan 17, 2024

Metrics action.invoked and action.executed already collected in code. PR #859 ready for review.
cc @smoya

Next actions to be completed:

  • Test all commands on local env.
  • Test all commands on production (New Relic Metrics API).
  • Documentation: Include metrics collection info at CLI repo's README file
  • Set timeout for New Relic API connection: feat: set timeout to newrelic's request smoya/asyncapi-adoption-metrics#19
  • Extra: Performance tests to check the difference between runtimes when metrics collection are enabled and disabled.

@peter-rr
Copy link
Member

peter-rr commented Feb 13, 2024

/progress 95

Metrics already working correctly at both local and production (New Relic) environments, and markdown document about metrics collection created. Now working on a DX-friendly solution like asyncapi config analytics off|on to disable tracking.

@peter-rr
Copy link
Member

peter-rr commented Mar 7, 2024

/progress 99 New command asyncapi config analytics already implemented. Also metadata related to user ID and AsyncAPI files identification (SHA-256) have been included. Just missing the final review for #859

@peter-rr
Copy link
Member

/progress 100 POC for measuring adoption already merged.

@peter-rr
Copy link
Member

I think we could close this issue since #859 has been already merged.
cc @smoya @Amzani

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

4 participants