Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TAM] Granular counter subscription #1670

Merged
merged 3 commits into from
Dec 18, 2023
Merged

Conversation

j-bos
Copy link
Contributor

@j-bos j-bos commented Dec 1, 2022

Enhancements to TAM API to enable a high-speed polling feature to repeatedly poll one or a few counters at short time intervals.

  • A new telemetry type for granular counter subscription to request specific counters at runtime.

  • Introduce SAI_TAM_COUNTER_SUBSCRIPTION object to represent subscribed counters and allow the user to associate a label for the counter in telemetry reports.

  • Add a time units configuration to specify units for the report interval. Default is backwards-compatible to the current units (microseconds).

@j-bos
Copy link
Contributor Author

j-bos commented Dec 9, 2022

Capturing here comments from review on SAI weekly call:

  • PERIODIC may not be needed. Use existing attribute for the counter polling interval.
  • Move document to .md format

inc/saitam.h Outdated Show resolved Hide resolved

| Node ID | Time-0 timestamp | Time-1 timestamp | Time-2 timestamp | Time-3 timestamp | Time-4 timestamp | Time-5 timestamp |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- |
| Counter-ID 0 | Counter-0 at Time-0 | Counter-0 at Time-1 | Counter-0 at Time-2 | Counter-0 at Time-3 | Counter-0 at Time-4 | Counter-0 at Time-5 |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the mapping between the counter-id in the report to the actual SAI Object id?

How is the node-id mapped to a particular SAI switch instance?

Copy link
Contributor

@rck-innovium rck-innovium left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need capability APIs to expose:

  1. the timeslice interval that can be supported by the device
  2. the max number and types of counters that can be polled in one polling interval atomically

@j-bos j-bos changed the title [TAM] Periodic and Timeslice telemetry [TAM] Granular counter subscription Mar 30, 2023
@j-bos j-bos force-pushed the telemetry branch 3 times, most recently from 6680bf9 to ac2c323 Compare April 6, 2023 15:13
@j-bos
Copy link
Contributor Author

j-bos commented Apr 6, 2023

Significant Updates:

  • Decouple the telemetry report format from this subscription proposal. Existing report types like proto or ipfix may be used.
  • A counter label is required for the collector to identify a counter. Since this needs to be bound to a specific object (with object id), introduce a new object to represent the relationship.

Copy link
Contributor

@JaiOCP JaiOCP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks Good

inc/saitam.h Outdated Show resolved Hide resolved
inc/saitam.h Show resolved Hide resolved
@kcudnik
Copy link
Collaborator

kcudnik commented Jun 29, 2023

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@rlhui
Copy link
Collaborator

rlhui commented Aug 8, 2023

@rck-innovium would you like to approve this or any concern/feedback?

sai_attr.value.oid = sai_tam_obj;

sai_set_queue_attribute_fn(
sai_queue,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo? Instead of sai_queue, it should be sai_queue_obj used as SAI_TAM_COUNTER_SUBSCRIPTION_ATTR_OBJECT_ID earlier. 

If yes, this is a redundancy in this new API model.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is a typo (more like a missed edit when adding _obj to follow the style of other examples).

Agree it is a redundancy. I also considered the alternative of just using the TAM attachment point at the switch as an on/off for a TAM defined for this kind of counter collection. What are your thoughts on that?

In particular, it might increase extensibility to other objects which do not currently have a defined TAM bind-point (and be a bit easier to use since the telemetry can be enabled/disabled all-at-once if desired (rather than object-by-object)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JaiOCP , I would like to get your thoughts on this as well. Whether we can/should just use an attachment to switch for this kind of telemetry.

@rck-innovium
Copy link
Contributor

@rck-innovium would you like to approve this or any concern/feedback?

@rlhui  @j-bos 

The current spec has changed quiet a bit from what was last discussed in the community. The comment about mapping the counter id in the report to the SAI OID has been addressed using the labels. 

However, the other changes including the new API needs more discussion.  I believe that the example workflow may have some typos and/or missing attributes. I want to ensure that there is no redundancy in the proposed API model. 

Typos and/or missing attributes in the given workflow:

  • SAI_TAM_TEL_TYPE_ATTR_COUNTER_SUBSCRIPTION_LIST is missing in the example workflow. Do we construct a TAM object that references the TAM counter subscription object?
  • SAI_TAM_COUNTER_SUBSCRIPTION_ATTR_OBJECT_ID   points to the target queue_obj_id. And later, we bind the TAM object to the queue_obj_id using SAI_QUEUE_ATTR_TAM.  The example uses different variable names for the queue OID. Is this a typo? If yes, we should avoid this bidirectional reference?

inc/saitam.h Show resolved Hide resolved
@kcudnik
Copy link
Collaborator

kcudnik commented Aug 17, 2023

please fix build errors

@j-bos
Copy link
Contributor Author

j-bos commented Aug 17, 2023

please fix build errors

Thanks, looks related to the git history scanning, will squash the commits.

inc/saitam.h Outdated Show resolved Hide resolved
inc/saitam.h Show resolved Hide resolved
inc/saitam.h Show resolved Hide resolved
Signed-off-by: Jason Bos <jbos@cisco.com>
Signed-off-by: Jason Bos <jbos@cisco.com>
Signed-off-by: Jason Bos <jbos@cisco.com>
@j-bos
Copy link
Contributor Author

j-bos commented Dec 6, 2023

  • SAI_TAM_COUNTER_SUBSCRIPTION_ATTR_OBJECT_ID   points to the target queue_obj_id. And later, we bind the TAM object to the queue_obj_id using SAI_QUEUE_ATTR_TAM.  The example uses different variable names for the queue OID. Is this a typo? If yes, we should avoid this bidirectional reference?

Removed the bidirectional reference as discussed over email. Documentation now indicates binding to the switch is used to enable/disable the feature in a simpler way after the subscriptions are created.

@rlhui rlhui merged commit 6f0550e into opencomputeproject:master Dec 18, 2023
3 checks passed
saiarcot895 added a commit to saiarcot895/sonic-sairedis that referenced this pull request Jan 26, 2024
The primary purpose of this is to bring in support for compiling on
Debian Bookworm.

This brings in the following changes:
* Update the Doxyfile for doxygen in Debian Bookworm (opencomputeproject/SAI#1946)
* Enable sai_uint16_t in ProcessStructValueType Struct Member (opencomputeproject/SAI#1949)
* [meta] Add support for port stat extensions (opencomputeproject/SAI#1947)
* [meta] Add custom range start end values check (opencomputeproject/SAI#1945)
* Cable diagnostics attribute added (opencomputeproject/SAI#1894)
* Add attributes to disable L3 rewrites (opencomputeproject/SAI#1924)
* Add MAC remote loopback to the port loopback enums. (opencomputeproject/SAI#1934)
* [TAM] Granular counter subscription (opencomputeproject/SAI#1670)

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
@j-bos j-bos deleted the telemetry branch March 7, 2024 08:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants