Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Barefoot dtel design #182

Merged
merged 3 commits into from
Aug 4, 2018
Merged

Conversation

shruthi9
Copy link
Contributor

@shruthi9 shruthi9 commented May 1, 2018

Design document for Barefoot Dataplane Telemetry support in SONiC.

@msftclas
Copy link

msftclas commented May 1, 2018

CLA assistant check
All CLA requirements met.



## Testing

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we also need virtual switch test to test pure control plan functionality.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comments! I will add a VS test as well. Also, will create a separate test plan document and upload it as part of this PR


## euclid

euclid (Euclid Universal Configuration LIbrary for Dataplane telemetry)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this a open source project, does it have a website?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. But, we intend to upstream SONiC specific changes as part of this effort

* Value:
* Report session OID
* Reference count

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to follow up with the reference count discussion we had during the meeting:

In meeting, the reference count usage was described as:

  1. When reference count is not zero, the object is protected from deletion.
  2. When reference count drops to zero, a delete request will free the object.

I have concern for this approach. The danger is that a delete call at the right time is required to free the object. So the caller of delete will have to check if the object has been freed or denied of deletion. This makes code logic complicated and encourages opportunities for either memory leak or double free.

I think a safe approach to use reference count is following:

  • An object get reference count of 1 at creation time.
  • A delete request reduces the reference count by 1.
  • An association with another entity increases the reference count by 1.
  • Detaching from another entity decrease the reference count by 1.
  • Object is freed when the reference count drops to 0.

The advantage of this approach is that the delete can be called whenever the code decide to free. And the object will be freed when all the associations are gone. This makes code logical simpler.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comment! This makes sense, I will change the design accordingly


Components of SONiC that will be modified or newly added are discussed in the following sub-sections.

## Config DB

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think you should explain why you decided to have the A,B,C, etc in the table name as this is strange.
as discussed over the design review we should have better way in SONiC to force an order.

as for naming convention i beleive table has been removed once we moved to config_db. please double check

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback! Regarding using tables in config_db, if we remove them, we run into the problem where events are re-ordered upon config replay. Since, DTel has specific dependency requirements between events, can we keep the tables until SONiC has better infrastructure to control event ordering (within an orchagent)? Another option is to create two separate orch agents, one to handle pre-requisite config and another for dependent config, and order the init of these orch agents.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback! Regarding using tables in config_db, if we remove them, we run into the problem where events are re-ordered upon config replay. Since, DTel has specific dependency requirements between events, can we keep the tables until SONiC has better infrastructure to control event ordering (within an orchagent)?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liatgrozovik - i think the question is whether this feature should bear the burden with resolving an issue that exists in sonic for all features? In spirit of making progress, we can have a short term solution as we did right now and adhere to long term solution when its designed properly and reviewed by community. Clearly issue of dependency / order of operations is something to be resolved for all features.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liatgrozovik - @lguohan was making a few other suggestions to us on this topic, @shruthi9 can give summary on what she will implement


Most events are handled as shown in the sequence diagram below. Ones which are different are depicted in the following figures.

![Control flow](cflow-1.jpg)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to add handling of port stat events and handle cases where ports are not yet configured.
IMO the request should be queued and replayed when the port configuration is ready (port or lag).

1. At least one drop watchlist with drop report enabled

**Queue report specific:**
1. Reporting enabled on at least one queue with some threshold (latency or depth) set or tail drop enabled.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please note that we have both VS testing as well as ACL testbed which covers ACL and everlfow.
you will need to make sure those are not broken plus add more test cases to cover this.

i don't see any reference to what will happen if a user will try to define an ACL which is not supported.
As there is an ACL HLD already with flows, you should review it and probably update it with all the ACL changes/ additionals. I IMO this HLD should covers details on AC and flowsL. We should have one place and a reference from here.

```


## Testing

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VirtualSwitch (a.k.a) VS testing should covers all swss parts including ACL changes as well as the new Dtel.
On top of that we have for each feature a testbed test which should cover system level tests and full functional validation with a real hw. this one should have dedicated test plan document and should be reviewed by community.
you have few examples of those in the github: crm, acl, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will create a separate test plan document and upload it as part of this PR.

@xjtufjj
Copy link

xjtufjj commented Jan 15, 2019

Does anyone can tell me where is te euclid tool ? it is mentioned in dtel design spec, also where is the test files?

@dafzheng
Copy link

dafzheng commented Jan 18, 2022

Does anyone knows how to use euclid tool ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants