Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: config-linux: add an initial spec of RDMA related field #930

Closed
wants to merge 1 commit into from

Conversation

mitake
Copy link

@mitake mitake commented Oct 17, 2017

This commit adds an initial spec of RDMA related field for Linux
configuration. They will be used for controlling rdmacg of the Linux
kernel.

Signed-off-by: Hitoshi Mitake mitake.hitoshi@lab.ntt.co.jp

@mitake mitake changed the title config-linux: add an initial spec of RDMA related field RFC: config-linux: add an initial spec of RDMA related field Oct 17, 2017
@mitake
Copy link
Author

mitake commented Oct 17, 2017

/cc @cyphar this is the spec side PR of opencontainers/runc#1612. The main motivation is multiplexing RDMA network interface safely by protecting RDMA resources from buggy applications. Our current target is deep learning framework e.g. TensorFlow and MXNet.

Should I update other files e.g. https://github.com/opencontainers/runtime-spec/blob/master/schema/config-linux.json ?

config-linux.md Outdated
@@ -637,6 +637,30 @@ The following parameters can be specified to set up seccomp:
"mountLabel": "system_u:object_r:svirt_sandbox_file_t:s0:c715,c811"
```

### <a name="configLinuxRdmaLimits" />RDMA resource limits

**`rdma_limits`** (array of objects, OPTIONAL) represents the `rdma` controller which allows to limit the

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggest: "which allows to limit the" to "which allows the limiting of the"

config-linux.md Outdated

**`rdma_limits`** (array of objects, OPTIONAL) represents the `rdma` controller which allows to limit the
resources related to network interfaces which support RDMA (e.g. HCA handles).
For more information, see the RDMA section of [the kernel cgroups documentation][cgroup-v2].
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To show that RDMA is not v2-specific, we may want to link both the v1 and v2 docs.

@mitake
Copy link
Author

mitake commented Oct 19, 2017

Updated based on the comment, could you take a look?

Copy link
Contributor

@wking wking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks reasonable to me. I've left a few copy-edit suggestions inline, but I'm fine with this PR even if none of those happen.

@crosbymichael may suggest using camelCase for consistency with the rest of the Linux spec (more discussion on this here). But I'll let a maintainer champion that f they're interested, and am fine with this landing either way.

config-linux.md Outdated
### <a name="configLinuxRdmaLimits" />RDMA resource limits

**`rdma_limits`** (array of objects, OPTIONAL) represents the `rdma` controller which allows the limiting of the
resources related to network interfaces which support RDMA (e.g. HCA handles).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't follow the one sentence per line policy.

Also maybe rephrase to the more compact:

rdma_limits (array of objects, OPTIONAL) limits resources for network interfaces which support RDMA (e.g. HCA handles).

config-linux.md Outdated

**`rdma_limits`** (array of objects, OPTIONAL) represents the `rdma` controller which allows the limiting of the
resources related to network interfaces which support RDMA (e.g. HCA handles).
For more information, see the RDMA section of [the kernel cgroups documentation][cgroup-v1-rdma].
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're just linking the v1 docs, you don't need “the RDMA section of”, because the whole page is about RDMA. How about:

For more information, see the [the kernel RMDA controller documentation][cgroup-v1-rdma].

or:

For more information, see the the kernel RMDA controller documentation for [v1][cgroup-v1-rdma] and [v2][cgroup-v2] cgroups.

config-linux.md Outdated

Each entry has the following structure:

* **`interface_name`** *(string, REQUIRED)* - a name of an interface
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

config_linux.md isn't consistent on this, but config.md is pretty consistent in not wrapping the type/requirement parenthetical in *. That would make this line:

* **`interface_name`** (string, REQUIRED) - a name of an interface

Because of the current config-linux.md inconsistency, I'd be fine with you using wrapping * or not in this commit, although you should probably be internally consistent. Can you drop them here and below, or keep them here and in the rdma_limits entry above?

config-linux.md Outdated
Each entry has the following structure:

* **`interface_name`** *(string, REQUIRED)* - a name of an interface
* **`hca_handle_limit`** *(int64, REQUIRED)* - limit a number of HCA handles which can be used by a container (-1 means max)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe follow the kernel wording and use:

* **`hca_handle_limit`** (int64, REQUIRED) - Maximum number of HCA Handles (use `-1` to request the maximum available).

I think talking about “maximum number of” is less ambiguous than “limit a number of”, although I can't put my finger on why, so I'm also fine leaving this as it stands.

@wking
Copy link
Contributor

wking commented Oct 19, 2017

Also, this should be tagged 1.Y.0 or 1.1.0 or some such, because it's adding functionality and therefore is not a patch-level change.

@mitake
Copy link
Author

mitake commented Oct 26, 2017

@wking updated for your comments, could you take a look?

Copy link
Contributor

@wking wking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still looks good to me. I've left one very minor copy-edit suggestion inline, but I'm fine with this PR even if that suggestion isn't addressed.

With the Markdown looking good, this PR could grow JSON Schema and Go entries, but I'm also fine leaving that to follow-up work (although it would be good to have them landed before we cut the minor bump containing these Markdown changes).

The Travis failure looks like a network hiccup.

config-linux.md Outdated

Each entry has the following structure:

* **`interface_name`** (string, REQUIRED) - a name of an interface
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: The spec isn't currently consistent on whether single-phrase list entries should end in a period, or whether the post-hyphen word should be capitalized, so I'm fine with whatever you want there. However, your following two lines are capitalized post-hyphen and end with a period, while this one is neither. Whichever way you prefer, it would be good to have this particular list be internally consistent.

This commit adds an initial spec of RDMA related field for Linux
configuration. They will be used for controlling rdmacg of the Linux
kernel.

Signed-off-by: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp>
@mitake
Copy link
Author

mitake commented Nov 6, 2017

@wking updated based on your latest comment, could you take a look?

@paravmellanox
Copy link
Contributor

I noticed this RFC now during meeting.
One reference implementation is available at
paravmellanox/cgroups@469ac1c

There are few minor differences between two PR specifically with respect the size as int64 and int32

@wking
Copy link
Contributor

wking commented Dec 7, 2017

Ah, I'd forgotten about this one. Cross-linking #942, which is moving in the same direction.


* **`interface_name`** (string, REQUIRED) - A name of an interface.
* **`hca_handle_limit`** (int64, REQUIRED) - Maximum number of HCA Handles (use `-1` to request the maximum available).
* **`hca_object_limit`** (int64, REQUIRED) - Maximum number of HCA Objects (use `-1` to request the maximum available).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the kernel docs, I think hca_handle_limit and hca_object_limit should both be OPTIONAL, possibly with a requirement that at least one of them be set (this is the same suggestion I made for #942 here).

#### Example

```json
"rdma_limits": [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll want to drop the leading indent from this block to catch up with #937.

@mitake
Copy link
Author

mitake commented Dec 7, 2017

I'm closing this because @paravmellanox has the good alternative and reference implementation.

@mitake mitake closed this Dec 7, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants