Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when passing network id for connecting a network to a container #9451

Closed
linggao opened this issue Feb 21, 2021 · 12 comments · Fixed by #9455
Closed

Error when passing network id for connecting a network to a container #9451

linggao opened this issue Feb 21, 2021 · 12 comments · Fixed by #9455
Assignees
Labels
In Progress This issue is actively being worked by the assignee, please do not work on this at this time. kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@linggao
Copy link

linggao commented Feb 21, 2021

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description
podman REST API is not compatible with docker REST API for /networks/{id or name}/connect and /networks/{id or name}/connect.
Our code connects a network to a container with docker REST API https://docs.docker.com/engine/api/v1.41/#operation/NetworkConnect through godockerclient. This api allows us to use either network id or network name. However, when calling the same api on /var/run/podman/podman.sock with network id, not only the error occurs but also the network and the container can no longer be queried.

Steps to reproduce the issue:

  1. podman network create foo-a
    podman run --name test --network foo-a -d alpine sleep 1000

  2. podman network create foo-b

  3. get network id for foo-b
    curl -sSLw "%{http_code}" --unix-socket /var/run/podman/podman.sock http://localhost/networks | jq

  4. podman network connect {network-id-for-foo-b} test
    or
    read -d '' sdef <<EOF
    {
    "Container":"{id-for-container-test}",
    "EndpointConfig": {
    }
    }
    EOF
    echo "$sdef" | curl -sLX POST --data @- -H "Content-Type: application/json" -H "Accept: application/json" --unix-socket /var/run/podman/podman.sock http://localhost/networks/{network-id-for-foo-b}/connect

  5. podman inspect test

Describe the results you received:
# echo "$sdef" | curl -sLX POST --data @- -H "Content-Type: application/json" -H "Accept: application/json" --unix-socket /var/run/podman/podman.sock http://localhost/networks/e7953c3a62f54803cbfbbe9db231d8895b12b06f4d493d8d56c333bebb3b6e09/connect
{"cause":"CNI network "e7953c3a62f54803cbfbbe9db231d8895b12b06f4d493d8d56c333bebb3b6e09" not found","message":"CNI network "e7953c3a62f54803cbfbbe9db231d8895b12b06f4d493d8d56c333bebb3b6e09" not found","response":500}

# podman inspect test
Error: network inspection mismatch: asked to join 2 CNI network(s) [e7953c3a62f54803cbfbbe9db231d8895b12b06f4d493d8d56c333bebb3b6e09 foo-a], but have information on 1 network(s): internal libpod error

# curl -sSLw "%{http_code}" --unix-socket /var/run/podman/podman.sock http://localhost/networks | jq
{
"cause": "internal libpod error",
"message": "network inspection mismatch: asked to join 2 CNI network(s) [e7953c3a62f54803cbfbbe9db231d8895b12b06f4d493d8d56c333bebb3b6e09 foo-a], but have information on 1 network(s): internal libpod error",
"response": 500
}

Describe the results you expected:
podman REST API and CLI should take either network id or name when connecting a network or disconnecting to a container.
This is the behavior of docker and podman REST APIs claims to be compatible with docker REST APIs.

Additional information you deem important (e.g. issue happens only occasionally):

Output of podman version:

Version:      3.0.1-dev
API Version:  3.0.0
Go Version:   go1.15.7
Built:        Tue Feb 16 06:47:41 2021
OS/Arch:      linux/amd64

Output of podman info --debug:

host:
  arch: amd64
  buildahVersion: 1.19.2
  cgroupManager: systemd
  cgroupVersion: v1
  conmon:
    package: Unknown
    path: /usr/local/libexec/podman/conmon
    version: 'conmon version 2.0.27-dev, commit: 7310bf13319ee8ed50799b202509bedc27b36cf8'
  cpus: 2
  distribution:
    distribution: '"rhel"'
    version: "8.3"
  eventLogger: file
  hostname: lingvs1.dev.edge-fabric.com
  idMappings:
    gidmap: null
    uidmap: null
  kernel: 4.18.0-240.15.1.el8_3.x86_64
  linkmode: dynamic
  memFree: 5340495872
  memTotal: 8342462464
  ociRuntime:
    name: runc
    package: runc-1.0.0-70.rc92.module+el8.4.0+9980+44630550.x86_64
    path: /usr/bin/runc
    version: 'runc version spec: 1.0.2-dev'
  os: linux
  remoteSocket:
    exists: true
    path: /run/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_NET_RAW,CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: false
    seccompEnabled: true
    selinuxEnabled: true
  slirp4netns:
    executable: ""
    package: ""
    version: ""
  swapFree: 2146758656
  swapTotal: 2146758656
  uptime: 60h 57m 7.72s (Approximately 2.50 days)
registries:
  search:
  - registry.access.redhat.com
  - registry.redhat.io
  - docker.io
store:
  configFile: /etc/containers/storage.conf
  containerStore:
    number: 1
    paused: 0
    running: 0
    stopped: 1
  graphDriverName: overlay
  graphOptions:
    overlay.mountopt: nodev,metacopy=on
  graphRoot: /var/lib/containers/storage
  graphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "true"
  imageStore:
    number: 7
  runRoot: /var/run/containers/storage
  volumePath: /var/lib/containers/storage/volumes
version:
  APIVersion: 3.0.0
  Built: 1613479661
  BuiltTime: Tue Feb 16 06:47:41 2021
  GitCommit: ""
  GoVersion: go1.15.7
  OsArch: linux/amd64
  Version: 3.0.1-dev

Package info (e.g. output of rpm -q podman or apt list podman):

podman-3.0.0-2.module+el8.4.0+9980+44630550.x86_64

Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide?

Yes

Additional environment details (AWS, VirtualBox, physical, etc.):
RHEL 8.3

@openshift-ci-robot openshift-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Feb 21, 2021
@mheon
Copy link
Member

mheon commented Feb 21, 2021

This doesn't look REST API related. It looks like the network connect call is corrupting the state (specifically network results blocks). Please provide more details on the test container.

@Luap99
Copy link
Member

Luap99 commented Feb 21, 2021

OK I see the problem. My network ID magic works only on the libpod side. Network connect checks if the networks exists and this passes because it is a valid ID but when we pass the ID as network name to OCICNI it will fail. At this point we already added the network to the state which causes problems with podman inspect and network ls because the the state and cni information no longer matches.

I think the same problem exists for a plain podman run --network ID ...
We have to translate the network ID to the name before we use it internally.

@Luap99 Luap99 self-assigned this Feb 21, 2021
@Luap99 Luap99 added the In Progress This issue is actively being worked by the assignee, please do not work on this at this time. label Feb 21, 2021
Luap99 added a commit to Luap99/libpod that referenced this issue Feb 21, 2021
The libpod network logic knows about networks IDs but OCICNI
does not. We cannot pass the network ID to OCICNI. Instead we
need to make sure we only use network names internally. This
is also important for libpod since we also only store the
network names in the state. If we would add a ID there the
same networks could accidently be added twice.

Fixes containers#9451

Signed-off-by: Paul Holzinger <paul.holzinger@web.de>
Luap99 added a commit to Luap99/libpod that referenced this issue Feb 21, 2021
The libpod network logic knows about networks IDs but OCICNI
does not. We cannot pass the network ID to OCICNI. Instead we
need to make sure we only use network names internally. This
is also important for libpod since we also only store the
network names in the state. If we would add a ID there the
same networks could accidentally be added twice.

Fixes containers#9451

Signed-off-by: Paul Holzinger <paul.holzinger@web.de>
Luap99 added a commit to Luap99/libpod that referenced this issue Feb 21, 2021
The libpod network logic knows about networks IDs but OCICNI
does not. We cannot pass the network ID to OCICNI. Instead we
need to make sure we only use network names internally. This
is also important for libpod since we also only store the
network names in the state. If we would add a ID there the
same networks could accidentally be added twice.

Fixes containers#9451

Signed-off-by: Paul Holzinger <paul.holzinger@web.de>
Luap99 added a commit to Luap99/libpod that referenced this issue Feb 21, 2021
The libpod network logic knows about networks IDs but OCICNI
does not. We cannot pass the network ID to OCICNI. Instead we
need to make sure we only use network names internally. This
is also important for libpod since we also only store the
network names in the state. If we would add a ID there the
same networks could accidentally be added twice.

Fixes containers#9451

Signed-off-by: Paul Holzinger <paul.holzinger@web.de>
@linggao
Copy link
Author

linggao commented Feb 22, 2021

@Luap99 thanks for the quick fix. I copied your latest files for testing, it is working for most cases. However, it failed in our program that uses godockerclient. I wrote a shell script to demonstrate what we are doing, this script failed with your current fix. But it works with docker if switch podman with docker. Please let me know what might be missing on our side.

#!/bin/bash

#clean up
echo -e "cleaning up..."
podman rm -f test
podman network prune -f

# create container
echo -e "\ncreating container test..."
podman network create foo-a
rc=$(podman run --name test --network foo-a -d alpine sleep 1000)
containerID=$rc
echo -e "\ncontainerID=$containerID"

# create network foo-b
echo -e "\ncreating network foo-b..."
read -d '' sdef <<EOF
{
  "Name": "foo-b",
  "CheckDuplicate": true,
  "Driver": "bridge",
  "EnableIPv6": false,
  "IPAM": {
    "Driver": "default",
    "Config": []
  },
  "Internal": false,
  "Options": {
    "com.docker.network.bridge.default_bridge": "true",
    "com.docker.network.bridge.enable_icc": "true",
    "com.docker.network.bridge.enable_ip_masquerade": "true"
  },
  "Labels": {
    "com.example.some-label": "some-value",
    "com.example.some-other-label": "some-other-value"
  }
}
EOF

rc=$(echo "$sdef" | curl -sLX POST --data @- -H "Content-Type: application/json" -H "Accept: application/json" --unix-socket /var/run/podman/podman.sock http://localhost/networks/create)
echo $rc
networkID=$(echo $rc |jq '.Id')

# trim the double quotes
networkID="${networkID%\"}"
networkID="${networkID#\"}"

echo "networkID=$networkID"

echo -e "\nconnecting..."
curl -sLX POST -H "Content-Type: application/json" -H "Accept: application/json" --unix-socket /var/run/podman/podman.sock -d "{\"Container\": \"$containerID\"}" http://localhost/networks/$networkID/connect

# this give strange output
echo -e "\nchecking..."
podman inspect test

The following is the output:

cleaning up...
d27519bb27696dfeef45afe0e81a72b1a779c2e772b9bbc3d03eb352d34c69d3
foo-a
foo-b

creating container test...
/etc/cni/net.d/foo-a.conflist

containerID=11e19f6b4be1df8ee88e5dec621362aff92a56fa25735a789e8f529da5ce85a9

creating network foo-b...
{"Id":"e7953c3a62f54803cbfbbe9db231d8895b12b06f4d493d8d56c333bebb3b6e09","Warning":null}
networkID=e7953c3a62f54803cbfbbe9db231d8895b12b06f4d493d8d56c333bebb3b6e09

connecting...
ERRO[1262] CNI network "foo-b" not found                
{"cause":"CNI network \"foo-b\" not found","message":"CNI network \"foo-b\" not found","response":500}

checking...
Error: network inspection mismatch: asked to join 2 CNI network(s) [foo-a foo-b], but have information on 1 network(s): internal libpod error

@linggao
Copy link
Author

linggao commented Feb 22, 2021

@Luap99 Btw, if you run above script one command at a time, it works well.

@Luap99
Copy link
Member

Luap99 commented Feb 22, 2021

@linggao Thanks for the script. I cannot reproduce but that's somewhat expected. This is a race condition. I see this a lot in our CI. I already opened a PR cri-o/ocicni#85 which should fix this problem.

Luap99 added a commit to Luap99/libpod that referenced this issue Feb 22, 2021
The libpod network logic knows about networks IDs but OCICNI
does not. We cannot pass the network ID to OCICNI. Instead we
need to make sure we only use network names internally. This
is also important for libpod since we also only store the
network names in the state. If we would add a ID there the
same networks could accidentally be added twice.

Fixes containers#9451

Signed-off-by: Paul Holzinger <paul.holzinger@web.de>
@linggao
Copy link
Author

linggao commented Feb 22, 2021

@Luap99 thanks for the info. Just let you know that the error happens all the time on my vm with the script. I have RHEL 8.3 with the latest podman code + this PR. Should I upgrade cri-o? How do I do it?

@Luap99
Copy link
Member

Luap99 commented Feb 22, 2021

I see in your podman info output that you only have two cores. The problem is that the cri-o library updates the network list asynchronously with fsnotfiy. On a slow system this might happen after we already try to use the new net. You cannot update the OCICNI library as this is directly compiled into podman.
As a workaround adding a sleep after the network create should work.

@linggao
Copy link
Author

linggao commented Feb 23, 2021

@Luap99 I saw this PR is merged.
Question 1: Is there a way I can build a podman with the cri-o/ocicni#85 PR?
Question 2: When will be the next release date that includes both this and `cri-o/ocicni#851 fixes?

@Luap99
Copy link
Member

Luap99 commented Feb 23, 2021

$ go mod edit --replace 'github.com/cri-o/ocicni=github.com/Luap99/ocicni@cd5167ffe95976c7b50ecc99ace7cae1713eb8ae'
$ $ make vendor
GO111MODULE=on go mod tidy
warning: ignoring symlink /home/paul/go/src/github.com/containers/podman/contrib/systemd/user
go: downloading github.com/Luap99/ocicni v0.2.1-0.20210220201229-cd5167ffe959
GO111MODULE=on go mod vendor
warning: ignoring symlink /home/paul/go/src/github.com/containers/podman/contrib/systemd/user
GO111MODULE=on go mod verify
all modules verified
$ make binaries

also see #9449

  1. I assume we will backport this to 3.0 so should become available with the next 3.0.2 release. The same goes for the ocicni changes once they merge.
    However if you want this in in rhel 8.4 than you have to open a bugzilla to request that fix as backport to v3.0 because podman v3 is already frozen for rhel 8.4.

@linggao
Copy link
Author

linggao commented Feb 23, 2021

@Luap99 thanks a lot for all of your help. I built ocicni fix you have made with podman, the initial tests in our code looks very good.
Now how/where do I open a bugzilla to request that both fixes get backported to v3.0 and get into rhel8.4?

@Luap99
Copy link
Member

Luap99 commented Feb 23, 2021

Now how/where do I open a bugzilla to request that both fixes get backported to v3.0 and get into rhel8.4?

@mheon @baude Can one of you answer that. I do not know that.

@mheon
Copy link
Member

mheon commented Feb 23, 2021

@linggao Open a BZ on the Red Hat Bugzilla, targeted against Podman on RHEL 8.4, for this issue. We can use that to justify the backport.

For reference, upstream 3.0.2 is probably landing sometime next week (Wednesday/Thursday seems likely?) but RHEL 8.4 will be staying on 3.0.1 with selective backports.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
In Progress This issue is actively being worked by the assignee, please do not work on this at this time. kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants