-
Notifications
You must be signed in to change notification settings - Fork 18.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Windows 2016 Swarm service error - "HNS failed with error : Element not found" #32935
Comments
I can create a service with published port
The problem with services is that you cannot use the published port right now as the routing mesh is not available on Windows yet. But accessing the service from another container or service should work. |
Please could you try the image I pushed to the hub? I'll run that checking script too. The final usecase is for this service to be accessed by my API gateway (Linux host) through the swarm. |
Your service also starts. So it seems to bea host problem. |
@alexellis what cloud host is this? I think you'll also need to use host publish mode:
|
I don't need to publish the port for the final use-case, but I wanted to try publishing it to access it and check it was working. I'll run with the mode=host and reply back. It seems odd that this same scenario works for Stefan without any additional changes. |
@alexellis does the scenario work on a different cloud host or on hardware? |
I tried to provision an Azure host, but had to delete it because it couldn't finish downloading the required system updates - after 3 days of being billed for it. The image is on the hub if you can give it a shot?
|
@kallie-b @JMesser81 known issue or fix available? |
@PatrickLang - could you be more specific? Latest update is that Azure host (presumably, VM) could not download required system updates. I'm checking now to see if the updated image was published to Azure |
@alexellis Azure was your 2nd attempt which failed on the update. The first host with the failure had the update applied right? |
I am now working on a different cloud provider after giving up and deleting with my Azure VM. I got to around 10-25% of downloading Windows updates before it failed every time. It would have been good to have got that working (although it's a separate issue) The issue here is that I cannot seem to publish a port on a Swarm service without the HNS error. The KB is correctly applied and I rebooted. I'll try the |
@friism with your suggested overrides and my image the service gives the same error.
With the IIS container and port 80 it gets into a Running state but appears to be inaccessible via public or loopback IP.
|
Here's the output of the checking script via @StefanScherer
|
For the Azure VM @PatrickLang @JMesser81 I used the "Windows Server 2016 Datacenter - with Containers" template from Azure marketplace. I cannot find the paper trail for the deleted resource-group to find out which VM model it was - but it had 3.5GB of RAM and an SSD attached - I think it may have been D2 (around 66GBP/month) |
@alexellis - so you are now on a different cloud host provider (non-Azure), the KB is correctly installed (Get-HotFix to confirm), and you are still seeing this HNS error? Is that correct? You can also run this script to collect traces: https://github.com/Microsoft/Virtualization-Documentation/blob/live/windows-server-container-tools/CleanupContainerHostNetworking/CleanupContainerHostNetworking.ps1 From an elevated command prompt: Attach zipped results |
@alexellis - yes, I just confirmed that the Azure VM template has not been updated with the April KB which included the overlay network driver. This explains why you had to download it manually (although not why it failed to download completely). |
@JMesser81 the hotfix is shown as being installed correctly in the screenshot in this issue. |
I am seeing the same thing on Windows 10 Desktop Creators Update. I managed to get round the error by following this https://docs.microsoft.com/en-us/virtualization/windowscontainers/manage-containers/swarm-mode#limitations This is the command I used to run my service and avoid the error: |
Hi I did some researching around this issue and looks that I found root cause :) "HNS failed with error : Element not found" error message on service create comes because overlay network is missing from nodes. When I run dockerd on debug mode that I can see these messages:
By using powershell command Get-WinEvent -ProviderName Microsoft-Windows-Hyper-V-VmSwitch I was able to found these more detailed log events:
And when we closely look policy on original request: "Policies": [{
"Type": "VSID",
"VSID": 256
}] we can see that is says that isolation mode is VSID and its value is 256. Then we we read deep dive documentation about Hyper-V network virtualization (http://techgenix.com/deep-dive-hyper-v-network-virtualization-part4/) it tells us that: So problem is that VSID value is out of range and Windows declines it. You can see overlay network details using command: docker network inspect ingress "Options": {
"com.docker.network.driver.overlay.vxlanid_list": "256"
}, I also created simple test application which can be used to send HNSNetworkRequest directly to hcsshim: package main
import (
"encoding/json"
"github.com/Microsoft/hcsshim"
"github.com/Sirupsen/logrus"
)
func main() {
network := `{
"Name": "bvjur6gl8ieq05mtgcjkchufk",
"Type": "overlay",
"Subnets": [{
"AddressPrefix": "10.255.0.0/16",
"GatewayAddress": "10.255.0.1",
"Policies": [{
"Type": "VSID",
"VSID": 5256
}]
}]
}`
in := []byte(network)
var raw map[string]interface{}
json.Unmarshal(in, &raw)
configurationb, _ := json.Marshal(raw)
configuration := string(configurationb)
logrus.Infof("HNSNetwork Request =%v", configuration)
hnsresponse, err := hcsshim.HNSNetworkRequest("POST", "", configuration)
if err != nil {
logrus.Errorf("Error: ", err)
}
logrus.Infof("Response: ", hnsresponse)
} With that one I was able to create container network (which you can see using command Get-ContainerNetwork) but problem is that docker refuses to use pre-created networks so this cannot be used as workaround. Anyway, I assume that fixing the issue should be quite simple now? |
And now I found workaround/solution to issue. I re-created my swarm the way that Swarm manager is Debian Jessie with latest docker-ce version (installed based on this guide: https://docs.docker.com/engine/installation/linux/docker-ce/debian/)
When I use that version I can see that it creates ingress network with valid VSID/VxlanID:
and it will be successfully created to Windows workers too. |
Still getting
|
@artisticcheese, if you readed my two last messages you can find out that out that key thing here VxlanID/VSID which is set by Swarm manager when you create new swarm. You can use command: docker network inspect ingress to see it. If is still wrong on fully patched Win server then it means that Docker version on Win is still too old so you need either use Linux as swarm manager like me or update edge/test/etc version of Docker to your Win server before create new swarm. There is no way to change VxlanId afterwards so you need make sure that it is created correctly on first place. |
yes, I read your discovery but it's not an issue in my case. My networks are created as you can see.
|
I don't believe this is the primary issue here, but in reading through this thread I'm noticing that in most cases the
You need to specify |
I deploy through |
@artisticcheese, ahaa true. Try: @kallie-b, Yes |
@artisticcheese Will you please run our logging script with no arguments (we're just getting info for now) on the two machines that are not working as expected? After running the script on each host, please zip the resulting folders of logs, and send it to us at sdn_feedback@microsoft.com. |
Logs sent only from server2. Can not run tool on server1 since it does not have docker installed. |
@kallie-b, btw it would be nice if someone would have time to looks this one because currently it is quite hard to be able find real error message on these cases: microsoft/hcsshim#54 |
Is this a duplicate of #34696, or just a similar error message? |
Description
When I create a swarm service on Windows 2016 with a published plot I get the following error:
HNS failed with error : Element not found
- the error isn't present withdocker run
and if I don't publish a port the container is created (albeit without being on the Swarm)I have the required KB update installed.
Steps to reproduce the issue:
docker service create --name watch --publish 8080:8080 -e suppress_lock=true alexellis2/golang-function-windows
Describe the results you received:
0/1 replicas - "HNS failed with error : Element not found"
I've also tried
docker network create
with an explicitly named overlay network. It gave the same error.Describe the results you expected:
1/1 replicas etc
Additional information you deem important (e.g. issue happens only occasionally):
Output of
docker version
:Output of
docker info
:Additional environment details (AWS, VirtualBox, physical, etc.):
Provisioned on a cloud host.
Thought it might be worth CC/ing @StefanScherer @friism
The text was updated successfully, but these errors were encountered: