Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nightly vSphere 6.0 HA test fails with Invalid configuration for device '1' #4666

Closed
rajanashok opened this issue Apr 10, 2017 · 53 comments
Closed
Assignees
Labels
component/portlayer kind/defect Behavior that is inconsistent with what's intended priority/p0 source/scenario Found via a scenario failure team/foundation

Comments

@rajanashok
Copy link
Contributor

Documentation:	
Runs the given command in the system and returns the RC and output.
Start / End / Elapsed:	20170410 07:20:30.013 / 20170410 07:20:31.173 / 00:00:01.160
07:20:30.022	INFO	Running command 'docker -H 10.192.33.129:2376 --tls create busybox /bin/top 2>&1'.	
07:20:31.172	INFO	${rc} = 1	
07:20:31.172	INFO	${container} = Error response from daemon: Server error from portlayer: Invalid configuration for device '1'.
@rajanashok
Copy link
Contributor Author

@mhagen-vmware
Copy link
Contributor

portlayer.log:

Apr 10 2017 12:20:30.676Z DEBUG [BEGIN] [github.com/vmware/vic/lib/apiservers/portlayer/restapi/handlers.(*ContainersHandlersImpl).CommitHandler:213] handle(bc1da1f46bd17b5345d28bef7cec3f10)
Apr 10 2017 12:20:30.677Z DEBUG [BEGIN] [github.com/vmware/vic/lib/portlayer/exec.Commit:35] 735110e21d746859f0b0edd84e8ed20d3fb4a7d1ab819a308caaad6bdab17d45
Apr 10 2017 12:20:31.215Z DEBUG unexpected fault on task retry : &types.InvalidDeviceSpec{InvalidVmConfig:types.InvalidVmConfig{VmConfigFault:types.VmConfigFault{VimFault:types.VimFault{MethodFault:types.MethodFault{FaultCause:(*types.LocalizedMethodFault)(nil), FaultMessage:[]types.LocalizableMessage(nil)}}}, Property:"virtualDeviceSpec.device.backing.parent.fileName"}, DeviceIndex:1}
Apr 10 2017 12:20:31.215Z ERROR An error occurred while waiting for a creation operation to complete. Spec was {DynamicData:{} ChangeVersion: Name:hungry_ardinghelli-735110e21d74 Version: Uuid:a5836b79-6722-5566-8d37-f6ce821b09cb InstanceUuid: NpivNodeWorldWideName:[] NpivPortWorldWideName:[] NpivWorldWideNameType: NpivDesiredNodeWwns:0 NpivDesiredPortWwns:0 NpivTemporaryDisabled:<nil> NpivOnNonRdmDisks:<nil> NpivWorldWideNameOp: LocationId: GuestId:otherGuest64 AlternateGuestName:Photon - Container v1.1.0-rc4, 9634, b89ebff Annotation: Files:0xc42013d0e0 Tools:<nil> Flags:<nil> ConsolePreferences:<nil> PowerOpInfo:<nil> NumCPUs:2 NumCoresPerSocket:0 MemoryMB:2048 MemoryHotAddEnabled:0xc421175a80 CpuHotAddEnabled:0xc421175a80 CpuHotRemoveEnabled:<nil> VirtualICH7MPresent:<nil> VirtualSMCPresent:<nil> DeviceChange:[0xc42013d6d0 0xc420762320 0xc420762690 0xc420762a50 0xc420763d60 0xc4206b40f0 0xc4206b4730 0xc4206b47d0] CpuAllocation:<nil> MemoryAllocation:<nil> LatencySensitivity:<nil> CpuAffinity:<nil> MemoryAffinity:<nil> NetworkShaper:<nil> CpuFeatureMask:[] ExtraConfig:[0xc420ff0720 0xc420ff0740 0xc420ff0760 0xc420ff0780 0xc420ff07a0 0xc420ff07c0 0xc420ff07e0 0xc420ff0800 0xc420ff0820 0xc420ff0840 0xc420ff1260 0xc420ff1280 0xc420ff12a0 0xc420ff12e0 0xc420ff1300 0xc420ff1320 0xc420ff1340 0xc420ff1360 0xc420ff1380 0xc420ff13a0 0xc420ff13c0 0xc420ff13e0 0xc420ff1400 0xc420ff1420 0xc420ff1440 0xc420ff1460 0xc420ff1480 0xc421191240 0xc421191260 0xc421191280 0xc4211912a0 0xc4211912c0 0xc4211912e0 0xc421191300 0xc421191320 0xc421191340 0xc421191360 0xc421191380 0xc4211913a0 0xc4211913c0 0xc4211913e0 0xc421191400 0xc421191420 0xc421191440 0xc421191460 0xc421191480 0xc4211914a0 0xc4211914c0 0xc4211914e0 0xc421191500 0xc421191520 0xc421191540 0xc421191560 0xc421191580 0xc4211915a0 0xc4211915c0 0xc4211915e0 0xc421191600 0xc421191620 0xc421191640 0xc421191660 0xc421191680 0xc4211916a0 0xc4211916c0 0xc4211916e0 0xc421191700 0xc421191720 0xc421191740 0xc421191760 0xc421191780 0xc4211917a0 0xc4211917c0 0xc4211917e0 0xc421191800 0xc421191820 0xc421191840 0xc421191860 0xc421191880 0xc4211918a0 0xc4211918c0 0xc4211918e0 0xc421191900 0xc421191920 0xc421191940 0xc421191960 0xc421191980] SwapPlacement: BootOptions:<nil> VAppConfig:<nil> FtInfo:<nil> RepConfig:<nil> VAppConfigRemoved:<nil> VAssertsEnabled:<nil> ChangeTrackingEnabled:<nil> Firmware: MaxMksConnections:0 GuestAutoLockEnabled:<nil> ManagedBy:<nil> MemoryReservationLockedToMax:<nil> NestedHVEnabled:<nil> VPMCEnabled:<nil> ScheduledHardwareUpgradeInfo:<nil> VmProfile:[] MessageBusTunnelEnabled:<nil> Crypto:<nil> MigrateEncryption:}
Apr 10 2017 12:20:31.215Z DEBUG [ END ] [github.com/vmware/vic/lib/portlayer/exec.Commit:35] [538.605634ms] 735110e21d746859f0b0edd84e8ed20d3fb4a7d1ab819a308caaad6bdab17d45
Apr 10 2017 12:20:31.215Z ERROR CommitHandler error on handle(bc1da1f46bd17b5345d28bef7cec3f10) for 735110e21d746859f0b0edd84e8ed20d3fb4a7d1ab819a308caaad6bdab17d45: task.Error{LocalizedMethodFault:(*types.LocalizedMethodFault)(0xc420176d60)}
Apr 10 2017 12:20:31.215Z DEBUG [ END ] [github.com/vmware/vic/lib/apiservers/portlayer/restapi/handlers.(*ContainersHandlersImpl).CommitHandler:213] [538.955675ms] handle(bc1da1f46bd17b5345d28bef7cec3f10)

@mhagen-vmware mhagen-vmware removed this from the Sprint 6 milestone Apr 12, 2017
@jzt
Copy link
Contributor

jzt commented Apr 13, 2017

Seen in longevity run on VC 6.5 + vSAN on iteration 14: container-logs.zip

@jzt
Copy link
Contributor

jzt commented Apr 13, 2017

vpxa.log:

2017-04-12T20:50:29.856Z info vpxa[76BBB70] [Originator@6876 sub=vpxLro opID=53f712a3-01-e8] [VpxLRO] -- BEGIN task-18831 -- vpxa -- vpxapi.VpxaService.createVm -- 52634330-6b1b-ecde-ab69-a7cfdb4d387b
2017-04-12T20:50:29.856Z warning vpxa[76BBB70] [Originator@6876 sub=vpxaVmprovUtil opID=53f712a3-01-e8] [VpxaVmprov] UndoNameReservation: Failed to open file /vmfs/volumes/vsan:52b432a675d2bf59-7168e072a499aad9/ee92ee58-ac10-1339-8331-000af7992758/vibrant_brahmagupta-440e33fd7da9.vmx
2017-04-12T20:50:29.856Z warning vpxa[76BBB70] [Originator@6876 sub=vpxaVmprovUtil opID=53f712a3-01-e8] [VpxaVmprov] UndoNameReservation: Failed to open file /vmfs/volumes/vsan:52b432a675d2bf59-7168e072a499aad9/ee92ee58-ac10-1339-8331-000af7992758/440e33fd7da9085b428489b961d7356364c85f615dd2ad5ddcafb8a95726e765.vmdk
2017-04-12T20:50:29.856Z verbose vpxa[76BBB70] [Originator@6876 sub=vpxaVmprov opID=53f712a3-01-e8] [CreateVmOnHost] Invoking createVm command on hostagent
2017-04-12T20:50:29.856Z verbose vpxa[76BBB70] [Originator@6876 sub=Default opID=53f712a3-01-e8] [PR_629085 FindTransientMoRef] _sourceMap = [2] {'resgroup-13': '_id=resgroup-13:_name=Resources:_remoteMoRef=vim.ResourcePool:ha-root-pool', 'resgroup-v7011': '_id=resgroup-v7011:_name=VCH-0-3298:_remoteMoRef=vim.ResourcePool:pool675'}
2017-04-12T20:50:29.856Z verbose vpxa[76BBB70] [Originator@6876 sub=VpxaHalResourcePool opID=53f712a3-01-e8] VpxaHalResourcePoolImpl::GetHostMoRef(vim.ResourcePool:resgroup-13) = vim.ResourcePool:ha-root-pool
2017-04-12T20:50:29.859Z verbose vpxa[76BBB70] [Originator@6876 sub=vpxLro] [VpxLRO] Task task-18831 has been requeued

later:

 2017-04-12T20:50:30.736Z verbose vpxa[7679B70] [Originator@6876 sub=vpxLro opID=GetTaskInfoHandleSuccess-5f961f61] [VpxLRO] Dispatching Error Handler Functor for haTask-ha-folder-vm-vim.Folder.createVm-200914585 which completed with an error from task-18831
 2017-04-12T20:50:30.736Z verbose vpxa[7679B70] [Originator@6876 sub=vpxLro opID=GetTaskInfoHandleSuccess-5f961f61] [VpxLRO] Task task-18831 has been rescheduled
 2017-04-12T20:50:30.736Z verbose vpxa[7760B70] [Originator@6876 sub=vpxLro opID=53f712a3-01-e8] [VpxLRO] LRO task-18831 has been resumed
 2017-04-12T20:50:30.736Z error vpxa[7760B70] [Originator@6876 sub=vpxaVmprov opID=53f712a3-01-e8] [CreateVmFailed] Received unexpected exception from host agent while creating VM: 0775fd54
 2017-04-12T20:50:30.737Z error vpxa[7760B70] [Originator@6876 sub=vpxLro opID=53f712a3-01-e8] [LroMainContinuation] Unexpected Exception: N3Vim5Fault17InvalidDeviceSpec9ExceptionE(vim.fault.InvalidDeviceSpec)
 --> [context]zKq7AUoCAgAAAIqkRQAVdnB4YQAAz/UtbGlidm1hY29yZS5zbwAAonwSAM/fDQFmTt5saWJ2aW0tdHlwZXMuc28AAeG73gIc3RR2cHhhAAJNyhQC5bInAgqzJwJC9ScCtZEnAq2YJwLiSygC7XsTAgFZFgATaCAAPmwgAHWzIADEci4Dam0AbGlicHRocmVhZC5zby4wAATeSw1saWJjLnNvLjYA[/context]
 2017-04-12T20:50:30.739Z verbose vpxa[7760B70] [Originator@6876 sub=PropertyProvider opID=53f712a3-01-e8] RecordOp ASSIGN: info.state, task-18831. Applied change to temp map.
 2017-04-12T20:50:30.739Z verbose vpxa[7760B70] [Originator@6876 sub=PropertyProvider opID=53f712a3-01-e8] RecordOp ASSIGN: info.cancelable, task-18831. Applied change to temp map.
 2017-04-12T20:50:30.739Z verbose vpxa[7760B70] [Originator@6876 sub=PropertyProvider opID=53f712a3-01-e8] RecordOp ASSIGN: info.error, task-18831. Applied change to temp map.
 2017-04-12T20:50:30.739Z verbose vpxa[7760B70] [Originator@6876 sub=PropertyProvider opID=53f712a3-01-e8] [CommitChangesAndNotify] Updating cached values
 2017-04-12T20:50:30.739Z info vpxa[7760B70] [Originator@6876 sub=vpxLro opID=53f712a3-01-e8] [VpxLRO] -- FINISH task-18831
 2017-04-12T20:50:30.739Z info vpxa[7760B70] [Originator@6876 sub=Default opID=53f712a3-01-e8] [VpxLRO] -- ERROR task-18831 -- vpxa -- vpxapi.VpxaService.createVm: vim.fault.InvalidDeviceSpec:
 --> Result:
 --> (vim.fault.InvalidDeviceSpec) {
 -->    faultCause = (vmodl.MethodFault) null,
 -->    faultMessage = <unset>,
 -->    property = "virtualDeviceSpec.device.backing.parent.fileName",
 -->    deviceIndex = 1
 -->    msg = "Invalid configuration for device '1'."
 --> }

Here's vmdk device portion of the spec:

-->       (vim.vm.device.VirtualDeviceSpec) {
-->          operation = "add",
-->          fileOperation = "create",
-->          device = (vim.vm.device.VirtualDisk) {
-->             key = -10,
-->             deviceInfo = (vim.Description) null,
-->             backing = (vim.vm.device.VirtualDisk.FlatVer2BackingInfo) {
-->                fileName = "[]/vmfs/volumes/vsan:52b432a675d2bf59-7168e072a499aad9/ee92ee58-ac10-1339-8331-000af7992758/440e33fd7da9085b428489b961d7356364c85f615dd2ad5ddcafb8a95726e765.vmdk",
-->                datastore = 'vim.Datastore:datastore-93',
-->                backingObjectId = <unset>,
-->                diskMode = "persistent",
-->                split = <unset>,
-->                writeThrough = <unset>,
-->                thinProvisioned = true,
-->                eagerlyScrub = <unset>,
-->                uuid = <unset>,
-->                contentId = <unset>,
-->                changeId = <unset>,
-->                parent = (vim.vm.device.VirtualDisk.FlatVer2BackingInfo) {
-->                   fileName = "[]/vmfs/volumes/vsan:52b432a675d2bf59-7168e072a499aad9/VCH-0-3298/VIC/42076c04-92d3-3439-4084-aab1b55f590c/images/d7f9d170aa714bf287eb994aada10a57f4fb1bf8fd748cbfa36a7ab9549190c1/d7f9d170aa714bf287eb994aada10a57f4fb1bf8fd748cbfa36a7ab9549190c1.vmdk",
-->                   datastore = <unset>,
-->                   backingObjectId = <unset>,
-->                   diskMode = "",
-->                   split = <unset>,
-->                   writeThrough = <unset>,
-->                   thinProvisioned = <unset>,
-->                   eagerlyScrub = <unset>,
-->                   uuid = <unset>,
-->                   contentId = <unset>,
-->                   changeId = <unset>,
-->                   parent = (vim.vm.device.VirtualDisk.FlatVer2BackingInfo) null,
-->                   deltaDiskFormat = <unset>,
-->                   digestEnabled = <unset>,
-->                   deltaGrainSize = <unset>,
-->                   deltaDiskFormatVariant = <unset>,
-->                   sharing = <unset>,
-->                   keyId = (vim.encryption.CryptoKeyId) null
-->                },
-->                deltaDiskFormat = <unset>,
-->                digestEnabled = <unset>,
-->                deltaGrainSize = <unset>,
-->                deltaDiskFormatVariant = <unset>,
-->                sharing = <unset>,
-->                keyId = (vim.encryption.CryptoKeyId) null
-->             },
-->             connectable = (vim.vm.device.VirtualDevice.ConnectInfo) null,
-->             slotInfo = (vim.vm.device.VirtualDevice.BusSlotInfo) null,
-->             controllerKey = 100,
-->             unitNumber = 0,
-->             capacityInKB = 8388608,
-->             capacityInBytes = <unset>,
-->             shares = (vim.SharesInfo) null,
-->             storageIOAllocation = (vim.StorageResourceManager.IOAllocationInfo) null,
-->             diskObjectId = <unset>,
-->             vFlashCacheConfigInfo = (vim.vm.device.VirtualDisk.VFlashCacheConfigInfo) null,
-->             iofilter = <unset>,
-->             vDiskId = (vim.vslm.ID) null
-->          },
-->          profile = (vim.vm.ProfileSpec) [
-->             (vim.vm.DefinedProfileSpec) {
-->                profileId = "aa6d5a82-1c88-45da-85d3-3d74b91a5bad",
-->                replicationSpec = (vim.vm.replication.ReplicationSpec) null,
-->                profileData = (vim.vm.ProfileRawData) {
-->                   extensionKey = "com.vmware.vim.sps",
-->                   objectData = "<ns1:storageProfile xmlns:ns1="http://profile.policy.data.vasa.vim.vmware.com/xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="ns1:StorageProfile"><ns1:constraints><ns1:subProfiles><ns1:capability><capabilityId xmlns="http://capability.policy.data.vasa.vim.vmware.com/xsd"><id>hostFailuresToTolerat
e</id><namespace>VSAN</namespace></capabilityId><constraint xmlns="http://capability.policy.data.vasa.vim.vmware.com/xsd"><propertyInstance><id>hostFailuresToTolerate</id><value xmlns:s845="http://www.w3.org/2001/XMLSchema" xsi:type="s845:int">1</value></propertyInstance></constraint></ns1:capability><ns1:capability><capabilityId xmlns="http://capabili
ty.policy.data.vasa.vim.vmware.com/xsd"><id>cacheReservation</id><namespace>VSAN</namespace></capabilityId><constraint xmlns="http://capability.policy.data.vasa.v
im.vmware.com/xsd"><propertyInstance><id>cacheReservation</id><value xsi:type="s845:int">0</value></propertyInstance></constraint></ns1:capability><ns1:capability><capabilityId xmlns="http://capability.policy.data.vasa.vim.vmware.com/xsd"><id>forceProvisioning</id><namespace>VSAN</namespace></capabilityId><constraint xmlns="http://capability.policy.dat
a.vasa.vim.vmware.com/xsd"><propertyInstance><id>forceProvisioning</id><value xsi:type="s845:boolean">false</value></propertyInstance></constraint></ns1:capability><ns1:capability><capabilityId xmlns="http://capability.policy.data.vasa.vim.vmware.com/xsd"><id>proportionalCapacity</id><namespace>VSAN</namespace></capabilityId><constraint xmlns="http://c
apability.policy.data.vasa.vim.vmware.com/xsd"><propertyInstance><id>proportionalCapacity</id><value xsi:type="s845:int">0</value></propertyInstance></constraint></ns1:capability><ns1:capab
ility><capabilityId xmlns="http://capability.policy.data.vasa.vim.vmware.com/xsd"><id>stripeWidth</id><namespace>VSAN</namespace></capabilityId><constraint xmlns="http://capability.policy.data.vasa.vim.vmware.com/xsd"><propertyInstance><id>stripeWidth</id><value xsi:type="s845:int">1</value></propertyInstance></constraint></ns1:capability><ns1:name>VSA
N sub-profile</ns1:name></ns1:subProfiles></ns1:constraints><ns1:createdBy>Temporary user handle</ns1:createdBy><ns1:creationTime>2017-04-03T16:01:33.115+00:00</ns1:creationTime><ns1:description>Storage policy used as default for Virtual SAN datastores</ns1:description><ns1:generationId>0</ns1:generationId><ns1:lastUpdatedBy>Temporary user handle</ns1:
lastUpdatedBy><ns1:lastUpdatedTime>2017-04-03T16:01:33.115+00:00</ns1:lastUpdatedTime><ns1:name>Virtual SAN Default Storage Policy</ns1:name><ns1:profileId>aa6d5a82-1c88-45da-85d3-3d74b91a5
bad</ns1:profileId></ns1:storageProfile>"
-->                }
-->             }
-->          ],
-->          backing = (vim.vm.device.VirtualDeviceSpec.BackingSpec) null
-->       },

I don't know if this means anything or would cause an InvalidDeviceSpec fault, but I notice that it's trying to create/open the vmdk here:
fileName = "[]/vmfs/volumes/vsan:52b432a675d2bf59-7168e072a499aad9/ee92ee58-ac10-1339-8331-000af7992758/440e33fd7da9085b428489b961d7356364c85f615dd2ad5ddcafb8a95726e765.vmdk".

However the directory /vmfs/volumes/vsan:52b432a675d2bf59-7168e072a499aad9/ee92ee58-ac10-1339-8331-000af7992758 does not exist on any of the 4 ESX hosts in this cluster (specifically the ee92ee58-ac10-1339-8331-000af7992758 directory in the path).

@hickeng any thoughts?

@jzt jzt added the impact/doc/note Requires creation of or changes to an official release note label Apr 13, 2017
@jzt
Copy link
Contributor

jzt commented Apr 13, 2017

@stuclem The workaround for this for now is to retry creating the container.

@hickeng
Copy link
Member

hickeng commented Apr 13, 2017

@jzt /vmfs/volumes/vsan:52b432a675d2bf59-7168e072a499aad9/ee92ee58-ac10-1339-8331-000af7992758 is the target VM folder - if VM creation failed then it should not have been created (or been cleaned up).

The failing element is the back.parent.fileName which is referring to the image disk:

[]/vmfs/volumes/vsan:52b432a675d2bf59-7168e072a499aad9/VCH-0-3298/VIC/42076c04-92d3-3439-4084-aab1b55f590c/images/d7f9d170aa714bf287eb994aada10a57f4fb1bf8fd748cbfa36a7ab9549190c1/d7f9d170aa714bf287eb994aada10a57f4fb1bf8fd748cbfa36a7ab9549190c1.vmdk"

My concern just looking at this output is that it's using the symlink VCH-0-3298 instead of the namespace UUID and we've had point-in-time issues with the symlinks not being available (way back in bonneville days).

@hickeng
Copy link
Member

hickeng commented Apr 14, 2017

Speculative change - this uses the UUID instead of the symlink for the endpointVM if placing the imagestore within its namespace. If this functions then we may see similar problems if explicitly using a dedicated namespace when we don't have an opportunity to perform this translation:

diff --git a/lib/install/management/appliance.go b/lib/install/management/appliance.go
index e50fa0b..7b38eaf 100644
--- a/lib/install/management/appliance.go
+++ b/lib/install/management/appliance.go
@@ -596,8 +596,14 @@ func (d *Dispatcher) createAppliance(conf *config.VirtualContainerHostConfigSpec
        },
        )

+       // fix up those parts of the config that depend on the final applianceVM folder name
        conf.BootstrapImagePath = fmt.Sprintf("[%s] %s/%s", conf.ImageStores[0].Host, d.vmPathName, settings.BootstrapISO)

+       if len(conf.ImageStores[0].Path) == 0 {
+               conf.ImageStores[0].Path = d.vmPathName
+       }
+
+       // apply the fixed-up configuration
        spec, err = d.reconfigureApplianceSpec(vm2, conf, settings)
        if err != nil {
                log.Errorf("Error while getting appliance reconfig spec: %s", err)
@@ -646,12 +652,7 @@ func (d *Dispatcher) reconfigureApplianceSpec(vm *vm.VirtualMachine, conf *confi
        var devices object.VirtualDeviceList
        var err error

-       spec := &types.VirtualMachineConfigSpec{
-               Name:               conf.Name,
-               GuestId:            string(types.VirtualMachineGuestOsIdentifierOtherGuest64),
-               AlternateGuestName: constants.DefaultAltVCHGuestName(),
-               Files:              &types.VirtualMachineFileInfo{VmPathName: fmt.Sprintf("[%s]", conf.ImageStores[0].Host)},
-       }
+       spec := &types.VirtualMachineConfigSpec{}

        // create new devices
        if devices, err = d.configIso(conf, vm, settings); err != nil {
diff --git a/lib/install/validate/storage.go b/lib/install/validate/storage.go
index 9731d88..abf99cc 100644
--- a/lib/install/validate/storage.go
+++ b/lib/install/validate/storage.go
@@ -44,7 +44,7 @@ func (v *Validator) storage(ctx context.Context, input *data.Data, conf *config.

        // provide a default path if only a DS name is provided
        if imageDSpath.Path == "" {
-               imageDSpath.Path = input.DisplayName
+               log.Debug("No path specified on datastore for image store - will use the endpointVM folder")
        }

        if ds != nil {

Can confirm on @caglar10ur 's nested vSAN env that we get the imagestore referenced as expected:

Apr 14 2017 00:46:39.377Z INFO  Creating directory [vsanDatastore] 471bf058-fc51-4ddc-7dbc-000c292f6695/VIC/4211fa85-e4fb-c5bf-d176-afda08a8203e/images
Apr 14 2017 00:46:39.531Z INFO  Creating directory [vsanDatastore] 471bf058-fc51-4ddc-7dbc-000c292f6695/VIC/4211fa85-e4fb-c5bf-d176-afda08a8203e/images/scratch
Apr 14 2017 00:46:39.606Z INFO  Creating directory [vsanDatastore] 471bf058-fc51-4ddc-7dbc-000c292f6695/VIC/4211fa85-e4fb-c5bf-d176-afda08a8203e/images/scratch/imageMetadata

and that the resulting VMDK descriptor for a containerVM is using the UUID path:

# Disk DescriptorFile
version=4
encoding="UTF-8"
CID=6d3177d6
parentCID=fd538d44
isNativeSnapshot="no"
createType="vsanSparse"
parentFileNameHint="/vmfs/volumes/vsan:52d98be77a5c1366-73a9f79cb0275983/471bf058-fc51-4ddc-7dbc-000c292f6695/VIC/4211fa85-e4fb-c5bf-d176-afda08a8203e/images/d7f9d170aa714bf287eb994aada10a57f4fb1bf8fd748cbfa36a7a

@hickeng
Copy link
Member

hickeng commented Apr 14, 2017

From flanders hostd.1.gz, the most likely key message from the host Parent file doesn't exist, or the user has no permission to access.:

2017-04-12T20:50:29.948Z error hostd[F340B70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/vsan:52b432a675d2bf59-7168e072a499aad9/ee92ee58-ac10-1339-8331-000af7992758/vibrant_brahmagupta-440e33fd7da9.vmx opID=53f712a3-01-e8-1753 user=vpxuser:VSPHERE.LOCAL\Administrator] Parent file doesn't exist, or the user has no permission to access.

I have been able to find the following for what I assume is the creation of the parent disk, also in hostd.1:

2017-04-12T20:48:51.303Z info hostd[FBC3B70] [Originator@6876 sub=Libs opID=c686385-66-1682 user=vpxuser:VSPHERE.LOCAL\Administrator] OBJLIB-FILEBE : FileBEOpen: can't open '/vmfs/volumes/vsan:52b432a675d2bf59-7168e072a499aad9/f091ee58-4636-1023-3906-000af7994a5c/VIC/42076c04-92d3-3439-4084-aab1b55f590c/images/d7f9d170aa714bf287eb994aada10a57f4fb1bf8fd748cbfa36a7ab9549190c1/d7f9d170aa714bf287eb994aada10a57f4fb1bf8fd748cbfa36a7ab9549190c1.vmdk' : Could not find the file (393218).
2017-04-12T20:48:51.303Z info hostd[FBC3B70] [Originator@6876 sub=DiskLib opID=c686385-66-1682 user=vpxuser:VSPHERE.LOCAL\Administrator] DISKLIB-DSCPTR: DescriptorOpenInt: failed to open '/vmfs/volumes/vsan:52b432a675d2bf59-7168e072a499aad9/f091ee58-4636-1023-3906-000af7994a5c/VIC/42076c04-92d3-3439-4084-aab1b55f590c/images/d7f9d170aa714bf287eb994aada10a57f4fb1bf8fd748cbfa36a7ab9549190c1/d7f9d170aa714bf287eb994aada10a57f4fb1bf8fd748cbfa36a7ab9549190c1.vmdk': Could not find the file (60002)
2017-04-12T20:48:51.303Z info hostd[FBC3B70] [Originator@6876 sub=DiskLib opID=c686385-66-1682 user=vpxuser:VSPHERE.LOCAL\Administrator] DISKLIB-LINK  : "/vmfs/volumes/vsan:52b432a675d2bf59-7168e072a499aad9/f091ee58-4636-1023-3906-000af7994a5c/VIC/42076c04-92d3-3439-4084-aab1b55f590c/images/d7f9d170aa714bf287eb994aada10a57f4fb1bf8fd748cbfa36a7ab9549190c1/d7f9d170aa714bf287eb994aada10a57f4fb1bf8fd748cbfa36a7ab9549190c1.vmdk" : failed to open (The system cannot find the file specified).
2017-04-12T20:48:51.303Z info hostd[FBC3B70] [Originator@6876 sub=DiskLib opID=c686385-66-1682 user=vpxuser:VSPHERE.LOCAL\Administrator] DISKLIB-CHAIN : "/vmfs/volumes/vsan:52b432a675d2bf59-7168e072a499aad9/f091ee58-4636-1023-3906-000af7994a5c/VIC/42076c04-92d3-3439-4084-aab1b55f590c/images/d7f9d170aa714bf287eb994aada10a57f4fb1bf8fd748cbfa36a7ab9549190c1/d7f9d170aa714bf287eb994aada10a57f4fb1bf8fd748cbfa36a7ab9549190c1.vmdk" : failed to open (The system cannot find the file specified).
2017-04-12T20:48:51.304Z info hostd[FBC3B70] [Originator@6876 sub=DiskLib opID=c686385-66-1682 user=vpxuser:VSPHERE.LOCAL\Administrator] DISKLIB-LIB   : Failed to open '/vmfs/volumes/vsan:52b432a675d2bf59-7168e072a499aad9/f091ee58-4636-1023-3906-000af7994a5c/VIC/42076c04-92d3-3439-4084-aab1b55f590c/images/d7f9d170aa714bf287eb994aada10a57f4fb1bf8fd748cbfa36a7ab9549190c1/d7f9d170aa714bf287eb994aada10a57f4fb1bf8fd748cbfa36a7ab9549190c1.vmdk' with flags 0x180017 The system cannot find the file specified (25).
2017-04-12T20:48:51.308Z info hostd[FBC3B70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/vsan:52b432a675d2bf59-7168e072a499aad9/f091ee58-4636-1023-3906-000af7994a5c/VCH-0-3298.vmx opID=c686385-66-1682 user=vpxuser:VSPHERE.LOCAL\Administrator] Storage policy for disk '/vmfs/volumes/vsan:52b432a675d2bf59-7168e072a499aad9/VCH-0-3298/VIC/42076c04-92d3-3439-4084-aab1b55f590c/images/d7f9d170aa714bf287eb994aada10a57f4fb1bf8fd748cbfa36a7ab9549190c1/d7f9d170aa714bf287eb994aada10a57f4fb1bf8fd748cbfa36a7ab9549190c1.vmdk' specified: '<ns1:storageProfile xmlns:ns1="http://profile.policy.data.vasa.vim.vmware.com/xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="ns1:StorageProfile"><ns1:constraints><ns1:subProfiles><ns1:capability><capabilityId xmlns="http://capability.policy.data.vasa.vim.vmware.com/xsd"><id>stripeWidth</id><namespace>VSAN</namespace>
</capabilityId><constraint xmlns="http://capability.policy.data.vasa.vim.vmware.com/xsd"><propertyInstance><id>stripeWidth</id><value xmlns:s843="http://www.w3.org/2001/XMLSchema" xsi:type="s843:int">1</value></propertyInstance></constraint></ns1:capability><ns1:capability><capabilityId xmlns="http://capability.policy.data.vasa.vim.vmware.com/xsd"><id>hostFailuresToTolerate</id><namespace>VSAN</namespace></capabilityId><constraint xmlns="http://capability.policy.data.vasa.vim.vmware.com/xsd"><propertyInstance><id>hostFailuresToTolerate</id><value xsi:type="s843:int">1</value></propertyInstance></constraint></ns1:capability><ns1:capability><capabilityId xmlns="http://capability.policy.data.vasa.vim.vmware.com/xsd"><id>cacheReservation</id><namespace>VSAN</namespace></capabilityId><constraint xmlns="http://capability.policy.data.vasa.vim.vmware.com/xsd"><propertyInstance><id>cacheReser
vation</id><value xsi:type="s843:int">0</value></propertyInstance></constraint></ns1:capability><ns1:capability><capabilityId xmlns="http://capability.policy.data.vasa.vim.vmware.com/xsd"><id>proportionalCapacity</id><namespace>VSAN</namespace></capabilityId><constraint xmlns="http://capability.policy.data.vasa.vim.vmware.com/xsd"><propertyInstance><id>proportionalCapacity</id><value xsi:type="s843:int">0</value></propertyInstance></constraint></ns1:capability><ns1:capability><capabilityId xmlns="http://capability.policy.data.vasa.vim.vmware.com/xsd"><id>forceProvisioning</id><namespace>VSAN</namespace></capabilityId><constraint xmlns="http://capability.policy.data.vasa.vim.vmware.com/xsd"><propertyInstance><id>forceProvisioning</id><value xsi:type="s843:boolean">false</value></propertyInstance></constraint></ns1:capability><ns1:name>VSAN sub-profile</ns1:name></ns1:subProfiles></ns1
:constraints><ns1:createdBy>Temporary user handle</ns1:createdBy><ns1:creationTime>2017-04-03T16:01:33.115+00:00</ns1:creationTime><ns1:description>Storage policy used as default for Virtual SAN datastores</ns1:description><ns1:generationId>0</ns1:generationId><ns1:lastUpdatedBy>Temporary user handle</ns1:lastUpdatedBy><ns1:lastUpdatedTime>2017-04-03T16:01:33.115+00:00</ns1:lastUpdatedTime><ns1:name>Virtual SAN Default Storage Policy</ns1:name><ns1:profileId>aa6d5a82-1c88-45da-85d3-3d74b91a5bad</ns1:profileId></ns1:storageProfile>'
2017-04-12T20:48:51.317Z error hostd[FBC3B70] [Originator@6876 sub=Default opID=c686385-66-1682 user=vpxuser:VSPHERE.LOCAL\Administrator] Create Child using storage policy '<ns1:storageProfile xmlns:ns1="http://profile.policy.data.vasa.vim.vmware.com/xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="ns1:StorageProfile"><ns1:constraints><ns1:subProfiles><ns1:capability><capabilityId xmlns="http://capability.policy.data.vasa.vim.vmware.com/xsd"><id>stripeWidth</id><namespace>VSAN</namespace></capabilityId><constraint xmlns="http://capability.policy.data.vasa.vim.vmware.com/xsd"><propertyInstance><id>stripeWidth</id><value xmlns:s843="http://www.w3.org/2001/XMLSchema" xsi:type="s843:int">1</value></propertyInstance></constraint></ns1:capability><ns1:capability><capabilityId xmlns="http://capability.policy.data.vasa.vim.vmware.com/xsd"><id>h
ostFailuresToTolerate</id><namespace>VSAN</namespace></capabilityId><constraint xmlns="http://capability.policy.data.vasa.vim.vmware.com/xsd"><propertyInstance><id>hostFailuresToTolerate</id><value xsi:type="s843:int">1</value></propertyInstance></constraint></ns1:capability><ns1:capability><capabilityId xmlns="http://capability.policy.data.vasa.vim.vmware.com/xsd"><id>cacheReservation</id><namespace>VSAN</namespace></capabilityId><constraint xmlns="http://capability.policy.data.vasa.vim.vmware.com/xsd"><propertyInstance><id>cacheReservation</id><value xsi:type="s843:int">0</value></propertyInstance></constraint></ns1:capability><ns1:capability><capabilityId xmlns="http://capability.policy.data.vasa.vim.vmware.com/xsd"><id>proportionalCapacity</id><namespace>VSAN</namespace></capabilityId><constraint xmlns="http://capability.policy.data.vasa.vim.vmware.com/xsd"><propertyInstance><id>
proportionalCapacity</id><value xsi:type="s843:int">0</value></propertyInstance></constraint></ns1:capability><ns1:capability><capabilityId xmlns="http://capability.policy.data.vasa.vim.vmware.com/xsd"><id>forceProvisioning</id><namespace>VSAN</namespace></capabilityId><constraint xmlns="http://capability.policy.data.vasa.vim.vmware.com/xsd"><propertyInstance><id>forceProvisioning</id><value xsi:type="s843:boolean">false</value></propertyInstance></constraint></ns1:capability><ns1:name>VSAN sub-profile</ns1:name></ns1:subProfiles></ns1:constraints><ns1:createdBy>Temporary user handle</ns1:createdBy><ns1:creationTime>2017-04-03T16:01:33.115+00:00</ns1:creationTime><ns1:description>Storage policy used as default for Virtual SAN datastores</ns1:description><ns1:generationId>0</ns1:generationId><ns1:lastUpdatedBy>Temporary user handle</ns1:lastUpdatedBy><ns1:lastUpdatedTime>2017-04-03T16
:01:33.115+00:00</ns1:lastUpdatedTime><ns1:name>Virtual SAN Default Storage Policy</ns1:name><ns1:profileId>aa6d5a82-1c88-45da-85d3-3d74b91a5bad</ns1:profileId></ns1:storageProfile>' for '/vmfs/volumes/vsan:52b432a675d2bf59-7168e072a499aad9/VCH-0-3298/VIC/42076c04-92d3-3439-4084-aab1b55f590c/images/d7f9d170aa714bf287eb994aada10a57f4fb1bf8fd748cbfa36a7ab9549190c1/d7f9d170aa714bf287eb994aada10a57f4fb1bf8fd748cbfa36a7ab9549190c1.vmdk'
2017-04-12T20:48:51.319Z info hostd[FBC3B70] [Originator@6876 sub=DiskLib opID=c686385-66-1682 user=vpxuser:VSPHERE.LOCAL\Administrator] DISKLIB-LIB_CREATE   : CREATE CHILD: "/vmfs/volumes/vsan:52b432a675d2bf59-7168e072a499aad9/VCH-0-3298/VIC/42076c04-92d3-3439-4084-aab1b55f590c/images/d7f9d170aa714bf287eb994aada10a57f4fb1bf8fd748cbfa36a7ab9549190c1/d7f9d170aa714bf287eb994aada10a57f4fb1bf8fd748cbfa36a7ab9549190c1.vmdk" -- vmfsSparse cowGran=1 allocType=0 policy='<ns1:storageProfile xmlns:ns1="http://profile.policy.data.vasa.vim.vmware.com/xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="ns1:StorageProfile"><ns1:constraints><ns1:subProfiles><ns1:capability><capabilityId xmlns="http://capability.policy.data.vasa.vim.vmware.com/xsd"><id>stripeWidth</id><namespace>VSAN</namespace></capabilityId><constraint xmlns="http://capability.policy.da
ta.vasa.vim.vmware.com/xsd"><propertyInstance><id>stripeWidth</id><value xmlns:s843="http://www.w3.org/2001/XMLSchema" xsi:type="s843:int">1</value></propertyInstance></constraint></ns1:capability><ns1:capability><capabilityId xmlns="http://capability.policy.data.vasa.vim.vmware.com/xsd"><id>hostFailuresToTolerate</id><namespace>VSAN</namespace></capabilityId><constraint xmlns="http://capability.policy.data.vasa.vim.vmware.com/xsd"><propertyInstance><id>hostFailuresToTolerate</id><value xsi:type="s843:int">1</value></propertyInstance></constraint></ns1:capability><ns1:capability><capabilityId xmlns="http://capability.policy.data.vasa.vim.vmware.com/xsd"><id>cacheReservation</id><namespace>VSAN</namespace></capabilityId><constraint xmlns="http://capability.policy.data.vasa.vim.vmware.com/xsd"><propertyInstance><id>cacheReservation</id><value xsi:type="s843:int">0</value></propertyInst
ance></constraint></ns1:capability><ns1:capability><capabilityId xmlns="http://capability.policy.data.vasa.vim.vmware.com/xsd"><id>proportionalCapacity</id><namespace>VSAN</namespace></capabilityId><constraint xmlns="http://capability.policy.data.vasa.vim.vmware.com/xsd"><propertyInstance><id>proportionalCapacity</id><value xsi:type="s843:int">0</value></propertyInstance></constraint></ns1:capability><ns1:capability><capabilityId xmlns="http://capability.policy.data.vasa.vim.vmware.com/xsd"><id>forceProvisioning</id><namespace>VSAN</namespace></capabilityId><constraint xmlns="http://capability.policy.data.vasa.vim.vmware.com/xsd"><propertyInstance><id>forceProvisioning</id><value xsi:type="s843:boolean">false</value></propertyInstance></constraint></ns1:capability><ns1:name>VSAN sub-profile</ns1:name></ns1:subProfiles></ns1:constraints><ns1:createdBy>Temporary user handle</ns1:create
dBy><ns1:creationTime>2017-04-03T16:01:33.115+00:00</ns1:creationTime><ns1:description>Storage policy used as default for Virtual SAN datastores</ns1:description><ns1:generationId>0</ns1:generationId><ns1:lastUpdatedBy>Temporary user handle</ns1:lastUpdatedBy><ns1:lastUpdatedTime>2017-04-03T16:01:33.115+00:00</ns1:lastUpdatedTime><ns1:name>Virtual SAN Default Storage Policy</ns1:name><ns1:profileId>aa6d5a82-1c88-45da-85d3-3d74b91a5bad</ns1:profileId></ns1:storageProfile>'
2017-04-12T20:48:51.326Z info hostd[FBC3B70] [Originator@6876 sub=Libs opID=c686385-66-1682 user=vpxuser:VSPHERE.LOCAL\Administrator] OBJLIB-LIB:  ObjLib_SupportsNativeSparse: Parent disk supports native sparse. Parent object BE is 'vsan'. Child path is '/vmfs/volumes/vsan:52b432a675d2bf59-7168e072a499aad9/VCH-0-3298/VIC/42076c04-92d3-3439-4084-aab1b55f590c/images/d7f9d170aa714bf287eb994aada10a57f4fb1bf8fd748cbfa36a7ab9549190c1/d7f9d170aa714bf287eb994aada10a57f4fb1bf8fd748cbfa36a7ab9549190c1.vmdk'
2017-04-12T20:48:51.331Z vsanSoapServer: VSAN_SPBMToVsanExpressionString:628:Emitted Vsan Expr: (("spbmProfileId" 'aa6d5a82-1c88-45da-85d3-3d74b91a5bad' )("spbmProfileGenerationNumber" l+0 )("spbmProfileName" 'Virtual SAN Default Storage Policy' )("stripeWidth" i1  ) ("hostFailuresToTolerate" i1  ) ("cacheReservation" i0  ) ("proportionalCapacity" i0  ) ("forceProvisioning" i0  )  )
2017-04-12T20:48:51.364Z info hostd[FBC3B70] [Originator@6876 sub=Libs opID=c686385-66-1682 user=vpxuser:VSPHERE.LOCAL\Administrator] OBJLIB-VSANOBJ: VsanObjCreateLockFile: Created lock file for object 'b392ee58-dba7-c528-bcdb-000af7992758'
2017-04-12T20:48:51.365Z info hostd[FBC3B70] [Originator@6876 sub=Libs opID=c686385-66-1682 user=vpxuser:VSPHERE.LOCAL\Administrator] OBJLIB-VSANOBJ: VsanObjCreate: Vsan object created with UUID: b392ee58-dba7-c528-bcdb-000af7992758
2017-04-12T20:48:51.519Z verbose hostd[C4D09D0] [Originator@6876 sub=PropertyProvider opID=c686385-66-1682 user=vpxuser:VSPHERE.LOCAL\Administrator] RecordOp ASSIGN: childConfiguration, pool675. Sent notification immediately.
2017-04-12T20:48:51.519Z verbose hostd[C4D09D0] [Originator@6876 sub=PropertyProvider opID=c686385-66-1682 user=vpxuser:VSPHERE.LOCAL\Administrator] RecordOp ASSIGN: vm["2244"], pool675. Sent notification immediately.
2017-04-12T20:48:51.582Z info hostd[FBC3B70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/vsan:52b432a675d2bf59-7168e072a499aad9/f091ee58-4636-1023-3906-000af7994a5c/VCH-0-3298.vmx opID=c686385-66-1682 user=vpxuser:VSPHERE.LOCAL\Administrator] State Transition (VM_STATE_RECONFIGURING -> VM_STATE_ON)
2017-04-12T20:48:51.582Z verbose hostd[FBC3B70] [Originator@6876 sub=PropertyProvider opID=c686385-66-1682 user=vpxuser:VSPHERE.LOCAL\Administrator] RecordOp ASSIGN: disabledMethod, 2244. Sent notification immediately.
2017-04-12T20:48:51.582Z info hostd[11180B70] [Originator@6876 sub=Guestsvc.GuestFileTransfer] Entered VmPowerStateListener
2017-04-12T20:48:51.582Z info hostd[11180B70] [Originator@6876 sub=Guestsvc.GuestFileTransfer] VmPowerStateListener succeeded
2017-04-12T20:48:51.582Z info hostd[11180B70] [Originator@6876 sub=Hbrsvc] Replicator: powerstate change VM: 2244 Old: 1 New: 1
2017-04-12T20:48:51.582Z verbose hostd[FBC3B70] [Originator@6876 sub=PropertyProvider opID=c686385-66-1682 user=vpxuser:VSPHERE.LOCAL\Administrator] RecordOp ASSIGN: runtime.powerState, 2244. Sent notification immediately.
2017-04-12T20:48:51.582Z verbose hostd[FBC3B70] [Originator@6876 sub=PropertyProvider opID=c686385-66-1682 user=vpxuser:VSPHERE.LOCAL\Administrator] RecordOp ASSIGN: runtime.device, 2244. Sent notification immediately.
2017-04-12T20:48:51.582Z verbose hostd[FBC3B70] [Originator@6876 sub=PropertyProvider opID=c686385-66-1682 user=vpxuser:VSPHERE.LOCAL\Administrator] RecordOp ASSIGN: summary.runtime.powerState, 2244. Sent notification immediately.
2017-04-12T20:48:51.582Z verbose hostd[FBC3B70] [Originator@6876 sub=PropertyProvider opID=c686385-66-1682 user=vpxuser:VSPHERE.LOCAL\Administrator] RecordOp ASSIGN: summary.runtime.vFlashCacheAllocation, 2244. Sent notification immediately.
2017-04-12T20:48:51.582Z verbose hostd[FBC3B70] [Originator@6876 sub=PropertyProvider opID=c686385-66-1682 user=vpxuser:VSPHERE.LOCAL\Administrator] RecordOp ASSIGN: runtime.vFlashCacheAllocation, 2244. Sent notification immediately.
2017-04-12T20:48:51.583Z verbose hostd[FBC3B70] [Originator@6876 sub=PropertyProvider opID=c686385-66-1682 user=vpxuser:VSPHERE.LOCAL\Administrator] RecordOp REMOVE: latestPage[41369], session[52999898-0f15-be2f-0a54-9403da99bb2a]52354ea4-09a5-dae3-6d47-dc48b4ca5968. Applied change to temp map.
2017-04-12T20:48:51.583Z verbose hostd[FBC3B70] [Originator@6876 sub=PropertyProvider opID=c686385-66-1682 user=vpxuser:VSPHERE.LOCAL\Administrator] RecordOp ADD: latestPage[41379], session[52999898-0f15-be2f-0a54-9403da99bb2a]52354ea4-09a5-dae3-6d47-dc48b4ca5968. Applied change to temp map.
2017-04-12T20:48:51.583Z verbose hostd[FBC3B70] [Originator@6876 sub=PropertyProvider opID=c686385-66-1682 user=vpxuser:VSPHERE.LOCAL\Administrator] RecordOp ASSIGN: latestEvent, ha-eventmgr. Applied change to temp map.
2017-04-12T20:48:51.583Z info hostd[FBC3B70] [Originator@6876 sub=Vimsvc.ha-eventmgr opID=c686385-66-1682 user=vpxuser:VSPHERE.LOCAL\Administrator] Event 41379 : Reconfigured VCH-0-3298 on flanders.eng.vmware.com in ha-datacenter.

It seems we are also unregistering a VM that held a reference to it a little while before the failure:

2017-04-12T20:48:59.216Z info hostd[CEE0B70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/vsan:52b432a675d2bf59-7168e072a499aad9/b692ee58-e953-eebb-8ff3-000af7992758/serene_goldwasser-96ae8c69b88f.vmx] Checking for all objects accessibility (VM's current state: VM_STATE_OFF, stable? true)
2017-04-12T20:48:59.216Z info hostd[CEE0B70] [Originator@6876 sub=VmObjectStorageMonitor] Object UUID 5192ee58-d7ee-1a94-fbf6-000af7992758 APD state: Healthy
2017-04-12T20:48:59.216Z info hostd[CEE0B70] [Originator@6876 sub=VmObjectStorageMonitor] Object UUID af92ee58-2f99-eb67-0871-000af7992758 APD state: Healthy
2017-04-12T20:48:59.216Z info hostd[CEE0B70] [Originator@6876 sub=VmObjectStorageMonitor] Object UUID b392ee58-dba7-c528-bcdb-000af7992758 APD state: Healthy
2017-04-12T20:48:59.216Z info hostd[CEE0B70] [Originator@6876 sub=VmObjectStorageMonitor] Object UUID b692ee58-e953-eebb-8ff3-000af7992758 APD state: Healthy
2017-04-12T20:48:59.216Z info hostd[CEE0B70] [Originator@6876 sub=VmObjectStorageMonitor] Object UUID b892ee58-e37d-09d6-a8da-000af7992758 APD state: Healthy

@hickeng
Copy link
Member

hickeng commented Apr 14, 2017

Uploading hostd logs from all hosts for the time period in question - only flanders and krusty seem involved with b392ee58-dba7-c528-bcdb-000af7992758

hostd.logs..zip

@hickeng
Copy link
Member

hickeng commented Apr 14, 2017

In case it's useful I've been using the following to map times:

vpxd.log:
2017-04-12T20:48:56.172Z verbose vpxa[7826B70] [Originator@6876 sub=vpxaMoVm opID=WFU-6382cc78] [VpxaMoVm] Created VM: serene_goldwasser-96ae8c69b88f (/vmfs/volumes/vsan:52b432a675d2bf59-7168e072a499aad9/b692ee58-e953-eebb-8ff3-000af7992758/serene_goldwasser-96ae8c69b88f.vmx). State: 0, FTState: 0

krusty:
	goldwasser created:
		2017-04-12T20:52:01.157Z info hostd[2A9A8B70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/vsan:52b432a675d2bf59-7168e072a499aad9/b692ee58-e953-eebb-8ff3-000af7992758/serene_goldwasser-96ae8c69b88f.vmx opID=1545f442-01-2d-fdfa user=vpxuser:VSPHERE.LOCAL\Administrator] Initialized virtual machine.
		2017-04-12T20:52:01.253Z info hostd[2A9A8B70] [Originator@6876 sub=Vimsvc.TaskManager opID=1545f442-01-2d-fdfa user=vpxuser:VSPHERE.LOCAL\Administrator] Task Completed : haTask-ha-folder-vm-vim.Folder.registerVm-174008676 Status success

	goldwasser power on:
		2017-04-12T20:52:05.618Z info hostd[2ACD8B70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/vsan:52b432a675d2bf59-7168e072a499aad9/b692ee58-e953-eebb-8ff3-000af7992758/serene_goldwasser-96ae8c69b88f.vmx opID=1545f442-02-bb-fe37 user=vpxuser:VSPHERE.LOCAL\Administrator] State Transition (VM_STATE_OFF -> VM_STATE_POWERING_ON)
	
	delete goldwasser:
		2017-04-12T20:52:36.731Z info hostd[2AE81B70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/vsan:52b432a675d2bf59-7168e072a499aad9/b692ee58-e953-eebb-8ff3-000af7992758/serene_goldwasser-96ae8c69b88f.vmx opID=4543efc3-20-fef7 user=vpxuser:VSPHERE.LOCAL\Administrator] State Transition (VM_STATE_DELETING -> VM_STATE_GONE)

vpxd to krusty time offset:
20:48:56 -> 20:52:01 = 3m5s

vpxd.log:
	create vibrant_brahmagupta directory:
		2017-04-12T20:49:50.606Z info vpxa[77C3B70] [Originator@6876 sub=vpxLro opID=53f712a3-01-c1] [VpxLRO] -- BEGIN lro-221968 -- vpxa -- vpxapi.VpxaService.reserveName -- 52634330-6b1b-ecde-ab69-a7cfdb4d387b
		2017-04-12T20:49:50.607Z verbose vpxa[77C3B70] [Originator@6876 sub=vpxaNameReserve opID=53f712a3-01-c1] [ReserveFileWithGivenRootDir] Directory /vmfs/volumes/vsan:52b432a675d2bf59-7168e072a499aad9 is available for use
		2017-04-12T20:49:50.607Z info vpxa[77C3B70] [Originator@6876 sub=hostdds opID=53f712a3-01-c1] [VpxaHalDSHostagent::CreateDirectory] root directory: '/vmfs/volumes/vsan:52b432a675d2bf59-7168e072a499aad9/', display name: 'vibrant_brahmagupta-440e33fd7da9'

flanders
	create directory:
		2017-04-12T20:49:50.609Z info hostd[FB01B70] [Originator@6876 sub=Vimsvc.TaskManager opID=53f712a3-01-c1-1730 user=vpxuser:vpxuser] Task Created : haTask--vim.DatastoreNamespaceManager.CreateDirectory-200914565
		2017-04-12T20:49:50.609Z verbose hostd[FB01B70] [Originator@6876 sub=PropertyProvider opID=53f712a3-01-c1-1730 user=vpxuser:vpxuser] RecordOp ADD: recentTask["haTask--vim.DatastoreNamespaceManager.CreateDirectory-200914565"], ha-taskmgr. Applied change to temp map.
		2017-04-12T20:49:50.609Z verbose hostd[EC40B70] [Originator@6876 sub=PropertyProvider opID=53f712a3-01-c1-1730 user=vpxuser:vpxuser] RecordOp ASSIGN: info, haTask--vim.DatastoreNamespaceManager.CreateDirectory-200914565. Applied change to temp map.
		2017-04-12T20:49:50.609Z info hostd[EC40B70] [Originator@6876 sub=Hostsvc.DatastoreNamespaceManager opID=53f712a3-01-c1-1730 user=vpxuser:vpxuser] DatastoreNamespaceManagerImpl::CreateDirectory (displayName: 'vibrant_brahmagupta-440e33fd7da9') (friendlyPath: '/vmfs/volumes/vsan:52b432a675d2bf59-7168e072a499aad9/vibrant_brahmagupta-440e33fd7da9')

vpxd to flanders time offset:
20:49:50 -> 20:49:50 = 0s

Time of interest -> enoexist (flanders):
2017-04-12T20:50:29.948Z

time of interest on krusty
20:50:29 + 3m5s = 20:53:34

@caglar10ur
Copy link
Contributor

My nested vsan env is using 6.5 if it matters.

@stuclem
Copy link
Contributor

stuclem commented Apr 17, 2017

@jzt thanks for providing the workaround. If you could also provide me a summary of the problem I'd be most grateful, as I have no clue what this issue is about!

@jzt
Copy link
Contributor

jzt commented Apr 17, 2017

@stuclem The main gist of this issue is that, when attempting to create a VMDK for the RW layer of a container during container create, there exists a scenario in which the parent VMDK cannot be accessed/located, resulting in an InvalidDeviceSpec fault. It appears to be related to vSAN, perhaps the child-most layer VMDK in the image is not presented in time immediately after pull when the create is attempted?

@stuclem
Copy link
Contributor

stuclem commented Apr 20, 2017

Thanks @jzt. Here's a belated attempt at a write up:


  • Running docker create results in InvalidDeviceSpec. #4666
    When attempting to create a VMDK for the read-write layer of a container during docker create, the parent VMDK sometimes cannot be accessed or located, resulting in an InvalidDeviceSpec fault. This is possibly related to vSAN storage, if the child-most layer VMDK in the image is not presented in time immediately after pull when you attempt container create.

    Workaround: Attempt to create the container again.


Is this OK? Thanks!

@jzt
Copy link
Contributor

jzt commented Apr 20, 2017

@hickeng what do you think?

@hickeng
Copy link
Member

hickeng commented Apr 20, 2017

@stuclem I don't think we should add the speculation. How about:

When attempting to create a VMDK for the read-write layer of a container during docker create, the parent VMDK sometimes cannot be accessed or located, resulting in an InvalidDeviceSpec fault. This is specific to vSAN datastores.

@stuclem
Copy link
Contributor

stuclem commented Apr 25, 2017

@hickeng done!

@stuclem stuclem removed the impact/doc/note Requires creation of or changes to an official release note label Apr 25, 2017
@jzt jzt self-assigned this Apr 26, 2017
@jzt jzt added this to the Sprint 8 milestone Apr 26, 2017
@rogeliosanchez
Copy link
Contributor

Seen in build 9992:
5-4-High-Availability.zip

@jzt
Copy link
Contributor

jzt commented Oct 2, 2017

@mhagen-vmware
Copy link
Contributor

Longevity failure:
container-logs.zip

Oct 11 2017 20:03:13.139Z ERROR op=293.17: unexpected fault on task retry: &types.InvalidDeviceSpec{InvalidVmConfig:types.InvalidVmConfig{VmConfigFault:types.VmConfigFault{VimFault:types.VimFault{MethodFault:types.MethodFault{FaultCause:(*types.LocalizedMethodFault)(nil), FaultMessage:[]types.LocalizableMessage(nil)}}}, Property:"virtualDeviceSpec.device.backing.parent.fileName"}, DeviceIndex:1}
Oct 11 2017 20:03:13.139Z ERROR An error occurred while waiting for a creation operation to complete. Spec was {DynamicData:{} ChangeVersion: Name:peaceful_curran-853ba10d81e1 Version: Uuid:e93e15ed-68ee-5c33-9b82-0f533bdfbcb1 InstanceUuid: NpivNodeWorldWideName:[] NpivPortWorldWideName:[] NpivWorldWideNameType: NpivDesiredNodeWwns:0 NpivDesiredPortWwns:0 NpivTemporaryDisabled:<nil> NpivOnNonRdmDisks:<nil> NpivWorldWideNameOp: LocationId: GuestId:otherGuest64 AlternateGuestName:Photon - Container v1.2.0-rc1, 14003, fbd0344 Annotation: Files:0xc4207725f0 Tools:<nil> Flags:<nil> ConsolePreferences:<nil> PowerOpInfo:<nil> NumCPUs:2 NumCoresPerSocket:0 MemoryMB:2048 MemoryHotAddEnabled:0xc420c21660 CpuHotAddEnabled:0xc420c21660 CpuHotRemoveEnabled:<nil> VirtualICH7MPresent:<nil> VirtualSMCPresent:<nil> DeviceChange:[0xc420772640 0xc420772690 0xc4207726e0 0xc420772820 0xc420772be0 0xc420772cd0 0xc420772eb0 0xc420772ff0] CpuAllocation:<nil> MemoryAllocation:<nil> LatencySensitivity:<nil> CpuAffinity:<nil> MemoryAffinity:<nil> NetworkShaper:<nil> CpuFeatureMask:[] ExtraConfig:[0xc4207d1da0 0xc4207d1dc0 0xc4207d1de0 0xc4207d1e20 0xc4207d1e40 0xc4207d1e80 0xc4207d1ea0 0xc4207d1ec0 0xc4207d1ee0 0xc4207d1f00 0xc4207daca0 0xc4207dacc0 0xc4207dace0 0xc4207dad00 0xc4207dad40 0xc4207dad80 0xc4207dada0 0xc4207dadc0 0xc4207dae00 0xc4207dae20 0xc4207dae40 0xc4207dae60 0xc4207dae80 0xc4207daec0 0xc4207daee0 0xc4207daf00 0xc4207daf20 0xc4207daf60 0xc4207dafa0 0xc4207dafc0 0xc4206e0580 0xc4206e05a0 0xc4206e05e0 0xc4206e0600 0xc4206e0620 0xc4206e0640 0xc4206e0660 0xc4206e0680 0xc4206e06a0 0xc4206e06c0 0xc4206e06e0 0xc4206e0700 0xc4206e0720 0xc4206e0740 0xc4206e0760 0xc4206e0780 0xc4206e07a0 0xc4206e07c0 0xc4206e07e0 0xc4206e0820 0xc4206e0840 0xc4206e0880 0xc4206e08a0 0xc4206e08c0 0xc4206e0900 0xc4206e0920 0xc4206e0940 0xc4206e0960 0xc4206e0980 0xc4206e09c0 0xc4206e09e0 0xc4206e0a00 0xc4206e0a20 0xc4206e0a40 0xc4206e0a80 0xc4206e0aa0 0xc4206e0b00 0xc4206e0b20 0xc4206e0b40 0xc4206e0b80 0xc4206e0ba0 0xc4206e0bc0 0xc4206e0be0 0xc4206e0c00 0xc4206e0c20 0xc4206e0c60 0xc4206e0c80 0xc4206e0ca0 0xc4206e0cc0 0xc4206e0d00 0xc4206e0d20 0xc4206e0d40 0xc4206e0da0 0xc4206e0dc0 0xc4206e0de0 0xc4206e0e00 0xc4206e0e20 0xc4206e0e60 0xc4206e0e80 0xc4206e0ea0 0xc4206e0ec0 0xc4206e0f00 0xc4206e0f20 0xc4206e0f40] SwapPlacement: BootOptions:<nil> VAppConfig:<nil> FtInfo:<nil> RepConfig:<nil> VAppConfigRemoved:<nil> VAssertsEnabled:<nil> ChangeTrackingEnabled:<nil> Firmware: MaxMksConnections:0 GuestAutoLockEnabled:<nil> ManagedBy:<nil> MemoryReservationLockedToMax:<nil> NestedHVEnabled:<nil> VPMCEnabled:<nil> ScheduledHardwareUpgradeInfo:<nil> VmProfile:[] MessageBusTunnelEnabled:<nil> Crypto:<nil> MigrateEncryption:}
Oct 11 2017 20:03:13.139Z ERROR CommitHandler error on handle(ec914e7b81371a1b507c01aa3d8863d2) for 853ba10d81e16a56b76261f5dcf2b738ba723cda72ebdd2ce2c198c03878a12f: Invalid configuration for device '1'.

This test is running VC build: Version 6.0.0 Build 5318172

@gigawhitlocks
Copy link
Contributor

Seen again during a docker create in Nightly on 10/23/17 vC 6.0 5-4 HA test. Full logs here https://storage.cloud.google.com/vic-ci-logs/functional_logs_10_22_06_09_vic_14150.tar.gz.zip?authuser=1

@hickeng hickeng self-assigned this Oct 25, 2017
@cgtexmex cgtexmex added this to the Sprint 20 Foundation milestone Oct 25, 2017
@matthewavery
Copy link
Contributor

matthewavery commented Oct 30, 2017

Seen again during a docker run in HA regression tests. Provided are the pertinent logs from the nightly run of 10-29-2017. for VC 6.0 --->
5-4-High-Availability.zip

@matthewavery
Copy link
Contributor

Seen again during a VC 6.0 HA test on 11/1/2017:
5-4-High-Availability.zip

@matthewavery
Copy link
Contributor

Seen again during a VC 6.0 HA test on 11/2/2017(second day in a row):
this is VIC Engine build 14352

5-4-High-Availability.zip

@matthewavery
Copy link
Contributor

Seen again during a VC 6.0 nightly HA test on 11/4/2017:
This is vic engine build 14394

5-4-High-Availability.zip

@matthewavery
Copy link
Contributor

seen again during a VC 6.0 nightly HA test failure on 11/6/2017:
This is vic engine build 14342

5-4-High-Availability.zip

@mlh78750 mlh78750 added the source/scenario Found via a scenario failure label Nov 8, 2017
@hickeng
Copy link
Member

hickeng commented Nov 15, 2017

Set up a nimbus deployment to use for recreate grind using the nightly scripts for precise duplication.

I've seen #6263 so have put in a workaround for that.

I've also seen that the NFS datastore becomes disconnected/inactive from all hosts during the HA failover, preventing continuation of the test. The nfsDatastore appears to be dropped from all the hosts (this could be related to having rerun the test multiple times against the same cluster - it may be that reconnecting the host is not reconnecting the datastore).

The only remediation I've found is to manually remount the NFS datastore to an ESX:

esxcli storage nfs add -H 10.161.2.65 -s /store -v nfsDatastore

then use Mount Datastore to Additional Hosts menu option to reconnect to the others.
I'd recommend:
a. moving off NFS datastores (iSCSI or vSAN)
b. deploying within a single nimbus pod via a testbed - there's connectivity issues in our current deployments

Also seen the following:

<msg timestamp="20171115 23:42:44.956" level="INFO">Running command 'docker -H 10.192.39.86:2376 --tls rm d7c0cc7a52659a4f4dcbca4b98dee71d1a90fe1047301e65f6690e074ce0c574 2&gt;&amp;1'.</msg>
<msg timestamp="20171115 23:42:46.003" level="INFO">${rc} = 1</msg>
<msg timestamp="20171115 23:42:46.003" level="INFO">${output} = Error response from daemon: Server error from portlayer: Couldn't remove container. The ESX host is temporarily disconnected. Please try again later.</msg>

Tasks:

@hickeng
Copy link
Member

hickeng commented Nov 16, 2017

Having addressed various setup related issues (#6772 and #6263) I've been unable to recreate this failure (3 clean runs in sequence at this point).

Either the failure is related to an issue in a specific instance of the deployed test infrastructure and my current infrastructure doesn't present the problem, or it's a lot more intermittent than currently presented.

I believe that to diagnose further we need a mechanism to leave the nimbus infrastructure in place after a failed test for inspection. I think we'll at the very least need the NFS datastore server logs to determine whether there was actually a file access attempted.

@cgtexmex
Copy link
Contributor

cgtexmex commented Dec 5, 2017

@hickeng do you think this could be related to #4858 ? I've not looked at these logs, but the issue with #4858 is the rehydration of the image cache -- if it's out of whack then I imagine device specs could be invalid....since we'd be creating them bad data...

@hickeng
Copy link
Member

hickeng commented Dec 9, 2017

@cgtexmex I think this may be related to #4858 but a different error path possibly. I have found the following:

  1. we restart the VCH
  2. reload existing containers
  3. restore DNS
  4. update logging/vspc config
  5. reload volume cache
  6. reload image cache

In step 6 we discover that the image store is inconsistent and we try to delete it so it can be recreated; however we have containerVMs using the store which prevents in use files from being erased.
I believe that the inconsistent image store error is because of An error occurred while communicating with the remote host seen in the portlayer log fragment below.

It looks like we cannot delete the actual data files for the VMDKs so we have xxx-delta.vmdk still existing, but no xxx.vmdk or manifest files.

I suspect that the invalid config is because of the missing vmdk descriptor file - however I've yet to confirm that this isn't a business as usual quirk of an NFS datastore.
On the NFS datastore server:

root@nimbus-nfsserver:/store/VCH-0-4113/VIC/4225a056-a7d9-1430-6026-f6aa270e809a/images# find . -type f -ls
525567 133260 -rw-------   1 root     root     8192000000 Dec  7 13:37 ./scratch/scratch-flat.vmdk
525580   40 -rw-------   1 root     root     16797696 Dec  7 13:38 ./789355058656ec2ca0a856283e2597852ccd444548c9ef4fcbf8c9c24a73e169/789355058656ec2ca0a856283e2597852ccd444548c9ef4fcbf8c9c24a73e169-delta.vmdk
525572 1484 -rw-------   1 root     root     16797696 Dec  7 13:38 ./97d69bba9a9d8406055d0fdc38a7f00162823ee5ca3751fc71531cb210204ff9/97d69bba9a9d8406055d0fdc38a7f00162823ee5ca3751fc71531cb210204ff9-delta.vmdk
root@nimbus-nfsserver:/store/VCH-0-4113/VIC/4225a056-a7d9-1430-6026-f6aa270e809a/images#

portlayer:

Dec  7 2017 21:44:59.533Z INFO  op=293.15: Refreshing image cache from datastore.
Dec  7 2017 21:44:59.882Z INFO  op=293.15: Checking for inconsistent images on http:///storage/images/4225a056-a7d9-1430-6026-f6aa270e809a
Dec  7 2017 21:45:02.240Z DEBUG [BEGIN]  [vic/lib/portlayer/storage/vsphere.(*ImageStore).GetImage:505] http:///storage/images/4225a056-a7d9-1430-6026-f6aa270e809a/scratch
Dec  7 2017 21:45:02.720Z INFO  op=293.15: No meta found for 4225a056-a7d9-1430-6026-f6aa270e809a/images/scratch/imageMetadata
Dec  7 2017 21:45:02.724Z DEBUG [BEGIN]  [vic/pkg/vsphere/disk.(*Manager).Get:255] [nfsDatastore] VCH-0-4113/VIC/4225a056-a7d9-1430-6026-f6aa270e809a/images/scratch/scratch.vmdk
Dec  7 2017 21:45:02.724Z DEBUG op=293.15: Didn't find the disk [nfsDatastore] VCH-0-4113/VIC/4225a056-a7d9-1430-6026-f6aa270e809a/images/scratch/scratch.vmdk in the DiskManager cache, creating it
Dec  7 2017 21:45:02.724Z DEBUG [BEGIN]  [vic/pkg/vsphere/disk.(*Manager).DiskParent:273] [nfsDatastore] VCH-0-4113/VIC/4225a056-a7d9-1430-6026-f6aa270e809a/images/scratch/scratch.vmdk
Dec  7 2017 21:45:02.849Z DEBUG [ END ]  [vic/pkg/vsphere/disk.(*Manager).DiskParent:273] [125.076293ms] [nfsDatastore] VCH-0-4113/VIC/4225a056-a7d9-1430-6026-f6aa270e809a/images/scratch/scratch.vmdk
Dec  7 2017 21:45:02.849Z DEBUG [ END ]  [vic/pkg/vsphere/disk.(*Manager).Get:255] [125.1294ms] [nfsDatastore] VCH-0-4113/VIC/4225a056-a7d9-1430-6026-f6aa270e809a/images/scratch/scratch.vmdk
Dec  7 2017 21:45:02.849Z DEBUG op=293.15: GetImage(http:///storage/images/4225a056-a7d9-1430-6026-f6aa270e809a/scratch) has parent http:///storage/images/4225a056-a7d9-1430-6026-f6aa270e809a/scratch
Dec  7 2017 21:45:02.849Z DEBUG [ END ]  [vic/lib/portlayer/storage/vsphere.(*ImageStore).GetImage:505] [609.823358ms] http:///storage/images/4225a056-a7d9-1430-6026-f6aa270e809a/scratch
Dec  7 2017 21:45:02.849Z DEBUG Index: inserting http:///storage/images/4225a056-a7d9-1430-6026-f6aa270e809a/scratch (parent: http:///storage/images/4225a056-a7d9-1430-6026-f6aa270e809a/scratch) in index
Dec  7 2017 21:45:02.916Z DEBUG [BEGIN]  [vic/lib/portlayer/storage/vsphere.(*ImageStore).GetImage:505] http:///storage/images/4225a056-a7d9-1430-6026-f6aa270e809a/789355058656ec2ca0a856283e2597852ccd444548c9ef4fcbf8c9c24a73e169
Dec  7 2017 21:45:03.192Z INFO  op=293.15: Getting metadata 4225a056-a7d9-1430-6026-f6aa270e809a/images/789355058656ec2ca0a856283e2597852ccd444548c9ef4fcbf8c9c24a73e169/imageMetadata/metaData
Dec  7 2017 21:45:03.490Z DEBUG [BEGIN]  [vic/pkg/vsphere/disk.(*Manager).Get:255] [nfsDatastore] VCH-0-4113/VIC/4225a056-a7d9-1430-6026-f6aa270e809a/images/789355058656ec2ca0a856283e2597852ccd444548c9ef4fcbf8c9c24a73e169/789355058656ec2ca0a856283e2597852ccd444548c9ef4fcbf8c9c24a73e169.vmdk
Dec  7 2017 21:45:03.490Z DEBUG op=293.15: Didn't find the disk [nfsDatastore] VCH-0-4113/VIC/4225a056-a7d9-1430-6026-f6aa270e809a/images/789355058656ec2ca0a856283e2597852ccd444548c9ef4fcbf8c9c24a73e169/789355058656ec2ca0a856283e2597852ccd444548c9ef4fcbf8c9c24a73e169.vmdk in the DiskManager cache, creating it
Dec  7 2017 21:45:03.490Z DEBUG [BEGIN]  [vic/pkg/vsphere/disk.(*Manager).DiskParent:273] [nfsDatastore] VCH-0-4113/VIC/4225a056-a7d9-1430-6026-f6aa270e809a/images/789355058656ec2ca0a856283e2597852ccd444548c9ef4fcbf8c9c24a73e169/789355058656ec2ca0a856283e2597852ccd444548c9ef4fcbf8c9c24a73e169.vmdk
Dec  7 2017 21:45:03.835Z DEBUG [ END ]  [vic/pkg/vsphere/disk.(*Manager).DiskParent:273] [344.899713ms] [nfsDatastore] VCH-0-4113/VIC/4225a056-a7d9-1430-6026-f6aa270e809a/images/789355058656ec2ca0a856283e2597852ccd444548c9ef4fcbf8c9c24a73e169/789355058656ec2ca0a856283e2597852ccd444548c9ef4fcbf8c9c24a73e169.vmdk
Dec  7 2017 21:45:03.835Z DEBUG [ END ]  [vic/pkg/vsphere/disk.(*Manager).Get:255] [344.959144ms] [nfsDatastore] VCH-0-4113/VIC/4225a056-a7d9-1430-6026-f6aa270e809a/images/789355058656ec2ca0a856283e2597852ccd444548c9ef4fcbf8c9c24a73e169/789355058656ec2ca0a856283e2597852ccd444548c9ef4fcbf8c9c24a73e169.vmdk
Dec  7 2017 21:45:03.835Z DEBUG op=293.15: GetImage(http:///storage/images/4225a056-a7d9-1430-6026-f6aa270e809a/789355058656ec2ca0a856283e2597852ccd444548c9ef4fcbf8c9c24a73e169) has parent http:///storage/images/4225a056-a7d9-1430-6026-f6aa270e809a/97d69bba9a9d8406055d0fdc38a7f00162823ee5ca3751fc71531cb210204ff9
Dec  7 2017 21:45:03.835Z DEBUG [ END ]  [vic/lib/portlayer/storage/vsphere.(*ImageStore).GetImage:505] [918.251887ms] http:///storage/images/4225a056-a7d9-1430-6026-f6aa270e809a/789355058656ec2ca0a856283e2597852ccd444548c9ef4fcbf8c9c24a73e169
Dec  7 2017 21:45:03.835Z DEBUG [BEGIN]  [vic/lib/portlayer/storage/vsphere.(*ImageStore).GetImage:505] http:///storage/images/4225a056-a7d9-1430-6026-f6aa270e809a/97d69bba9a9d8406055d0fdc38a7f00162823ee5ca3751fc71531cb210204ff9
Dec  7 2017 21:45:04.153Z INFO  op=293.15: Getting metadata 4225a056-a7d9-1430-6026-f6aa270e809a/images/97d69bba9a9d8406055d0fdc38a7f00162823ee5ca3751fc71531cb210204ff9/imageMetadata/metaData
Dec  7 2017 21:45:04.440Z DEBUG [BEGIN]  [vic/pkg/vsphere/disk.(*Manager).Get:255] [nfsDatastore] VCH-0-4113/VIC/4225a056-a7d9-1430-6026-f6aa270e809a/images/97d69bba9a9d8406055d0fdc38a7f00162823ee5ca3751fc71531cb210204ff9/97d69bba9a9d8406055d0fdc38a7f00162823ee5ca3751fc71531cb210204ff9.vmdk
Dec  7 2017 21:45:04.440Z DEBUG op=293.15: Didn't find the disk [nfsDatastore] VCH-0-4113/VIC/4225a056-a7d9-1430-6026-f6aa270e809a/images/97d69bba9a9d8406055d0fdc38a7f00162823ee5ca3751fc71531cb210204ff9/97d69bba9a9d8406055d0fdc38a7f00162823ee5ca3751fc71531cb210204ff9.vmdk in the DiskManager cache, creating it
Dec  7 2017 21:45:04.440Z DEBUG [BEGIN]  [vic/pkg/vsphere/disk.(*Manager).DiskParent:273] [nfsDatastore] VCH-0-4113/VIC/4225a056-a7d9-1430-6026-f6aa270e809a/images/97d69bba9a9d8406055d0fdc38a7f00162823ee5ca3751fc71531cb210204ff9/97d69bba9a9d8406055d0fdc38a7f00162823ee5ca3751fc71531cb210204ff9.vmdk
Dec  7 2017 21:45:35.642Z DEBUG vSphere Event Alarm 'Virtual machine CPU usage' on VCH-0-4113 changed from Gray to Green for eventID(934) ignored by the event collector
Dec  7 2017 21:45:35.671Z DEBUG vSphere Event Alarm 'Virtual machine memory usage' on VCH-0-4113 changed from Gray to Green for eventID(935) ignored by the event collector
Dec  7 2017 21:45:35.671Z DEBUG vSphere Event Alarm 'Virtual machine CPU usage' on infallible_yalow-69e3498e2125 changed from Gray to Green for eventID(936) ignored by the event collector
Dec  7 2017 21:45:35.671Z DEBUG vSphere Event Alarm 'Virtual machine memory usage' on infallible_yalow-69e3498e2125 changed from Gray to Green for eventID(937) ignored by the event collector
Dec  7 2017 21:46:23.841Z ERROR op=293.15: Error querying parents ([nfsDatastore] VCH-0-4113/VIC/4225a056-a7d9-1430-6026-f6aa270e809a/images/97d69bba9a9d8406055d0fdc38a7f00162823ee5ca3751fc71531cb210204ff9/97d69bba9a9d8406055d0fdc38a7f00162823ee5ca3751fc71531cb210204ff9.vmdk): An error occurred while communicating with the remote host.
Dec  7 2017 21:46:23.841Z DEBUG [ END ]  [vic/pkg/vsphere/disk.(*Manager).DiskParent:273] [1m19.400222592s] [nfsDatastore] VCH-0-4113/VIC/4225a056-a7d9-1430-6026-f6aa270e809a/images/97d69bba9a9d8406055d0fdc38a7f00162823ee5ca3751fc71531cb210204ff9/97d69bba9a9d8406055d0fdc38a7f00162823ee5ca3751fc71531cb210204ff9.vmdk
Dec  7 2017 21:46:23.841Z DEBUG [ END ]  [vic/pkg/vsphere/disk.(*Manager).Get:255] [1m19.400282033s] [nfsDatastore] VCH-0-4113/VIC/4225a056-a7d9-1430-6026-f6aa270e809a/images/97d69bba9a9d8406055d0fdc38a7f00162823ee5ca3751fc71531cb210204ff9/97d69bba9a9d8406055d0fdc38a7f00162823ee5ca3751fc71531cb210204ff9.vmdk
Dec  7 2017 21:46:23.841Z DEBUG [ END ]  [vic/lib/portlayer/storage/vsphere.(*ImageStore).GetImage:505] [1m20.006020115s] http:///storage/images/4225a056-a7d9-1430-6026-f6aa270e809a/97d69bba9a9d8406055d0fdc38a7f00162823ee5ca3751fc71531cb210204ff9
Dec  7 2017 21:46:23.841Z WARN  op=293.15: Error getting image store 4225a056-a7d9-1430-6026-f6aa270e809a: An error occurred while communicating with the remote host.
Dec  7 2017 21:46:23.841Z INFO  op=293.15: Creating directory [nfsDatastore] VCH-0-4113/VIC/4225a056-a7d9-1430-6026-f6aa270e809a/images
Dec  7 2017 21:46:23.893Z INFO  op=293.15: Creating directory [nfsDatastore] VCH-0-4113/VIC/4225a056-a7d9-1430-6026-f6aa270e809a/images/scratch
Dec  7 2017 21:46:23.952Z DEBUG op=293.15: File already exists: [nfsDatastore] VCH-0-4113/VIC/4225a056-a7d9-1430-6026-f6aa270e809a/images/scratch
Dec  7 2017 21:46:23.952Z INFO  op=293.15: Removing failed image store 4225a056-a7d9-1430-6026-f6aa270e809a
Dec  7 2017 21:46:23.952Z INFO  op=293.15: Cleaning up image store 4225a056-a7d9-1430-6026-f6aa270e809a
Dec  7 2017 21:46:23.952Z INFO  op=293.15: Removing 4225a056-a7d9-1430-6026-f6aa270e809a/images

There is also evidence that the image file was deleted previously in a hostd.log - my working assumption is that this is the rmi busybox call from the initial set of regression tests before the host is killed and that this is not a contributing factor - it did cause me to investigate however so it's noted here:

2017-12-07T21:38:32.629Z info hostd[7A680B70] [Originator@6876 sub=Vimsvc.TaskManager opID=310.59-7-d1-b735 user=vpxuser:VSPHERE.LOCAL\Administrator] Task Created : haTask--vim.VirtualDiskManager.deleteVirtualDisk-311
2017-12-07T21:38:32.738Z info hostd[7A880B70] [Originator@6876 sub=Vimsvc.TaskManager opID=310.59-7-d1-b735 user=vpxuser:VSPHERE.LOCAL\Administrator] Task Completed : haTask--vim.VirtualDiskManager.deleteVirtualDisk-311 Status success
2017-12-07T21:38:32.738Z info hostd[7A880B70] [Originator@6876 sub=Nfcsvc opID=310.59-7-d1-b735 user=vpxuser:VSPHERE.LOCAL\Administrator] file delete force (/vmfs/volumes/5584ce93-e9a9a607/VCH-0-4113/VIC/4225a056-a7d9-1430-6026-f6aa270e809a/images/789355058656ec2ca0a856283e2597852ccd444548c9ef4fcbf8c9c24a73e169/789355058656ec2ca0a856283e2597852ccd444548c9ef4fcbf8c9c24a73e169.vmdk -> /vmfs/volumes/5584ce93-e9a9a607/VCH-0-4113/VIC/4225a056-a7d9-1430-6026-f6aa270e809a/images/789355058656ec2ca0a856283e2597852ccd444548c9ef4fcbf8c9c24a73e169/789355058656ec2ca0a856283e2597852ccd444548c9ef4fcbf8c9c24a73e169.vmdk) by 'vpxuser'

container-logs-issue4666.zip
vpxd-5.log
hostd.log
hostd.log

@cgtexmex
Copy link
Contributor

cgtexmex commented Dec 9, 2017

@hickeng I just glanced at the logs you posted (log entries for Dec. 7th) that doesn't appear to have the fix for #4858 as there's no retry and the updated logging (specifically new operation logging) isn't present.

I'd really be interested to see if we hit this anymore once the nightlies are run with that code...

@mhagen-vmware
Copy link
Contributor

I can no longer reproduce this issue on latest master

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/portlayer kind/defect Behavior that is inconsistent with what's intended priority/p0 source/scenario Found via a scenario failure team/foundation
Projects
None yet
Development

No branches or pull requests