I had an issue where vMotion would no longer work. When selecting the target host in the vCenter UI, the compatibility check would fail with the error:
A Google search for the issue reveals quite a bit of possible causes, mostly pointing to several more standard causes, which were all in order. When digging deeper in the logs, I stumbled upon the same message appearing in /storage/log/vmware/vmware-vpxd/vpxd.log.
Continuing the search, someone mentioned that it could be caused by services not started, which can easily be revealed when logging into the VCSA through SSH, and running service-control --status --all from the command-line. That someone was right:
# service-control --status --all Stopped: vmcam vmware-imagebuilder vmware-mbcs vmware-netdumper vmware-rbd-watchdog vmware-sps vmware-statsmonitor vmware-updatemgr vmware-vcha vsan-dps Running: ...
Note: output truncated for readability
I compared the output to that of a "healthy" vCenter and noticed some services (vmware-sps and vmware-updatemgr) not started. Manually attempting to start the services did not help: the starting process kept hanging indefinitely. Looking into the logs of the service (/storage/logs/vmware-sps/sps.log) revealed another error (again, I truncated the output):
After some more searching, I came across this article which turned out to be the solution:
The article stated that there's an issue in the database: if there are multiple entries for the SSO admin account in a particular table (vpx_access), it would cause the vmware-sps service not to start.
And sure enough, although nothing was updated (no patches, no certificates), it turned out that there were indeed multiple entries for the SSO admin present in the database. After removing the surplus and restarting all services, all services could start properly and the issue was resolved.
The steps I took to resolve the issue:
/opt/vmware/vpostgres/current/bin/psql -d VCDB -U postgres
SELECT * FROM vpx_access;
VCDB=# SELECT * FROM vpx_access; id | principal | role_id | entity_id | flag | surr_key ----+-----------------------------+---------+-----------+------+---------- 1 | VSPHERE.LOCAL\Administrator | -1 | 1 | 1 | 1 (1 row)
DELETE FROM vpx_access WHERE principal = 'VSPHERE.LOCAL\Administrator' AND id <> 1;
service-control --stop --all service-control --start --all
After correcting the database, all services started without problems and functionality was restored.
If you get the particular error message mentioned at the start of this article, the solution presented here may not resolve it, but it's worth checking before you move on to researching other possible causes.