Fixing JSON typos during initial SDDC bring-up process

During a VCF 4.0 SDDC bring-up installation I encountered something weird with one of the Edge VM VTEP IP settings. This is what my deployment JSON looked like:

   "edgeNodeSpecs": [
        {
          "edgeNodeName": "edge-tn04",
          "edgeNodeHostname": "edge-tn04.vcf-s1.vlabs.local",
          "managementCidr": "192.168.110.95/24",
          "edgeVtep1Cidr": "192.168.151.20/24",
          "edgeVtep2Cidr": "192.168.151.21/24",
          "interfaces": [
            {
              "name": "uplink-edge1-tor1",
              "interfaceCidr": "172.27.9.1/24"
            },
            {
              "name": "uplink-edge1-tor2",
              "interfaceCidr": "172.27.10.1/24"
            }
          ]
        },
        {
          "edgeNodeName": "edge-tn05",
          "edgeNodeHostname": "edge-tn05.vcf-s1.vlabs.local",
          "managementCidr": "192.168.110.96/24",
          "edgeVtep1Cidr": "192.168.151.22/24",
          "edgeVtep2Cidr": "192.168.151.23/24",
          "interfaces": [
            {
              "name": "uplink-edge2-tor1",
              "interfaceCidr": "172.27.9.2/24"
            },
            {
              "name": "uplink-edge2-tor2",
              "interfaceCidr": "172.27.10.2/24"
            }
          ]
        }
      ],

as you can see both edge VMs had an IP belonging to subnet 192.168.151.0/24

However Cloud Builder threw the following error up, summarised with message:

VTEP gateway not on same network as VTEP IP

2021-01-03T16:29:10.478+0000 [bringup-app,[c4fbd35356fc05a9,5098]] ERROR [c.v.e.s.o.model.error.ErrorFactory,pool-3-thread-11] [L7PF9A] INPUT_PARAM_ERROR Invalid parameter: {0}com.vmware.evo.sddc.orchestrator.exceptions.OrchTaskException: Invalid parameter: {0}<br>at com.vmware.vcf.common.fsm.plugins.nsxt.action.CreateNsxtEdgeNodeVmAction.preValidate(CreateNsxtEdgeNodeVmAction.java:120)
at com.vmware.vcf.common.fsm.plugins.nsxt.action.CreateNsxtEdgeNodeVmAction.preValidate(CreateNsxtEdgeNodeVmAction.java:52)
at com.vmware.evo.sddc.orchestrator.platform.action.FsmActionState.lambda$static$1(FsmActionState.java:17)
at com.vmware.evo.sddc.orchestrator.platform.action.FsmActionState.invoke(FsmActionState.java:62)
at com.vmware.evo.sddc.orchestrator.platform.action.FsmActionPlugin.invoke(FsmActionPlugin.java:168)
at com.vmware.evo.sddc.orchestrator.platform.action.FsmActionPlugin.invoke(FsmActionPlugin.java:153)
at com.vmware.evo.sddc.orchestrator.core.ProcessingTaskSubscriber.invokeMethod(ProcessingTaskSubscriber.java:393)
at com.vmware.evo.sddc.orchestrator.core.ProcessingTaskSubscriber.processTask(ProcessingTaskSubscriber.java:538)
at com.vmware.evo.sddc.orchestrator.core.ProcessingTaskSubscriber.accept(ProcessingTaskSubscriber.java:122)
at sun.reflect.GeneratedMethodAccessor480.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.google.common.eventbus.Subscriber.invokeSubscriberMethod(Subscriber.java:87)
at com.google.common.eventbus.Subscriber$1.run(Subscriber.java:72)
at org.springframework.cloud.sleuth.instrument.async.TraceRunnable.run(TraceRunnable.java:67)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalArgumentException: VTEP gateway not on same network as VTEP IP
at org.apache.commons.lang3.Validate.isTrue(Validate.java:158)
at com.vmware.vcf.common.fsm.plugins.nsxt.action.CreateNsxtEdgeNodeVmAction.preValidateNvdsCreateParams(CreateNsxtEdgeNodeVmAction.java:220)
at com.vmware.vcf.common.fsm.plugins.nsxt.action.CreateNsxtEdgeNodeVmAction.preValidate(CreateNsxtEdgeNodeVmAction.java:117)
… 17 common frames omitted
2021-01-03T16:29:10.481+0000 [bringup-app,[c4fbd35356fc05a9,5098]] DEBUG [c.v.e.s.o.c.ProcessingTaskSubscriber,pool-3-thread-11] Collected the following errors for task with name CreateNsxtEdgeNodeVmAction and ID 7f000001-7659-15aa-8176-c8ab8d12038d: [ExecutionError [errorCode=null, errorResponse=LocalizableErrorResponse(messageBundle=com.vmware.evo.sddc.common.core.error.messages)]]<br>2021-01-03T16:29:10.527+0000 [bringup-app,[c4fbd35356fc05a9,581e]] INFO [c.v.e.s.o.c.ProcessingOrchestratorImpl,pool-3-thread-4] Prevalidation completed with failure, 2<br>2021-01-03T16:29:10.563+0000 [bringup-app,[c4fbd35356fc05a9,20c7]] DEBUG [c.v.e.s.o.c.ProcessingTaskSubscriber,pool-3-thread-17] Invoking task CreateNsxtEdgeNodeVmAction.PREVALIDATE Description: Deploy NSX-T Data Center Edge Node Virtual Machine 1, Plugin: NsxtPlugin, ParamBuilder null, Input map: {nvdsParams=DeployAndConfigureAvns____19__PrepareEdgeCluster____0__DeployAndConfigureBringUpEdgeNodes____5__edge1NvdsParams, nsxtManager=DeployAndConfigureAvns____19__nsxtManagerRemoteEndpoint, nodeParams=DeployAndConfigureAvns____19__PrepareEdgeCluster____0__DeployAndConfigureBringUpEdgeNodes____5__edge1NodeParams}, Id: 7f000001-7659-15aa-8176-c8ab8d12038c …<br>2021-01-03T16:29:10.609+0000 [bringup-app,[c4fbd35356fc05a9,20c7]] DEBUG [c.v.e.s.o.c.c.ContractParamBuilder,pool-3-thread-17] Contract task Deploy NSX-T Data Center Edge Node Virtual Machine 1 input: {"nsxtManager":{"address":"nsxt2.vcf-s1.vlabs.local","username":"admin","password":"<strong><em><strong>"},"nodeParams":{"dataNetworkIds":["dvportgroup-36","dvportgroup-37"],"nodeDescription":"","nodeDisplayName":"edge-tn05","nodeFqdn":"edge-tn05.vcf-s1.vlabs.local","nodeMgmtIpCidr":"192.168.110.96/24","nodeMgmtGateway":"192.168.110.254","mgmtDvPortgroupId":"dvportgroup-12","nodeVmFormFactor":"SMALL","vcenterStorageMobId":"datastore-17","vcenterClusterMobId":"resgroup-29","nsxtVcenterUuid":"0bb7ed3b-f5cc-44d8-80af-c1f0deb5faa1","dnsServerIps":["192.168.110.1"],"dnsSearchDomains":["vcf-s1.vlabs.local"],"ntpServers":["192.168.110.1"],"cliUsername":"admin","cliPassword":"</strong></em></strong>","auditUsername":"audit","auditPassword":"<strong><em><strong>","rootPassword":"</strong></em></strong>","enableRootSsh":true},"nvdsParams":{"transportZoneNames":["sfo01-m01-edge-uplink-tz","vcf-s1.vlabs.local-tz-overlay01"],"pnicToUplink":{"fp-eth0":"uplink1","fp-eth1":"uplink2"},"nvdsProfileId":"93198c24-687e-458c-aca9-be18a228a3c8","hostSwitchName":"sddc-s1-nvds01-pg-edge","vtepIpCidrs":["192.168.150.22/24","192.168.150.23/24"],"vtepGateway":"192.168.151.254"}}<br>2021-01-03T16:29:10.611+0000 [bringup-app,[c4fbd35356fc05a9,20c7]] DEBUG [c.v.v.c.n.s.c.c.ApiConnection,pool-3-thread-17] Created ApiClient connection to: nsxt2.vcf-s1.vlabs.local

Here you can see the screenshot where Cloud Builder (CB) was trying to deploy one of the NSX-T Edge transport node using 192.168.150.22/24 and 192.168.150.23/24 instead of 192.168.151.22/24 and 192.168.151.23/24

That made no sense to me because it was not what I had on my deployment JSON, however (somehow) it made into CB with messed up IP addresses. At this stage every retry will result in the same error over and over, unless we alter on the fly the stored JSON inside Cloud Builder.

Fixing the Cloud Builder JSON

Fear not we can restart the CB deployment and re-submit an updated JSON specification file against the failed SDDC deployment UUID. Please note: the following process is valid if you started your deployment with a JSON file, if you began from the XLS deployment spreadsheet you will first need to re-convert said xls into json but I am not covering this here.

Step 1) Get the SDDC deployment UUID from Cloud Builder logs, search for Execution ID

less /var/log/vmware/vcf/bringup/vcf-bringup-debug.2021-01-03.0.log | grep "Execution ID"

NOTE: You should check vcf-bringup-debug.log but in my case logs already rotated so I had to check an older date

Step 2) build the API command we need to submit on CB as following

for VCF < 4.0.1

curl -X POST http://localhost:9080/bringup-app/bringup/sddcs/<uuid> -H "Content-Type: application/json" -d "@file-path-to-json"

for VCF > 4.0.1

curl -k -u admin:'password' -X PATCH https://localhost/v1/sddcs/<uuid> -H "Content-Type: application/json" -d "@file-path-to-json"

Step 3) Re-submit the deployment JSON and restart the bring-up process

I did and I received a 500 Error (highlighted below) and I really could not understand what was wrong, in fact I triple checked and couldn’t see anything wrong.

The good old “have you tried rebooting?” did the job in my case and here we are!

That’s it, at this point the bring-up process is back and running again, hopefully without any further typos 🙂

Leave a Comment

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.