Failures while assigning public IP via enableStaticNat #10512
Replies: 15 comments
-
Just a small update.
Another maybe interesting thing, when I tried to add public IP to VM that already had public IP, I immediately got corresponding error.
|
Beta Was this translation helpful? Give feedback.
-
Release IP was also very slow, though I did it from UI, and it did not fail. From the allocated list IPs disappeared fast enough, but notifications about IP release started appearing in UI just after maybe some 40 seconds. Release of all 16 IPs I had took about 2 minutes.
then failed with
|
Beta Was this translation helpful? Give feedback.
-
Maybe anybody can suggest java parameters to improve this? |
Beta Was this translation helpful? Give feedback.
-
I found this warning in management log, seems it's relevant
The part of log corresponding to |
Beta Was this translation helpful? Give feedback.
-
@akrasnov-drv , do you have webhooks enabled? it seems like the issue is not in enableStaticNat itself. |
Beta Was this translation helpful? Give feedback.
-
@akrasnov-drv |
Beta Was this translation helpful? Give feedback.
-
@DaanHoogland I don't think webhooks is the cause here. The logs shared are for the API call that finished in around 10s. I see multiple agent-server Command-Answer communication taking few seconds. Also, multiple errors like,
Maybe the API needs optimization or there is underlying network issue |
Beta Was this translation helpful? Give feedback.
-
it looks @akrasnov-drv uses redundant VRs, the error happens in the BACKUP VR which do not have default route as the public nic is DOWN.
|
Beta Was this translation helpful? Give feedback.
-
First of all thanks for the attention and care. @DaanHoogland I tried using webhooks in the past, but when started getting different issues I recreated the cluster without webhooks. Here is my agent config
Nevertheless (I believe I reported it before) there is
and I had some doubts about it. Though I do not see how it can be related to the current nat issue. I created 100 VMs via api without a problem, only this call fails with timeout. @weizhouapache I have workers in global config set to 50, but as you see above, agent has it set to 5, and it's not something I set. I can increase that, no problem, but I really doubt number of workers should be relevant to a failure in single api call. @shwstppr I'll clean the env and start it again to provide a wider log, covering both successful executions and then failure. In the meantime, has anybody tried my flow? Did it work (and then the failure is just in my env)? |
Beta Was this translation helpful? Give feedback.
-
@akrasnov-drv enabling hundred(s) static nat adresses in not a usual case. But it should work. We'll have to investigate what might go wrong. Do you have a clean environment to experiment in? I doubt either is the culprit but we'll have to start simple. |
Beta Was this translation helpful? Give feedback.
-
I removed all VMs and network and recreated with standard isolated network offering with static nat with single VR.
It managed to assign about 40 IPs on the way before failing.
Attaching logs from management and from VR for the time of above. |
Beta Was this translation helpful? Give feedback.
-
@DaanHoogland it was not our intention. We are just trying to use CloudStack fleet in Jenkins. The only plugin supporting CloudStack is jcloud plugin, and it requires public IP and static nat. |
Beta Was this translation helpful? Give feedback.
-
Noted @akrasnov-drv , this is added to the backlog and has to be investigated. I am afraid I don't have a workaround off the top of my head. |
Beta Was this translation helpful? Give feedback.
-
@akrasnov-drv , do you have a resolution to this query, yet? |
Beta Was this translation helpful? Give feedback.
-
closing this for lack of activity. please re-open or open a new one if it becomes relevant again. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
problem
enableStaticNat
starts failing just after several uses.Actually I see that IP is assigned but the call still fails with timeout just after 3-4 uses
For my test I created a number of VMs, and then tried to assign public IP to all of them sequentially:
I tried different configurations of network and VR, and got it always failing, in the best case after 6-7 successful assignments.
large VR with 4 CPUS and several GB memory did not help neither.
time
for all failing ones showsreal 0m10.350s
or slightly more.Started from #10184
versions
CloudStack 4.20.0.0 with https://github.com/apache/cloudstack/pull/10254/files applied (PR does not help with this)
Ubuntu 22.04.5 LTS
libvirt 8.0.0-1ubuntu7.10
isolated network over VLAN
about 1000 public IPs in /20
The steps to reproduce the bug
Repeat 2-3 for VMs in 1. till it starts failing (just after about 3-4 cycles in my case)
associateIpAddress
to get public IP IDenableStaticNat
with VM ID and IP IDInitially
enableStaticNat
takes 9 seconds then increases to 10, and then just starts failing with timeoutHere is a cycle doing the above
What to do about it?
Looks like the call is taking too much time to return. Should be optimized.
Beta Was this translation helpful? Give feedback.
All reactions