VMware CPU and Memory Reservations: Fixing Insufficient resources to satisfy configured failover level for HA

This post comes from a few days of poring over manuals as well as some technical support. This is a good one. The error came from trying to power on a VM in our VMware cluster and we would get these errors:

“Insufficient resources to satisfy configured failover level for HA”

VMWARE: insufficient resources to satisfy configured failover level for HA

And this alert on our cluster

“Insufficient resources to satisfy HA failover level on cluster vmCluster in vmTST”

Our way of thinking was we had to power one off to power another one on.

But that didn’t work.

Here we is the actual solution. (p.s. Great VMware HA education for me on this one!)

PROBLEM SOURCE: VMware HA is turned on and you are violating constraints


Sponsored Links

VMware HA is turned on, and you have it configured so that there is a certain amount of resource reserve for failover. By turning on this VM, you are going to dip into that resource reserve and so VMware is telling you “Nope, not turning it on….”

There is a quick fix to get the VM turned on (one good way, one bad way), and then there are two long term fixes for you to consider. In my case, the first one was faster, while the second one was better for my environment.

My VMware environment

Datacenter: vmTST
Cluster: vmCluster
OS: ESXi 4.1.0
Five (5) servers in a cluster.

VMware environment

My VMware Cluster Errors

As mentioned above:

“Insufficient resources to satisfy configured failover level for HA”

and

“Insufficient resources to satisfy HA failover level on cluster vmCluster in vmTST”

TWO WAYS TO DO QUICK FIX

  1. Turning off HA (popular, and I would say WRONG)
  2. Disable Admission Control (much better!!)

#1: Turning off HA (though I recommend against)

This is the solution I saw on some forums (including vmware forum). After looking at it more, I recommend against it and I’ll explain why, but here it is:

VSphere Client: Browse Inventory -> Hosts and Clusters

Browse vmware inventory hosts and clusters

Edit VMware cluster settings

Right Click on Cluster name -> Edit Settings

VMware cluster: edit settings

Turning off HA

While this works, if you do this, whenever you turn it back on, it has to do a recalculation for the HA failover. Bad, especially for testing or doing temporary power ons.

WRONG WAY: do not turn off VMware HA

#2: Disable “Admission Control” (better IMO)

Better to disable “Admission Control” so VMS will power on despite violating availability constraints. This way your HA is still on. In the long run, though, it is better to fix your issue.

Same window, but next bullet item on the left:

VMware: better to disable admission control

LONG TERM FIX: TWO WAYS

There are two things I ended up having to look at. One was pretty good long term fix and that I had found suggested on forums including VMware forums.

The second is the actual fix to my problem, the best one in the long term


Sponsored Links

FIX #1: Change from “Host Failures Cluster Tolerates” to “Percentage of cluster resources reserved as failover spare capacity”

In other words, instead of telling VMware you want to have enough resource reserve so that you can lose one host, you are telling VMware you want to have a certain percentage of resources unused for failover.

We had it configured to lose one host. So by switching to a percentage it was a quick and easy fix for my environment.

VMware HA: Host failures cluster tolerates (?)

VMware HA: Host Failures Cluster Tolerates

So if we look at the “VMware HA” window, you’ll see that my “Host failures cluster tolerates” was set to 1. Now with 5 servers you would think that means “20%” but that’s not so. Because what if one of your VMs (or more) for whatever reason took up 75% of your resources, then by worst case calculation you could only have one VM on your five node cluster.

A worst case calculation of your largest VM will determine what’s called a “slot” size. The VMware HA will then calculate how many total “slots” can be used which determines how many total VMs you can have powered on.

When this option is chosen, from what I’ve read on VMware forums, the calculations are VERY conservative.

Find Your Slot Size: VMware Cluster Summary -> Advanced Runtime Info

VMware cluster summary: Advanced Runtime Info

VMware Advanced Runtime Info: Slot sizes

VMware HA Advanced Runtime Info

So you can see above, worst case scenario, one slot size is 2507Mhz, 4256 MB. With that in mind, there are 55 slots available on my five node cluster. There are a total of 156 VMs out of 55.

This means I would have to power off 102 VMs to get to 54 powered on VMs leaving one slot open to power the new one on… (YIKES!0

Changing To Percentage: First Check Resource Usage

Out of curiosity, I checked the actual resource usage in my cluster

VMware cluster: Hosts

If you tally up all the green bars in CPU, I could fit all the CPU usage of every VM on one host.

If you tally up all the green bars in Memory, I could fit all the memory usage in about three hosts.

So why can’t I power on a VM? Because the calculation is *THAT CONSERVATIVE* for the “Host failure cluster tolerates” option

VMware HA: Switch to percentage

VMware HA: Percentage of cluster resources reserved as failover spare capacity

Now, the first time I did this, I chose “20%” which prorated to one server out of the five being free.

And I was able to power on a VM

On a whim, I kept upping the percentage and I got as high as 75% before I decided to stop, thinking I was doing something wrong.

Part of it was that the VM I was powering on was very very small in resource usage (and later I found out also it had 0 reserve configured with it) which is probably why it powered on even at 75% failover spare capacity.

Anyhow, so in a pinch, this is one way to configure some amount of reserve AND be able to power on your VMs, at least if your resource usage somewhat mirrors mine (see previous picture)

FIX #2: Best Long Term Fix: Determine WHY the cluster resource reserve is so high and see if it is actually needed, or if it is just poorly configured

In the end this was the actual fix for us, because it delved into the actual source of the problem. Which was to find out:

WHY the heck was our VM slot size so BIG?

Because obviously all five hosts combined were using VERY LITTLE CPU and RAM. Less than 20% on CPU (it could fit all on one server), and less than 50% on RAM (it could fit on two to three servers).

It turns out: The slot size is not based on usage, it is based on a VM resource reservation.

So here is how to check your resource reservation for your Vms.

VMware Cluster: Resource Allocation for CPU and Memory

CPU

VMware cluster: resource allocation cpu

(The dashed lines are my VM names which I blanked out)

Click on the “CPU” button and look for the “Reservation” column and sort by largest to smallest.

Memory

VMware cluster: resource allocation memory

(The dashed lines are my VM names which I blanked out)

Click on the “Memory” button and look for the “Reservation” column and sort by largest to smallest.

As you can see, there are many VMs with resource reservation. This means as soon as the VM is powered on, it will reserve this much resource REGARDLESS IF IT IS NEEDED OR NOT!

But as you can see by actual usage, we are not even near to capacity, there is no real reason for us to reserve that much.

One of the culprits: it turns out many of our templates we use to clone/deploy VMs had resources reservation already set, so each time we made a new VM it had a resource reservation.

VMware Cluster: Virtual Machines Actual Usage

Go to the tab “Virtual Machines” now and you can see actual usage. There is a column “HOST CPU – Mhz” and “Guest Mem – %”. These show actual usage by the VM.

VMware cluster: Virtual Machines List

I sorted alphabetically here and referenced the previous two pictures (VMs with the highest reservations) and then checked this list to see actual usage. Sure enough, many of our VMs were not using that much resource (as you can tell from earlier graphs)

Next step: contact VM owners to see if the VM was in typical usage. If so, get permission to turn the resource reserve down or even off.

VMware: Right Click -> Edit Settings

To configure resource reserve, right click on the VM and Edit Settings

VMware cluster: Right Click and Edit Settings

VMware: CPU reservation and Memory reservation

Here I turned the CPU Resource reservation and memory reservation low or to zero

VMware Edit CPU Resource Reservation Settings

VMware edit settings memory resource reservation

REMEMBER TO CONSULT YOUR USER FIRST TO SEE IF VM IS IN TYPICAL USE

VMware HA: Advanced Runtime Info Results

Now go back to your Advanced Runtime Info Results… (you might have to turn the VMware HA to “Host failover cluster tolerates” if you had changed it to the percentage as an intermediate fix)

VMware HA: Advanced Runtime Info

When all was said and done, I went from 55 slots to 550 slots.

And from being in the “red” of 101 VMs I’d need to power off to power one on to being in the “green” of 394 VM slots available.

CPU slot size went down a factor of 10
Memory slot size went down a factor of 20

NICE!!!

Hope this has been helpful!

Geek, VMware

70 thoughts on “VMware CPU and Memory Reservations: Fixing Insufficient resources to satisfy configured failover level for HA

  1. it is 10 pm and I just got a web server for my company back up. Your info was very helpful and I wanted to say thank you. My fix actually came when I checked expandable reservation under the resource pool. But the error continually stated the target resource pool was the parent pool but it was not. Sorry if that does not make sense.

    Enjoyed the video…keep having fun :)

    • Ah good to know. Glad my post could help a bit too. Thanks for the video compliment :) Over the years I’ve found Lindy Hop to be a fantastic left-brain/right-brain experience :D

  2. Another reason you might see this issue is if someone has set a memory reservation of say 8 gigs and you lower the memory limit to 4 gigs in the VM settings. VCenter freaks out since it does not not know which is true :D
    So do as the author says, look at the cluster resource allocations. Personally, I use NO (0) reservations without issue in our cluster, which runs a 24/7 casino, several restaurants, etc. 4 hosts and 40 virtual machines.

    • Bill – that is really helpful. Thanks for the comment about vcenter “freaking out” (accurate wording I’m sure, can picture the freakout very easily). And I’ve come around to the idea of no reservations also. I was thinking in certain production environments I could understand the need for the reservations, but really only if all resources are taxed and so you need to make sure to reserve some space for certain VMs, in which case you don’t have a very friendly failover environment anyway. Thanks for the comment!

  3. Thank you! Your article helped me to resolve my problem. I put a host into Maintenance Mode and received the exact same “Insufficient Resources” error. New cluster, tons of resources – frustrating! After going through the troubleshooting steps you have listed, I found a virtual appliance that was recently installed “out of the box” had an 8192MB RAM reservation. I removed that reservation and now I have slots galore.

    Beware pre-packaged VAs and VMs!
    Justin

    • Wow that must have been a bit of work for you. I know what you mean. For us it was some templates that were created. We were using those templates to deploy VM after VM only to find the template had the reservation! Great job Justin! Good detective work and glad my past experience helped :)

    • Hans – your comment timestamp: 3am on the weekend? You’re a busy guy. This one drove me crazy for a few days and in fact the initial call to VMware did not prove fruitful and it took my own research and prodding the VMware engineer to get him to escalate and get more answers for us. So I’m glad my pain is helping others :)

  4. Ben, we just ran into this issue with one of our clusters. Thank you for taking the time to post this. This fix did the trick for us!

  5. That was great information. I am not responsible for making the change but loved the information now lets see if the engineers know what to do. :)

    • Hello, I’m glad you like it. I really REALLY appreciate that you are asking. I prefer you not copy the article directly. Please feel free to “curate it” ..meaning summarize the points that you want and then link back here letting your readers know they can get the full details here. Thanks again asking! I really admire that!

    • Hi sorry if it wasn’t clear. The answer is, yes, adjust the reservations or if they are not needed then zero them out which I guess you could also call removing them. Technically they are not removed, they are set to zero (sorry, just semantics!). So then i go through the examples of finding out which VMs have reservations, what the actual usage is, asking the user if the actual usage is an outlier or normal, then you can adjust the reservation or just set it to zero altogether.

      Does that help?

  6. Ben, thanks for such a quick reply. That certainly helped. If you don’t mind, I would like to ask you some additional questions on this topic since you are well versed in it.

    Would you recommend setting a reservation for every VM? If HA is enabled and admission control is set to ‘Disable: Allow VM power on operations that violate availability constraints’, are reservations still taken into account for HA purposes?

    • Sure thing: Okay first question: For setting a reservation on every VM: If you want to guarantee a VM resources even if it won’t be using it, then you would set the reservation. This really affects your failover capabilities. The only reason I can see you possibly using it is if you have a VM that otherwise does very little but say periodically has to do some big processing job that requires a lot of CPU and memory, then you’d want to set a reservation to make sure that VM will always have it. Otherwise, I think it’d be better to set an amount of resource with no reservation, monitor it, then increase or decrease the actual amount as needed. Just my opinion.

    • Okay second question: good one, and this is my thought: Even if you choose the “Disable”, your reservations and HA are still affected. And it makes it a bit worse because if you leave it as Disabled, then you can freely power on VMs and violate your failover policy without knowing it. So if you are going to use HA, it is better in the long run not to Disable, and fix your resource reservations so you have a viable failover environment.

  7. Thank you so much for your help. Sorry for not replying back sooner. I needed more time for all this information to sink in. I’ve been trying to convince some people why not to use reservations on every single VM so this is definitely very helpful.

  8. Thanks Ben your guide helped out. I did the temp fix for now. I look forward to doing the long term fix #2 hopefully some time this year. Just have to find a window when people won’t be screaming over their files not being available.

    • Hey H2, glad my article helped! I’m guessing my comment might not be relevant anymore, but I think you can change the reservations while the VM is on. Anyhow, cheers glad I could be of help!

    • Ha Enrique thanks! Vmware definitely provides a lot of resources online, it’s just sometimes a bit difficult weeding through some of the info to get to what specifically applies to you. I totally agree with you that reading the whitepapers can be challenging. The good side is it usually has more than the info you need, better than the alternative :) Glad my post could help!

  9. Ben, we were receiving the HA insufficient resource alarm. After muscling through some vmware articles and blogs I found your. This was straight to the point and offered great reasoning for using the different options. Thanks, Dan

  10. Amazing article! It was very thorough and not only solved my issue (using the Fix #2 Best Long Term Fix) but also explained the logic behind it and the other possible solutions. Thanks!

  11. This was a great article…I really appreciated the thought process throughout each step. Having the reasoning for a solution was almost AS important as the final solution/answer itself.

    I had only one VM with a reservation of 8GB. Once I got rid of that memory reservation on the single VM, my memory slot size went from 8GB down to 130MB.

    Well done and many thanks again!

  12. Thanks Ben, this post is still helpful nearly 2 years later. Appreciate you putting time into this and your explanation is really good.

  13. Found out that you have to get rid of reservations on ALL VM’s. As Ytsejamer1 said, if you leave just one VM with a reservation, your slot sizes will remain unchanged. I only had a few VM’s that had reservations, so once I got rid of ALL reservations on ALL VM’s, it then recalculated the new Slot size.

    Worked great with VMWare 5.5.0 3029944 (current as of 2015-09-22). No more insufficient resources errors for HA anymore.

    Great article, well thought out, and actually understanding what the problem is just as good as the fix!

    Thank you!

  14. Been awhile since I commented, I stopped replying to each comment individually mostly for readability sake (not to clog the comments area up even more). However, I am always grateful for everyone and anyone who takes the time to comment. Thanks!!

  15. Very good !! thank you for the tutorial. I just need to take a doubt , installation of the VSA , after the appliance startup, happens this same error and does not end , are probably the recusos HA and memory reservation vsa VM -1 ?

  16. Hey guys! Just want to say thanks. I stopped replying to every comment just to make things much more readable to anyone who comes across this blog post (unless the comment was a question of course). But wanted to write one in here to say thanks for the thanks everyone! It inspires me to continue to want to write. I actually have tons more techie blog material sitting on my laptop, I hope I can get more out soon :) Till then, thanks for the thanks guys!

  17. I know I’m bumping an old thread, but this article was sent my way by a coworker, after they found it and fixed the issue.

    I just added a working Host to a new Cluster, and turned on HA, probably for the first time. (Testing Environment) I saw a warning to this effect, but thought it was just because we had only one Host in the Cluster. Now I see it was much deeper.

    Thanks!

    Oh! As a fellow Lindy Hopper, I will be checking out the rest of your blog, too. Swing On!

Leave a Comment

Your email address will not be published. Required fields are marked *

Spam protection by WP Captcha-Free