Always On VPN IKEv2 Load Balancing Issue with Kemp LoadMaster

Always On VPN IKEv2 Load Balancing Issue with Kemp LoadMasterA recent update to the Kemp LoadMaster load balancer may cause failed connections for Always On VPN connections using IKEv2. SSTP VPN connections are unaffected.

Load Balancing IKEv2

When using the Kemp LoadMaster load balancer to load balance IKEv2, custom configuration is required to ensure proper operation. Specifically, the virtual service must be configured to use “port following” to ensure both the initial request on UDP port 500 and the subsequent request on UDP port 4500 are sent to the same real server. This requires the virtual service to be configured to operate at layer 7. Detailed configuration guidance for load balancing IKEv2 on the Kemp LoadMaster load balancer can be found here.

Always On VPN IKEv2 Load Balancing Issue with Kemp LoadMaster

Issues with LMOS 7.2.48.0

A recent release of the Load Master Operating System (LMOS) v7.2.48.0 introduced a bug that affects UDP services configured to operate at layer 7, which includes IKEv2. This bug breaks Always On VPN connections using IKEv2, resulting in failed connections. When this occurs, the administrator may encounter an error 809 message for device tunnel or user tunnel.

Always On VPN IKEv2 Load Balancing Issue with Kemp LoadMaster

Update Available

Administrators who use the Kemp LoadMaster load balancer to load balance Always On VPN IKEv2 connections and have updated to LMOS 7.2.48.0 are encouraged to update to LMOS 7.2.48.1 immediately. This latest update includes a fix that resolves broken IKEv2 load balancing for Always On VPN. Once the LoadMaster has been updated to 7.2.48.1, Always On VPN connections using IKEv2 should complete successfully.

Additional Information

Windows 10 Always On VPN IKEv2 Load Balancing and NAT

Windows 10 Always On VPN IKEv2 Load Balancing with Kemp LoadMaster Load Balancer

Windows 10 Always On VPN SSTP Load Balancing with Kemp LoadMaster Load Balancer

Windows 10 Always On VPN Load Balancing with Kemp LoadMaster in Azure

Windows 10 Always On VPN Load Balancing Deployment Guide for Kemp Load Balancers

Leave a comment

22 Comments

  1. Hi Richard, first off a huge thanks – your blog as been really helpful to me!
    I’m hoping you may have an insight into a problem I am having with clients not reconnecting.
    We have a Kemp vlm-max with the latest updates (as of Mar 22nd), configured using the Kemp template with port following configured. 3 aovpn servers, 2019 with the reg key set for ike fragmentation. Ext firewalls are Palo Altos passing udp 500/4500 to the kemp.
    The issue I see is if a client disconnects (either reboot or manually clicking disconnect) the machine then cannot reconnect. Some machines will, some won’t. Clients are windows 10, mix of 1709 and 1809. Both device tunnels and user tunnels deployed (although only user to 1709)
    If I restart the Kemp, these devices can reconnect again. Pulling my hair out trying to figure out what I’m missing!

    Reply
    • I’m hearing these same reports. Not sure if it is a Kemp-specific issue or not, as others have reported similar issues with F5. If you haven’t done so already, I would suggest configure the Kemp so that it doesn’t perform NAT (turn off “subnet originating requests” and enable transparency on the virtual service). This might introduce some routing issues, so there may be other changes in your environment required to make this work. Ultimately it will make for more reliable IKEv2 load balancing. 🙂

      Reply
  2. Thanks, Richard, we have turned that off now and I am seeing more connections come through, although way more device tunnels than user tunnels, seems to be the user tunnel we are having the most issues with. We have a case open with MS Premiere support, Palo Alto and Kemp, Ill let you know what we find (if anything). From Wireshark, it looks like something is hanging on to a dead connection, I can see packets coming back from the Kemp to the client even when no tunnel is connected. If we clear the session on the FW and reboot the kemp, those users are then able to connect again.

    Reply
  3. Hi Richard. Taking the Kemp out of the equation fixes the issue, MS Support has identified the error we get when load balancing as being related to:
    – VPN server does not allow client connection with error: Max number of established MM SAs to peer exceeded

    – The VPN server has invalidated the main mode SA before the error

    – Not visible in the logs what caused this.

    – Client dropped the informational packet with the error notification from the server sent in clear text to avoid DoS

    So it appears the Kemp/RAS is hanging on to disconnected connections and not allowing the client to reconnect. Waiting on a follow call with MS

    Reply
    • Is your Kemp configured to perform NAT? In Kemp terms that would be “subnet originating requests”? Best practice is to implement the virtual service with transparency enabled so the VPN server sees the client’s original source IP address. This will avoid the maximum number of MM SAs per source IP address issue.

      Reply
      • Ah, that is not configured, Kemp had us turn it off originally. I’ll do some testing tomorrow with it flipped back on – thanks for the info!

      • The critical point here is that the VPN server should see the client’s original IP address. If that is being translated at any point (load balancer or edge firewall) then all of those client sessions appear to come from the same “client” and RRAS objects.

  4. Hi Richard,
    Thanks for the info – I now have the kemps working with transparency turned on and the Ras servers default gateway pointed at the vip address on the Kemp. I have got HA working on the kemps with failover between them working, although I don’t seem to be able to take a ras server down and have the connection re-establish to one of the other servers – the client fails to connect until the downed server is brought back online. I suspect there is a session timeout setting I have missed, but it’s getting late, so will have to wait until the morning – I just wanted to thank you again and add the above info I case it helps anyone else. I plan to write it up fully once working!

    Reply
    • Here are a few Kemp settings that might help with this situation.

      System Configuration/Miscellaneous Options/L7 Configuration :: Select Yes Accept Changes from the Always Check Persist drop-down list
      System Configuration/Miscellaneous Options/L7 Configuration :: Select Drop Connections on RS Failure and Drop at Drain End Time

      The following settings might not work with IKEv2, which uses UDP, but might be worth setting anyway.

      System Configuration/Miscellaneous Options/L7 Configuration :: Enter 60 in the L7 Connection Drain Time (secs) field and click Set Time
      System Configuration/Miscellaneous Options/Network Options :: Enable reset on close

      Let me know how it goes. And definitely share the link to your write up when you’ve published it!

      Reply
      • “System Configuration/Miscellaneous Options/Network Options :: Enable reset on close” Seems to have done the trick! all the other options were already set as suggested, huge thanks for you help – ill try and write this up today 🙂

      • Great to hear! 🙂

  5. Hi Richard, finally got around to writing this up – hope this is of some use to you and others:
    https://www.deviousweb.com/2020/04/03/kemp-loadmaster-config-for-windows-always-on-vpn-with-ikev2/

    Reply
  6. Hi Richard,
    we have a new(ish) issue on the Kemps running 7.2.51.0.18987, configured a new Virtual service for IKE&SSTP (Services for 500/4500/443) that mirrors our live service. In testing, we note that the kemp sometimes doesnt pass the traffic through to the “live”server, and frequently disconnects clients. Rebooting the RAS server/client doesnt help, we have to go into the service on the Kemp, and disable/re-enable the ‘live’ server. Have you seen this behaviour before?

    Thanks!

    Reply
    • I have not myself. I’m running 7.2.51.0 in my lab and haven’t seen any issues. However, the load is quite light and connections made infrequently. I’m curious…is this happening for both SSTP and IKEv2? Or just one protocol in particular?

      Reply
      • After some further testing today, it’s just the virtual service running on 500 that requires us to disable renable, thankfully this is a test instance as we try and hone our profiles with Pfs and sstp fallback

  7. andrew

     /  April 29, 2021

    Hi DigitalFix

    I am having similar issue with Kemp loadbalancer , everything appears correct and works fine. If i switch off a server everything fails over correctly but when the server is re-instated it only accepts sstp not IKE unless I reboot the load balancer. I cannot see any server issues as it appears IKE does not reach the server.

    I have applied all fixes from both Richard and yourself and just wondering if there are any other known issues, I have a call logged with Kemp as well

    many thanks

    Reply
    • That’s unusual. Have you tried disabling IKE mobility on your VPN client to see if that changes the behavior?

      Reply
  8. Mits

     /  November 5, 2021

    Richard,

    Firstly, thank you for all of the advice and guidance you’ve provided… very much appreciated and very much followed!! 🙂

    I don’t know if you’ve seen this behaviour previously, however, we’re running AOVPN though a Kemp LoadMaster. It’s a relatively new deployment on 2019 (with the October 2021 patches) and we’ve got both device tunnels and SSTP user tunnels. Currently, we’re suffering from an issue whereby for device tunnels that have been connected for several hours, a restart of the device causes the device tunnel not to re-connect (for a good 2-4 hours).

    From a packet trace where the connection fails with an 809, it looks like the client’s initial IKE INIT packets are to one server, but then, when the port switches from 500 to 4500, the following packets are sent to another one of the servers.. This is despite the configuration for port following enabled (where the port 4500 virtual service follows the port 500 virtual service).

    I know you do some work with Kemp, hence the question…

    Reply
    • This is likely related to a known issue on the Kemp load balancer. If you have port following configured correctly, try setting the persistence timeout to 1 day. Let me know if that helps at all.

      Reply
      • Mits

         /  November 19, 2021

        Richard,

        Unfortunately, despite changing various variables / configurations within the LoadMaster on the advice of their support department, we never got to the bottom of the issue and in the end, replaced this with a load-balancer from another vendor (which worked flawlessly).

        Has anyone else had any luck with the LoadMaster? I don’t want to throw in the towel just yet!

      • I’ve deployed them for numerous customers with little issue. Recently I’ve encountered some issues with IKEv2, but implementing the changes suggested by Kemp seems to resolve the issue.

Leave a Reply to Richard M. HicksCancel reply

Discover more from Richard M. Hicks Consulting, Inc.

Subscribe now to keep reading and get access to the full archive.

Continue reading