Always On VPN IKEv2 Load Balancing with F5 BIG-IP

Always On VPN IKEv2 Load Balancing with F5 BIG-IPThe Internet Key Exchange version 2 (IKEv2) is the protocol of choice for Always On VPN deployments where the highest level of security is required. Implementing Always On VPN at scale often requires multiple VPN servers to provide sufficient capacity and to provide redundancy. Commonly an Application Delivery Controller (ADC) or load balancer is configured in front of the VPN servers to provide scalability and high availability for Always On VPN.

Load Balancing IKEv2

In a recent post I described some of the unique challenges load balancing IKEv2 poses, and I demonstrated how to configure the Kemp LoadMaster load balancer to properly load balance IKEv2 VPN connections. In this post I’ll outline how to configure IKEv2 VPN load balancing on the F5 BIG-IP load balancer.

Note: This article assumes the administrator is familiar with basic F5 BIG-IP load balancer configuration, such as creating nodes, pools, virtual servers, etc.

Initial Configuration

Follow the steps below to create a virtual server on the F5 BIG-IP to load balance IKEv2 VPN connections.

Pool Configuration

To begin, create two pools on the load balancer. The first pool will be configured to use UDP port 500, and the second pool will be configured to use UDP port 4500. Each pool is configured with the VPN servers defined as the individual nodes.

Always On VPN IKEv2 Load Balancing with F5 BIG-IP

Virtual Server Configuration

Next create two virtual servers, the first configured to use UDP port 500 and the second to use UDP port 4500.

Always On VPN IKEv2 Load Balancing with F5 BIG-IP

To ensure reliable connectivity for IKEv2 connections it is necessary for the VPN server to see the client’s original source IP address. When configuring virtual server, select None from the Source Address Translation drop-down list.

Always On VPN IKEv2 Load Balancing and NAT

Persistence Profile

To ensure that both IKEv2 UDP 500 and 4500 packets are delivered to the same node, follow the steps below to create and assign a Persistence Profile.

1. Expand Local Traffic > Profiles and click Persistence.
2. Click Create.
3. Enter a descriptive name for the profile in the Name field.
4. Select Source Address Affinity from the Persistence Type drop-down list.
5. Click the Custom check box.
6. Select the option to Match Across Services.
7. Click Finished.

Always On VPN IKEv2 Load Balancing with F5 BIG-IP

Assign the new persistence profile to both UDP 500 and 4500 virtual servers. Navigate to the Resources tab on each virtual server and select the new persistence profile from the Default Persistence Profile drop-down list. Be sure to do this for both virtual servers.

Always On VPN IKEv2 Load Balancing with F5 BIG-IP

Additional Resources

Windows 10 Always On VPN IKEv2 Load Balancing and NAT

Windows 10 Always On VPN IKEv2 Load Balancing with Kemp LoadMaster Load Balancer 

Windows 10 Always On VPN IKEv2 Security Configuration

Windows 10 Always On VPN and IKEv2 Fragmentation

Windows 10 Always On VPN Certificate Requirements for IKEv2

Video: Windows 10 Always On VPN Load Balancing with the Kemp LoadMaster Load Balancer

Leave a comment

57 Comments

  1. Adam

     /  March 21, 2019

    Hi Richard
    How did you configure your health monitors for IKEV2?

    Reply
    • On the F5 you can use the default UDP monitor, which seems to work. With other platforms you may be limited to using ICMP/ping.

      Reply
      • Adam

         /  March 21, 2019

        cool. Thanks Richard
        Love the F5 articles. Keep them coming

      • You bet! 🙂

      • Adam

         /  March 10, 2021

        Hi Richard, we’ve moved on from using udp monitors for the IKEv2 on the F5. Wanted to share some useful bits with you.

        Since the udp monitor marks the server as active as soon as it is up, which isn’t great.

        We now use a https monitor configured as per your guidance, but for the “Alias Service port”
        Set it to be 443 (https)

        This https monitor is then applied to the udp500 and 4500 pools. Since the alias service port is set to monitor 443 any other service that is using it is marked up when the alias service port is up.

        So when the RRAS services are up and therefore the monitor receives an “up” message, the udp nodes are marked as up too.

      • I’ve done something similar in the past. It isn’t perfect, but better than nothing! 🙂

  2. Jimmy

     /  April 2, 2019

    Hi Richard,

    Regarding load balancing, is there anyway to load balance Always on without any appliance i.e f5, kemp etc, like using two servers running always on vpn, load balance without any appliance.

    Reply
    • Certainly. You could configure two public IP addresses (one for each VPN server) and then use DNS round robin to load balancer client requests. You could also use Windows Network Load Balancing (NLB).

      Reply
      • Matt

         /  March 18, 2020

        Question – we’re trying this now with DNS Round Robin . I can ping the entry and get the 2 different external IP response each time after a flushdns. However, only one server is getting all the VPN connections.

        Both are valid as I can change my host file directly to each one and each VPN server will get the connection, but when using DNS round robin it’s still only going to one? Any thoughts?

      • So ALL of your client connections are on one server? None on the other? It isn’t uncommon to see uneven load balancing using DNS round robin because many users are behind NAT (not just their own, but their ISPs NAT).

      • Adam

         /  March 18, 2020

        we’ve seen this issue with DNS round robin before. The problem we saw was the first connection on udp500 was going to the first server and the nat traversal switch over on rdp4500 was connecting to the second IP. The issue might be persistence on the firewall/device being used not matching across services?

        Depending on your network if the connection always uses nat traversal then you’ll always see the connection on the second IP.

      • No question the firewall/load balancer has to be configured to ensure that UDP 4500 connections go to the same backend server as the UDP 500 request went. On the F5 that is the “match across services” setting in the persistence profile. If that’s done correctly everything should work.

  3. Zack

     /  April 17, 2019

    Thanks for the article. When we set it up we had trouble getting clients to move from one VPN server to the other. We found a couple settings on the F5 that made a difference. By default when creating a server the type is standard (which still allows for the UDP config but didn’t connect even with port settings specified). We had to update it to Performance (Layer 4) and set the Source Address Translation to Auto Map. After that we can move back and forth between two VPN servers easily. Should be great for a DR situation.

    Reply
    • My pleasure! I didn’t go in to the low-level detailed configuration of the F5 in this article, mostly because I expect the administrator will have intimate knowledge of F5 configuration. However, I typically use Performance L4 anyway and wasn’t aware there were issues with failover. Thanks for sharing!

      Reply
  4. Vladislavs Dmuhovskis

     /  October 14, 2019

    Which load balancing method do you advice to use on F5 to balance IKEv2 across 2 RRAS servers.

    Reply
    • I typically use Least Connections (Member) to ensure equal distribution between servers and to speed up convergence after a server is restarted or a new server is added to the pool.

      Reply
  5. Matthew Rawles

     /  November 1, 2019

    Hi Richard,

    We have been struggling with load balancers and always-on, we currently use a Jet Nexus appliance to load balance IKE and SSTP, we have 2000+ configured users.

    After a few days the load balancer stops allowing UDP sessions, random source IPs just cannot connect (SSTP is always ok). Rebooting the real servers fixes this in most cases, sometimes if you change your IP (say you move from ADSL to a mobile hotspot) you can get back in again ok.

    I’ve been trying a KEMP and a BIG-IP load balancer out to replace the JetNexus, i dont have a lot of confidence in their product.

    With the KEMP ive got a similar issue to the Jet, after a while you randomly receive error 809 messages on clients (i have been testing with 40 virtual windows 10 clients sat in a VLAN that NATs into the same subnet as the load balancer (so everything is at gbit speed). I’ve grouped the test clients so when they are translated they have unique public IPs for their group (so 5 or 6 will come from the same source public IP).

    Load balancer is configures single-arm.

    The Windows servers are 2019 and have the IKE fragmnetation reg fix.

    If i distribute the clients over multiple ip addresses i will always get a couple that refuse to connect to the Kemp (but can connect fine to the real server). I’ve opened a case with Kemp on this but they say things like “we dont have many customers load balancing UDP 500/4500”.

    So finally i’ve been trying a BIG-IP from F5, this i managed to get working so all 40 test systems connect, which is great but i’m seeing awful performance (500ms ping times, when the response should be 3ms over this test network).

    With Kemp there are detailed templates and guides to the correct settings to make Always-on behave, i cannot find any guides or examples of the correct settings for the services on F5, do you know of any ?

    Have you seen issues with device VPN users getting 809 errors when you have 100s of clients connecting via a load balancer ? (we typically have about 300-400 at any one time).

    Thanks

    Matthew Rawles
    NHS (UK)

    Reply
    • Can you try setting the following registry value on your RRAS servers and restarting the IKEEXT service please? Here are the PowerShell commands to do this.

      New-ItemProperty -Path ‘HKLM:SYSTEM\CurrentControlSet\Services\IKEEXT\Parameters\’ -Name IkeNumEstablishedForInitialQuery -PropertyType DWORD -Value 50000 -Force
      Restart-Service IKEEXT -Force -PassThru

      Let me know if this solves your problem or not. 🙂

      Reply
      • Matthew Rawles

         /  November 2, 2019

        Hi Richard,

        I’m not sure thats made a difference, (this is testing the Kemp LB), my 40 test clients all initially connect ok.

        Then if i power them down (making sure the RRAS server has no connections showing), change the source IP the clients come from, and power them back up not all of the 40 connect ok (maybe 2 or 3 fail).

        On one failed client, if i move the client to a different network (changing its source IP) the client then connects. Moving it back to the orginal source IP it then refuses to connect (error 809).

        If it is RRAS rejecting the connections is there a way to increase the logging to see this ?

        I’ll put that reg fix on our production (Jet Nexus LB) setup, see if that helps, are their any list of RRAS tweaks like this ?

        I’d also like to get the F5 tets i have working (but unlike Kemp there is nothing i can find online on the best way to configure the F5 LB, no detail, just your helpful summary). Over F5 we seem to get connections ok but the VPN is unusably slow (very high latency). There must be a setting I have wrong on that LB.

        Thanks

        Matthew Rawles

      • It certainly sounds like it is an IPsec issue. Hoping that registry entry makes a different. You can enable debug logging in the RRAS management console which should provide more detail for you. Network traces might be useful too.

      • Matthew Rawles

         /  November 2, 2019

        Hi Richard,

        I just noticed that one of my test RRAS servers only had 2 IKE ports enabled on it, not sure how I missed that, so that may have been the root cause of some of the odd 809 errors.

        i’ll apply your reg fix to our production servers (and I think i’ll increase the number of IKEv2 ports from the 1024 i’d set per server to a much bigger value per server, if these are being held by the LB based UDP connections and Windows isn’t freeing them up that might be why we seem to run out).

        I’ll feed back later this week on how we get on with that.

        Thanks again

        Matthew Rawles

      • Indeed, not having enough VPN ports provisioned could be problematic. Also, it’s a good idea to overpvosion those ports just to be on the safe side. 🙂

  6. Elliot Sandell

     /  November 25, 2019

    Hi Richard, is it possible to use a Citrix Netscaler to Load Balance? Have you any configuration details you could share?

    Thanks

    Elliot (NHS)

    Reply
    • Yes, absolutely. I haven’t documented it yet though. It’s on my list of things to do for sure. Look for that article to be published sometime in the next month or so, hopefully. 🙂

      Reply
  7. Matt Klein

     /  April 9, 2020

    In order to have no SNAT on the Loadbalancer – does the Default Gateway of the VPN server need to be set to the F5?

    We’ve been told by networking that in order to set the Source Address Translation to ‘None’ the default gateway on the VPN server needs to point to the F5?

    Reply
  8. Chris

     /  June 15, 2020

    Hi Richard, your article was a lifesaver for us. Thank you. Do you have any experience configuring an F5 UDP Health Monitor for Always On VPN?

    Reply
    • I usually just use the default UDP monitor and it seems to work. 🙂

      Reply
      • Hi Richard, I hope you are well.

        I am also interested in the F5 UDP Health Monitor configuration as we see on three different AOVPN environments using F5 load balancing that the UDP health monitor is not working. It generates errors on the Eventlog on the RRAS VPN servers and we see that the UDP health monitor seems to stay up even though the service was down.

        I have been asked by our Comms team who manage the F5 if we are able to allow “ICPM port unreachable” messages to be sent out from the RRAS VPN servers, but I don’t find much information about this on the Internet.

        Regards,

        Dave

      • By default, the Windows firewall blocks all ICMP port unreachable and TCP resets for ports that don’t have an application listening on them. You can enable these messages by disabling “stealth mode” on the Windows Firewall. I’d suggest doing this only on the Public and Private profiles, not the domain profile. Have a look at the following reference articles for more information.

        https://docs.microsoft.com/en-us/previous-versions/windows/it-pro/windows-server-2008-R2-and-2008/dd448557(v=ws.10)
        https://support.microsoft.com/en-au/help/2586744/disable-stealth-mode-in-windows/

        Let me know if you have any success with this!

      • Hi Richard, thanks very much for the info. I actually found this article later on yesterday and was curious if this worked as the OS versions say Windows Server 2008 and want sure if the registry settings were applicable for 2019.

        I will give this a try and see if it works.

        Regards

        Dave

      • Should work. Let me know how it goes!

      • Hi Richard, I hope you are well.

        Just an update to the issue with F5 UDP health monitors not working. I have tested disabling stealth mode on the public and private firewall profiles, but if I stop the RRAS service the health monitors stay green. If I disabled the external network card, the ICMP health monitor goddess red but the UDP health monitors still stay green. So still no further forward and our Comms team are looking at possible creating custom monitors.

        Regards

        Dave

      • Quite unusual. Keep me posted if you learn anything more or find a working solution for this. 🙂

      • I will do, thanks Richard. Have a great weekend.

  9. Enfield303

     /  January 19, 2021

    Hello Richard, is there a particular reason to have seperate services for 500 and 4500 traffic?

    Reply
  10. DGoossens

     /  January 28, 2021

    Hi Richard,
    We are using device + user tunnel. (device ikev2, user SSTP)

    For the device tunnel, we’ve configured a monitor on port 443, since UDP monitoring isn’t working on the F5.
    What I see is that, if I shutdown the RRAS service on a server on which I’m connected, and have an active device tunnel, it fails, and I’m unable to reconnect.
    The monitor marks the server as inactive, but apparently they still see a connection on port 4500.
    On the server itself, I don’t see any connection.

    I really need to restart the RRAS server to make sure I can connect again.
    Do you know what might be the reason?

    Reply
    • For IKEv2, try disabling IKE Mobility. You can do those by going to Security > Advanced Settings on the VPN profile. Uncheck the box next to Mobility. For SSTP, you’ll have to configure the load balancer to issue a TCP reset when the real servers fails.

      Reply
  11. wasnlos11Stefan

     /  February 18, 2021

    Dear Richard,
    first of all, thank you for your wonderful articles on Always On VPN!
    We currently have an exciting phenomenon when using F5 load balancers in conjunction with ALON (Device & User Tunnel).
    It can be observed that the Device Tunnel “flaps” briefly every 6-7 seconds. This causes the client to be assigned a new tunnel IP, which in turn leads to problems with various applications (SIP).
    Do you know this phenomenon and can you give us a hint?

    Reply
    • I’m not sure, to be honest. That’s not something I’ve come across myself.

      Reply
      • Thanks first of all for the quick reply. I actually had a typo in my first comment: the devicetunnel flaps every 6-7 MINUTES (not seconds). I guess that doesn’t change the fact that this behavior seems to occur only with our own infrastructure …
        Thanks anyway and I hope we will find the problem soon.
        Best regards

      • Let me know if you find a root cause!

  12. EL KOURI

     /  March 8, 2021

    Hello Richard,

    Would like forst of all to thank you for your interesting web site and experiences.

    We have AOVPN and we are facing random dis connections of about 80% connected users. Checked all FW and 5F configuration but all seems good.

    Do you have some idea where can I find the root cause ?

    We have three VIPs : one for each port (udp500, udp4500, and http443), and clients are configured on Automatic protocol.

    Thank yoh very much for your help.

    Reply
    • There are numerous things that could cause this. Most commonly it is intermediary equipment (routers, firewalls, etc.) and even on-premises equipment. To find the root cause you’ll likely have to open a support case with Microsoft to have them take a closer look at things.

      Reply
  13. Hi there,
    we are facing a similar issue (see above) …
    Is the F5 doing a full nat (SNAT and DNAT) on the incoming UDP traffic or just an DNAT?
    We do have notices from different players which recomend NOT to use SNAT on the F5.
    Best regards

    Reply
    • By default the F5 “proxies” the connection, which results in client connections appearing to come from the F5, not the original client’s source IP address. This is, in effect, full NAT. You’ll need to configure the F5 to pass the client’s original source IP address to avoid some of the issues I’ve outlined in this post.

      Reply
  14. Andrey I Zasypkin

     /  October 6, 2021

    Hi Richard, for AlwayOnVPN, can Device Tunnel mode and user Tunnel modes be terminated/configured with F5 VPN server, is it supported..? or only MS RRAS Server is an option when it comes to Device mode (pre logon connection). it does not have to be load balanced in my scenario.
    thank you,
    you have been such a great resource for my previous Direct Access deployments..

    Reply
  15. Maxim TAZZI

     /  April 1, 2022

    Hello Richard,
    does it work with Cisco ASA VPN / Anyconnect
    Thank you

    Reply
  16. Nate G

     /  December 28, 2022

    Hello Richard, first off I just want to thank you for the invaluable info you make available for free. We wouldn’t have been able to setup AOVPN without the info on your site.

    Question: Does it matter that the device tunnel and user tunnel connections land on separate AOVPN servers? I am troubleshooting a user (we only have 4 beta testers on AOVPN at the moment) who connect to user and device tunnel successfully on startup, but shortly afterwards he is suddenly unable to access internal resources. I’ve not been able to reproduce the issue so I am looking for any misconfigurations on the AOVPN servers or the F5.

    Reply
  17. steff94

     /  June 2, 2023

    Hi all,
    what timeout values do you recommend to be set in the configuration of the F5 VIP? 300 seconds? Less? more? 🙂

    Reply
  1. Always On VPN IKEv2 Features and Limitations | Richard M. Hicks Consulting, Inc.
  2. Always On VPN IKEv2 Load Balancing and NAT | Richard M. Hicks Consulting, Inc.

Leave a Reply to Matt KleinCancel reply

Discover more from Richard M. Hicks Consulting, Inc.

Subscribe now to keep reading and get access to the full archive.

Continue reading