Always On VPN IKEv2 Load Balancing with F5 BIG-IP

The Internet Key Exchange version 2 (IKEv2) is the protocol of choice for Always On VPN deployments where the highest level of security is required. Implementing Always On VPN at scale often requires multiple VPN servers to provide sufficient capacity and to provide redundancy. Commonly an Application Delivery Controller (ADC) or load balancer is configured in front of the VPN servers to provide scalability and high availability for Always On VPN.

Load Balancing IKEv2

In a recent post I described some of the unique challenges load balancing IKEv2 poses, and I demonstrated how to configure the Kemp LoadMaster load balancer to properly load balance IKEv2 VPN connections. In this post I’ll outline how to configure IKEv2 VPN load balancing on the F5 BIG-IP load balancer.

Note: This article assumes the administrator is familiar with basic F5 BIG-IP load balancer configuration, such as creating nodes, pools, virtual servers, etc.

Initial Configuration

Follow the steps below to create a virtual server on the F5 BIG-IP to load balance IKEv2 VPN connections.

Pool Configuration

To begin, create two pools on the load balancer. The first pool will be configured to use UDP port 500, and the second pool will be configured to use UDP port 4500. Each pool is configured with the VPN servers defined as the individual nodes.

Virtual Server Configuration

Next create two virtual servers, the first configured to use UDP port 500 and the second to use UDP port 4500.

To ensure reliable connectivity for IKEv2 connections it is necessary for the VPN server to see the client’s original source IP address. When configuring virtual server, select None from the Source Address Translation drop-down list.

Persistence Profile

To ensure that both IKEv2 UDP 500 and 4500 packets are delivered to the same node, follow the steps below to create and assign a Persistence Profile.

1. Expand Local Traffic > Profiles and click Persistence.
2. Click Create.
3. Enter a descriptive name for the profile in the Name field.
4. Select Source Address Affinity from the Persistence Type drop-down list.
5. Click the Custom check box.
6. Select the option to Match Across Services.
7. Click Finished.

Assign the new persistence profile to both UDP 500 and 4500 virtual servers. Navigate to the Resources tab on each virtual server and select the new persistence profile from the Default Persistence Profile drop-down list. Be sure to do this for both virtual servers.

Additional Resources

Windows 10 Always On VPN IKEv2 Load Balancing and NAT

Windows 10 Always On VPN IKEv2 Load Balancing with Kemp LoadMaster Load Balancer

Windows 10 Always On VPN IKEv2 Security Configuration

Windows 10 Always On VPN and IKEv2 Fragmentation

Windows 10 Always On VPN Certificate Requirements for IKEv2

Video: Windows 10 Always On VPN Load Balancing with the Kemp LoadMaster Load Balancer

57 Comments

by Richard M. Hicks on March 11, 2019 • Permalink

Posted by Richard M. Hicks on March 11, 2019

https://directaccess.richardhicks.com/2019/03/11/always-on-vpn-ikev2-load-balancing-with-f5-big-ip/

57 Comments

Adam
/ March 21, 2019

Hi Richard
How did you configure your health monitors for IKEV2?

Loading...

Reply
- Richard M. Hicks
  / March 21, 2019
  
  On the F5 you can use the default UDP monitor, which seems to work. With other platforms you may be limited to using ICMP/ping.
  
  Loading...
  
  Reply
  - Adam
    / March 21, 2019
    
    cool. Thanks Richard
    Love the F5 articles. Keep them coming
    
    Loading...
  - Richard M. Hicks
    / March 24, 2019
    
    You bet! 🙂
    
    Loading...
  - Adam
    / March 10, 2021
    
    Hi Richard, we’ve moved on from using udp monitors for the IKEv2 on the F5. Wanted to share some useful bits with you.
    
    Since the udp monitor marks the server as active as soon as it is up, which isn’t great.
    
    We now use a https monitor configured as per your guidance, but for the “Alias Service port”
    Set it to be 443 (https)
    
    This https monitor is then applied to the udp500 and 4500 pools. Since the alias service port is set to monitor 443 any other service that is using it is marked up when the alias service port is up.
    
    So when the RRAS services are up and therefore the monitor receives an “up” message, the udp nodes are marked as up too.
    
    Loading...
  - Richard M. Hicks
    / March 10, 2021
    
    I’ve done something similar in the past. It isn’t perfect, but better than nothing! 🙂
    
    Loading...
Jimmy
/ April 2, 2019

Hi Richard,

Regarding load balancing, is there anyway to load balance Always on without any appliance i.e f5, kemp etc, like using two servers running always on vpn, load balance without any appliance.

Loading...

Reply
- Richard M. Hicks
  / April 2, 2019
  
  Certainly. You could configure two public IP addresses (one for each VPN server) and then use DNS round robin to load balancer client requests. You could also use Windows Network Load Balancing (NLB).
  
  Loading...
  
  Reply
  - Matt
    / March 18, 2020
    
    Question – we’re trying this now with DNS Round Robin . I can ping the entry and get the 2 different external IP response each time after a flushdns. However, only one server is getting all the VPN connections.
    
    Both are valid as I can change my host file directly to each one and each VPN server will get the connection, but when using DNS round robin it’s still only going to one? Any thoughts?
    
    Loading...
  - Richard M. Hicks
    / March 18, 2020
    
    So ALL of your client connections are on one server? None on the other? It isn’t uncommon to see uneven load balancing using DNS round robin because many users are behind NAT (not just their own, but their ISPs NAT).
    
    Loading...
  - Adam
    / March 18, 2020
    
    we’ve seen this issue with DNS round robin before. The problem we saw was the first connection on udp500 was going to the first server and the nat traversal switch over on rdp4500 was connecting to the second IP. The issue might be persistence on the firewall/device being used not matching across services?
    
    Depending on your network if the connection always uses nat traversal then you’ll always see the connection on the second IP.
    
    Loading...
  - Richard M. Hicks
    / March 18, 2020
    
    No question the firewall/load balancer has to be configured to ensure that UDP 4500 connections go to the same backend server as the UDP 500 request went. On the F5 that is the “match across services” setting in the persistence profile. If that’s done correctly everything should work.
    
    Loading...
Zack
/ April 17, 2019

Thanks for the article. When we set it up we had trouble getting clients to move from one VPN server to the other. We found a couple settings on the F5 that made a difference. By default when creating a server the type is standard (which still allows for the UDP config but didn’t connect even with port settings specified). We had to update it to Performance (Layer 4) and set the Source Address Translation to Auto Map. After that we can move back and forth between two VPN servers easily. Should be great for a DR situation.

Loading...

Reply
- Richard M. Hicks
  / April 18, 2019
  
  My pleasure! I didn’t go in to the low-level detailed configuration of the F5 in this article, mostly because I expect the administrator will have intimate knowledge of F5 configuration. However, I typically use Performance L4 anyway and wasn’t aware there were issues with failover. Thanks for sharing!
  
  Loading...
  
  Reply
Vladislavs Dmuhovskis
/ October 14, 2019

Which load balancing method do you advice to use on F5 to balance IKEv2 across 2 RRAS servers.

Loading...

Reply
- Richard M. Hicks
  / October 21, 2019
  
  I typically use Least Connections (Member) to ensure equal distribution between servers and to speed up convergence after a server is restarted or a new server is added to the pool.
  
  Loading...
  
  Reply
Matthew Rawles
/ November 1, 2019

Hi Richard,

We have been struggling with load balancers and always-on, we currently use a Jet Nexus appliance to load balance IKE and SSTP, we have 2000+ configured users.

After a few days the load balancer stops allowing UDP sessions, random source IPs just cannot connect (SSTP is always ok). Rebooting the real servers fixes this in most cases, sometimes if you change your IP (say you move from ADSL to a mobile hotspot) you can get back in again ok.

I’ve been trying a KEMP and a BIG-IP load balancer out to replace the JetNexus, i dont have a lot of confidence in their product.

With the KEMP ive got a similar issue to the Jet, after a while you randomly receive error 809 messages on clients (i have been testing with 40 virtual windows 10 clients sat in a VLAN that NATs into the same subnet as the load balancer (so everything is at gbit speed). I’ve grouped the test clients so when they are translated they have unique public IPs for their group (so 5 or 6 will come from the same source public IP).

Load balancer is configures single-arm.

The Windows servers are 2019 and have the IKE fragmnetation reg fix.

If i distribute the clients over multiple ip addresses i will always get a couple that refuse to connect to the Kemp (but can connect fine to the real server). I’ve opened a case with Kemp on this but they say things like “we dont have many customers load balancing UDP 500/4500”.

So finally i’ve been trying a BIG-IP from F5, this i managed to get working so all 40 test systems connect, which is great but i’m seeing awful performance (500ms ping times, when the response should be 3ms over this test network).

With Kemp there are detailed templates and guides to the correct settings to make Always-on behave, i cannot find any guides or examples of the correct settings for the services on F5, do you know of any ?

Have you seen issues with device VPN users getting 809 errors when you have 100s of clients connecting via a load balancer ? (we typically have about 300-400 at any one time).

Thanks

Matthew Rawles
NHS (UK)

Loading...

Reply
- Richard M. Hicks
  / November 1, 2019
  
  Can you try setting the following registry value on your RRAS servers and restarting the IKEEXT service please? Here are the PowerShell commands to do this.
  
  New-ItemProperty -Path ‘HKLM:SYSTEM\CurrentControlSet\Services\IKEEXT\Parameters\’ -Name IkeNumEstablishedForInitialQuery -PropertyType DWORD -Value 50000 -Force
  Restart-Service IKEEXT -Force -PassThru
  
  Let me know if this solves your problem or not. 🙂
  
  Loading...
  
  Reply
  - Matthew Rawles
    / November 2, 2019
    
    Hi Richard,
    
    I’m not sure thats made a difference, (this is testing the Kemp LB), my 40 test clients all initially connect ok.
    
    Then if i power them down (making sure the RRAS server has no connections showing), change the source IP the clients come from, and power them back up not all of the 40 connect ok (maybe 2 or 3 fail).
    
    On one failed client, if i move the client to a different network (changing its source IP) the client then connects. Moving it back to the orginal source IP it then refuses to connect (error 809).
    
    If it is RRAS rejecting the connections is there a way to increase the logging to see this ?
    
    I’ll put that reg fix on our production (Jet Nexus LB) setup, see if that helps, are their any list of RRAS tweaks like this ?
    
    I’d also like to get the F5 tets i have working (but unlike Kemp there is nothing i can find online on the best way to configure the F5 LB, no detail, just your helpful summary). Over F5 we seem to get connections ok but the VPN is unusably slow (very high latency). There must be a setting I have wrong on that LB.
    
    Thanks
    
    Matthew Rawles
    
    Loading...
  - Richard M. Hicks
    / November 4, 2019
    
    It certainly sounds like it is an IPsec issue. Hoping that registry entry makes a different. You can enable debug logging in the RRAS management console which should provide more detail for you. Network traces might be useful too.
    
    Loading...
  - Matthew Rawles
    / November 2, 2019
    
    Hi Richard,
    
    I just noticed that one of my test RRAS servers only had 2 IKE ports enabled on it, not sure how I missed that, so that may have been the root cause of some of the odd 809 errors.
    
    i’ll apply your reg fix to our production servers (and I think i’ll increase the number of IKEv2 ports from the 1024 i’d set per server to a much bigger value per server, if these are being held by the LB based UDP connections and Windows isn’t freeing them up that might be why we seem to run out).
    
    I’ll feed back later this week on how we get on with that.
    
    Thanks again
    
    Matthew Rawles
    
    Loading...
  - Richard M. Hicks
    / November 4, 2019
    
    Indeed, not having enough VPN ports provisioned could be problematic. Also, it’s a good idea to overpvosion those ports just to be on the safe side. 🙂
    
    Loading...
Elliot Sandell
/ November 25, 2019

Hi Richard, is it possible to use a Citrix Netscaler to Load Balance? Have you any configuration details you could share?

Thanks

Elliot (NHS)

Loading...

Reply
- Richard M. Hicks
  / November 25, 2019
  
  Yes, absolutely. I haven’t documented it yet though. It’s on my list of things to do for sure. Look for that article to be published sometime in the next month or so, hopefully. 🙂
  
  Loading...
  
  Reply
Matt Klein
/ April 9, 2020

In order to have no SNAT on the Loadbalancer – does the Default Gateway of the VPN server need to be set to the F5?

We’ve been told by networking that in order to set the Source Address Translation to ‘None’ the default gateway on the VPN server needs to point to the F5?

Loading...

Reply
- Richard M. Hicks
  / April 10, 2020
  
  Yes, that’s correct.
  
  Loading...
  
  Reply
Chris
/ June 15, 2020

Hi Richard, your article was a lifesaver for us. Thank you. Do you have any experience configuring an F5 UDP Health Monitor for Always On VPN?

Loading...

Reply
- Richard M. Hicks
  / June 18, 2020
  
  I usually just use the default UDP monitor and it seems to work. 🙂
  
  Loading...
  
  Reply
  - trotterd
    / July 1, 2020
    
    Hi Richard, I hope you are well.
    
    I am also interested in the F5 UDP Health Monitor configuration as we see on three different AOVPN environments using F5 load balancing that the UDP health monitor is not working. It generates errors on the Eventlog on the RRAS VPN servers and we see that the UDP health monitor seems to stay up even though the service was down.
    
    I have been asked by our Comms team who manage the F5 if we are able to allow “ICPM port unreachable” messages to be sent out from the RRAS VPN servers, but I don’t find much information about this on the Internet.
    
    Regards,
    
    Dave
    
    Loading...
  - Richard M. Hicks
    / July 1, 2020
    
    By default, the Windows firewall blocks all ICMP port unreachable and TCP resets for ports that don’t have an application listening on them. You can enable these messages by disabling “stealth mode” on the Windows Firewall. I’d suggest doing this only on the Public and Private profiles, not the domain profile. Have a look at the following reference articles for more information.
    
    https://docs.microsoft.com/en-us/previous-versions/windows/it-pro/windows-server-2008-R2-and-2008/dd448557(v=ws.10)
    https://support.microsoft.com/en-au/help/2586744/disable-stealth-mode-in-windows/
    
    Let me know if you have any success with this!
    
    Loading...
  - trotterd
    / July 2, 2020
    
    Hi Richard, thanks very much for the info. I actually found this article later on yesterday and was curious if this worked as the OS versions say Windows Server 2008 and want sure if the registry settings were applicable for 2019.
    
    I will give this a try and see if it works.
    
    Regards
    
    Dave
    
    Loading...
  - Richard M. Hicks
    / July 7, 2020
    
    Should work. Let me know how it goes!
    
    Loading...
  - trotterd
    / July 10, 2020
    
    Hi Richard, I hope you are well.
    
    Just an update to the issue with F5 UDP health monitors not working. I have tested disabling stealth mode on the public and private firewall profiles, but if I stop the RRAS service the health monitors stay green. If I disabled the external network card, the ICMP health monitor goddess red but the UDP health monitors still stay green. So still no further forward and our Comms team are looking at possible creating custom monitors.
    
    Regards
    
    Dave
    
    Loading...
  - Richard M. Hicks
    / July 10, 2020
    
    Quite unusual. Keep me posted if you learn anything more or find a working solution for this. 🙂
    
    Loading...
  - trotterd
    / July 10, 2020
    
    I will do, thanks Richard. Have a great weekend.
    
    Loading...
Enfield303
/ January 19, 2021

Hello Richard, is there a particular reason to have seperate services for 500 and 4500 traffic?

Loading...

Reply
- Richard M. Hicks
  / January 21, 2021
  
  Not really. You can combine services if you like. 🙂
  
  Loading...
  
  Reply
DGoossens
/ January 28, 2021

Hi Richard,
We are using device + user tunnel. (device ikev2, user SSTP)

For the device tunnel, we’ve configured a monitor on port 443, since UDP monitoring isn’t working on the F5.
What I see is that, if I shutdown the RRAS service on a server on which I’m connected, and have an active device tunnel, it fails, and I’m unable to reconnect.
The monitor marks the server as inactive, but apparently they still see a connection on port 4500.
On the server itself, I don’t see any connection.

I really need to restart the RRAS server to make sure I can connect again.
Do you know what might be the reason?

Loading...

Reply
- Richard M. Hicks
  / January 29, 2021
  
  For IKEv2, try disabling IKE Mobility. You can do those by going to Security > Advanced Settings on the VPN profile. Uncheck the box next to Mobility. For SSTP, you’ll have to configure the load balancer to issue a TCP reset when the real servers fails.
  
  Loading...
  
  Reply
wasnlos11Stefan
/ February 18, 2021

Dear Richard,
first of all, thank you for your wonderful articles on Always On VPN!
We currently have an exciting phenomenon when using F5 load balancers in conjunction with ALON (Device & User Tunnel).
It can be observed that the Device Tunnel “flaps” briefly every 6-7 seconds. This causes the client to be assigned a new tunnel IP, which in turn leads to problems with various applications (SIP).
Do you know this phenomenon and can you give us a hint?

Loading...

Reply
- Richard M. Hicks
  / February 24, 2021
  
  I’m not sure, to be honest. That’s not something I’ve come across myself.
  
  Loading...
  
  Reply
  - wasnlos11
    / February 25, 2021
    
    Thanks first of all for the quick reply. I actually had a typo in my first comment: the devicetunnel flaps every 6-7 MINUTES (not seconds). I guess that doesn’t change the fact that this behavior seems to occur only with our own infrastructure …
    Thanks anyway and I hope we will find the problem soon.
    Best regards
    
    Loading...
  - Richard M. Hicks
    / February 27, 2021
    
    Let me know if you find a root cause!
    
    Loading...
EL KOURI
/ March 8, 2021

Hello Richard,

Would like forst of all to thank you for your interesting web site and experiences.

We have AOVPN and we are facing random dis connections of about 80% connected users. Checked all FW and 5F configuration but all seems good.

Do you have some idea where can I find the root cause ?

We have three VIPs : one for each port (udp500, udp4500, and http443), and clients are configured on Automatic protocol.

Thank yoh very much for your help.

Loading...

Reply
- Richard M. Hicks
  / March 9, 2021
  
  There are numerous things that could cause this. Most commonly it is intermediary equipment (routers, firewalls, etc.) and even on-premises equipment. To find the root cause you’ll likely have to open a support case with Microsoft to have them take a closer look at things.
  
  Loading...
  
  Reply
wasnlos11
/ March 10, 2021

Hi there,
we are facing a similar issue (see above) …
Is the F5 doing a full nat (SNAT and DNAT) on the incoming UDP traffic or just an DNAT?
We do have notices from different players which recomend NOT to use SNAT on the F5.
Best regards

Loading...

Reply
- Richard M. Hicks
  / March 10, 2021
  
  By default the F5 “proxies” the connection, which results in client connections appearing to come from the F5, not the original client’s source IP address. This is, in effect, full NAT. You’ll need to configure the F5 to pass the client’s original source IP address to avoid some of the issues I’ve outlined in this post.
  
  Loading...
  
  Reply
Andrey I Zasypkin
/ October 6, 2021

Hi Richard, for AlwayOnVPN, can Device Tunnel mode and user Tunnel modes be terminated/configured with F5 VPN server, is it supported..? or only MS RRAS Server is an option when it comes to Device mode (pre logon connection). it does not have to be load balanced in my scenario.
thank you,
you have been such a great resource for my previous Direct Access deployments..

Loading...

Reply
- Richard M. Hicks
  / October 6, 2021
  
  You are not limited to using RRAS, for sure. In fact, you can use any VPN device you like, including F5, as long as it supports IKEv2 for client-based VPN connections, or the vendor has a plug-in VPN client (Windows Store). More details here: https://directaccess.richardhicks.com/2019/01/17/always-on-vpn-and-third-party-vpn-devices/.
  
  Loading...
  
  Reply
Maxim TAZZI
/ April 1, 2022

Hello Richard,
does it work with Cisco ASA VPN / Anyconnect
Thank you

Loading...

Reply
- Richard M. Hicks
  / April 5, 2022
  
  It should, yes.
  
  Loading...
  
  Reply
Nate G
/ December 28, 2022

Hello Richard, first off I just want to thank you for the invaluable info you make available for free. We wouldn’t have been able to setup AOVPN without the info on your site.

Question: Does it matter that the device tunnel and user tunnel connections land on separate AOVPN servers? I am troubleshooting a user (we only have 4 beta testers on AOVPN at the moment) who connect to user and device tunnel successfully on startup, but shortly afterwards he is suddenly unable to access internal resources. I’ve not been able to reproduce the issue so I am looking for any misconfigurations on the AOVPN servers or the F5.

Loading...

Reply
- Richard M. Hicks
  / December 28, 2022
  
  Shouldn’t matter at all. I see this quite commonly, in fact. 🙂
  
  Loading...
  
  Reply
steff94
/ June 2, 2023

Hi all,
what timeout values do you recommend to be set in the configuration of the F5 VIP? 300 seconds? Less? more? 🙂

Loading...

Reply
- Richard M. Hicks
  / June 2, 2023
  
  300 seconds (5 minutes) is most common.
  
  Loading...
  
  Reply