The Internet Key Exchange version 2 (IKEv2) VPN protocol is the protocol of choice when the highest level of security is required for Always On VPN connections. It uses IPsec and features configurable security parameters that allow administrators to adjust policies to meet their specific security requirements. IKEv2 is not without some important limitations, but organizations may insist on the use of IKEv2 to provide the greatest protection possible for remote connected clients. Due to complexities of the IKEv2 transport, special configuration on the Citrix ADC (formerly NetScaler) is required when load balancing this workload.
Special Note: In December 2019 a serious security vulnerability was discovered on the Citrix ADC that gives an unauthenticated attacker the ability to arbitrarily execute code on the appliance. As of this writing a fix is not available (due end of January 2020) but a temporary workaround can be found here.
Load Balancing IKEv2
When an Always On VPN client establishes a connection using IKEv2, communication begins on UDP port 500, but switches to UDP port 4500 if Network Address Translation (NAT) is detected in the communication path between the client and the server. Because UDP is connectionless, custom configuration is required to ensure that VPN clients maintain connectivity to the same backend VPN server during this transition.
Initial Configuration
Load balancing IKEv2 using the Citrix ADC is similar to other workloads. Below are specific settings and parameters required to load balance IKEv2 using the Citrix ADC.
Note: This article is not a comprehensive configuration guide for the Citrix ADC. It assumes the administrator is familiar with basic load balancing concepts and has experience configuring the Citrix ADC.
Service Settings
The load balancing services for IKEv2 VPN will use UDP ports 500 and 4500. Create the service group and assign group members for UDP 500 as follows.
Repeat the steps above to create the service group for UDP port 4500.
Virtual Server Settings
Two virtual servers are required, one for UDP port 500 and one for UDP port 4500. Ensure that the service group using UDP port 500 is bound to the virtual server using the same port.
Repeat the steps above to create the virtual service for UDP port 4500.
Service Monitoring
Since IKEv2 uses the UDP protocol, the only option for service monitoring is to use PING, which is configured by default. Ensure that the firewall on the VPN server allows inbound ICMPv4 and ICMPv6 Echo Request. The default PING monitor on the Citrix ADC will ping the resource every 5 seconds. If a different interval is required, the administrator can edit the PING monitor and bind that to the service or service group as necessary.
Persistency Group
A Persistency Group on the Citrix ADC will be configured to ensure that IKEv2 VPN client requests from the same client are always routed to the same backend server. Follow the steps below to create a Persistency Group and assign it to both IKEv2 virtual servers created previously.
- In the Citrix ADC management console expand Traffic Management > Load Balancing > Persistency Groups.
- Click Add.
- Enter a descriptive name for the Persistency Group.
- Select SOURCEIP from the Persistence drop-down list.
- Next to the Virtual Server Name section click the Add button.
- Add both previously configured IKEv2 virtual servers for UDP 500 and 4500.
- Click Create.
Use Client IP
To ensure reliable connectivity for IKEv2 VPN connections it is necessary for the VPN server to see the client’s original source IP address. Follow the steps below to configure the Service Group to forward the client’s IP address to the VPN server.
- In the Citrix ADC management console expand System, click Settings, and then click Configure Modes.
- Select Use Subnet IP.
- Click Ok.
- Expand Traffic Management, click Load Balancing, and then click Service Groups.
- Select the IKEv2 UDP 500 Service Group.
- Click Edit in the Settings section.
- Select Use Client IP.
- Repeat these steps on the IKEv2 UDP 4500 Service Group.
Note: Making the above changes will require configuring the VPN server to use the Citrix ADC as its default gateway.
Additional Information
Windows 10 Always On VPN IKEv2 Load Balancing and NAT
Windows 10 Always On VPN SSTP Load Balancing with Citrix NetScaler ADC
Windows 10 Always On VPN IKEv2 Features and Limitations
Windows 10 AlWAYS On VPN and IKEv2 Fragmentation
Alan
/ March 13, 2020Thanks Richard for doing up the instructions for Citrix / AOVPN / IKEv2. I had originally set this up like what you had but other issues led me to reconfigure with Protocol ANY and ANY port and seems to work ok too. (have a listen policy for udp 500 / 4500 to lock it down a bit).
One issue we came across is multiple clients sitting behind a single IP (internet breakout) which probably causes havoc for any type of persistent session. Still haven’t fully addressed that but I think that’s more our environment rather than a fault in the product 🙂
Do you find it worth tweaking any of the Client Timeout settings within the vServer to help AOVPN when any disconnects / reconnects happen. Going to experiment as the way you have approached it allows finer tweaking of the UDP500 connection which would become obsolete once the NAT was detected (or so I am led to believe). Might be worth timing that connection out after 30sec instead of the default 120.
Anyway Thanks.
Richard M. Hicks
/ March 18, 2020I haven’t spent a lot of time fine tuning those settings on the NetScaler. The defaults seem to work ok anyway, but if you have different experience feel free to share! FYI, you might want to run the following PowerShell script to help with some of those issues you are expererincing: https://github.com/richardhicks/aovpn/blob/master/Set-IKEv2LoadBalancingParameters.ps1.
Alan
/ March 20, 2020Many thank Richard as always. Will check out the ps/parameters and adjust if not already set to test. Will post back any findings relating to timeouts also.
Aaron
/ March 31, 2020Hi Richard,
Thanks for all the articles on AlwaysOn. We are currently load balancing our RAS boxes behind a netscaler, but seeing the same issue as was mentioned in the comments of your article about load balancing with a Kemp (RAS servers not liking everything coming from a single IP).
Without enabling USIP on the netscaler, since we would need to then change the default gateway of the RAS servers, do you know of any way to insert the client ip from the netscaler to the RAS server?
Thanks.
Richard M. Hicks
/ March 31, 2020No. The RRAS server has to see the client’s original source IP address in the UDP packet it receives from the NetScaler.
Greg
/ June 12, 2020Hi Richard,
First of all thanks for all the great articles. When you mention the vpn server will need to have the default gateway set to the ADC does this mean to the ship address or the virtual server?
Richard M. Hicks
/ June 13, 2020I typically use the load balancer interface and not the VIP. 🙂
Greg
/ June 13, 2020Thanks sorry I made a typo I meant snip address not ship
Paddy berger
/ June 19, 2020Hi Richard, not quite sure what is happening but let me try to explain. We had configured a service rather than service group and all was working however on the odd occasion clients would get the 809 error. Decided to try the above method, created the service group and when the tick box “Use Client IP” is selected I get error 809 from the client, unticked and clients can connect. Not sure why though?
Richard M. Hicks
/ June 22, 2020When you select the option to ‘Use Client IP’ on the Citrix NetScaler ADC you also have to change the default gateway on your VPN server to point to the NetScaler’s SNIP. 🙂
Paddy Berger
/ June 30, 2020Is there any other way of getting that option to work without it having the NetScaler SNIP. We have a netscaler but all it does it push traffic to the firewall. It is the firewall which know about all the routes, dmz, etc. I have already added the regkey but that didn’t really help much.
Richard M. Hicks
/ July 1, 2020Not sure. You can experiment with other configurations but using the NetScaler SNIP always works.
Colin
/ July 9, 2020Hi Richard,
Thanks for providing this information.
On the RRAS server-side is any special configuration needed for two (or more) load-balanced servers to allow them to be part of a NetScaler LB group? Or do we simply install and configure them as if they were both standalone AOVPN servers and NetScaler can just balance them?
Thanks,
Colin.
Richard M. Hicks
/ July 10, 2020No special configuration on the RRAS servers is required, other than to ensure they both have unique IP addresses pools configured to assign to clients. The redundant VPN servers have no knowledge of any other VPN servers in the pool at all.
swedesolutions
/ August 12, 2020We are using this combo for some time. But somehow the user sessions are messed up. When a user connects to the VPN it is using the VPN01 at first. But after it recieved an NPS session it is getting redirected to the VPN02. This without NPS Session so no policy’s are applied. We have open sessions for more then 100 hours. Even if we configured a maximum of 720 minutes.
Any ideas what this could be?
Richard M. Hicks
/ August 13, 2020Not sure to be honest. Sounds like it might be related to VPN server load balancing and not NPS though.
@MaheshSA78 (@MaheshSA78)
/ September 30, 2020Hi Richard,
While using Netscaler LB for IKE v2 load balancing and making LB VIP as gateway in VPN Server, how does the reverse traffic flow happens and what is expected to be seen in firewall while traffic is going out?
We observed that, out of 2 VPN servers in LB- one server’s response is going out to client with its own IP while other server responding through LB’s VIP.
VPN Servers are Windows 2016 RRAS Servers.
Richard M. Hicks
/ September 30, 2020That’s odd. I’m not sure to be honest, and it probably depends on how the NetScaler is configured. However, I’d expect them both to be the same. Also, as the NetScaler is just forwarding packets, I’d expect the source to be VPN server, not the load balancer SNIP or VIP.
Matt Riddler
/ February 16, 2021Hello, Did you ever get this working? I have the same. Connection through an adc lb, gets pushed to one of the vpn servers & then that is attempting to reply to the original request, but this is getting blocked, as it is not coming from the VIP address. I have tried the default gateway to be set to the proper gateway, the vip & the SNIP. Same happens on all. I put in a kemp load balancer & 30 seconds later working load balancer. But we are big netscaler users, so can only use this.
Dany Demers
/ June 10, 2021Hi, you should put 2 monitor on your UDP service group (ping and UDP) both with a weight of 1 and change the monitor threshold to 2 under “threshold and timeout” this way your device has to answer correctly to UDP request and to ping to be considered up. UPD is good for when your device is up to make sure the backend device is working properly but if your device goes down you need the ping monitor to fail as UPD will return “true” if there is no reply. This make a more reliable setup.
Richard M. Hicks
/ June 10, 2021Thanks for the tip! Anything that makes monitoring IKEv2 more reliable is welcome. 🙂
marc woollard
/ December 3, 2021Hi do you have to set persistency on the VIP itself for the IKEv2? As we at the moment have it as source IP 30 mins?
Richard M. Hicks
/ December 3, 2021That should work. The only exception is the Kemp load balancer that has a bug which requires the persistence to be set much higher.
Matt Vest
/ December 21, 2021Hi Richard,
Excellent blog, your posts are incredibly helpful for getting Remote Access working.
I’ve set up an Always On VPN using NetScaler as the load balancer, but have found that failover doesn’t work as expected. Connections are not interrupted when a server is taken down unless the client take some type of action to re-initiate the VPN connection. Is this normal, or do I have an error in the load balancer configuration somewhere?
Thanks in advance for any insight you can offer!
Richard M. Hicks
/ December 21, 2021Which VPN protocol are you having trouble with? IKEv2 or SSTP? Both?
Mike Mackin
/ March 8, 2022Has anyone resolved this issue? We are having the same problem here when we down a server.
.
Mike Mackin
/ March 8, 2022Did you manage to resolve this issue? We are experiencing the same in our Environment.
Matt Vest
/ March 10, 2022Hi Mike,
We actually opened a ticket with Citrix, but they didn’t identify any issues with our configuration, and said it should work. After we did some troubleshooting, the configuration started working (properly failing over when one VPN server is taken offline.) The only explanation that we have is that the appliances were rebooted between our initial configuration and testing. I realize that is not incredibly helpful, but it’s all I can offer at this point, outside of opening a ticket with Citrix to troubleshoot.
Mike Mackin
/ March 15, 2022Thanks for the reply Matt. Can I ask if you are also using MFA on your user Tunnel, and if so does this work?
When i down a server, the Device Tunnel comes back up, and it hits the other server. However the User Tunnel goes into all i can explain is an MFA Loop, where it asks me to approve the connection 3 Times, and then just fails to connect.
I do see logs on the Radius Box that the connection has been approved, but yet it refuses to connect until I restart the Laptop.
Does the laptop store some kind of session key the very first time it connects? and is it expecting the same one back when it tries to reconnect?
I have this logged with MS but they have not come back to me as yet, so any help is greatly appreciated.
Thanks in advance,
Mike.
Chris
/ May 18, 2023Hi Richard,
First thanks for all the articles, they have helped me out so much.
I have a quirk very simila to Paddy Berger further up the comments. We have a NetScaler LB routing traffic to VPN servers in our DMZ. The VPN servers have two NIC’s, one externally facing with the NS SNIP as the gateway, and one internally facing without a gateway. VPN connections work fine and we have a good user experience.
The issue I have is because we have the NS SNIP set as the gateway on the externally facing NIC we are not able to get internet access on the VPN server itself. We have a number of cloud based services we used to monitor our servers and none of these are connecting as a result of this. Not sure if it is more of a NetScaler routing issue but we’re kind of stuck. When you configured the NS SNIP as the default gateway did you then have to configure any routes on the NetScaler itself?
Thanking you in advance
Richard M. Hicks
/ May 18, 2023Hi Chris. That’s unusual. When I set this up for my customers, I never do anything specific to allow Internet access. It always seems to “just work”. If you reach out to me directly, I’d be happy to take a look at things together and troubleshoot. 🙂
Nowayout
/ February 19, 2024I’d check your firewall, with mine it was showing the DMZ ip of the server coming in on the inside interface of the firewall due to the fact it was routing via the loadbalancer instead of direct to the firewall
Chris
/ September 5, 2024Thanks for the reply. Yes it was FW rule related, we had to allow traffic to go to and from the SNIP. Thanks for the comment.
Chris
/ September 5, 2024Revisiting this article now our uptake of AOVPN is in full swing. Another strange quirk I found is we have four VPN servers load balanced behind the Netscaler, and configured as per the article (thanks for that!).
All servers are set with equal weighting of 1, however in our experience 75% of connections seem to just choose to hit the one same VPN server, and the remaining 25% is distributed between the other three VPN servers.
If we take that server out of the pool, the same behaviour will occur, but on one of the other VPN servers. Returning the first server to the pool does not then revert the load back to it.
I’m not expecting exact numbers to be evenly distributed on each server, but there is a huge rise on a single server, with the result being we run out of IP addresses to assign from that server.
Richard M. Hicks
/ September 6, 2024That’s interesting. Something to consider is that many ISPs are using Carrier-Grade NAT (CGNAT) or Large Scale NAT (LSN). If you have many clients connecting from the same ISP it can result in uneven traffic distribution. Not sure if that’s the issue but it might explain the results you are seeing.
Chris
/ September 16, 2024Thank you for taking the time to respond. The NAT theory you proposed lines up with the user experience we are seeing at the moment, whereby a number of users are connecting via the same external IP address used by our ISP.