Always On VPN IKEv2 Load Balancing and NAT

Always On VPN IKEv2 Load Balancing and NATOver the last few weeks, I’ve worked with numerous organizations and individuals troubleshooting connectivity and performance issues associated with Windows 10 Always On VPN, and specifically connections using the Internet Key Exchange version 2 (IKEv2) VPN protocol. An issue that appears with some regularity is when Windows 10 clients fail to connect with error 809. In this scenario, the server will accept connections without issue for a period of time and then suddenly stop accepting requests. When this happens, existing connections continue to work without issue in most cases. Frequently this occurs with Windows Server Routing and Remote Access Service (RRAS) servers configured in a clustered array behind an External Load Balancer (ELB).

Network Address Translation

It is not uncommon to use Network Address Translation (NAT) when configuring Always On VPN. In fact, for most deployments the public IP address for the VPN server resides not on the VPN server, but on an edge firewall or load balancer connected directly to the Internet. The firewall/load balancer is then configured to translate the destination address to the private IP address assigned to the VPN server in the perimeter/DMZ or the internal network. This is known a Destination NAT (DNAT). Using this configuration, the client’s original source IP address is left intact. This configuration presents no issues for Always On VPN.

Source Address Translation

When troubleshooting these issues, the common denominator seems to be the use of Full NAT, which includes translating the source address in addition to the destination. This results in VPN client requests arriving at the VPN server as appearing not to come from the client’s original IP address, but the IP address of the network device (firewall or load balancer) that is translating the request. Full NAT may be explicitly configured by an administrator, or in the case of many load balancers, configured implicitly because the load balancer is effectively proxying the connection.

Known Issues

IKEv2 VPN connections use IPsec for encryption, and by default, Windows limits the number of IPsec Security Associations (SAs) coming from a single IP address. When a NAT device is performing destination/full NAT, the VPN server sees all inbound IKEv2 VPN requests as coming from the same IP address. When this happens, clients connecting using IKEv2 may fail to connect, most commonly when the server is under moderate to heavy load.

Resolution

The way to resolve this issue is to ensure that any load balancers or NAT devices are not translating the source address but are performing destination NAT only. The following is configuration guidance for F5, Citrix ADC (formerly NetScaler), and Kemp load balancers.

F5

On the F5 BIG-IP load balancer, navigate to the Properties > Configuration page of the IKEv2 UDP 500 virtual server and choose None from the Source Address Translation drop-down list. Repeat this step for the IKEv2 UDP 4500 virtual server.

Always On VPN IKEv2 Load Balancing and NAT

Citrix ADC

On the Citrix ADC load balancer, navigate to System > Settings > Configure Modes and check the option to Use Subnet IP.

Always On VPN IKEv2 Load Balancing and NAT

Next, navigate to Traffic Management > Load Balancing > Service Groups and select the IKEv2 UDP 500 service group. In the Settings section click edit and select Use Client IP. Repeat these steps for the IKEv2 UDP 4500 service group.

Always On VPN IKEv2 Load Balancing and NAT

Kemp

On the Kemp LoadMaster load balancer, navigate to Virtual Services > View/Modify Services and click Modify on the IKEv2 UDP 500 virtual service. Expand Standard Options and select Transparency. Repeat this step for the IKEv2 UDP 4500 virtual service.

Always On VPN IKEv2 Load Balancing and NAT

Caveat

Making the changes above may introduce routing issues in your environment. When configuring these settings, it may be necessary to configure the VPN server’s default gateway to use the load balancer to ensure proper routing. If this is not possible, consider implementing the workaround below.

Workaround

To fully resolve this issue the above changes should be made to ensure the VPN server can see the client’s original source IP address. If that’s not possible for any reason, the following registry key can be configured to increase the number of established SAs from a single IP address. Be advised this is only a partial workaround and may not fully eliminate failed IKEv2 connections. There are other settings in Windows that can prevent multiple connections from a single IP address which are not adjustable at this time.

To implement this registry change, open an elevated PowerShell command window on the RRAS server and run the following commands. Repeat these commands on all RRAS servers in the organization.

New-ItemProperty -Path ‘HKLM:SYSTEM\CurrentControlSet\Services\IKEEXT\Parameters\’ -Name IkeNumEstablishedForInitialQuery -PropertyType DWORD -Value 50000 -Force

Restart-Service IKEEXT -Force -PassThru

Additional Information

IPsec Traffic May Be Blocked When A Computer is Behind a Load Balancer

Windows 10 Always On VPN IKEv2 Load Balancing with Citrix NetScaler ADC

Windows 10 Always On VPN IKEv2 Load Balancing with F5 BIG-IP

Windows 10 Always On VPN IKEv2 Load Balancing with Kemp LoadMaster

Troubleshooting Always On VPN Error Code 809

When testing an Always On VPN connection, the administrator may encounter a scenario where the VPN client fails to connect to the VPN server. On the Windows 10 client the error message states the following.

“Can’t connect to [connection name]. The network connection between your computer and the VPN server could not be established because the remote server is not responding. This could be because one of the network devices (e.g. firewalls, NAT, routers, etc.) between your computer and the remote server is not configured to allow VPN connections. Please contact your Administrator or your service provider to determine which device may be causing the problem.”

Always On VPN and IKEv2 Fragmentation

In addition, the Application event log records an error message with Event ID 20227 from the RasClient source. The error message states the following.

“The User [username] dialed a connection named [connection name] with has failed. The error code returned on failure is 809.”

Troubleshooting Always On VPN Error Code 809

Connection Timeout

The error code 809 indicates a VPN timeout, meaning the VPN server failed to respond. Often this is related directly to network connectivity, but sometimes other factors can come in to play.

Troubleshooting VPN Error Code 809

When troubleshooting VPN error code 809 the following items should be carefully checked.

  • Name Resolution – Ensure the VPN server’s public hostname resolves to the correct IP address.
  • Firewall Configuration – Confirm the edge firewall is configured properly. Inbound TCP port 443 is required for the Secure Socket Tunneling Protocol (SSTP) and inbound UDP ports 500 and 4500 are required for the Internet Key Exchange version 2 (IKEv2) protocol. Make sure that any NAT rules are forwarding traffic to the correct server.
  • Load Balancer Configuration – If VPN servers are located behind a load balancer, make certain that virtual IP address and ports are configured correctly and that health checks are passing. For IKEv2 specifically, it is crucial that UDP ports 500 and 4500 be delivered to the same backend server. This commonly requires custom configuration. For example, on the KEMP LoadMaster the administrator will configure “port following”. On the F5 BIG-IP a  custom “persistence profile” must be configured. On the Citrix NetScaler a “persistency group” must be defined.

IKEv2 Fragmentation

VPN error code 809 can also be caused by IKE fragmentation when using the IKEv2 VPN protocol. During IKEv2 connection establishment, payload sizes may exceed the IP Maximum Transmission Unit (MTU) for the network path between the client and server. This causes the IP packets to be fragmented. However, it is not uncommon for intermediary devices (routers, NAT devices, or firewalls) to block IP fragments. When this occurs, a VPN connection cannot be established. However, looking at a network trace of the connection attempt, the administrator will see that the connection begins but subsequently fails.

Troubleshooting Always On VPN Error Code 809

Enable IKEv2 Fragmentation Support

The IKEv2 protocol includes support for fragmenting packets at the IKE layer. This eliminates the need for fragmenting packets at the IP layer. IKEv2 fragmentation must be configured on both the client and server.

Client

IKEv2 fragmentation was introduced in Windows 10 1803 and is enabled by default. No client-side configuration is required.

Server

IKEv2 is commonly supported on many firewall and VPN devices. Consult the vendor’s documentation for configuration guidance. For Windows Server Routing and Remote Access (RRAS) servers, IKEv2 fragmentation was introduced in Windows Server 1803 and is also supported in Windows Server 2019. It is enabled via a registry key. The following PowerShell command can be used to enable IKEv2 fragmentation on supported servers.

New-ItemProperty -Path “HKLM:\SYSTEM\CurrentControlSet\Services\RemoteAccess\Parameters\Ikev2\” -Name EnableServerFragmentation -PropertyType DWORD -Value 1 -Force

Validation

Once IKEv2 fragmentation is configured on the VPN server, a network capture will reveal the IKE_SA_INIT packet now includes the IKEV2_FRAGMENTATION_SUPPORTED notification message.

Always On VPN and IKEv2 Fragmentation

Additional Information

Windows 10 Always On VPN and IKEv2 Fragmentation

Windows 10 Always On VPN IKEv2 Security Configuration

Windows 10 Always On VPN Hands-On Training Classes

Always On VPN and IKEv2 Fragmentation

The IKEv2 protocol is a popular choice when designing an Always On VPN solution. When configured correctly it provides the best security compared to other protocols. The protocol is not without some unique challenges, however. IKEv2 is often blocked by firewalls, which can prevent connectivity. Another lesser know issue with IKEv2 is that of fragmentation. This can result in failed connectivity that can be difficult to troubleshoot.

IP Fragmentation

IKEv2 uses UDP for transport, and typically most packets are relatively small. The exception to this is when authentication takes place, especially when using client certificate authentication. The problem is further complicated by long certificate chains and by RSA keys, especially those that are greater than 2048 bit. If the payload exceeds 1500 bytes, the IP packet will have to be broken in to smaller fragments to be sent over the network. If an intermediary device in the path is configured to use a smaller Maximum Transmission Unit (MTU), that device may fragment the IP packets.

IP Fragmentation and Firewalls

Many routers and firewalls are configured to drop IP fragments by default. When this happens, IKEv2 communication may begin initially, but subsequently fail. This typically results in an error code 809 with a message stating the following.

“Can’t connect to [connection name]. The network connection between your computer and the VPN server could not be established because the remote server is not responding. This could be because one of the network devices (e.g. firewalls, NAT, routers, etc.) between your computer and the remote server is not configured to allow VPN connections. Please contact your Administrator or your service provider to determine which device may be causing the problem.”

Always On VPN and IKEv2 Fragmentation

Troubleshooting

When troubleshooting potential IKEv2 fragmentation-related connection failures, a network trace should be taken of the connection attempt on the client. Observe the packet sizes during the conversation, especially IKE_AUTH packets. Packet sizes exceeding the path MTU will have to be fragmented, as shown here.

Always On VPN and IKEv2 Fragmentation

Measuring Path MTU

Measuring the path MTU between the client and server can be helpful when troubleshooting fragmentation related issues. The mtupath.exe utility is an excellent and easy to use tool for this task. The tool can be downloaded here.

Always On VPN and IKEv2 Fragmentation

IKEv2 Fragmentation

To address the challenges with IP fragmentation and potential connectivity issues associated with network devices dropping fragmented packets, the IKEv2 protocol itself can be configured to perform fragmentation at the IKE layer. This eliminates the need for IP layer fragmentation, resulting in better reliability for IKEv2 VPN connections.

Both the server and the client must support IKEv2 fragmentation for this to occur. Many firewall and VPN vendors include support for IKEv2 fragmentation. Consult the vendor’s documentation for configuration guidance. For Windows Server Routing and Remote Access (RRAS) servers, the feature was first introduced in Windows Server 1803 and is supported in Windows Server 2019. Windows 10 clients support IKEv2 fragmentation beginning with Windows 10 1803.

Enabling IKEv2 Fragmentation

Windows 10 clients support IKEv2 fragmentation by default. However, it must be enabled on the server via the registry. The following PowerShell command will enable IKEv2 fragmentation support on Windows Server 1803 and later.

New-ItemProperty -Path “HKLM:\SYSTEM\CurrentControlSet\Services\RemoteAccess\Parameters\Ikev2\” -Name EnableServerFragmentation -PropertyType DWORD -Value 1 -Force

A PowerShell script to implement IKEv2 fragmentation can be found on my GitHub here.

Validation Testing

Once IKEv2 fragmentation is configured on the VPN server, a network capture will reveal the IKE_SA_INIT packet now includes the IKEV2_FRAGMENTATION_SUPPORTED notification message.

Always On VPN and IKEv2 Fragmentation

Additional Information

Windows 10 Always On VPN IKEv2 Security Configuration

RFC 7383 – IKEv2 Message Fragmentation

IEA Software MTU Path Scan Utility

Windows 10 Always On VPN Hands-On Training Classes