Eliminating single points of failure is crucial to ensuring the highest levels of availability for any remote access solution. For Windows 10 Always On VPN deployments, the Windows Server 2016 Routing and Remote Access Service (RRAS) and Network Policy Server (NPS) servers can be load balanced to provide redundancy and high availability within a single datacenter. Additional RRAS and NPS servers can be deployed in another datacenter or in Azure to provide geographic redundancy if one datacenter is unavailable, or to provide access to VPN servers based on the location of the client.
Multisite Always On VPN
Unlike DirectAccess, Windows 10 Always On VPN does not natively include support for multisite. However, enabling multisite geographic redundancy can be implemented using Azure Traffic Manager.
Azure Traffic Manager
Traffic Manager is part of Microsoft’s Azure public cloud solution. It provides Global Server Load Balancing (GSLB) functionality by resolving DNS queries for the VPN public hostname to an IP address of the most optimal VPN server.
Advantages and Disadvantages
Using Azure Traffic manager has some benefits, but it is not with some drawbacks.
Advantages – Azure Traffic Manager is easy to configure and use. It requires no proprietary hardware to procure, manage, and support.
Disadvantages – Azure Traffic Manager offers only limited health check options. Today, only HTTP, HTTP, and TCP protocols can be used to perform endpoint health checks. There is no option to use UDP or PING, making monitoring for IKEv2 a challenge.
Note: This scenario assumes that RRAS with Secure Socket Tunneling Protocol (SSTP) or another third-party TLS-based VPN server is in use. If IKEv2 is to be supported exclusively, it will still be necessary to publish an HTTP or HTTPS-based service for Azure Traffic Manager to monitor site availability.
Traffic Routing Methods
Azure Traffic Manager provide four different methods for routing traffic.
Priority – Select this option to provide active/passive failover. A primary VPN server is defined to which all traffic is routed. If the primary server is unavailable, traffic will be routed to another backup server.
Weighted – Select this option to provide active/active failover. Traffic is routed to all VPN servers equally, or unequally if desired. The administrator defines the percentage of traffic routed to each server.
Performance – Select this option to route traffic to the VPN server with the lowest latency. This ensures VPN clients connect to the server that responds the quickest.
Geographic – Select this option to route traffic to a VPN server based on the VPN client’s physical location.
Multivalue – Select this option when endpoints must use IPv4 or IPv6 addresses.
Subnet – Select this option to map DNS responses to the client’s source IP address.
Configure Azure Traffic Manager
Open the Azure management portal and follow the steps below to configure Azure Traffic Manager for multisite Windows 10 Always On VPN.
Create a Traffic Manager Resource
- Click Create a resource.
- Click Networking.
- Click Traffic Manager profile.
Create a Traffic Manager Profile
- Enter a unique name for the Traffic Manager profile.
- Select an appropriate routing method (described above).
- Select a subscription.
- Create or select a resource group.
- Select a resource group location.
- Click Create.
Important Note: The name of the Traffic Manager profile cannot be used by VPN clients to connect to the VPN server, since a TLS certificate cannot be obtained for the trafficmanager.net domain. Instead, create a CNAME DNS record that points to the Traffic Manager FQDN and ensure that name matches the subject or a Subject Alternative Name (SAN) entry on the VPN server’s TLS and/or IKEv2 certificates.
Endpoint Monitoring
Open the newly created Traffic Manager profile and perform the following tasks to enable endpoint monitoring.
- Click Configuration.
- Select HTTPS from the Protocol drop-down list.
- Enter 443 in the Port field.
- Enter /sra_%7BBA195980-CD49-458b-9E23-C84EE0ADCD75%7D/ in the Path field.
- Enter 401-401 in the Expected Status Code Ranges field.
- Update any additional settings, such as DNS TTL, probing interval, tolerated number of failures, and probe timeout, as required.
- Click Save.
Endpoint Configuration
Follow the steps below to add VPN endpoints to the Traffic Manager profile.
- Click Endpoints.
- Click Add.
- Select External Endpoint from the Type drop-down list.
- Enter a descriptive name for the endpoint.
- Enter the Fully Qualified Domain Name (FQDN) or the IP address of the first VPN server.
- Select a geography from the Location drop-down list.
- Click OK.
- Repeat the steps above for any additional datacenters where VPN servers are deployed.
Summary
Implementing multisite by placing VPN servers is multiple physical locations will ensure that VPN connections can be established successfully even when an entire datacenter is offline. In addition, active/active scenarios can be implemented, where VPN client connections can be routed to the most optimal datacenter based on a variety of parameters, including current server load or the client’s current location.
Nori
/ July 30, 2018Since RRAS is not supported on VMs in Azure, would you recommend using Azure P2S VPN as an endpoint for Always On VPN?
Richard M. Hicks
/ July 31, 2018You’re right, RRAS isn’t formally supported in Azure, but it does work. However, you certainly can use the Azure VPN gateway as the VPN server to support Always On VPN if you choose. 🙂
Eric Yew
/ July 31, 2018Does RRAS with IKEv2 for AlwaysOn VPN still work in Azure? I know it used to work for me in the past, but now I can’t get it to work anymore. Logging a support call with Microsoft, we end up with the conclusion the IP Protocol 50 is being blocked.
Richard M. Hicks
/ August 2, 2018It should. You’ll need to allow inbound UDP ports 500 and 4500. IP protocol 50 is not required for it to work.
Eric Yew
/ August 7, 2018Isn’t ESP (IP Protocol 50) required for IKEv2 IPSec VPN?
https://feedback.azure.com/forums/217313-networking/suggestions/32209003-support-for-ikev2-vpn-clients-to-connect-to-an-azu
Richard M. Hicks
/ August 7, 2018If the VPN server is behind a NAT, then UDP ports 500 and 4500 are all that’s required to be open. If your VPN server is not behind a NAT, then IP protocol 50 is required.
Eric Yew
/ July 31, 2018I find that using both device tunnel and user tunnel with traffic manager (or any DNS type load balancer) does not work well. If a client connects to VPN server 1 with device tunnel and then it connects with user tunnel, but to the VPN server 2, 1 of the VPN server will have corrupted routing and only way to fix it is to reboot. Happened to 2 of our customers so far. Have you experience this yourself?
Richard M. Hicks
/ August 2, 2018Not encountered that myself, but good to know. I’m not particularly enamored with the device tunnel at this point, so I try to avoid its use as much as possible. 🙂
Nori
/ August 2, 2018Interesting. Could you elaborate further why you don’t like the device tunnel? I haven’t deployed it, but I was “Microsoft super-excited” when I heard about it.
Richard M. Hicks
/ August 7, 2018The device tunnel is not authenticated as well as I would like it to be. It requires only a machine certificate, unlike DirectAccess where a certificate and a computer account in Active Directory are required. In addition, there are still some reliability issues with the device tunnel that are frustrating. :/
Eric Yew
/ August 7, 2018Same here! unfortunately, one of our customer needs it as they have a password reset tool on their login screen which was accessible via direct access and they wanted like for like functionality. So they have decided to go with only device tunnel for now.
bargi
/ August 8, 2018Interested to hear your experience with device tunnel and AOVPN in general
.
We started with the Device tunnel at the start of the year with 1709 and then noticed the tunnel dropping after login and svchost.exe_RasMan crashing.(no User tunnel configured, just device)
At the time MS said it “should” be fixed with 1803 couldn’t confirm it actually would. Not started rolling out 1803 yet with all the issues there’s been with the build.
As a work around we swapped to User tunnel and User certs which we thought was working fine. But then noticed a worrying number of users who couldn’t connect with Windows saying there was no valid certificate. Everyone with the problem actually had a valid certificate, for some reason Window or rasdialer couldn’t see it but certman can. Deleting the cert and running GPUpdate to pull down a new one fixes the problem immediately. I compared both the old and new cert and no differences other than the obvious.(thumbprint etc)
So as even a hackier work around we’re rolling out a User VPN but authenticating with Computer Certificate as we’ve yet to see any issues with the Computer certificate.
Agree using Computer certificate authentication is not the best as it completely bypasses NPS/RADIUS servers and any policies.
Richard M. Hicks
/ August 8, 20181803 is much improved in that regard. I’ve not experienced the issue with the device tunnel dropping when the user tunnel is established. However, I’m still hearing reports (and experiencing myself) of overall tunnel instability. Most commonly the device tunnel/user tunnel aren’t re-establishing automatically after a network status change (e.g. moving from one network to another or coming out of sleep mode). Hoping that Microsoft is resolving some of these in vNext for sure. 🙂
bargi
/ August 8, 2018btw here’s another bug that MS support took 5 months to reproduce and acknowledge but then said I’d need to pay for it to be looked at any further
You can only have 1 User AOVPN with Auto connect enabled at one time per Computer.
eg: Log on as User 1, create a User AOVPN connection, it works as expected and “Automatically Connect” is ticked, log off.
Log on as User 2, create a User AOVPN connection, it works as expected and “Automatically Connect” is ticked, log off.
Log on as User 1 and User AOVPN does not connect and “Automatically Connect” is unticked. Tick it, VPN connects as expected, log off.
Log on as User 2 and User AOVPN does not connect and “Automatically Connect” is unticked. etc, etc
Granted this generally isn’t an issue as the majority of the time as laptops generally are used by a single person
bargi
/ August 8, 2018As a final rant (I promise!)
Does anyone else find their RRAS servers full of ghost/orphaned connections?
Tried adjusting timeouts and other settings to get them to clear but the only thing I found is to either restart RRAS service nightly.
Richard M. Hicks
/ August 8, 2018Are you using IKEv2 or SSTP? Both? That’s not uncommon for IKEv2, really. If a client gets disconnected (loses network connectivity for any reason) the server will keep the SA alive for a period of time in case the client wants to reconnect. However, if the client establishes a completely new connection, then it doesn’t reuse the old ones and they’ll appear as orphaned. They should die out over time though. How long do they seem to hang around for?
bargi
/ August 1, 2018Another work around for IKE exclusive setups is to enable PPTP and open TCP 1723 to the RRAS server and use this for the Healthcheck.
So PPTP can’t actually be used, lock it down by unticking “Remote Access connections (Inbound only)” and “Demand-dial routing connections (inbound and outbound) for the PPTP properties under Ports.
To further secure it, configure your firewall to only allow connections from the MS health check servers defined below.
https://azuretrafficmanagerdata.blob.core.windows.net/probes/azure/probe-ip-ranges.json
If you’re firewall doesn’t support updating automatically from the page just sign up to a free webpage monitoring site and have it email you when it changes.
Benefit is the check is directly related to the RRAS service being up/down.
bargi
/ August 1, 2018correction to above, you need at least “Remote Access connections (Inbound only)” enabled for RRAS to open the port.
Richard M. Hicks
/ August 2, 2018Thanks for the tips, Raymond. Much appreciated! Definitely a good idea. 🙂
ND
/ November 12, 2018As an alternative to Azure Traffic Manager, should it be possible if we have say 2 AO VPN servers using IKEv2 in separate geographical locations to use a Citrix Netscaler for GSLB? I’ve had a look at the Citrix documentation, AO VPN documentation but can’t see anything definitive. I’ve seen that Kemp devices can perform this function, but nothing confirming if Citrix can?
Thanks,
ND
Richard M. Hicks
/ November 17, 2018Absolutely! I’ve used Azure Traffic Manager and Amazon Route 53 for DirectAccess and Always On VPN global load balancing, but I find the GSLB functionality included in most popular load balancers/ADCs (F5, NetScaler, KEMP, etc.) offer more features and better granularity/control. 🙂
Jeroen
/ November 27, 2018First of all thanks for your great blogs!
I’ve setup an Always On User VPN PoC for multiple countries on a single URL with Azure traffic manager which works great. For now each country has a local VPN and NPS server.
I’ve created nested geo/priority profiles in traffic manager so when a country goes down you will be routed to another country. In the VPN profile you need to specify the NPS server. how can i make that high available or is there some way to remove the NPS server from the profile and add more radius servers to the VPN servers config?
Richard M. Hicks
/ November 29, 2018You have a couple of options. On the client-side you could add all of the NPS servers in the organization to the VPN profile. Alternatively you could simply disable NPS server validation which would allow the client to connect to any NPS server.
Jeroen
/ December 4, 2018Great that helped me in the right direction. Now have failover between multiple countries with a single VPN profile combined with Azure Traffic Manager.
Merwin
/ May 14, 2019Hi, do you know how did you fix “ghost/orphaned connections” from RAS Server? i am also seeing them and not sure how to fix them
thanks
Richard M. Hicks
/ May 14, 2019They will eventually die off on their own. If you want to see them removed immediately you could just restart the RemoteAccess service. Keep in mind that will terminate all connections, but the Always On connections will reconnect automatically of course. If you want to be more selective you can use PowerShell and the Get-RemoteAccessConnectionStatistics command to identify and remove stale connections as well.
David
/ June 24, 2019Hi Richard,
Correct me if i am wrong, but the information regarding The Azure Traffic Manager is out of date,
I believe it now supports “custom headers” and custom “Expected status codes” you can have up to 8 pairs of custom headers and 8 sets of ranges for expected status codes.
Richard M. Hicks
/ June 24, 2019You may be right. 🙂 Things change rapidly in the cloud which makes writing about them challenging! I typically try to preface things with something like “at the time of this writing” just to hedge. 😉 I’ll have a closer look though and update the post as required. Thanks for bringing this to my attention!
Richard M. Hicks
/ June 24, 2019Update: I took another look at this and the guidance in this post still stands (as of June 24, 2019). Although Azure Traffic Manager does allow you to provide an “expected status code”, it only accepts 200-299 and 301. For Always On VPN and SSTP, the expected code is 401. So, until Microsoft adds support for 401 response codes when using HTTPS, we’re stuck with having to use TCP instead.
oderbang
/ November 5, 2020Hi Richard, I have revisited Azure Traffic Manager and they do now support 401 (wohoo), however it seems in order to get the correct 401 response from a SSTP VPN you need to set the path to “/sra_%7BBA195980-CD49-458b-9E23-C84EE0ADCD75%7D/” for a SSTP to respond
I pieced this information together from load balancing setup guide.
I have tested this and works great.
Richard M. Hicks
/ November 6, 2020That’s great to hear, thanks for the tip! The last time I checked (a few months ago) it still didn’t support 401. I will do some testing and update my guidance for sure. Thanks again!
oderbang
/ February 5, 2021Hi Richard,
Just a follow up to let you know we have been running this now for the last 3 months and it has worked flawlessly (using 401 error to detect service health).
It is note worthy to say that this really only “truly” works if you are using SSTP exclusively.
We introduced IKev2 with SSTP backup and of course thoes users with less than reliable internet/WiFi experienced issues to do with session affinity as the Traffic manager load balanced the requests.. we have moved to a priority based Traffic Manager so that we have fault tolerance but no “load balancing”… unless you know of a way for multiple RRAS Ikev2 servers to “share” session data? I think that is our best solution without buying an actual Layer 7 Load balancer!
One other thing to note for others wanting to do this is: If you DO use multiple protocols, a traffic manager is only able to check the health of SSTP as its the only protocol that operates on a port the traffic manager can check (443). Theoretically this means if your IKev2 fails but SSTP is still up the service endpoint will still be marked as healthy.
To get around this we configured the clients to use ikev2 with SSTP backup, (aka VPNStratagy:14) so even if Ikev2 connection fails the VPN is still usable.
Hope this information helps someone!
Richard M. Hicks
/ February 5, 2021Thanks for the insight! 🙂
James Hawksworth
/ June 10, 2021I don’t appear to have an option to choose 401 either, I’m intrigued how oderbang has this option, but WordPress doesn’t let me reply to messages 3 levels down… D’oh! Anything you’re end, Rich?
Richard M. Hicks
/ June 10, 2021Unfortunately, 401 HTTP response codes aren’t accepted as valid when using Azure Traffic Manager. I’d hear rumors that Microsoft was going to add it, but it’s not there today. :/
James Hawksworth
/ June 11, 2021I got this working with 401 afterwards – The ATM config expects a range, so entering “401-401” works using oderbang’s path to check SSTP status 🙂
Richard M. Hicks
/ June 11, 2021That’s interesting! Can you send me a screenshot of your configuration? I’ll definitely do some testing with this soon.
oderbang
/ June 15, 2021I can send a screen shot.. let me know where to send it and ill get it to you.
Richard M. Hicks
/ June 15, 2021No worries, James already sent me one. Thanks!
Vojin
/ November 12, 2019Isn’t RRAS unsupported in Azure IaaS?
Richard M. Hicks
/ November 13, 2019Correct. Windows Server Routing and Remote Access Service (RRAS) is not a formally supported workload running on Windows Server in Azure. However, RRAS does work well in Azure and I’ve deployed it numerous times. 🙂
victor bassey
/ January 4, 2020Hello Richard, have you ever tried load balancing IKEv2 UDP ports 500 &4500 using Azure Traffic Manager? It seems ATM can only be used to load balance TCP based protocol e.g. SSTP and not UDP ones?
Richard M. Hicks
/ January 4, 2020Yes, I do it all the time. 🙂 To be clear though, you don’t actually load balance UDP traffic with Azure Traffic Manager. Azure Traffic Manager is nothing more than intelligent DNS. The client resolves the VPN FQDN to Azure Traffic Manager, and Traffic Manager returns an IP address based on your configuration. What you might be referring to is the workload monitoring. Azure Traffic Manager does not support UDP health checks, only TCP. So, when using Azure Traffic Manager you have to use ICMP/PING to monitor the endpoint.
Hope that helps!
victor bassey
/ January 4, 2020Thanks Richard. So by just configuring ATM profile say vpn1.trafficmanager.net with icmp or tcp 443 for probing
Once I add the endpoints, ATM will pass traffic for any port and just use icmp or tcp443 for monitoring?
Richard M. Hicks
/ January 6, 2020Correct. And again, technically speaking, the Azure Traffic Manager doesn’t pass any traffic. It’s just DNS. vpn1.trafficmanage.net will return an IP address, but that address might be different depending on how you configure it and which resources are available at the time. 🙂
victor bassey
/ January 6, 2020Got it Richard. Thanks
victor bassey
/ January 5, 2020Hey Richard, Spot on! thanks for the clarification. I was looking at ATM as a proper Load-Balancing appliance doing some sort of port forwarding rather than just an intelligent DNS.
victor bassey
/ January 5, 2020Spot on Richard! Thanks for the clarification. I was looking at ATM as proper port-forwarding Load-Balancer rather than an intelligent DNS like you correctly stated.
Jess
/ February 17, 2020Hi Richard – once we do this do we simply put in the DNS name we get from traffic manager in the VPN connection name/address?
Richard M. Hicks
/ February 17, 2020No. You will have to create a CNAME DNS record that maps to the Azure Traffic Manager FQDN. You’ll do this because the FQDN used by your clients must match the subject name on either the TLS or IKEv2 IPsec certificates installed on the VPN server.
Geoff
/ March 30, 2020Hey Richard, I think your reply here is root of the issue I am having getting Traffic Manager to work with an Azure VPN Gateway based Always On VPN configuration. With the Azure VPN Gateway point-to-site configuration, it automatically generates a hostname/certificate for the connection for azuregateway-blah-blah-blah.vpn.azure.com DNS name.
If I used that DNS name, the connection works fine. If I use my Traffic Manager CNAME record, IKE auth fails because the cert doesn’t match. Do you know how I can fix this with Azure VPN Gateways? I can’t seem to find a way to add a SAN for my CNAME record to the auto-generated Azure Gateway server certificate.
Richard M. Hicks
/ March 31, 2020Right. This is a limitation of the Azure VPN gateway. You do not have the option to install your own certificate. With that, you have to use the subject name they use, which means if you have more than one gateway you can’t use geographic load balancing. Perhaps one day it will be possible to upload our own certificate, but right now it is not an option.
Paddy Berger
/ March 12, 2020Hi Richard,
How would this work if your servers are located on your internal domain and not in Azure? So we have two sites with VPN servers located and NPS servers located, currently we have only tested with one side of the site and working as a single site VPN solution. However we wanted to add redundancy and have added the secondary vpn server in site B.
All using IKEv2 for device and user tunnels.
Thanks
Richard M. Hicks
/ March 12, 2020No different, really. You would configure Azure Traffic Manager to use external endpoints and provide the IP address of each location in the configuration.
Paddy Berger
/ March 12, 2020Hi Richard, also to add to the above. I have attempted to try this for the one site to see if I can connect and I get a failure saying IKE authentication credentials are unacceptable. What I did was add the traffic manager to point to the current vpn server via its external hostname (the one which is currently in the vpn client address), created a new hostname to point to the traffic manager. Added this hostname into VPN server certificate SAN. Added this new hostname into vpn client connection and get that error. Any ideas? Have I followed this correctly?
Richard M. Hicks
/ March 12, 2020As long as whatever name you use to resolve to the Azure Traffic Manager profile is also the subject name of the VPN server certificate (or one of the entries on the SAN list) it should work.
Paddy Berger
/ March 13, 2020Perfect, worked like a charm. I had to delete the existing cert and as soon as I did that, the connection started working.
Paddy Berger
/ April 6, 2020Hi Richard, have been working on this and have hit two problems.
Problem 1: Initially I could connect to the dns name pointing to the traffic manager, however when I now try to connect I get “element not found” or “name of the remote access server did not resolve”. I then added the traffic manager dns into the san and that seems to connect. Once connected the original vpn which has the dns name pointing to the traffic manager also connects, seems like a DNS issue but cannot see where to resolve. Also having the traffic manager dns as the connection name, I recall you said should not be used due to TLS.
Problem 2: (connecting with the traffic manager name)
If the traffic manager goes into degraded the connection still connects and doesn’t automatically disconnect. I followed your guide by adding a simple website on the vpn server, traffic manager see’s the https server, I then did a iisreset /stop and can see traffic manager becomes degraded, yet connection is still live and I am able to disconnect the vpn and reconnect without any problems?
Currently only have one VPN server in traffic manager as still working on the other site, but wanted one side working.
Paddy Berger
/ April 6, 2020Just to update problem 2, I know AOVPN will work even if IIS is stopped, what I wanted to know was the if the service is degraded, should it not stop connections or future connections whilst in this state
Richard M. Hicks
/ April 6, 2020In my experience Azure Traffic Manager would not return DNS records for entry points marked as degraded.
Richard M. Hicks
/ April 6, 2020Not sure what’s up with DNS there, but the only way this works is if your VPN deployment public FQDN is an alias (CNAME) for the Azure Traffic Manager FQDN. If you try to use the Azure Traffic Manager FQDN you will get subject name mismatch errors and SSTP and IKEv2 connections will fail. You should definitely not need to add the Azure Traffic Manager to the SAN list on your certificate. As for Azure Traffic Manager not failing over, make sure the TTL on your DNS CNAME record is set low (60 seconds).
Paddy Berger
/ April 6, 2020Weird things is I added the Azure traffic manager FQDN in the SAN and tried to connect with that name, worked. hence wanted to ask you if this should actually work or not
Richard M. Hicks
/ April 6, 2020I guess if it was for your IKEv2 IPsec certificate issued by your internal CA it would work, or if you are using SSTP with a private PKI issue certificate (not recommended). In those cases yes, I guess it would work. However, it shouldn’t be necessary. If it is working for you then I’d just accept it and move on. 🙂
Paddy Berger
/ April 11, 2020Hi Richard, got to the bottom of the DNS issue. Ok, so I am now at a stage where I have got traffic manager pointing to the two sites i.e. site1.co.uk and site2.co.uk, however when testing by degrading site1, I noticed the vpn does not connect as on the vpn connection itself it points to site1 nps server whereas it is looking for site2 nps server, what is the best method to get past this?
Would you load balance the nps servers or add the site1 and site2 vpn servers into the nps servers of each other too?
Thanks
Richard M. Hicks
/ April 11, 2020Easiest way is to configure the VPN server in each site to to use their local NPS as primary and the remote site as secondary.
Paddy Berger
/ April 14, 2020Hi, I added the secondary NPS server on the VPN Server under Radius Authentication with a score of 20. I then added this as a radius client on the NPS server and reconfigured the “configure vpn or dial up” so that this time both servers are listed. When testing, the connection will still connect to local nps and remote nps will not connect as it will say “The connection was prevented because of a policy configured on your RAS/VPN”.
As a test I removed the local nps server so that only the remote nps is listed in radius authentication and that works, however when both are listed and testing between the two, it would not work.
Richard M. Hicks
/ April 14, 2020When it fails, what does the event log on the NPS server say? It should give you a reason why the connection was rejected.
Matthew
/ April 14, 2020Hi Richard,
Thank you for the incredibly useful documentation. One query that has been raised by our organisation is what happens in the following scenario.
On a IKEv2 VPN (device tunnel) the user’s DNS queries Azure Traffic Manager and finds the appropriate server to connect. I’m aware through Azure documentation that TCP/UDP protocols are treated slightly differently. In this scenario, how is this connection maintained so the user doesn’t end up polling the address and connecting to another server if the connection reaches a time-out? Therefore, the original server never closes the connection and they may experience other issues.
I believed once the connection is established it wouldn’t poll DNS again (although that’s not to say the response wouldn’t change), because the VPN is connected directly to the VPN server. The counterargument to this is that although it’s connected to the server the Traffic Manager CNAME record is still the ‘access’ point so therefore Traffic Manager could effectively give a different response and potentially what effect that would have on the VPN connection I’m not sure.
Many thanks,
Matthew
Richard M. Hicks
/ April 14, 2020Azure Traffic Manager is just a DNS server, so when it returns an IP address to the client it is out of the picture. What happens after that is between the client and the VPN server. Once the connection is established the client will not look up the VPN FQDN until it has to re-establish the connection. You’re right, if the connection is interrupted there’s a possibility that Azure Traffic Manager will return a different IP address and the orphaned connection on the original server will remain for a period of time. However, as long as you have enough ports and available IP address space this shouldn’t be an issue. Those sessions will eventually expire anyway.
oderbang
/ February 5, 2021Hi Richard,
I had this exact scenario, It’s important to understand how ikev2 works to fully understand the issue. when you initially establish a connection a pair of cryptographic keys are generated and shared between the client and server.. this is what is used to secure the IP tunnel.
When you have a network interruption which severs the connection… your client will automatically re-establish the tunnel using the same keypairs generated above.
This partially goes wrong when you have multiple VPN servers.
lets say a client attempts to re-establish a connection only to be provided the IP address of a 2nd server… this is not the end of the world as the server sees it has no active connection for that client and simply just asks you to re-authenticate and negotiates a new key pair. this just leaves server1 (as richard points out) with a orphaned connection that will eventually expire.
the REAL problem occurs when the client connection is interrupted a 2nd time, the client again attempts to re-establish the tunnel, only this time its is directed back to server1… server 1 still has an active connection for that client but THIS time the key pairs no longer match. (as the client now has pairs from server2)
This causes an error and the client cannot re-establish the connection. now leaving both server1 and server2 orphaned connection effective blocking the client from connecting until either an admin forces the connection to disconnect or the session expires.
To counter this you can either:
1) increase the number of VPN server you have which reduces the chance of the client “bumping” into a orphaned connection( expensive and probably not a great as it complicates diagnosing issue also unless you have 1000s of VPN clients its not cost effective)
or 2) set your traffic manager to balance the request using “priority” this means ALL connections go too a single server but you still have “failover” if your primary server fails.
Richard do you know of a way to share IKE keypairs between servers using RRAS to maintain session affinity ?
Richard M. Hicks
/ February 5, 2021RRAS servers definitely don’t share session state. There’s no option available to do this, unfortunately.
Paddy Berger
/ April 15, 2020Thanks Richard for all your input, managed to work out why it didn’t work. I had to reboot the NPS server first and then the VPN server for both sites in that order and all seemed to kick in. Tested via traffic manager and all load balancing and failing over correctly.
Matt Heath
/ April 22, 2020Hi Richard,
We use Azure Traffic manager and have two sites about 30 miles apart from each over and using the same Internet Provider. The issue we are having is that we are finding that Device and User tunnels are not terminating at the same site which is causing some routing issues.
We are trying to load balance between both sites, but because of this we are limited to the Azure balancing settings that can be used. In particular the performance balancing setting cannot be used due to the sites being so close together and using the same Internet Provider.
My question is, Is there a way that a users device and user tunnel can terminate on the same site (both device and user tunnel on same site), or is there a way to get the device tunnel to disconnect when the user tunnel connects.
We are using Windows 10 version 1809 and seem to find that the device tunnel stays up even when the user tunnel is connected. When we previously had Windows 10 version 1709, we would find that the device tunnel disconnected when the user tunnel came up, and therefore this didn’t cause an issue.
Richard M. Hicks
/ April 22, 2020To begin, you should not have any routing issues if the device tunnel and user tunnel terminate in different locations. That said, there’s no way to absolutely guarantee that both tunnels will connect to the same entry point in this scenario. If Azure Traffic Manager returns different IP addresses they will most definitely connect to different sites.
Also, the device tunnel staying up while the user tunnel is connected is expected and by design. If the device tunnel was disconnecting after the user tunnel came up, that was a bug. 🙂
stefan2904
/ December 8, 2021We are currently in the design phase of Always On VPN to replace our DirectAccess environment. We want to make use of the Azure Traffic Manager to route users to the AlwaysOn servers in their own region (3 regions total). As we will have multiple AlwaysOn servers per region, we also want to load balance the traffic amongst them. Per region we have a load balancer (A10) in place.
As I see it we have 2 options:
– Configure each individual AlwaysOn server in Azure Traffic Manager (each node will have it’s own external IP). This way we don’t have to use our own load balancers and the setup will be easier.
– Configure the 3 load balancers in Azure Traffic Manager (each load balancer will have an external IP). This setup is more complex.
What is in your opinion the best option?
Richard M. Hicks
/ December 10, 2021I’m not sure there’s a ‘best’ option here, really. Both are equally viable options, so it’s just your preference at this point. Either one should work well. Eliminating the load balancer would make things simpler, which is always easier to support in the long run. 🙂