Always On VPN Multisite with Azure Traffic Manager

Always On VPN Multisite with Azure Traffic ManagerEliminating single points of failure is crucial to ensuring the highest levels of availability for any remote access solution. For Windows 10 Always On VPN deployments, the Windows Server 2016 Routing and Remote Access Service (RRAS) and Network Policy Server (NPS) servers can be load balanced to provide redundancy and high availability within a single datacenter. Additional RRAS and NPS servers can be deployed in another datacenter or in Azure to provide geographic redundancy if one datacenter is unavailable, or to provide access to VPN servers based on the location of the client.

Multisite Always On VPN

Unlike DirectAccess, Windows 10 Always On VPN does not natively include support for multisite. However, enabling multisite geographic redundancy can be implemented using Azure Traffic Manager.

Azure Traffic Manager

Traffic Manager is part of Microsoft’s Azure public cloud solution. It provides Global Server Load Balancing (GSLB) functionality by resolving DNS queries for the VPN public hostname to an IP address of the most optimal VPN server.

Advantages and Disadvantages

Using Azure Traffic manager has some benefits, but it is not with some drawbacks.

Advantages – Azure Traffic Manager is easy to configure and use. It requires no proprietary hardware to procure, manage, and support.

Disadvantages – Azure Traffic Manager offers only limited health check options. Azure Traffic Manager’s HTTPS health check only accepts HTTP 200 OK responses as valid. Most TLS-based VPNs will respond with an HTTP 401 Unauthorized, which Azure Traffic Manager considers “degraded”. The only option for endpoint monitoring is a simple TCP connection to port 443, which is a less accurate indicator of endpoint availability.

Note: This scenario assumes that RRAS with Secure Socket Tunneling Protocol (SSTP) or another third-party TLS-based VPN server is in use. If IKEv2 is to be supported exclusively, it will still be necessary to publish an HTTP or HTTPS-based service for Azure Traffic Manager to monitor site availability.

Traffic Routing Methods

Azure Traffic Manager provide four different methods for routing traffic.

Priority – Select this option to provide active/passive failover. A primary VPN server is defined to which all traffic is routed. If the primary server is unavailable, traffic will be routed to another backup server.

Weighted – Select this option to provide active/active failover. Traffic is routed to all VPN servers equally, or unequally if desired. The administrator defines the percentage of traffic routed to each server.

Performance – Select this option to route traffic to the VPN server with the lowest latency. This ensures VPN clients connect to the server that responds the quickest.

Geographic – Select this option to route traffic to a VPN server based on the VPN client’s physical location.

Configure Azure Traffic Manager

Open the Azure management portal and follow the steps below to configure Azure Traffic Manager for multisite Windows 10 Always On VPN.

Create a Traffic Manager Resource

  1. Click Create a resource.
  2. Click Networking.
  3. Click Traffic Manager profile.

Create a Traffic Manager Profile

  1. Enter a unique name for the Traffic Manager profile.
  2. Select an appropriate routing method (described above).
  3. Select a subscription.
  4. Create or select a resource group.
  5. Select a resource group location.
  6. Click Create.

Always On VPN Multisite with Azure Traffic Manager

Important Note: The name of the Traffic Manager profile cannot be used by VPN clients to connect to the VPN server, since a TLS certificate cannot be obtained for the trafficmanager.net domain. Instead, create a CNAME DNS record that points to the Traffic Manager FQDN and ensure that name matches the subject or a Subject Alternative Name (SAN) entry on the VPN server’s TLS and/or IKEv2 certificates.

Endpoint Monitoring

Open the newly created Traffic Manager profile and perform the following tasks to enable endpoint monitoring.

  1. Click Configuration.
  2. Select TCP from the Protocol drop-down list.
  3. Enter 443 in the Port field.
  4. Update any additional settings, such as DNS TTL, probing interval, tolerated number of failures, and probe timeout, as required.
  5. Click Save.

Always On VPN Multisite with Azure Traffic Manager

Endpoint Configuration

Follow the steps below to add VPN endpoints to the Traffic Manager profile.

  1. Click Endpoints.
  2. Click Add.
  3. Select External Endpoint from the Type drop-down list.
  4. Enter a descriptive name for the endpoint.
  5. Enter the Fully Qualified Domain Name (FQDN) or the IP address of the first VPN server.
  6. Select a geography from the Location drop-down list.
  7. Click OK.
  8. Repeat the steps above for any additional datacenters where VPN servers are deployed.

Always On VPN Multisite with Azure Traffic Manager

Summary

Implementing multisite by placing VPN servers is multiple physical locations will ensure that VPN connections can be established successfully even when an entire datacenter is offline. In addition, active/active scenarios can be implemented, where VPN client connections can be routed to the most optimal datacenter based on a variety of parameters, including current server load or the client’s current location.

Additional Information

Windows 10 Always On VPN Hands-On Training Classes

 

Leave a comment

65 Comments

  1. Nori

     /  July 30, 2018

    Since RRAS is not supported on VMs in Azure, would you recommend using Azure P2S VPN as an endpoint for Always On VPN?

    Reply
  2. Eric Yew

     /  July 31, 2018

    I find that using both device tunnel and user tunnel with traffic manager (or any DNS type load balancer) does not work well. If a client connects to VPN server 1 with device tunnel and then it connects with user tunnel, but to the VPN server 2, 1 of the VPN server will have corrupted routing and only way to fix it is to reboot. Happened to 2 of our customers so far. Have you experience this yourself?

    Reply
    • Not encountered that myself, but good to know. I’m not particularly enamored with the device tunnel at this point, so I try to avoid its use as much as possible. 🙂

      Reply
      • Nori

         /  August 2, 2018

        Interesting. Could you elaborate further why you don’t like the device tunnel? I haven’t deployed it, but I was “Microsoft super-excited” when I heard about it.

      • The device tunnel is not authenticated as well as I would like it to be. It requires only a machine certificate, unlike DirectAccess where a certificate and a computer account in Active Directory are required. In addition, there are still some reliability issues with the device tunnel that are frustrating. :/

      • Eric Yew

         /  August 7, 2018

        Same here! unfortunately, one of our customer needs it as they have a password reset tool on their login screen which was accessible via direct access and they wanted like for like functionality. So they have decided to go with only device tunnel for now.

      • bargi

         /  August 8, 2018

        Interested to hear your experience with device tunnel and AOVPN in general
        .
        We started with the Device tunnel at the start of the year with 1709 and then noticed the tunnel dropping after login and svchost.exe_RasMan crashing.(no User tunnel configured, just device)
        At the time MS said it “should” be fixed with 1803 couldn’t confirm it actually would. Not started rolling out 1803 yet with all the issues there’s been with the build.

        As a work around we swapped to User tunnel and User certs which we thought was working fine. But then noticed a worrying number of users who couldn’t connect with Windows saying there was no valid certificate. Everyone with the problem actually had a valid certificate, for some reason Window or rasdialer couldn’t see it but certman can. Deleting the cert and running GPUpdate to pull down a new one fixes the problem immediately. I compared both the old and new cert and no differences other than the obvious.(thumbprint etc)

        So as even a hackier work around we’re rolling out a User VPN but authenticating with Computer Certificate as we’ve yet to see any issues with the Computer certificate.

        Agree using Computer certificate authentication is not the best as it completely bypasses NPS/RADIUS servers and any policies.

      • 1803 is much improved in that regard. I’ve not experienced the issue with the device tunnel dropping when the user tunnel is established. However, I’m still hearing reports (and experiencing myself) of overall tunnel instability. Most commonly the device tunnel/user tunnel aren’t re-establishing automatically after a network status change (e.g. moving from one network to another or coming out of sleep mode). Hoping that Microsoft is resolving some of these in vNext for sure. 🙂

      • bargi

         /  August 8, 2018

        btw here’s another bug that MS support took 5 months to reproduce and acknowledge but then said I’d need to pay for it to be looked at any further
        You can only have 1 User AOVPN with Auto connect enabled at one time per Computer.
        eg: Log on as User 1, create a User AOVPN connection, it works as expected and “Automatically Connect” is ticked, log off.
        Log on as User 2, create a User AOVPN connection, it works as expected and “Automatically Connect” is ticked, log off.
        Log on as User 1 and User AOVPN does not connect and “Automatically Connect” is unticked. Tick it, VPN connects as expected, log off.
        Log on as User 2 and User AOVPN does not connect and “Automatically Connect” is unticked. etc, etc

        Granted this generally isn’t an issue as the majority of the time as laptops generally are used by a single person

      • bargi

         /  August 8, 2018

        As a final rant (I promise!)
        Does anyone else find their RRAS servers full of ghost/orphaned connections?
        Tried adjusting timeouts and other settings to get them to clear but the only thing I found is to either restart RRAS service nightly.

      • Are you using IKEv2 or SSTP? Both? That’s not uncommon for IKEv2, really. If a client gets disconnected (loses network connectivity for any reason) the server will keep the SA alive for a period of time in case the client wants to reconnect. However, if the client establishes a completely new connection, then it doesn’t reuse the old ones and they’ll appear as orphaned. They should die out over time though. How long do they seem to hang around for?

  3. bargi

     /  August 1, 2018

    Another work around for IKE exclusive setups is to enable PPTP and open TCP 1723 to the RRAS server and use this for the Healthcheck.

    So PPTP can’t actually be used, lock it down by unticking “Remote Access connections (Inbound only)” and “Demand-dial routing connections (inbound and outbound) for the PPTP properties under Ports.

    To further secure it, configure your firewall to only allow connections from the MS health check servers defined below.
    https://azuretrafficmanagerdata.blob.core.windows.net/probes/azure/probe-ip-ranges.json
    If you’re firewall doesn’t support updating automatically from the page just sign up to a free webpage monitoring site and have it email you when it changes.

    Benefit is the check is directly related to the RRAS service being up/down.

    Reply
    • bargi

       /  August 1, 2018

      correction to above, you need at least “Remote Access connections (Inbound only)” enabled for RRAS to open the port.

      Reply
    • Thanks for the tips, Raymond. Much appreciated! Definitely a good idea. 🙂

      Reply
  4. As an alternative to Azure Traffic Manager, should it be possible if we have say 2 AO VPN servers using IKEv2 in separate geographical locations to use a Citrix Netscaler for GSLB? I’ve had a look at the Citrix documentation, AO VPN documentation but can’t see anything definitive. I’ve seen that Kemp devices can perform this function, but nothing confirming if Citrix can?

    Thanks,
    ND

    Reply
    • Absolutely! I’ve used Azure Traffic Manager and Amazon Route 53 for DirectAccess and Always On VPN global load balancing, but I find the GSLB functionality included in most popular load balancers/ADCs (F5, NetScaler, KEMP, etc.) offer more features and better granularity/control. 🙂

      Reply
  5. First of all thanks for your great blogs!
    I’ve setup an Always On User VPN PoC for multiple countries on a single URL with Azure traffic manager which works great. For now each country has a local VPN and NPS server.
    I’ve created nested geo/priority profiles in traffic manager so when a country goes down you will be routed to another country. In the VPN profile you need to specify the NPS server. how can i make that high available or is there some way to remove the NPS server from the profile and add more radius servers to the VPN servers config?

    Reply
    • You have a couple of options. On the client-side you could add all of the NPS servers in the organization to the VPN profile. Alternatively you could simply disable NPS server validation which would allow the client to connect to any NPS server.

      Reply
      • Jeroen

         /  December 4, 2018

        Great that helped me in the right direction. Now have failover between multiple countries with a single VPN profile combined with Azure Traffic Manager.

  6. Merwin

     /  May 14, 2019

    Hi, do you know how did you fix “ghost/orphaned connections” from RAS Server? i am also seeing them and not sure how to fix them

    thanks

    Reply
    • They will eventually die off on their own. If you want to see them removed immediately you could just restart the RemoteAccess service. Keep in mind that will terminate all connections, but the Always On connections will reconnect automatically of course. If you want to be more selective you can use PowerShell and the Get-RemoteAccessConnectionStatistics command to identify and remove stale connections as well.

      Reply
  7. David

     /  June 24, 2019

    Hi Richard,
    Correct me if i am wrong, but the information regarding The Azure Traffic Manager is out of date,
    I believe it now supports “custom headers” and custom “Expected status codes” you can have up to 8 pairs of custom headers and 8 sets of ranges for expected status codes.

    Reply
    • You may be right. 🙂 Things change rapidly in the cloud which makes writing about them challenging! I typically try to preface things with something like “at the time of this writing” just to hedge. 😉 I’ll have a closer look though and update the post as required. Thanks for bringing this to my attention!

      Reply
    • Update: I took another look at this and the guidance in this post still stands (as of June 24, 2019). Although Azure Traffic Manager does allow you to provide an “expected status code”, it only accepts 200-299 and 301. For Always On VPN and SSTP, the expected code is 401. So, until Microsoft adds support for 401 response codes when using HTTPS, we’re stuck with having to use TCP instead.

      Reply
  8. Vojin

     /  November 12, 2019

    Isn’t RRAS unsupported in Azure IaaS?

    Reply
    • Correct. Windows Server Routing and Remote Access Service (RRAS) is not a formally supported workload running on Windows Server in Azure. However, RRAS does work well in Azure and I’ve deployed it numerous times. 🙂

      Reply
  9. victor bassey

     /  January 4, 2020

    Hello Richard, have you ever tried load balancing IKEv2 UDP ports 500 &4500 using Azure Traffic Manager? It seems ATM can only be used to load balance TCP based protocol e.g. SSTP and not UDP ones?

    Reply
    • Yes, I do it all the time. 🙂 To be clear though, you don’t actually load balance UDP traffic with Azure Traffic Manager. Azure Traffic Manager is nothing more than intelligent DNS. The client resolves the VPN FQDN to Azure Traffic Manager, and Traffic Manager returns an IP address based on your configuration. What you might be referring to is the workload monitoring. Azure Traffic Manager does not support UDP health checks, only TCP. So, when using Azure Traffic Manager you have to use ICMP/PING to monitor the endpoint.

      Hope that helps!

      Reply
      • victor bassey

         /  January 4, 2020

        Thanks Richard. So by just configuring ATM profile say vpn1.trafficmanager.net with icmp or tcp 443 for probing

        Once I add the endpoints, ATM will pass traffic for any port and just use icmp or tcp443 for monitoring?

      • Correct. And again, technically speaking, the Azure Traffic Manager doesn’t pass any traffic. It’s just DNS. vpn1.trafficmanage.net will return an IP address, but that address might be different depending on how you configure it and which resources are available at the time. 🙂

      • victor bassey

         /  January 6, 2020

        Got it Richard. Thanks

      • Hey Richard, Spot on! thanks for the clarification. I was looking at ATM as a proper Load-Balancing appliance doing some sort of port forwarding rather than just an intelligent DNS.

      • victor bassey

         /  January 5, 2020

        Spot on Richard! Thanks for the clarification. I was looking at ATM as proper port-forwarding Load-Balancer rather than an intelligent DNS like you correctly stated.

  10. Jess

     /  February 17, 2020

    Hi Richard – once we do this do we simply put in the DNS name we get from traffic manager in the VPN connection name/address?

    Reply
    • No. You will have to create a CNAME DNS record that maps to the Azure Traffic Manager FQDN. You’ll do this because the FQDN used by your clients must match the subject name on either the TLS or IKEv2 IPsec certificates installed on the VPN server.

      Reply
      • Geoff

         /  March 30, 2020

        Hey Richard, I think your reply here is root of the issue I am having getting Traffic Manager to work with an Azure VPN Gateway based Always On VPN configuration. With the Azure VPN Gateway point-to-site configuration, it automatically generates a hostname/certificate for the connection for azuregateway-blah-blah-blah.vpn.azure.com DNS name.

        If I used that DNS name, the connection works fine. If I use my Traffic Manager CNAME record, IKE auth fails because the cert doesn’t match. Do you know how I can fix this with Azure VPN Gateways? I can’t seem to find a way to add a SAN for my CNAME record to the auto-generated Azure Gateway server certificate.

      • Right. This is a limitation of the Azure VPN gateway. You do not have the option to install your own certificate. With that, you have to use the subject name they use, which means if you have more than one gateway you can’t use geographic load balancing. Perhaps one day it will be possible to upload our own certificate, but right now it is not an option.

  11. Paddy Berger

     /  March 12, 2020

    Hi Richard,

    How would this work if your servers are located on your internal domain and not in Azure? So we have two sites with VPN servers located and NPS servers located, currently we have only tested with one side of the site and working as a single site VPN solution. However we wanted to add redundancy and have added the secondary vpn server in site B.

    All using IKEv2 for device and user tunnels.

    Thanks

    Reply
    • No different, really. You would configure Azure Traffic Manager to use external endpoints and provide the IP address of each location in the configuration.

      Reply
  12. Paddy Berger

     /  March 12, 2020

    Hi Richard, also to add to the above. I have attempted to try this for the one site to see if I can connect and I get a failure saying IKE authentication credentials are unacceptable. What I did was add the traffic manager to point to the current vpn server via its external hostname (the one which is currently in the vpn client address), created a new hostname to point to the traffic manager. Added this hostname into VPN server certificate SAN. Added this new hostname into vpn client connection and get that error. Any ideas? Have I followed this correctly?

    Reply
    • As long as whatever name you use to resolve to the Azure Traffic Manager profile is also the subject name of the VPN server certificate (or one of the entries on the SAN list) it should work.

      Reply
      • Paddy Berger

         /  March 13, 2020

        Perfect, worked like a charm. I had to delete the existing cert and as soon as I did that, the connection started working.

  13. Paddy Berger

     /  April 6, 2020

    Hi Richard, have been working on this and have hit two problems.

    Problem 1: Initially I could connect to the dns name pointing to the traffic manager, however when I now try to connect I get “element not found” or “name of the remote access server did not resolve”. I then added the traffic manager dns into the san and that seems to connect. Once connected the original vpn which has the dns name pointing to the traffic manager also connects, seems like a DNS issue but cannot see where to resolve. Also having the traffic manager dns as the connection name, I recall you said should not be used due to TLS.

    Problem 2: (connecting with the traffic manager name)
    If the traffic manager goes into degraded the connection still connects and doesn’t automatically disconnect. I followed your guide by adding a simple website on the vpn server, traffic manager see’s the https server, I then did a iisreset /stop and can see traffic manager becomes degraded, yet connection is still live and I am able to disconnect the vpn and reconnect without any problems?

    Currently only have one VPN server in traffic manager as still working on the other site, but wanted one side working.

    Reply
    • Paddy Berger

       /  April 6, 2020

      Just to update problem 2, I know AOVPN will work even if IIS is stopped, what I wanted to know was the if the service is degraded, should it not stop connections or future connections whilst in this state

      Reply
    • Not sure what’s up with DNS there, but the only way this works is if your VPN deployment public FQDN is an alias (CNAME) for the Azure Traffic Manager FQDN. If you try to use the Azure Traffic Manager FQDN you will get subject name mismatch errors and SSTP and IKEv2 connections will fail. You should definitely not need to add the Azure Traffic Manager to the SAN list on your certificate. As for Azure Traffic Manager not failing over, make sure the TTL on your DNS CNAME record is set low (60 seconds).

      Reply
      • Paddy Berger

         /  April 6, 2020

        Weird things is I added the Azure traffic manager FQDN in the SAN and tried to connect with that name, worked. hence wanted to ask you if this should actually work or not

      • I guess if it was for your IKEv2 IPsec certificate issued by your internal CA it would work, or if you are using SSTP with a private PKI issue certificate (not recommended). In those cases yes, I guess it would work. However, it shouldn’t be necessary. If it is working for you then I’d just accept it and move on. 🙂

  14. Paddy Berger

     /  April 11, 2020

    Hi Richard, got to the bottom of the DNS issue. Ok, so I am now at a stage where I have got traffic manager pointing to the two sites i.e. site1.co.uk and site2.co.uk, however when testing by degrading site1, I noticed the vpn does not connect as on the vpn connection itself it points to site1 nps server whereas it is looking for site2 nps server, what is the best method to get past this?

    Would you load balance the nps servers or add the site1 and site2 vpn servers into the nps servers of each other too?

    Thanks

    Reply
    • Easiest way is to configure the VPN server in each site to to use their local NPS as primary and the remote site as secondary.

      Reply
      • Paddy Berger

         /  April 14, 2020

        Hi, I added the secondary NPS server on the VPN Server under Radius Authentication with a score of 20. I then added this as a radius client on the NPS server and reconfigured the “configure vpn or dial up” so that this time both servers are listed. When testing, the connection will still connect to local nps and remote nps will not connect as it will say “The connection was prevented because of a policy configured on your RAS/VPN”.

        As a test I removed the local nps server so that only the remote nps is listed in radius authentication and that works, however when both are listed and testing between the two, it would not work.

      • When it fails, what does the event log on the NPS server say? It should give you a reason why the connection was rejected.

  15. Matthew

     /  April 14, 2020

    Hi Richard,

    Thank you for the incredibly useful documentation. One query that has been raised by our organisation is what happens in the following scenario.

    On a IKEv2 VPN (device tunnel) the user’s DNS queries Azure Traffic Manager and finds the appropriate server to connect. I’m aware through Azure documentation that TCP/UDP protocols are treated slightly differently. In this scenario, how is this connection maintained so the user doesn’t end up polling the address and connecting to another server if the connection reaches a time-out? Therefore, the original server never closes the connection and they may experience other issues.

    I believed once the connection is established it wouldn’t poll DNS again (although that’s not to say the response wouldn’t change), because the VPN is connected directly to the VPN server. The counterargument to this is that although it’s connected to the server the Traffic Manager CNAME record is still the ‘access’ point so therefore Traffic Manager could effectively give a different response and potentially what effect that would have on the VPN connection I’m not sure.

    Many thanks,
    Matthew

    Reply
    • Azure Traffic Manager is just a DNS server, so when it returns an IP address to the client it is out of the picture. What happens after that is between the client and the VPN server. Once the connection is established the client will not look up the VPN FQDN until it has to re-establish the connection. You’re right, if the connection is interrupted there’s a possibility that Azure Traffic Manager will return a different IP address and the orphaned connection on the original server will remain for a period of time. However, as long as you have enough ports and available IP address space this shouldn’t be an issue. Those sessions will eventually expire anyway.

      Reply
  16. Paddy Berger

     /  April 15, 2020

    Thanks Richard for all your input, managed to work out why it didn’t work. I had to reboot the NPS server first and then the VPN server for both sites in that order and all seemed to kick in. Tested via traffic manager and all load balancing and failing over correctly.

    Reply
  17. Matt Heath

     /  April 22, 2020

    Hi Richard,

    We use Azure Traffic manager and have two sites about 30 miles apart from each over and using the same Internet Provider. The issue we are having is that we are finding that Device and User tunnels are not terminating at the same site which is causing some routing issues.

    We are trying to load balance between both sites, but because of this we are limited to the Azure balancing settings that can be used. In particular the performance balancing setting cannot be used due to the sites being so close together and using the same Internet Provider.

    My question is, Is there a way that a users device and user tunnel can terminate on the same site (both device and user tunnel on same site), or is there a way to get the device tunnel to disconnect when the user tunnel connects.

    We are using Windows 10 version 1809 and seem to find that the device tunnel stays up even when the user tunnel is connected. When we previously had Windows 10 version 1709, we would find that the device tunnel disconnected when the user tunnel came up, and therefore this didn’t cause an issue.

    Reply
    • To begin, you should not have any routing issues if the device tunnel and user tunnel terminate in different locations. That said, there’s no way to absolutely guarantee that both tunnels will connect to the same entry point in this scenario. If Azure Traffic Manager returns different IP addresses they will most definitely connect to different sites.

      Also, the device tunnel staying up while the user tunnel is connected is expected and by design. If the device tunnel was disconnecting after the user tunnel came up, that was a bug. 🙂

      Reply
  1. Always On VPN Options for Azure Deployments | Richard M. Hicks Consulting, Inc.
  2. Always On VPN and RRAS in Azure | Richard M. Hicks Consulting, Inc.
  3. Always On VPN Load Balancing for RRAS in Azure | Richard M. Hicks Consulting, Inc.

Leave a Reply to Jeroen Cancel reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: