Always On VPN Updates to Improve Connection Reliability

Always On VPN Updates to Improve Connection ReliabilityA longstanding issue with Windows 10 Always On VPN is that of VPN tunnel connectivity reliability and device tunnel/user tunnel interoperability. Many administrators have reported that Always On VPN connections fail to establish automatically at times, that only one tunnel comes up at a time (user tunnel or device tunnel, but not both), or that VPN tunnels fail to establish when coming out of sleep or hibernate modes. Have a look at the comments on this post and you’ll get a good understanding of the issues with Always On VPN.

Recent Updates

The good news is that most of these issues have been resolved with recent updates to Windows 10 1803 and 1809. Specifically, the February 19, 2019 update for Windows 10 1803 (KB4487029) and the March 1, 2019 update for Windows 10 1809 (KB4482887) include fixes to address these known issues. Administrators are encouraged to deploy Windows 10 1803 with the latest updates applied when implementing Always On VPN. Windows 10 1809 with the latest updates applied is preferred though.

Persistent Issues

Although initial reports are favorable for these updates and based on my experience the effectiveness and reliability of Windows 10 Always On VPN is greatly improved, there have still been some reports of intermittent VPN tunnel establishment failures.

Possible Causes

During my testing, after applying the updates referenced earlier both device tunnel and user tunnel connections are established much more consistently than before the updates were applied. I did encounter some issues, however. Specifically, when coming out of sleep or hibernate, VPN connections would fail to establish. Occasionally VPN connections would fail after a complete restart.

NCSI

After further investigation it was determined that the connectivity failure was caused by the Network Connectivity Status Indicator (NCSI) probe failing, causing Windows to report “No Internet access”.

Always On VPN Updates to Improve Connection Reliability

Cisco Umbrella Roaming Client

In this instance the NCSI probe failure was caused by the Cisco Umbrella Roaming Client installed and running on the device. The Umbrella Roaming Client is security software that provides client protection by monitoring and filtering DNS queries. It operates by configuring a DNS listener on the loopback address. NCSI probes are known to fail when the DNS server is running on a different interface than is being tested.

Resolution

Microsoft released a fix for this issue in Windows 10 1709. The fix involves changing a group policy setting to disable interface binding when perform DNS lookups by the NCSI. You can enable this setting via Active Directory group policy by navigating to Computer Configuration > Administrative Templates > Network > Network Connectivity Status Indicator > Specify global DNS. Select Enabled and check the option to Use global DNS, as shown here.

Always On VPN Updates to Improve Connection Reliability

For testing purposes this setting can be enabled individual using the following PowerShell command.

New-ItemProperty -Path “HKLM:\SOFTWARE\Policies\Microsoft\Windows\NetworkConnectivityStatusIndicator\” -Name UseGlobalDNS -PropertyType DWORD -Value 1 -Force

Third-Party Software

As Always On VPN connectivity can be affected by NCSI, any third-party firewall or antivirus/antimalware solution could potentially introduce VPN connection instability. Observe NCSI operation closely when troubleshooting unreliable connections with Always On VPN.

Additional Information

Windows 10 1803 Update KB4487029

Windows 10 1809 Update KB4482887

Cisco Umbrella Roaming Client Limited Network Connectivity Warning

Network Connectivity Status Indicator (NCSI) Operation Explained

Leave a comment

17 Comments

  1. Nate

     /  May 15, 2019

    Hi. I’m having a sleep/hibernate issue. Resuming from sleep it tries to connect for about 25 seconds, seems to time out, then connects within two seconds in the second attempt. I tried the NCSI workaround with no luck. For me it seems like the NRPT is in a bad state. Our VPN server is in a DNS domain where we do split-DNS, so there is an NRPT rule for the domain. When I resume from sleep, JUST as it finally connects I get a “Name resolution policy table has been corrupted.” event 1023.
    If I disconnect the VPN before I put the machine to sleep, it auto-connects within 2 seconds after resuming and I don’t get event 1023.
    So might it not be properly disconnecting and clearing the NRPT when going to sleep?
    This is my LAST real annoyance to work out with this thing.

    Thanks,
    Nate

    Reply
  2. Nate

     /  May 15, 2019

    Follow-up to my previous post. I created a test profile without the DNS domain containing the VPN server in the NRPT list. It fixed the connection delay, but I can’t do this in production. Unless someone has a better idea, I’ll have to rework this to use a different domain for the VPN server, which also means re-issuing VPN server certificates :/

    Reply
  3. Nate

     /  May 15, 2019

    Follow up to my follow up, because of course I worked around it right after I bothered to write a post. An NRPT exclusion for the VPN server appears to work. I’m surprised this works if NRPT itself is the problem, but I’m now connecting within 2 seconds when resuming as soon as the Internet becomes available. Hope this helps someone else

    Reply
    • Glad you were able to get it sorted! 🙂

      Reply
    • rance

       /  May 22, 2019

      Nate, could you please explain what you did in more detail?
      I am also having sleep/hibernate connection issue. laptop wakes up, VPN says connected, but it isn’t and have to reboot to sort. Thank you for your time.

      Reply
      • Nate

         /  May 23, 2019

        Hi rance,

        Sure, but off the bat your symptom is different than mine, which was a status of Connecting for about 25 seconds, timing out, then connecting immediately on the 2nd attempt.

        In my XML, I had a rule that looked like this:

        .mydomain.com
        192.168.0.1,192.168.0.2

        Let’s say my VPN server name is vpn.mydomain.com. I realized that when resuming from sleep, the NRPT was still active even though the connection was not, and it was trying to look up vpn.mydomain.com on the internal DNS servers specified in the XML. So, I created this exception:

        vpn.mydomain.com
        8.8.8.8,8.8.4.4

        This tells it to always look up vpn.mydomain.com on public DNS.
        I should NOT have needed to specify the DnsServers element, but I noticed it would sometimes want to grab the DNS servers from the parent domain’s NRPT rule which persisted the issue.

        While I was researching this I ran across a lot of posts that sound more like your issue where it takes a full reboot to resolve. This one, for example: https://social.technet.microsoft.com/Forums/windows/en-US/c4722609-2992-40c7-a88b-c897d4abf364/no-vpn-after-sleep?forum=w8itpronetworking

        The suggestions there might help, specifically the adapter power setting.

        Running fully patched 1809 is also pretty key. I deployed this to a few 17xx and they were pretty unpredictable until I upgraded them.

        Good luck!

  4. Nate

     /  May 23, 2019

    Sorry, WordPress stripped the tags from my rules. Hopefully you know what I meant.

    Reply
  5. Daniel

     /  November 20, 2019

    Regarding to Connection Reliability: Do you have ever seen, that during the day the VPN-tunnel gets broken, so that it’s not possible to e. g. transfer any data over the tunnel, but it stays in the state “Connected”. If we restart the client / often it’s enough to reconnect the tunnel it’s okay (but if this happens with the device tunnel, it’s not possible for the user to reconnect, so a restart is needed).
    We’re using a F5 loadbalancer for IKEv2.
    Any ideas to debug this?

    Reply
    • Haven’t seen anything like that myself. There is a known issue with Windows Server RRAS where client connections can fail, it would affect all connections and it typically only happens after a service restart. It certainly wouldn’t happen in the middle of a connection.

      Reply
    • rance

       /  November 22, 2019

      We had so many problems with IKEv2, too many to list on this blog.
      We stopped using it, ISP & corporations blocking port 500, UDP fragmentation and user and device tunnels conflict causing random tunnel drops… top ones.
      One thing that seem to lower the calls, was rebooting VPN server each evening. (Not a fix)
      We now running AOVPN in SSTP/IKEv2 mode (failback) very solid to date.
      Look at timeouts you have set on the F5, does F5 have an AOVPN template? We use Kemp and have a template which has all the settings and timings set.
      Is your VPN server 2019?
      Think about server reboots and moving to SSTP/IKEv2.
      Or… You can disconnect the user from the VPN server, may save them a reboot, but this doesn’t lower your calls.
      Lots of information on Richard M Hicks site, he has saved my bacon several times.
      Hope this helps in a small way, sorry no easy fix or answer.

      Reply
      • Agreed. IKEv2 offers better security options than SSTP, but it suffers from some serious operational challenges. Rebooting the server isn’t ideal, of course. Restarting the RemoteAccess service would be equally disruptive, but might not solve your problems. You could write a script that programmatically terminates VPN connections that exceed a specific duration. Not sure if that would help or not, but might be interesting to test. Clearing stale connections might help, who knows. 😉

        As for the F5, I don’t have a template. I’d be happy to send you my configuration if you like. I did post my SSTP monitor configuration here though, if that helps.

      • Daniel

         /  November 25, 2019

        @Rance: Thank you very much for your advices. We will think about realize the reverse/standard mode with SSTP first then IKEv2. On the weekend I disabled some advanced features of the VMXNET3 network card in the device manager from both RRAS servers, e. g. everything with offloading and coalescing and now the pings are more stable (nearly no timeouts and not much difference between each ping times) and no disconnects have been reported yet.

        @Richard: Would be great if you could send me your F5 configuration. Our SSTP monitor is working fine. IKEv2 would be the interesting thing. Thank you very much.

      • Daniel

         /  November 26, 2019

        @rance: One other question. Do you still use a device tunnel with your SSTP solution? Heard that the device tunnel is IKEv2 only.

        Today we had again lot of connection drops…Frustrating

      • The Windows 10 Always On VPN device tunnel is indeed IKEv2 only…

      • Daniel

         /  November 28, 2019

        Thank you both for the informations.
        Since today we test the “Automatic” User Tunnel, mainly SSTP and then IKE, as rance recommended.

  1. Always On VPN DNS Registration Update Available | Richard M. Hicks Consulting, Inc.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: