Always On VPN Updates to Improve Connection Reliability

Always On VPN Updates to Improve Connection ReliabilityA longstanding issue with Windows 10 Always On VPN is that of VPN tunnel connectivity reliability and device tunnel/user tunnel interoperability. Many administrators have reported that Always On VPN connections fail to establish automatically at times, that only one tunnel comes up at a time (user tunnel or device tunnel, but not both), or that VPN tunnels fail to establish when coming out of sleep or hibernate modes. Have a look at the comments on this post and you’ll get a good understanding of the issues with Always On VPN.

Recent Updates

The good news is that most of these issues have been resolved with recent updates to Windows 10 1803 and 1809. Specifically, the February 19, 2019 update for Windows 10 1803 (KB4487029) and the March 1, 2019 update for Windows 10 1809 (KB4482887) include fixes to address these known issues. Administrators are encouraged to deploy Windows 10 1803 with the latest updates applied when implementing Always On VPN. Windows 10 1809 with the latest updates applied is preferred though.

Persistent Issues

Although initial reports are favorable for these updates and based on my experience the effectiveness and reliability of Windows 10 Always On VPN is greatly improved, there have still been some reports of intermittent VPN tunnel establishment failures.

Possible Causes

During my testing, after applying the updates referenced earlier both device tunnel and user tunnel connections are established much more consistently than before the updates were applied. I did encounter some issues, however. Specifically, when coming out of sleep or hibernate, VPN connections would fail to establish. Occasionally VPN connections would fail after a complete restart.

NCSI

After further investigation it was determined that the connectivity failure was caused by the Network Connectivity Status Indicator (NCSI) probe failing, causing Windows to report “No Internet access”.

Always On VPN Updates to Improve Connection Reliability

Cisco Umbrella Roaming Client

In this instance the NCSI probe failure was caused by the Cisco Umbrella Roaming Client installed and running on the device. The Umbrella Roaming Client is security software that provides client protection by monitoring and filtering DNS queries. It operates by configuring a DNS listener on the loopback address. NCSI probes are known to fail when the DNS server is running on a different interface than is being tested.

Resolution

Microsoft released a fix for this issue in Windows 10 1709. The fix involves changing a group policy setting to disable interface binding when perform DNS lookups by the NCSI. You can enable this setting via Active Directory group policy by navigating to Computer Configuration > Administrative Templates > Network > Network Connectivity Status Indicator > Specify global DNS. Select Enabled and check the option to Use global DNS, as shown here.

Always On VPN Updates to Improve Connection Reliability

For testing purposes this setting can be enabled individual using the following PowerShell command.

New-ItemProperty -Path “HKLM:\SOFTWARE\Policies\Microsoft\Windows\NetworkConnectivityStatusIndicator\” -Name UseGlobalDNS -PropertyType DWORD -Value 1 -Force

Third-Party Software

As Always On VPN connectivity can be affected by NCSI, any third-party firewall or antivirus/antimalware solution could potentially introduce VPN connection instability. Observe NCSI operation closely when troubleshooting unreliable connections with Always On VPN.

Additional Information

Windows 10 1803 Update KB4487029

Windows 10 1809 Update KB4482887

Cisco Umbrella Roaming Client Limited Network Connectivity Warning

Network Connectivity Status Indicator (NCSI) Operation Explained

Leave a comment

45 Comments

  1. Nate

     /  May 15, 2019

    Hi. I’m having a sleep/hibernate issue. Resuming from sleep it tries to connect for about 25 seconds, seems to time out, then connects within two seconds in the second attempt. I tried the NCSI workaround with no luck. For me it seems like the NRPT is in a bad state. Our VPN server is in a DNS domain where we do split-DNS, so there is an NRPT rule for the domain. When I resume from sleep, JUST as it finally connects I get a “Name resolution policy table has been corrupted.” event 1023.
    If I disconnect the VPN before I put the machine to sleep, it auto-connects within 2 seconds after resuming and I don’t get event 1023.
    So might it not be properly disconnecting and clearing the NRPT when going to sleep?
    This is my LAST real annoyance to work out with this thing.

    Thanks,
    Nate

    Reply
  2. Nate

     /  May 15, 2019

    Follow-up to my previous post. I created a test profile without the DNS domain containing the VPN server in the NRPT list. It fixed the connection delay, but I can’t do this in production. Unless someone has a better idea, I’ll have to rework this to use a different domain for the VPN server, which also means re-issuing VPN server certificates :/

    Reply
  3. Nate

     /  May 15, 2019

    Follow up to my follow up, because of course I worked around it right after I bothered to write a post. An NRPT exclusion for the VPN server appears to work. I’m surprised this works if NRPT itself is the problem, but I’m now connecting within 2 seconds when resuming as soon as the Internet becomes available. Hope this helps someone else

    Reply
    • Glad you were able to get it sorted! 🙂

      Reply
    • rance

       /  May 22, 2019

      Nate, could you please explain what you did in more detail?
      I am also having sleep/hibernate connection issue. laptop wakes up, VPN says connected, but it isn’t and have to reboot to sort. Thank you for your time.

      Reply
      • Nate

         /  May 23, 2019

        Hi rance,

        Sure, but off the bat your symptom is different than mine, which was a status of Connecting for about 25 seconds, timing out, then connecting immediately on the 2nd attempt.

        In my XML, I had a rule that looked like this:

        .mydomain.com
        192.168.0.1,192.168.0.2

        Let’s say my VPN server name is vpn.mydomain.com. I realized that when resuming from sleep, the NRPT was still active even though the connection was not, and it was trying to look up vpn.mydomain.com on the internal DNS servers specified in the XML. So, I created this exception:

        vpn.mydomain.com
        8.8.8.8,8.8.4.4

        This tells it to always look up vpn.mydomain.com on public DNS.
        I should NOT have needed to specify the DnsServers element, but I noticed it would sometimes want to grab the DNS servers from the parent domain’s NRPT rule which persisted the issue.

        While I was researching this I ran across a lot of posts that sound more like your issue where it takes a full reboot to resolve. This one, for example: https://social.technet.microsoft.com/Forums/windows/en-US/c4722609-2992-40c7-a88b-c897d4abf364/no-vpn-after-sleep?forum=w8itpronetworking

        The suggestions there might help, specifically the adapter power setting.

        Running fully patched 1809 is also pretty key. I deployed this to a few 17xx and they were pretty unpredictable until I upgraded them.

        Good luck!

  4. Nate

     /  May 23, 2019

    Sorry, WordPress stripped the tags from my rules. Hopefully you know what I meant.

    Reply
  5. Daniel

     /  November 20, 2019

    Regarding to Connection Reliability: Do you have ever seen, that during the day the VPN-tunnel gets broken, so that it’s not possible to e. g. transfer any data over the tunnel, but it stays in the state “Connected”. If we restart the client / often it’s enough to reconnect the tunnel it’s okay (but if this happens with the device tunnel, it’s not possible for the user to reconnect, so a restart is needed).
    We’re using a F5 loadbalancer for IKEv2.
    Any ideas to debug this?

    Reply
    • Haven’t seen anything like that myself. There is a known issue with Windows Server RRAS where client connections can fail, it would affect all connections and it typically only happens after a service restart. It certainly wouldn’t happen in the middle of a connection.

      Reply
    • rance

       /  November 22, 2019

      We had so many problems with IKEv2, too many to list on this blog.
      We stopped using it, ISP & corporations blocking port 500, UDP fragmentation and user and device tunnels conflict causing random tunnel drops… top ones.
      One thing that seem to lower the calls, was rebooting VPN server each evening. (Not a fix)
      We now running AOVPN in SSTP/IKEv2 mode (failback) very solid to date.
      Look at timeouts you have set on the F5, does F5 have an AOVPN template? We use Kemp and have a template which has all the settings and timings set.
      Is your VPN server 2019?
      Think about server reboots and moving to SSTP/IKEv2.
      Or… You can disconnect the user from the VPN server, may save them a reboot, but this doesn’t lower your calls.
      Lots of information on Richard M Hicks site, he has saved my bacon several times.
      Hope this helps in a small way, sorry no easy fix or answer.

      Reply
      • Agreed. IKEv2 offers better security options than SSTP, but it suffers from some serious operational challenges. Rebooting the server isn’t ideal, of course. Restarting the RemoteAccess service would be equally disruptive, but might not solve your problems. You could write a script that programmatically terminates VPN connections that exceed a specific duration. Not sure if that would help or not, but might be interesting to test. Clearing stale connections might help, who knows. 😉

        As for the F5, I don’t have a template. I’d be happy to send you my configuration if you like. I did post my SSTP monitor configuration here though, if that helps.

      • Daniel

         /  November 25, 2019

        @Rance: Thank you very much for your advices. We will think about realize the reverse/standard mode with SSTP first then IKEv2. On the weekend I disabled some advanced features of the VMXNET3 network card in the device manager from both RRAS servers, e. g. everything with offloading and coalescing and now the pings are more stable (nearly no timeouts and not much difference between each ping times) and no disconnects have been reported yet.

        @Richard: Would be great if you could send me your F5 configuration. Our SSTP monitor is working fine. IKEv2 would be the interesting thing. Thank you very much.

      • Daniel

         /  November 26, 2019

        @rance: One other question. Do you still use a device tunnel with your SSTP solution? Heard that the device tunnel is IKEv2 only.

        Today we had again lot of connection drops…Frustrating

      • The Windows 10 Always On VPN device tunnel is indeed IKEv2 only…

      • Daniel

         /  November 28, 2019

        Thank you both for the informations.
        Since today we test the “Automatic” User Tunnel, mainly SSTP and then IKE, as rance recommended.

    • rico101

       /  May 12, 2020

      Hi do you still have this issue as we are seeing this too. This has been a nightmare to roll out due to reliability issues.

      Reply
      • Daniel

         /  May 15, 2020

        Hi rico101, we only got this issues solved by switching the tunneltype to SSTP. We still have a device-tunnel with IKEv2 which still has problems, but when the user-tunnel is connected, the device-tunnel can be ignored, because the user-tunnel has a lower metric and we have the same routes from the device-tunnel configured for the user-tunnel as well.

  6. rance

     /  December 20, 2019

    @Daniel: Sorry I haven’t got back. Great news!
    I should have said, we have it set to “Automatic” and do not use the GPO to reset. Didn’t see any point as SSTP is just solid and works everywhere. @Richard is correct you have no option for machine, IKEv2 and computer certificate. Only means that the device cant not be unattended controlled. Windows updates and remote control can uses the user tunnel when out of the office.

    Reply
  7. Nick

     /  January 9, 2020

    Hi, Have you heard anything about another patch coming maybe to stabilise AOVPN, mentioned by a user in this thread:

    https://social.technet.microsoft.com/Forums/en-US/93dc1467-7082-4416-8bcf-d76a5dac2071/always-on-vpn-not-always-reconnecting-after-standby?forum=win10itpronetworking

    Reply
  8. raga

     /  April 8, 2020

    Hello,

    We are using AOVPN for about 600 devices (devices tunnel, full mode).
    RRAS servers is Windows 2019, and clients are Windows 10 1803. We have lot of stability issues. Sometimes, clients just lost connection and cannot connect anymore. Sometimes, the only known way for the client to connect is to reboot the RAS. So we have a scheled task to reboot the servers each nights. I do know if on april 2020 there is a solution for this.

    Thanks for your website 🙂

    Regards

    Reply
    • Are your RRAS VPN servers behind a load balancer?

      Reply
      • Hello, yes.

      • Then this is likely caused by the way the load balancer is configured. Commonly load balancers “proxy” these connections so the RRAS server sees the connection coming from the load balancer, not the client. This causes problems for RRAS and specifically IKEv2 because Windows/RRAS limits the number of IPsec Security Associations (SAs) from a single IP address. You’ll have to configure your load balancer to pass the client’s original IP address to the VPN server to avoid this problem.

        I have a blog post coming out on this soon. However, I’ve already updated the guidance for load balancer configuration on this site for F5, Citrix ADC (NetScaler), and Kemp.

  9. Hi Richard,
    our users have begun (in the last few days) to report an issue where they they receive this message when attempting to manually connect an native VPN connection – “There are no more files”. It appears to be a bug that causes this when a user certificate is renewed.

    This bug report is our exact issue – https://github.com/MicrosoftDocs/windowsserverdocs/issues/4258

    In his report, endreigesund suggests changing the profile option “RememberCredentials” to FALSE as a work around.

    Do you know what this option actually does in a VPN connection using PEAP-TLS? I am trying to determine whether changing this option could have adverse impact on the users.

    Many thanks for your tireless work in helping inform us about the dark arts of Always On – very much appreciated.

    Reply
    • If you are using EAP authentication with client certificates there should be no reason to enable the option to remember credentials. You should be able to safely change that option without impacting users. 🙂

      Reply
    • Ian M

       /  February 4, 2021

      We’ve had this too. For anyone searching we found the phrase ‘The error code returned on failure is 18’ on event ID 20227 on the client. It’s not a helpful error message!

      Reply
  10. Ankur

     /  January 26, 2021

    Hi Richard,

    We have been testing AOVPN User Tunnel for last 4 months and we had no connection issue and connection stayed connected more than a day and do all quick mode and main mode negotiation. We did test with 30 pilot users group and they were loving it. We rolled out to about 200 odd users few weeks ago as it was stable. lately we are started to experience the issue where AOVPN connection drop off intermittently. We use Windows Server 2019 for RRAS, SOPHOS Firewall. We haven’t made any changes and not sure why suddenly we started to experience the issues.

    Appreciate your suggestions and help.

    Thanks
    Ankur

    Reply
  11. Hi Richard,
    Have you ever came accross a Slow Windows log in since implementing AlwaysON ? we are currently testing it, and Windows gets stuck on the Welcome screen longer thatn usual on the devices that have been enrolled.

    Reply
    • Occasionally, yes. Often it has to do with the client trying to access resources that aren’t reachable over the VPN connection. This is quite common when the device tunnel is in use and you’ve restricted access to specific hosts using routes or traffic filters.

      Reply
  12. Robert Naylor

     /  May 21, 2021

    We had always on VPN working a treat on 1909 however having issues after upgrading 20H2 NCSI appears to be getting in the way

    Occasionally after resuming from sleep either the local connection or the VPN connection will show no internet access.

    Tried all combinations of usual fixes but still happening.

    Reply
    • Sorry to hear that. Does it happen for newly provisioned devices on 20H2? Or just those that have been upgraded from previous versions?

      Reply
  13. Todi Skordileva

     /  January 22, 2022

    Hi Richard,

    I hope you are doing well and thank you for all this wonderful articles that have made my life easier.

    We have just deployed AoVPN within our Organisation and we do have a few issues that I haven’t seen anywhere else in any articles.

    We are using Roaming Profile for a few users and when the DeviceTunnel for any reason does not automatically connect on the boot up, the Vpn connectors are disappearing of the users profiles and won’t show under the network connections or be visible to manually click and connect. This also happens when users for any reason get disconnected of the VPN and they have to reboot the Laptops to get it back.

    Do we have any recommendations for clients that are using Roaming Profiles and how we can prevent for the connectors to disappear when failing to connect ?

    This is not the case for users that are on Local Profiles, the connectors are visible.

    Thanks,

    Todi

    Reply
    • That’s unusual, and not something I’ve encountered myself. I’m curious though, how are you deploying Always On VPN profiles? Intune? SCCM? Something else?

      Reply
      • Todi Skordileva

         /  April 10, 2022

        Hi Richard,

        We are using SCCM to deploy the settings and both are installed per device instead, as it looks like the only bug you can deploy both Tunnels using SCCM.

        I can get around and resolve this by removing the AppData folder from re-directing but haven’t found yet the exact location of the settings that I can exclude from.

        Another issue that we have come across is that the User certificate keeps dropping and haven’t found anywhere any article related or reported for the same issue.

        I have two calls logged with Microsoft in relation to the above issues and I will comment back if a resolution is found, in case someone else experiencing same issue.

        Regards,

        Todi

      • When you say ‘user certificates keep dropping’, do you mean they are disappearing from the user’s certificate store? Or does the VPN complain that no certificate is available, but you can see it in the certificate store?

      • Todi Skordileva

         /  April 13, 2022

        Hi Richard,

        It is disappearing from the users certificate store and is complaining about the certificate, following a reboot or manually requesting it, resolves the problem.

        We deploy the user certs via GPO and auto enrolled.

        Regards,

        Todi

      • Interesting. Resolving the issue with the VPN profile disappearing for your users is easy enough. Just deploy the Always On VPN user tunnel profile in the ‘all users’ context. My PowerShell script can do this using the -AllUserConnection parameter. As for user certificates disappearing, that certainly could be related to roaming profiles, but I’m not sure.

  1. Always On VPN DNS Registration Update Available | Richard M. Hicks Consulting, Inc.
  2. Always On VPN Connection Issues After Sleep or Hibernate | Richard M. Hicks Consulting, Inc.

Leave a Reply

Discover more from Richard M. Hicks Consulting, Inc.

Subscribe now to keep reading and get access to the full archive.

Continue reading