Windows 365 Powering your Business Continuity and Disaster Recovery Plans

monk

A major topic since COVID has been BCP (Business Continuity Planning) and DR (Disaster Recovery). Recently Windows 365 released some amazing capabilities that we’re going to discuss today. We will cover:

Existing Resiliency with Windows 365 Cloud PCs

Today, Windows 365 Cloud PCs resiliency is based on a few different metrics:

  • 99.9% highly available Cloud PC user sessions (as referenced in the MSFT SLA)
    • MSFT measures downtime in minutes, the period in which all connection attempts by a user to a Cloud PC were unsuccessful, excluding any of the following types of failures:
      • Failures resulting from the Cloud PC being in an inoperable state unrelated to the underlying Azure infrastructure (e.g., damaged or corrupt operating system, operating system configuration, or misconfiguration); and
      • Failure resulting from an application or other software installed on the Cloud PC.
  • Data object resiliency for disk storage of 99.999999999% (Microsoft’s main recommendation for data resiliency is leveraging OneDrive along with OneDrive’s Best Practices that I discussed recently.)
  • Automated availability zone failover for the compute instance.
  • Recovery Point Objective (RPO) of ~0 (the time-based measurement of the maximum data that can be lost for an user as a result of the DR event)

Some of the failures that could result in an AZ failover are the failure of the virtual NIC, compute instance, storage plane instance, or compute power instance. We will see an automatic failover when one of these failures are seen. This event will require a user to log back into their Cloud PC and some potential minimal disruption.

Resiliency of the Cloud PC Management Service

The Cloud PC Management Service comes into play with the resiliency story as well. For clarity, the Cloud PC Management Service, includes the Intune admin center and the Cloud PC end user portal (windows365.microsoft.com).

The Cloud PC Management Service has redundant architecture within the region with a target uptime of 99.99%. The service has these target objectives:

  • RTO of < 6 hours.
  • RPO of <30 minutes for changes made in the management service.

The good news is if there is an outage you can still leverage the Windows App to log into their session, https://rdweb.wvd.microsoft.com/webclient/index.html, or a bookmark for their session.

Now, let’s talk about a feature that people tend to overlook called Enterprise State Roaming.

Enterprise State Roaming

One of the interesting items they recommend as part of your strategy is enterprise state roaming, which you can enable here:

The settings page for enabling enterprise state roaming in Entra

Enterprise State Roaming (ESR) is an often forgotten feature, which this graphic covers nicely below:

A diagram of enterprise state roaming

ESR provides a unified experience across a user’s devices by synchronizing data across multiple devices. The data is hosted in Azure aligning with this table. This data is retained until it becomes stale or deleted manually:

Country/region valuehas their data hosted in
An EMEA country/region such as France or ZambiaOne or more of the Azure regions within Europe
A North American country/region such as United States or CanadaOne or more of the Azure regions within the US
An APAC country/region such as Australia or New ZealandOne or more of the Azure regions within Asia
South American and Antarctica regionsOne or more Azure regions within the U

There’s no retention policy around that data, and it can be removed in a few different ways:

  • User deleted from Entra (removed within 90-180 days)
  • Directory deleted from Entra (removed within 90-180 days)
  • Admin opens an Azure support ticket to delete the data

One last note is to make sure both of these settings are not disabled in Intune for it to work properly:

  • Allow Microsoft Account Connection
  • Allow Sync My Settings

The New Cross Region Disaster Recovery in Windows 365

Windows 365 now has a new optional offer called “Cross Region Disaster Recovery”

Windows 365 cross region disaster recovery licensing page

As you can see, this new license which is $4.50 per month per user, provides a really interesting capability. Instead of the standard availability zone failover, this feature provides cross region DR, which is a good best practice for all cloud services.

Windows 365 cross region DR creates geographically distant temporary copies of Cloud PCs that can be accessed in the fallback region (the region where you will failover in the event of a disaster in your primary site).

Azure cross-region replication diagram

Once you license the feature, it’s pretty easy to onboard your environment. You will modify your user settings policy to set the fallback region:

Enabling cross region disaster recovery in the Windows 365 user settings page

Once the user license has synchronized and they have the proper user settings, you will manually onboard them via bulk device actions::

Using Intune bulk device actions to fail a Cloud PC to their DR region

Now, check out the video below where we will cover setting up the user settings policy and triggering a move of the Cloud PC to the fallback region.

After the manual activation, a temporary copy of the Cloud PC (as mentioned earlier) is created using the latest restore point in the fallback region. That means all installed apps, settings, and data move with you.

If an admin deactivates the cross region disaster recovery after the outage event, the temporary Cloud PC is deleted. No applications, settings, data, or other information is preserved from the temporary Cloud PC.

In the event of an outage your RPO and RTO are:

  • RTO of < 4 hours for tenants with less than 50,000 Cloud PCs in a region.
  • RPO of < 4 hours

Devices are restored as quickly as possible, but the amazing thing if you can target what devices you want restored first. The speed and scale of the restoration process is per region and per tenant. This strategy prioritizes certain devices but doesn’t change the overall RTO for the full environment.

One thing to note, if the fallback region doesn’t have capacity or is unhealthy, the backup Cloud PC won’t be provisioned. The data is still preserved and accessible in the fallback region regardless. You also will want to wait 12-24 hours after assigning the license to give time for replication and its readiness.

Cross Region Disaster Recovery User Experience

When cross region disaster recovery is activated, users will see this on their Cloud PC on the next login as you can see below:

After the cross region disaster recovery activation is complete, when a user signs in to their Cloud PC they receive a temporary Cloud PC. With this device, they get full user context, including:

  • Configuration
  • Data stored on the local disk
  • User-installed applications up to the RPO for the device.

As I mentioned earlier, once you deactivate the failover, the fallback device is removed. The user returns to their primary device and none of the data saved to the fallback device is kept. Data you stored in OneDrive, cloud apps, etc. will not be impacted.

For context, I did some timing exercises and it takes about 16-18 minutes for the Cloud PC from shutdown to move to the fallback region.

Another interesting note, it appears that currently you can only access the temporary Cloud PC from the Windows 365 web portal

If you do connect via the Windows App, you might see some weirdness like this, but feel free to ignore it:

Windows App showing a AVD instance that is part of the cross region DR

Managing Cloud PC Cross Region Disaster Recovery

When you have the cross region DR implemented, it’s important to be able to get a full view into your fleet. You can see in the video below, how we leverage Windows 365 reports to get insight into the status of our devices, their failover status, and much more:

Interesting note that when you trigger failover it makes you think you broke something with the red exclamation point:

Final Thoughts

In closing, we already have strong resiliency present in Windows 365 and their SLA. This new capability in Windows 365 provides peace-of-mind by allowing users to quickly fallback to a secondary region during a DR event.

We also see some amazing mindfulness by being able to prioritize the devices you want to move or move back, which helps account for VIP users. The cost isn’t too terrible at $50 a year per device to deliver your BCP strategy in Windows 365. It should be interesting to see how the capability improves and matures within Enterprises.

Facebook
Twitter
LinkedIn
Since COVID, focus has been on BCP and DR. Windows 365 now offers new capabilities like Cross Region Disaster Recovery, Enterprise State Roaming, and Cloud PC resiliency. These features ensure 99.9% uptime, RPO of ~0, and easy activation and deactivation of fallback devices. The new DR feature costs $4.50/user.

1 thought on “Windows 365 Powering your Business Continuity and Disaster Recovery Plans”

  1. Pingback: Weekly Newsletter – 13th of July to 19th of July 2024 - Windows 365 Community

Let me know what you think

Scroll to Top

Discover more from Mobile Jon's Blog

Subscribe now to keep reading and get access to the full archive.

Continue reading