Failover Clustering articles https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/bg-p/FailoverClustering Failover Clustering articles Mon, 25 Oct 2021 13:36:38 GMT FailoverClustering 2021-10-25T13:36:38Z New features of Windows Server 2022 Failover Clustering https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/new-features-of-windows-server-2022-failover-clustering/ba-p/2677427 <P data-unlink="true"><SPAN>Greetings again Windows Server and Failover Cluster fans!!&nbsp; <A href="#" target="_self">John Marlin</A> here and I own the Failover Clustering feature within the Microsoft product team.&nbsp; In this blog, I will be giving an overview of the new features in Windows Server 2022 Failover Clustering.&nbsp; Some of these will be talked about at the&nbsp;</SPAN><SPAN>upcoming&nbsp;</SPAN><A href="#" target="_self">Windows Server Summit</A><SPAN>.&nbsp; One note that I will say is that this particular blog post will not cover the new features for Azure Stack HCI version 21H2.&nbsp; That is another blog for another time.</SPAN></P> <P data-unlink="true">&nbsp;</P> <P data-unlink="true"><SPAN>So let's get this started.</SPAN></P> <P>&nbsp;</P> <CENTER><span class="lia-inline-image-display-wrapper lia-image-align-center" image-alt="WhatsNew.png" style="width: 406px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/305303i32CC917AA61E8ACD/image-dimensions/406x260?v=v2" width="406" height="260" role="button" title="WhatsNew.png" alt="WhatsNew.png" /></span></CENTER> <P>&nbsp;</P> <P><FONT size="5"><STRONG>Clustering Affinity and AntiAffinity</STRONG></FONT></P> <P>&nbsp;</P> <P data-unlink="true"><SPAN>Affinity is a rule you would set up that establishes a relationship between two or more roles (i,e, virtual machines, resource groups, and so on) to keep them together. AntiAffinity is the same but is used to try to keep the specified roles apart from each other.&nbsp;</SPAN> In Azure Stack HCI version 20H2, we added this and now brought it over to Windows Server as well.&nbsp; In previous versions of Windows Server, we only had AntiAffinity&nbsp;capabilities. This was with the use of <A href="#" target="_self">AntiAffinityClassNames</A> and <A href="#" target="_self">ClusterEnforcedAntiAffinity</A>.&nbsp; We took a look at what we were doing and made it better.&nbsp; Now, not only do we have AntiAffinity, but also Affinity.&nbsp; You can configure this new Affinity and AntiAffinity with PowerShell commands and have four options.</P> <P data-unlink="true">&nbsp;</P> <OL> <LI data-unlink="true">Same Fault Domain</LI> <LI data-unlink="true">Same Node</LI> <LI data-unlink="true">Different Fault Domain</LI> <LI data-unlink="true">Different Node</LI> </OL> <CENTER><span class="lia-inline-image-display-wrapper lia-image-align-center" image-alt="Affinity.png" style="width: 400px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/305304i35FE73E5A603902E/image-size/medium?v=v2&amp;px=400" role="button" title="Affinity.png" alt="Affinity.png" /></span></CENTER> <P>&nbsp;</P> <P data-unlink="true">The below doc discusses the feature in more detail including how to configure it.</P> <P data-unlink="true">&nbsp;</P> <P class="lia-indent-padding-left-30px" data-unlink="true"><EM>Cluster Affinity</EM></P> <P class="lia-indent-padding-left-30px" data-unlink="true"><EM><A href="#" target="_blank" rel="noopener">https://docs.microsoft.com/en-us/azure-stack/hci/manage/vm-affinity</A>&nbsp;</EM></P> <P data-unlink="true">&nbsp;</P> <P data-unlink="true">For those that still use AntiAffinityClassNames, we will still honor it.&nbsp; Which means, upgrading to Windows Server 2022</P> <P data-unlink="true">&nbsp;</P> <P data-unlink="true"><FONT size="5"><STRONG>AutoSites</STRONG></FONT></P> <P data-unlink="true">&nbsp;</P> <P data-unlink="true">AutoSites is another feature brought over from Azure Stack HCI.&nbsp; AutoSites is basically what is says.&nbsp; When you configure Failover Clustering, it will first look into Active Directory to see if Sites are configured.&nbsp; For example:</P> <P data-unlink="true">&nbsp;</P> <CENTER><span class="lia-inline-image-display-wrapper lia-image-align-center" image-alt="Sites.png" style="width: 429px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/305272i9AF3D9657F8365CC/image-size/large?v=v2&amp;px=999" role="button" title="Sites.png" alt="Sites.png" /></span></CENTER> <P>&nbsp;</P> <P data-unlink="true">If they are and the nodes are included in a site, we will automatically create site fault domains and put the nodes in the fault domain they are a member of.&nbsp; For example, if you had two nodes in a Redmond site and two nodes in a Seattle site, it would look like this once the cluster is created.</P> <P data-unlink="true">&nbsp;</P> <P data-unlink="true"><span class="lia-inline-image-display-wrapper lia-image-align-left" image-alt="Sites1.png" style="width: 400px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/305273iDBCA053801380CFF/image-size/medium?v=v2&amp;px=400" role="button" title="Sites1.png" alt="Sites1.png" /></span></P> <P data-unlink="true">&nbsp;</P> <P data-unlink="true">&nbsp;</P> <P data-unlink="true">&nbsp;</P> <P data-unlink="true">&nbsp;</P> <P data-unlink="true">&nbsp;</P> <P data-unlink="true">As you can see, we will create the site fault domain name the same as what it is in Active Directory.&nbsp;</P> <P data-unlink="true">&nbsp;</P> <P data-unlink="true">If sites are not configured within Active Directory, we will then look at the networks to see if there are differences as well as networks common to each other.&nbsp; For example, say you had the nodes with this network configuration:</P> <P data-unlink="true">&nbsp;</P> <P data-unlink="true">Node1 = 1.0.0.11 with subnet 255.0.0.0</P> <P data-unlink="true">Node2 = 1.0.0.12 with subnet 255.0.0.0</P> <P data-unlink="true">Node3 = 172.0.0.11 with subnet 255.255.0.0</P> <P data-unlink="true">Node1 = 172.0.0.12 with subnet 255.255.0.0</P> <P data-unlink="true">&nbsp;</P> <P data-unlink="true">We will see this as multiple nodes in one subnet and multiple nodes in another.&nbsp; Therefore, these nodes are in separate sites and it will configure sites for you automatically.&nbsp; With this configuration, it will create the site fault domains with the names of the networks.&nbsp; For example:</P> <P data-unlink="true">&nbsp;</P> <P data-unlink="true"><span class="lia-inline-image-display-wrapper lia-image-align-left" image-alt="sites2.png" style="width: 400px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/305274i03EA2FB0F0E96DD9/image-size/medium?v=v2&amp;px=400" role="button" title="sites2.png" alt="sites2.png" /></span></P> <P data-unlink="true">&nbsp;</P> <P data-unlink="true">&nbsp;</P> <P data-unlink="true">&nbsp;</P> <P data-unlink="true">&nbsp;</P> <P data-unlink="true">&nbsp;</P> <P data-unlink="true">This will make things easier when you want to create a stretched Failover Cluster.&nbsp; Please note that Storage Spaces Direct cannot be stretched in Windows Server 2022 as it can be in Azure Stack HCI.</P> <P data-unlink="true">&nbsp;</P> <P data-unlink="true"><STRONG><FONT size="5">Granular Repair</FONT></STRONG></P> <P data-unlink="true">&nbsp;</P> <P data-unlink="true">Since we just mentioned Storage Spaces Direct, one of the talked about features of it is repair.&nbsp; As a refresher, as data is written to drives, it is spread throughout all drives on all the nodes.</P> <P data-unlink="true">&nbsp;</P> <CENTER><span class="lia-inline-image-display-wrapper lia-image-align-center" image-alt="extents.png" style="width: 400px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/305306iE2CF2D06D43855F1/image-size/medium?v=v2&amp;px=400" role="button" title="extents.png" alt="extents.png" /></span></CENTER> <P>&nbsp;</P> <P data-unlink="true">When a node goes down for maintenance, crashes, or whatever the case may be, once it comes back up, there is a "repair" job run where data is moved around and onto the drives, if necessary, of the node that came back.&nbsp; A repair is basically a resync of the data between all the nodes.&nbsp; Depending on the amount of time the node was down, the longer it could take for the repair to complete.&nbsp; A repair in previous versions would take the extent (block of data) that is normally 1 gigabyte or 256 megabyte in size and resync it in its entirety.&nbsp; It did not matter how much of the extent was changed (for example 1 kilobyte), the entire extent is copied.</P> <P data-unlink="true">&nbsp;</P> <P data-unlink="true">In Windows Server 2022, we have changed this thinking and now work off of "sub-extents".&nbsp; A sub-extent is only a portion of the entire extent.&nbsp; This is normally set at the interleave setting which is 256 kilobytes.&nbsp; Now, when 1 kilobyte of a 1 gigabyte extent is changed, we will only move around the 256 kilobyte sub-extent.&nbsp; This will make repair times much faster and quicker to complete.</P> <P data-unlink="true">&nbsp;</P> <P data-unlink="true">One other thing we considered was, when a repair/resync occurs, it can affect production due to the CPU resources it must use.&nbsp; To combat that, we also added the capability to throttle the resources up or down, depending on when it may be done.&nbsp; For example, if you need a repair/resync to run during production hours, you need to keep performance of your production needs to remain up.&nbsp; Therefore, you may want to set it on low so it more runs in the background.&nbsp; However, if you were to do it overnight on a weekend, you can afford to crank it up to a higher setting so it completes faster.</P> <P data-unlink="true">&nbsp;</P> <P>The storage speed repair settings are:</P> <P>&nbsp;</P> <DIV class="table-scroll-wrapper has-inner-focus" tabindex="0" role="group" aria-label="Horizontally scrollable data"> <TABLE class="table" style="border-style: hidden;"> <THEAD> <TR> <TH width="133px"><STRONG>Setting</STRONG></TH> <TH width="80px"><STRONG>Queue depth</STRONG></TH> <TH width="270px"><STRONG>Resource allocation</STRONG></TH> </TR> </THEAD> <TBODY> <TR> <TD width="133px"> <P>Very low</P> </TD> <TD width="80px" class="lia-align-center">1</TD> <TD width="270px">Most resources to active workloads</TD> </TR> <TR> <TD width="133px">Low</TD> <TD width="80px" class="lia-align-center">2</TD> <TD width="270px">More resources to active workloads</TD> </TR> <TR> <TD width="133px">Medium (default)</TD> <TD width="80px" class="lia-align-center">4</TD> <TD width="270px">Balances workloads and repairs</TD> </TR> <TR> <TD width="133px">High</TD> <TD width="80px" class="lia-align-center">8</TD> <TD width="270px">More resources to resync and repairs</TD> </TR> <TR> <TD width="133px">Very high</TD> <TD width="80px" class="lia-align-center">16</TD> <TD width="270px">Most resources to resync and repairs</TD> </TR> </TBODY> </TABLE> <P>&nbsp;</P> <P>For more information regarding resync speeds, please refer to the below article:</P> </DIV> <P data-unlink="true">&nbsp;</P> <P data-unlink="true"><EM>Adjustable storage repair speed in Azure Stack HCI and Windows Server</EM></P> <P data-unlink="true"><EM><A href="#" target="_blank" rel="noopener">https://docs.microsoft.com/en-us/azure-stack/hci/manage/storage-repair-speed</A></EM></P> <P data-unlink="true">&nbsp;</P> <P data-unlink="true"><FONT size="5"><STRONG>Cluster Shared Volumes and Bitlocker</STRONG></FONT></P> <P data-unlink="true">&nbsp;</P> <P data-unlink="true">Cluster Shared Volumes (CSV) enable multiple nodes in a Windows Server Failover Cluster or Azure Stack HCI to simultaneously have read-write access to the same LUN (disk) that is provisioned as an NTFS volume.&nbsp;&nbsp;BitLocker Drive Encryption is a data protection feature that integrates with the operating system and addresses the threats of data theft or exposure from lost, stolen, or inappropriately decommissioned computers.&nbsp;&nbsp;</P> <P data-unlink="true">&nbsp;</P> <P data-unlink="true">BitLocker on volumes within a cluster are managed based on how the cluster service "views" the volume to be protected.&nbsp; BitLocker will unlock protected volumes without user intervention by attempting protectors in the following order:</P> <P data-unlink="true">&nbsp;</P> <OL> <LI>Clear Key</LI> <LI>Driver-based auto-unlock key</LI> <LI>ADAccountOrGroup protector <UL> <LI>Service context protector</LI> <LI>User protector</LI> </UL> </LI> <LI>Registry-based auto-unlock key</LI> </OL> <P>&nbsp;</P> <P>Failover Cluster requires the Active Directory-based protector option (#3 above) for a cluster disk resource or CSV resources.&nbsp; The encryption protector is a SID-based protector where the account being used is Cluster Name Object (CNO) that is created in Active Directory.&nbsp; Because it is Active Directory-based, a domain controller must be available in order to obtain the key protector to mount the drive.&nbsp; If a domain controller is not available or slow in responding, the clustered drive is not going to mount.</P> <P>&nbsp;</P> <P>With this thinking, we needed to have a "backup" plan.&nbsp; With Windows Server 2022, when a drive is enabled for Bitlocker encryption while it is a part of Failover Cluster, we will now create an additional key protector just for cluster itself.&nbsp; By doing this, it will still go out to a domain controller first to get the key.&nbsp; If the domain controller is not available, it will then use the locally kept additional key to mount the drive.&nbsp; The default will always be to go to the domain controller first.&nbsp; We have also built in the ability to manually mount a cluster drive using new PowerShell cmdlets and passing the locally kept recovery key.</P> <P>&nbsp;</P> <P>Another thing about this is that it now opens up the ability to Bitlocker drives that are a part of a workgroup or cross-domain cluster where a Cluster Name Object does not exist.</P> <P>&nbsp;</P> <P><STRONG><FONT size="5">SMB Encryption</FONT></STRONG></P> <P>&nbsp;</P> <P>Windows Server 2022 SMB Direct now supports encryption.&nbsp; Previously, enabling SMB encryption disabled direct data placement, making RDMA performance as slow as TCP.&nbsp; Now data is encrypted before placement, leading to relatively minor performance degradation while adding AES-128 and AES-256 protected packet privacy.&nbsp; You can enable encryption using <A href="#" target="_self">Windows Admin Center</A>, <A href="#" target="_self">Set-SmbServerConfiguration</A>, or a Universal Naming Convention (UNC) Hardening group policy.&nbsp; Furthermore, Windows Server Failover Clusters now support granular control of encrypting intra-node storage communications for Cluster Shared Volumes (CSV) and the storage bus layer (SBL). This means that when using Storage Spaces Direct and SMB Direct, you can decide to encrypt the east-west communications within the cluster itself for higher security.</P> <P>&nbsp;</P> <P><STRONG><FONT size="5">New Cluster Resource Types</FONT></STRONG></P> <P>&nbsp;</P> <P><SPAN>Cluster&nbsp;</SPAN><A href="#" target="_blank" rel="noopener" data-linktype="relative-path">resources</A><SPAN>&nbsp;are categorized by type. Failover Clustering defines several types of resources and provides&nbsp;</SPAN><A href="#" target="_blank" rel="noopener" data-linktype="relative-path">resource DLLs</A><SPAN>&nbsp;to manage these types.&nbsp; In Windows Server 2022, we have added three new resource types.</SPAN></P> <P>&nbsp;</P> <P><STRONG>HCS Virtual Machine</STRONG></P> <P>&nbsp;</P> <P><SPAN>Building a great management API for Docker was important for Windows Server Containers. There's a ton of really cool low-level technical work that went into enabling containers on Windows, and we needed to make sure they were easy to use. This seems very simple, but figuring out the right approach was surprisingly tricky. Our first thought was to extend our existing management technologies (e.g. WMI, PowerShell) to containers. After investigating, we concluded that they weren’t optimal for Docker, and started looking at other options.</SPAN></P> <P>&nbsp;</P> <P><SPAN>After a bit of thinking, we decided to go with a third option. We created a new management service called the Host Compute Service (HCS), which acts as a layer of abstraction above the low level functionality. The HCS was a stable API Docker could build upon, and it was also easier to use. Making a Windows Server Container with the HCS is just a single API call. Making a Hyper-V Container instead just means adding a flag when calling into the API.</SPAN></P> <P>&nbsp;</P> <P><SPAN>Looking at the architecture in Linux:</SPAN></P> <P>&nbsp;</P> <CENTER><span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="HCS-Linux-Arch.png" style="width: 400px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/305435i1847A698628CA94C/image-size/medium?v=v2&amp;px=400" role="button" title="HCS-Linux-Arch.png" alt="HCS-Linux-Arch.png" /></span></CENTER> <P>&nbsp;</P> <P>Looking at the architecture in Windows:</P> <P>&nbsp;</P> <CENTER><span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="HCS-Windows-Arch.png" style="width: 400px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/305429iDE36DE68F80BC472/image-size/medium?v=v2&amp;px=400" role="button" title="HCS-Windows-Arch.png" alt="HCS-Windows-Arch.png" /></span></CENTER> <P>&nbsp;</P> <P><SPAN>HCS Virtual Machine lets you create a virtual machine using the HCS APIs rather than the Virtual Machine Management Service (VMMS). </SPAN></P> <P>&nbsp;</P> <P><STRONG>NFS Multi Server Namespace</STRONG></P> <P>&nbsp;</P> <P>If you are not familiar with an NFS Multi Server Namespace, think of a tree with several branches.&nbsp; An NFS Multi Server Namespace allows for the single namespace to extend out to multiple servers by the use of a referral.&nbsp; With this referral, you can integrate data from multiple NFS Servers into a single namespace.&nbsp; NFS Clients would connect to this namespace and be referred to a selected NFS Server from one of its branches.</P> <P>&nbsp;</P> <P><STRONG><FONT size="5">Storage Bus Cache</FONT></STRONG></P> <P>&nbsp;</P> <P>In all things transparent, this one is a little bit of a reach, but Failover Clustering is needed (sort of).</P> <P>&nbsp;</P> <P>The storage bus cache for Storage Spaces on standalone servers can significantly improve read and write performance, while maintaining storage efficiency and keeping the operational costs low. Similar to its implementation for Storage Spaces Direct, this feature binds together faster media (for example, SSD) with slower media (for example, HDD) to create tiers. By default, only a portion of the faster media tier is reserved for the cache.</P> <P>&nbsp;</P> <P>What makes this the bit of a reach is that in order to use storage bus cache, the Failover Clustering feature must be installed but the machine cannot be a member of a Cluster.&nbsp; I.E. Add the feature and move on.</P> <P>&nbsp;</P> <P>More information on Storage Bus Cache can be found here:</P> <P>&nbsp;</P> <P>Tutorial: Enable storage bus cache with Storage Spaces on standalone servers</P> <P><A href="#" target="_blank" rel="noopener">https://docs.microsoft.com/en-us/windows-server/storage/storage-spaces/storage-spaces-storage-bus-cache</A></P> <P>&nbsp;</P> <P>Thanks</P> <P>John Marlin</P> <P>Senior Program Manager</P> <P>Twitter:&nbsp;@Johnmarlin_MSFT</P> <P>&nbsp;</P> Wed, 01 Sep 2021 20:02:49 GMT https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/new-features-of-windows-server-2022-failover-clustering/ba-p/2677427 John Marlin 2021-09-01T20:02:49Z Failover Clustering in Azure https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/failover-clustering-in-azure/ba-p/2554341 <P>Azure is a cloud computing platform with an ever-expanding set of services to help you build solutions to meet your business goals. Azure services range from simple web services for hosting your business presence in the cloud to running fully virtualized computers for you to run your custom software solutions.&nbsp; With over 60 regions globally, 200+ products, and over 17,000 services and applications, Azure has everything you need in a cloud.</P> <P>&nbsp;</P> <P>One of the products that can server as the compute infrastructure for our service or application is Failover Clustering.&nbsp; Failover Clustering can be a traditional cluster or it can be running Storage Spaces Direct.&nbsp; No matter the choice, there are a few configuration changes that must be made post cluster creation to ensure connectivity can be made.&nbsp; Starting in Windows Server 2019, and moving forward, we have added detection into the cluster creation process that will automatically do some of this configuration for you.</P> <P>&nbsp;</P> <P>Let's first talk about the Cluster Network Name.&nbsp;&nbsp;The Cluster Network Name is used to provide an alternate computer name for an entity that exists on a network. When it is created, it will also create a Cluster IP Address resource that provides an identity to the group, allowing the group to be accessed by network clients.&nbsp; When in Azure, an additional Azure Load Balancer must be created with separate a IP Address so that it can be reached.&nbsp;&nbsp;</P> <P>&nbsp;</P> <P>In Windows Server 2019, and moving forward, we have added detection during the cluster creation process to look to see if it is being created in Azure.&nbsp; A new parameter has been added to Clustering to help you determine what we have detected.&nbsp; To view it and see the output, the command to run would be:</P> <P>&nbsp;</P> <P class="lia-indent-padding-left-30px"><FONT color="#0000FF">Get-Cluster | fl DetectedCloudPlatform</FONT></P> <P class="lia-indent-padding-left-30px">&nbsp;</P> <P class="lia-indent-padding-left-30px"><FONT color="#0000FF">DetectedCloudPlatform&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; : Azure</FONT></P> <P class="lia-indent-padding-left-30px">&nbsp;</P> <P>As a side note, if it detects it is on-premises or any other cloud provider, the response will be <FONT color="#0000FF">None.</FONT></P> <P>&nbsp;</P> <P>If so, there are several configurations it will add and the first is with the Cluster Name.&nbsp; Instead of the traditional Cluster Name and Cluster IP Address, it will now create the Cluster Name as a distributed network name (DNN) automatically.&nbsp; If you have worked with Scale Out File Servers (SOFS), it is the same type distributed name.&nbsp;&nbsp;A Distributed Network Name is a name in the Cluster that does not use a clustered IP Address.&nbsp; It is a name that is published in DNS using the IP Addresses of all the nodes in the Cluster.&nbsp; Since it uses the IP Addresses of the nodes, a load balancer is not needed.&nbsp; So it would look like this from Failover Cluster Manager.</P> <P>&nbsp;</P> <P><span class="lia-inline-image-display-wrapper lia-image-align-center" image-alt="CNO-DNN.png" style="width: 576px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/296255iA18E168D8FB38CBF/image-size/large?v=v2&amp;px=999" role="button" title="CNO-DNN.png" alt="CNO-DNN.png" /></span></P> <P>&nbsp;</P> <P>As a side note, the automatic creation of the name as a DNN is only when the machines are in Azure.&nbsp; However, we have added the ability to create it as a DNN on-premises if it is so desired.&nbsp; When creating the Cluster using Failover Cluster Manager or Windows Admin Center on-premises, it will create it with the name and IP Address.&nbsp; However, using PowerShell, you have a new switch&nbsp;<FONT color="#0000FF"><STRONG>–ManagementPointNetworkType</STRONG></FONT> that can be used with <FONT color="#0000FF"><A href="#" target="_self"><STRONG>New-Cluster</STRONG></A></FONT> that will create it as a DNN.&nbsp;&nbsp;<FONT color="#0000FF"><STRONG>–ManagementPointNetworkType</STRONG></FONT> has several parameters to define the type of name it will be.</P> <P>&nbsp;</P> <P class="lia-indent-padding-left-30px"><FONT color="#0000FF">New-Cluster -ManagementPointNetworkType:<EM>x</EM></FONT></P> <P>&nbsp;</P> <P class="lia-indent-padding-left-60px"><FONT color="#0000FF">Singleton : <EM>Traditional Cluster Name and Cluster IP Address</EM></FONT><BR /><FONT color="#0000FF">Distributed : <EM>Create as&nbsp;DNN and use node IP Addresses</EM></FONT><BR /><FONT color="#0000FF">Automatic :&nbsp;<EM>Detect if on-premises or Azure (default)</EM></FONT></P> <P class="lia-indent-padding-left-60px">&nbsp;</P> <P>Moving on, one of the next things we will change is the network communication thresholds.&nbsp; Communication between nodes is crucial in keeping them up and talking to ensure high availability.&nbsp; As a refresher, you have several settings that control the length of wait times and number of failures before we determine a node to be down and it is removed from cluster membership.&nbsp; As a refresher, these are those settings.</P> <P>&nbsp;</P> <TABLE class="lia-align-center" style="height: 258px; border-style: solid; width: 80%;" border="1" width="80%"> <TBODY> <TR> <TD width="33.333333333333336%" height="84px" class="lia-align-left"><STRONG>Parameter</STRONG></TD> <TD width="33.333333333333336%" height="84px"> <P class="lia-align-center"><STRONG>Windows 2019 / Azure Stack HCI</STRONG></P> <P class="lia-align-center"><STRONG>Default</STRONG></P> </TD> <TD width="33.333333333333336%" height="84px"><STRONG>Maximum</STRONG></TD> </TR> <TR> <TD width="33.333333333333336%" height="29px" class="lia-align-left">SameSubnetDelay</TD> <TD width="33.333333333333336%" height="29px" class="lia-align-center">1 second</TD> <TD width="33.333333333333336%" height="29px" class="lia-align-center">2 seconds</TD> </TR> <TR> <TD width="33.333333333333336%" height="29px" class="lia-align-left">SameSubnetThreshold</TD> <TD width="33.333333333333336%" height="29px" class="lia-align-center">10 heartbeats</TD> <TD width="33.333333333333336%" height="29px" class="lia-align-center">120 heartbeats</TD> </TR> <TR> <TD width="33.333333333333336%" height="29px" class="lia-align-left">CrossSubnetDelay</TD> <TD width="33.333333333333336%" height="29px" class="lia-align-center">1 second</TD> <TD width="33.333333333333336%" height="29px" class="lia-align-center">4 seconds</TD> </TR> <TR> <TD width="33.333333333333336%" height="29px" class="lia-align-left">CrossSubnetThreshold</TD> <TD width="33.333333333333336%" height="29px" class="lia-align-center">20 heartbeats</TD> <TD width="33.333333333333336%" height="29px" class="lia-align-center">&nbsp;120 heartbeats</TD> </TR> <TR> <TD width="33.333333333333336%" height="29px" class="lia-align-left">CrossSiteDelay</TD> <TD width="33.333333333333336%" height="29px" class="lia-align-center">1 second</TD> <TD width="33.333333333333336%" height="29px" class="lia-align-center">4 seconds</TD> </TR> <TR> <TD width="33.333333333333336%" height="29px" class="lia-align-left">CrossSiteThreshold</TD> <TD width="33.333333333333336%" height="29px" class="lia-align-center">20 heartbeats</TD> <TD width="33.333333333333336%" height="29px" class="lia-align-center">&nbsp;120 heartbeats</TD> </TR> </TBODY> </TABLE> <P>&nbsp;</P> <P><SPAN>It is important to understand that both the delay and threshold have a cumulative effect on the total health detection.&nbsp; For example setting <STRONG>SameSubnetDelay</STRONG> to send a heartbeat every 1 seconds and setting the<STRONG> SameSubnetThreshold</STRONG> to 10 heartbeats missed before taking recovery, means that the cluster can have a total network tolerance of 10 seconds before recovery action is taken.&nbsp; The higher the numbers, the longer it will take to detect a node is not responding.&nbsp; In general, continuing to send frequent heartbeats but having greater thresholds is the preferred method.&nbsp; The primary scenario for increasing the Delay, is if there are ingress / egress charges for data sent&nbsp;between nodes.&nbsp; When we have detected that the cluster in in Azure, we will auto increase the thresholds to their maximum values.</SPAN></P> <P>&nbsp;</P> <P><SPAN>Please refer to the <A href="https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/tuning-failover-cluster-network-thresholds/ba-p/371834" target="_self">Tuning Failover Cluster Network Thresholds</A> blog to change these values.</SPAN></P> <P>&nbsp;</P> <P>The last thing I want to talk about is Azure host maintenance.&nbsp; Maintenance on an compute host is something you cannot get around as patches, driver/firmware updates, etc need to be done periodically.&nbsp; Same goes for those hosts in Azure or any other cloud provider.&nbsp; So what to do with the virtual machines running on those hosts is something that needs to be considered by the Azure administrators.&nbsp; There is basically only a couple of things that they can do which is leave the VMs where they are or move them off.&nbsp; The decision to move or stay can simply come down to how long is it going to take and does it need a reboot.&nbsp; No matter how quick it may take to apply, if a reboot is needed, the VMs are going to move off.&nbsp; However, if whatever maintenance being done doesn't need a reboot an is quick, simply freezing the virtual machine is done.</P> <P>&nbsp;</P> <P>As a client, you very well never know anything ever happened and that is the goal.&nbsp; But there could be times when you notice it as you cannot connect, you are hung, a cluster node drops out of membership, etc.&nbsp; From a client perspective there is not a way of knowing what had happened.&nbsp; You must trust that the administrators have no issues and they make the right decisions.</P> <P>&nbsp;</P> <P>But what if you as an administrator received a heads up of impending host maintenance and you could make the decision.&nbsp; Well, that leads to the other new feature we added.&nbsp; With Windows Server 2019, we added integration and awareness of Azure Host Maintenance and improved experience by monitoring for Azure Scheduled Events.&nbsp; For this to fully be done, all clustered VMs must be in the same Azure Availability Zone..&nbsp; When a host has maintenance scheduled, we will now detect it and throw an event into the virtual machine's FailoverClustering/Operational channel.&nbsp; We have also included actions that you can configure based on the event.</P> <P>&nbsp;</P> <P>First, let's talk about the events you could see.&nbsp; This is an example of one of those events.</P> <P>&nbsp;</P> <P class="lia-indent-padding-left-30px">Log: FailoverClustering/Operational<BR />Level: Warning<BR />Event ID: 1139<BR />symbol="NODE_MAINTENANCE_DETECTED”<BR />Description: The cluster service has detected an Azure host maintenance event has been scheduled. This maintenance event may cause the node hosting the virtual machine to become unavailable during this time.</P> <P class="lia-indent-padding-left-30px">&nbsp;</P> <P class="lia-indent-padding-left-30px">Node: VMNode1<BR />Approximate Time: 2021/07/16-17:30:00.000<BR />Details: ' EventId = 4FE57A76-7754-48FD-9B45-48387A36CD19 <BR />EventStatus = Scheduled Event<BR />Type = Freeze Resource<BR />Type = VirtualMachine</P> <P>&nbsp;</P> <P>As you can see, this event triggered as a host maintenance event has been scheduled.&nbsp; It provides several other things of interest.</P> <P>&nbsp;</P> <P class="lia-indent-padding-left-30px">1. The time the event is to occur</P> <P class="lia-indent-padding-left-30px">2. What the event will be someone from the Azure Team could look up if a support ticket were raised</P> <P class="lia-indent-padding-left-30px">3. What it will do with the virtual machine</P> <P>&nbsp;</P> <P>There are actually 3 events you could see.</P> <P>&nbsp;</P> <P class="lia-indent-padding-left-30px">Event ID 1136:&nbsp; Host maintenance is imminent and about to occur</P> <P class="lia-indent-padding-left-30px">Event ID 1139:&nbsp; Host maintenance has been detected</P> <P class="lia-indent-padding-left-30px">Event ID 1140:&nbsp; Host maintenance has been rescheduled</P> <P>&nbsp;</P> <P>Now that you have the events, the next thing is to decide if you want to define an action.&nbsp; We have created two new cluster properties of&nbsp;<STRONG>DetectManagedEvents</STRONG> and&nbsp;<STRONG>DetectManagedEventsThreshold</STRONG>.&nbsp; <STRONG>DetectManagedEvents</STRONG> is for the action you wish to have occur when it detects an event is scheduled.&nbsp; <STRONG>DetectManagedEventsThreshold</STRONG>&nbsp; The options for each of these are as follows:</P> <P>&nbsp;</P> <P class="lia-indent-padding-left-30px">DetectManagedEvents</P> <P class="lia-indent-padding-left-60px">0 = Do not Log Azure Scheduled Events <FONT color="#FF0000"><EM>&lt;-- default for on-premises</EM></FONT><BR />1 = Log Azure Scheduled Events <FONT color="#FF0000"><EM>&lt;-- default in Azure</EM></FONT><BR />2 = Avoid Placement (don’t move roles to this node)<BR />3 = Pause and drain when Scheduled Event is detected<BR />4 = Pause, drain, and failback when Scheduled Event is detected</P> <P class="lia-indent-padding-left-30px">&nbsp;</P> <P class="lia-indent-padding-left-30px">DetectManagedEventsThreshold</P> <P class="lia-indent-padding-left-60px">60 seconds <EM><FONT color="#FF0000">&lt;-- default</FONT></EM><BR />Amount of time before taking action</P> <P>&nbsp;</P> <P><STRONG>Note:</STRONG> <EM>These settings only apply when the virtual machine is in Azure.&nbsp; It does not take effect on any other platform (I.E. a third party cloud provider, Hyper-V, Azure Stack HUB/HCI/Edge, etc).</EM></P> <P>&nbsp;</P> <P>In closing, we recognized that there are some configurations needed when a Failover Cluster is in Azure.&nbsp; By adding these new features, we have taken some of the burden away from you as an administrator and automatically making these changes for you.</P> <P>&nbsp;</P> <P>Thanks</P> <P>John Marlin</P> <P>Senior Program Manager</P> <P>Twitter:&nbsp;@Johnmarlin_MSFT</P> <P class="lia-indent-padding-left-60px">&nbsp;</P> Fri, 16 Jul 2021 18:38:13 GMT https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/failover-clustering-in-azure/ba-p/2554341 John Marlin 2021-07-16T18:38:13Z Security Settings for Failover Clustering https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/security-settings-for-failover-clustering/ba-p/2544690 <P>&nbsp;</P> <P>Security is at the forefront of many administrator's minds and with Failover Clustering, we did some improvements with Windows Server 2019 and Azure Stack HCI with regards to security.</P> <P>&nbsp;</P> <P>Since the beginning of time, Failover Clustering has always had a dependency on NTLM authentication.&nbsp; As the versions came and went, a little more of this dependency was removed.&nbsp; Now, with Windows Server 2019 Failover Clustering, we have finally removed all of these dependencies.&nbsp;&nbsp;<SPAN>Instead Kerberos and certificate-based authentication is used exclusively. There are no changes required by the user, or deployment tools, to take advantage of this security enhancement. It also allows failover clusters to be deployed in environments where NTLM has been disabled.</SPAN></P> <P>&nbsp;</P> <P>This goes for the bootstrapping of the cluster to the starting of the resources and drives.&nbsp; With the bootstrapping process, the need of an Active Directory domain controller is also no longer needed.&nbsp; As explained in this <A href="https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/so-what-exactly-is-the-cliusr-account/ba-p/388832" target="_self">blog</A>, we have a local user account (CLIUSR) that is used for various things now.&nbsp; In conjunction of this account as well as the use of certificates:</P> <P>&nbsp;</P> <OL> <LI>Cluster Service starts and forms the cluster</LI> <LI>Other nodes will join the cluster</LI> <LI>Drives (including Cluster Shared Volumes) will come online</LI> <LI>Groups and resources start coming online.</LI> </OL> <P>This is especially beneficial if you have a domain controller is virtualized running on the cluster, preventing the "chicken or the egg" scenario.</P> <P>&nbsp;</P> <P>Another security concern that administrators have is what is out on the wire.&nbsp; There are a couple of security settings to consider with regards to communications between the nodes and and storage.&nbsp; From a storage perspective, there is Cluster Shared Volume (CSV) traffic for any redirected data and Storage Bus Layer (SBL) traffic, if using Storage Spaces Direct.</P> <P>&nbsp;</P> <P>Let's first talk about cluster communications.&nbsp; Cluster communications could contain any number of things and what an admin would like is to prevent anything from picking it up on the network.&nbsp; As a default, all communication between the nodes are sent signed, making the use of certificates.&nbsp; This may be fine when all the cluster nodes reside in the same rack.&nbsp; However, when nodes are separated in different racks or locations, an admin may wish to have a little more security and make use of encryption.&nbsp;</P> <P>&nbsp;</P> <P><span class="lia-inline-image-display-wrapper lia-image-align-center" image-alt="SecurityLevel.png" style="width: 400px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/295612iBA79992D0D58CF4F/image-size/medium?v=v2&amp;px=400" role="button" title="SecurityLevel.png" alt="SecurityLevel.png" /></span></P> <P>&nbsp;</P> <P>This setting is controlled by the Cluster property <STRONG>SecurityLevel</STRONG> and has three different levels.</P> <P>&nbsp;</P> <P class="lia-indent-padding-left-30px">1 = Clear Text</P> <P class="lia-indent-padding-left-30px">2 = Signed <EM>(default)</EM></P> <P class="lia-indent-padding-left-30px">3 = Encrypted</P> <P>&nbsp;</P> <P>If the desire is to change this to encrypted communications, the command to run would be:</P> <P>&nbsp;</P> <P class="lia-indent-padding-left-30px"><FONT color="#0000FF">(Get-Cluster).SecurityLevel = 3</FONT></P> <P>&nbsp;</P> <P>The other bit of communication between the nodes would be with the storage.&nbsp; Both Cluster Shared Volumes (CSV) has traffic on the wire and if using Storage Spaces Direct, you have the Storage Bus Layer (SBL) traffic.&nbsp; For these bits of traffic, the default is to send everything in clear text.&nbsp; Admins may decide they wish to secure this type of data traffic to lock it down and prevent sniffer traces from picking anything up.</P> <P>&nbsp;</P> <P><span class="lia-inline-image-display-wrapper lia-image-align-center" image-alt="SecurityLevelToStorage.png" style="width: 400px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/295613i9E6EC118B3979478/image-size/medium?v=v2&amp;px=400" role="button" title="SecurityLevelToStorage.png" alt="SecurityLevelToStorage.png" /></span></P> <P>&nbsp;</P> <P>This setting is controlled by the Cluster property <STRONG>SecurityLevelToStorage</STRONG> and has three different levels.</P> <P>&nbsp;</P> <P class="lia-indent-padding-left-30px">1 = Clear Text <EM>(default)</EM></P> <P class="lia-indent-padding-left-30px">2 = Both CSV and SBL traffic are signed</P> <P class="lia-indent-padding-left-30px">3 = Both CSV and SBL traffic are encrypted</P> <P>&nbsp;</P> <P>If the desire is to change this to encrypted communications, the command to run would be:</P> <P>&nbsp;</P> <P class="lia-indent-padding-left-30px"><FONT color="#0000FF">(Get-Cluster).SecurityLevelToStorage = 3</FONT></P> <P class="lia-indent-padding-left-30px">&nbsp;</P> <P><FONT color="#000000">One caveat to the <STRONG>SecurityLevel</STRONG> and <STRONG>SecurityLevelToStorage</STRONG> that must be taken into consideration.&nbsp; These forms of communication are using SMB.&nbsp; When using a form of encryption on the network with SMB, RDMA is not used.&nbsp; Therefore, if you are using this on RDMA network cards, RDMA is not used and can cause a performance impact.&nbsp; Microsoft is aware of this impact and working on correcting this for a later version.&nbsp; For more information on this, please refer to the following document.</FONT></P> <P>&nbsp;</P> <P><FONT color="#000000">Reduced networking performance after you enable SMB Encryption or SMB Signing in Windows Server 2016</FONT></P> <P><FONT color="#000000"><A href="#" target="_blank">Reduced performance after SMB Encryption or SMB Signing is enabled - Windows Server | Microsoft Docs</A></FONT></P> <P>&nbsp;</P> <P><FONT color="#000000">Thanks</FONT></P> <P><FONT color="#000000">John Marlin</FONT></P> <P><FONT color="#000000">Senior Program Manager</FONT></P> <P><FONT color="#000000">Twitter:&nbsp;@Johnmarlin_MSFT</FONT></P> Wed, 14 Jul 2021 21:47:46 GMT https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/security-settings-for-failover-clustering/ba-p/2544690 John Marlin 2021-07-14T21:47:46Z Failover Clustering Networking Basics and Fundamentals https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/failover-clustering-networking-basics-and-fundamentals/ba-p/1706005 <P>My name is <A href="#" target="_blank" rel="noopener">John Marlin</A> and I am with the High Availability and Storage Team.&nbsp; With newer versions of Windows Server and Azure Stack HCI on the horizon, it’s time to head to the archives and dust off some old information as they are in need of updating.</P> <P>&nbsp;</P> <P><span class="lia-inline-image-display-wrapper lia-image-align-center" image-alt="man-blowing-off-dust.jpg" style="width: 400px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/221677i2293B86CC39CDBA6/image-size/medium?v=v2&amp;px=400" role="button" title="man-blowing-off-dust.jpg" alt="man-blowing-off-dust.jpg" /></span></P> <P>&nbsp;</P> <P>In this blog, I want to talk about Failover Clustering and Networking. Networking is a fundamental key with Failover Clustering that sometimes is overlooked but can be the difference in success or failure. In this blog, I will be hitting on all facets from the basics, tweaks, multi-site/stretch, and Storage Spaces Direct.&nbsp; By no means should this be taken as a “this is a networking requirement” blog.&nbsp; Treat this as more of general guidance with some recommendations and things to consider.&nbsp; Specific requirements for any of our operating systems (new or old) will be a part of the documentation (<A href="#" target="_blank" rel="noopener">https://docs.microsoft.com</A>) of the particular OS.</P> <P>&nbsp;</P> <P>In Failover Clustering, all networking aspects are provided by our Network Fault Tolerant (NetFT) adapter. Our NetFT adapter is a virtual adapter that is created with the Cluster is created. There is no configuration necessary as it is self-configuring. When it is created, it will create its MAC Address based off of a hash of the MAC Address of the first physical network card. It does have conflict detection and resolution built in. For the IP Address scheme, it will create itself an APIPA IPv4 (169.254.*) and IPv6 (fe80::*) address for communication.</P> <P>&nbsp;</P> <P class="lia-indent-padding-left-30px"><FONT size="2">Connection-specific DNS Suffix&nbsp; . :</FONT></P> <P class="lia-indent-padding-left-30px"><FONT size="2">Description . . . . . . . . . . . : Microsoft Failover Cluster Virtual Adapter</FONT></P> <P class="lia-indent-padding-left-30px"><FONT size="2">Physical Address. . . . . . . . . : 02-B8-FA-7F-A5-F3</FONT></P> <P class="lia-indent-padding-left-30px"><FONT size="2">DHCP Enabled. . . . . . . . . . . : No</FONT></P> <P class="lia-indent-padding-left-30px"><FONT size="2">Autoconfiguration Enabled . . . . : Yes</FONT></P> <P class="lia-indent-padding-left-30px"><FONT size="2">Link-local IPv6 Address . . . . . : fe80::80ac:e638:2e8d:9c09%4(Preferred)</FONT></P> <P class="lia-indent-padding-left-30px"><FONT size="2">IPv4 Address. . . . . . . . . . . : 169.254.1.143(Preferred)</FONT></P> <P class="lia-indent-padding-left-30px"><FONT size="2">Subnet Mask . . . . . . . . . . . : 255.255.0.0</FONT></P> <P class="lia-indent-padding-left-30px"><FONT size="2">Default Gateway . . . . . . . . . :</FONT></P> <P class="lia-indent-padding-left-30px"><FONT size="2">DHCPv6 IAID . . . . . . . . . . . : 67287290</FONT></P> <P class="lia-indent-padding-left-30px"><FONT size="2">DHCPv6 Client DUID. . . . . . . . : 00-01-00-01-26-6B-52-A5-00-15-5D-31-8E-86</FONT></P> <P class="lia-indent-padding-left-30px"><FONT size="2">NetBIOS over Tcpip. . . . . . . . : Enabled</FONT></P> <P>&nbsp;</P> <P>The NetFT adapter provides the communications between all nodes in the cluster from the Cluster Service. To do this, it discovers multiple communication paths between nodes and if the routes are on the same subnet or cross subnet. The way it does this is through “heartbeats” through all network adapters for Cluster use to all other nodes. Heartbeats basically serve multiple purposes.</P> <P>&nbsp;</P> <OL> <LI>Is this a viable route between the nodes?</LI> <LI>Is this route currently up?</LI> <LI>Is the node being connected to up?</LI> </OL> <P>&nbsp;</P> <P>There is more to heartbeats, but will defer to my other blog <A href="https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/no-such-thing-as-a-heartbeat-network/ba-p/388121" target="_blank" rel="noopener">No Such Thing as a Heartbeat Network</A> for more details on it.</P> <P>&nbsp;</P> <P>For Cluster communication and heartbeats, there are several considerations that must be taken into account.</P> <P>&nbsp;</P> <OL> <LI>Traffic uses port 3343. Ensure any firewall rules have this port open for both TCP and UDP</LI> <LI>Most Cluster traffic is lightweight.</LI> <LI>Communication is sensitive to latency and packet loss. Latency delays could mean performance issues, including removal of nodes from membership.</LI> <LI>Bandwidth is not as important as quality of service.</LI> </OL> <P>&nbsp;</P> <P>Cluster communication between nodes is crucial so that all nodes are currently in sync. Cluster communication is constantly going on as things progress. The NetFT adapter will dynamically switch intra-cluster traffic to another available Cluster network if it goes down or isn’t responding.</P> <P>&nbsp;</P> <P>The communications from the Cluster Service to other nodes through the NetFT adapter looks like this.</P> <P>&nbsp;</P> <P><span class="lia-inline-image-display-wrapper lia-image-align-center" image-alt="netft-arch.png" style="width: 342px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/221678i429BA52B9DF52678/image-size/medium?v=v2&amp;px=400" role="button" title="netft-arch.png" alt="netft-arch.png" /></span></P> <UL> <LI>Cluster Service plumbs network routes over NIC1, NIC2 on NetFT</LI> </UL> <UL> <LI>Cluster Service establishes TCP connection over NetFT adapter using the private NetFT IP address (source port 3343)</LI> <LI>NetFT wraps the TCP connection inside of a UDP packet (source port 3343)</LI> <LI>NetFT sends this UDP packet over one of the cluster-enabled physical NIC adapters to the destination node targeted for destination node’s NetFT adapter</LI> <LI>Destination node’s NetFT adapter receives the UDP packet and then sends the TCP connection to the destination node’s Cluster Service</LI> </UL> <P>&nbsp;</P> <P>Heartbeats are always traversing all Cluster enabled adapters and networks. However, Cluster communication will only go through one network at a time. The network it will use is determined by the role of the network and the priority (metric).</P> <P>&nbsp;</P> <P>There are three roles a Cluster has for networks.</P> <P>&nbsp;</P> <P class="lia-indent-padding-left-30px"><STRONG>Disabled for Cluster Communications</STRONG> – Role 0 - This is a network that Cluster will not use for anything.</P> <P>&nbsp;</P> <P class="lia-indent-padding-left-30px"><STRONG>Enabled for Cluster Communication only</STRONG> – Role 1 – Internal Cluster Communication and Cluster Shared Volume traffic (more later) are using this type network as a priority.</P> <P>&nbsp;</P> <P class="lia-indent-padding-left-30px"><STRONG>Enabled for client and cluster communication</STRONG> – Role 3 – This network is used for all client access and Cluster communications. Items like talking to a domain controller, DNS, DHCP (if enabled) when Network Names and IP Addresses come online. Cluster communication and Cluster Shared Volume traffic could use this network if all Role 1 networks are down.</P> <P>&nbsp;</P> <P>Based on the roles, the NetFT adapter will create metrics for priority. The metric Failover Cluster uses is not the same as the network card metrics that TCP/IP assigns. Networks are given a “cost” (Metric) to define priority. A lower metric value means a higher priority while a higher metric value means a lower priority.</P> <P>&nbsp;</P> <P>These metrics are automatically configured based on Cluster network role setting.</P> <P>&nbsp;</P> <P class="lia-indent-padding-left-30px">Cluster Network Role of 1 = 40,000 starting value</P> <P class="lia-indent-padding-left-30px">Cluster Network Role of 3 = 80,000 starting value</P> <P>&nbsp;</P> <P>Things such as Link speed, RDMA, and RSS capabilities will reduce metric value. For example, let’s say I have two networks in my Cluster with one being selected and Cluster communications only and one for both Cluster/Client. I can run the following to see the metrics.</P> <P>&nbsp;</P> <P class="lia-indent-padding-left-30px"><FONT size="3" color="#3366FF">PS &gt; Get-ClusterNetwork | ft Name, Metric</FONT></P> <P class="lia-indent-padding-left-30px">&nbsp;</P> <P class="lia-indent-padding-left-30px"><FONT size="3" color="#3366FF">Name&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;Metric</FONT></P> <P class="lia-indent-padding-left-30px"><FONT size="3" color="#3366FF">----&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;------</FONT></P> <P class="lia-indent-padding-left-30px"><FONT size="3" color="#3366FF">Cluster Network 1&nbsp; &nbsp;&nbsp;&nbsp;70240</FONT></P> <P class="lia-indent-padding-left-30px"><FONT size="3" color="#3366FF">Cluster Network 2&nbsp; &nbsp;&nbsp;&nbsp;30240</FONT></P> <P class="lia-indent-padding-left-30px">&nbsp;</P> <P>The NetFT adapter is also capable of taking advantage of SMB Multichannel and load balance across the networks. For NetFt to take advantage of it, the metrics need to be &lt; 16 metric values apart. In the example above, SMB Multichannel would not be used. But if there were additional cards in the machines and it looked like this:</P> <P>&nbsp;</P> <P class="lia-indent-padding-left-60px"><FONT color="#3366FF">PS &gt; Get-ClusterNetwork | ft Name, Metric</FONT></P> <P class="lia-indent-padding-left-60px">&nbsp;</P> <P class="lia-indent-padding-left-60px"><FONT color="#3366FF">Name&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;Metric</FONT></P> <P class="lia-indent-padding-left-60px"><FONT color="#3366FF">----&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;------</FONT></P> <P class="lia-indent-padding-left-60px"><FONT color="#3366FF">Cluster Network 1&nbsp; &nbsp;&nbsp;&nbsp;70240</FONT></P> <P class="lia-indent-padding-left-60px"><FONT color="#3366FF">Cluster Network 2&nbsp; &nbsp;&nbsp;&nbsp;30240</FONT></P> <P class="lia-indent-padding-left-60px"><FONT color="#3366FF">Cluster Network 3&nbsp;&nbsp;&nbsp;&nbsp; 30241</FONT></P> <P class="lia-indent-padding-left-60px"><FONT color="#3366FF">Cluster Network 4&nbsp;&nbsp;&nbsp;&nbsp; 30245</FONT></P> <P class="lia-indent-padding-left-60px"><FONT color="#3366FF">Cluster Network 5&nbsp;&nbsp;&nbsp;&nbsp; 30265</FONT></P> <P>&nbsp;</P> <P>In a configuration such as this, SMB Multichannel would be used over Cluster Networks 2, 3 and 4. From a Cluster communication and heartbeat standpoint, multichannel really isn’t a big deal. However, when a Cluster is using Cluster Shared Volumes or is a Storage Spaces Direct Cluster, storage traffic is going to need higher bandwidth. SMB Multichannel would fit nicely here so an additional network card or higher speed network cards are certainly a consideration.</P> <P>&nbsp;</P> <P>In the beginning of the blog, I mentioned latency and packet loss. If heartbeats cannot get through in a timely fashion, node removals can happen. Heartbeats can be tuned in the case of higher latency networks. The following are default settings for tuning the Cluster networks.</P> <P class="lia-indent-padding-left-30px">&nbsp;</P> <TABLE class=" lia-indent-margin-left-30px"> <TBODY> <TR> <TD width="156"> <P><STRONG>Parameter</STRONG></P> </TD> <TD width="156"> <P><STRONG>Windows 2012 R2</STRONG></P> </TD> <TD width="156"> <P><STRONG>Windows 2016</STRONG></P> </TD> <TD width="156"> <P><STRONG>Windows 2019</STRONG></P> </TD> </TR> <TR> <TD width="156"> <P>SameSubnetDelay</P> </TD> <TD width="156"> <P>1 second</P> </TD> <TD width="156"> <P>1 second</P> </TD> <TD width="156"> <P>1 second</P> </TD> </TR> <TR> <TD width="156"> <P>SameSubnetThreshold</P> </TD> <TD width="156"> <P>5 heartbeats</P> </TD> <TD width="156"> <P>10 heartbeats</P> </TD> <TD width="156"> <P>20 heartbeats</P> </TD> </TR> <TR> <TD width="156"> <P>CrossSubnetDelay</P> </TD> <TD width="156"> <P>1 second</P> </TD> <TD width="156"> <P>1 second</P> </TD> <TD width="156"> <P>1 second</P> </TD> </TR> <TR> <TD width="156"> <P>CrossSubnetThreshold</P> </TD> <TD width="156"> <P>5 heartbeats</P> </TD> <TD width="156"> <P>10 heartbeats</P> </TD> <TD width="156"> <P>20 heartbeats</P> </TD> </TR> <TR> <TD width="156"> <P>CrossSiteDelay</P> </TD> <TD width="156"> <P>N/A</P> </TD> <TD width="156"> <P>1 second</P> </TD> <TD width="156"> <P>1 second</P> </TD> </TR> <TR> <TD width="156"> <P>CrossSiteThreshold</P> </TD> <TD width="156"> <P>N/A</P> </TD> <TD width="156"> <P>20 heartbeats</P> </TD> <TD width="156"> <P>20 heartbeats</P> </TD> </TR> </TBODY> </TABLE> <P>&nbsp;</P> <P>For more information on these settings, please refer to the <A href="https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/tuning-failover-cluster-network-thresholds/ba-p/371834" target="_blank" rel="noopener">Tuning Failover Cluster Network Thresholds</A> blog.</P> <P>&nbsp;</P> <P>Planning networks for Failover Clustering is dependent on how it will be used. Let’s take a look at some of the common network traffics a Cluster would have.</P> <P>&nbsp;</P> <P><span class="lia-inline-image-display-wrapper lia-image-align-center" image-alt="Network1.png" style="width: 999px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/221679iC638D736C51A45CB/image-size/large?v=v2&amp;px=999" role="button" title="Network1.png" alt="Network1.png" /></span></P> <P>&nbsp;</P> <P>If this were a Hyper-V Cluster running virtual machines and Cluster Shared Volumes, Live Migration is going to occur.&nbsp; Clients are also connecting to the virtual machines.</P> <P>&nbsp;</P> <P>Cluster Communications and heart beating will always be on the wire.&nbsp; If you are using Cluster Shared Volumes (CSV), there will be some redirection traffic.</P> <P>&nbsp;</P> <P>If this were Cluster that used ISCSI for its storage, you would have that as a network.</P> <P>&nbsp;</P> <P>If this was stretched (nodes in multiple sites), you may have the need for an additional network as the considerations for replication (such as Storage Replica) traffic.</P> <P>&nbsp;</P> <P>If this is a Storage Spaces Direct Cluster, additional traffic for the Storage Bus Layer (SBL) traffic needs to be considered.</P> <P>&nbsp;</P> <P>As you can see, there is a lot of various network traffic requirements depending on the type of Cluster and the roles running. Obviously, you cannot have a dedicated network or network card for each as that just isn’t always possible.</P> <P>&nbsp;</P> <P>We do have a blog that will help with the Live Migration traffic to get some of the traffic isolated or limited in the bandwidth it uses. The blog <A href="https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/optimizing-hyper-v-live-migrations-on-an-hyperconverged/ba-p/396609" target="_blank" rel="noopener">Optimizing Hyper-V Live Migrations on an Hyperconverged Infrastructure</A> goes over some tips to set up.</P> <P>&nbsp;</P> <P>The last thing I wanted to talk about is with stretch/multisite Failover Clusters. I have already mentioned the Cluster specific networking considerations, but now I want to talk about how the virtual machines react in this type environment.</P> <P>&nbsp;</P> <P>Let’s say we have two datacenters and a four-node Failover Cluster with 2 nodes in each datacenter. As with most datacenters, they are in their own subnet and would be similar to this:</P> <P>&nbsp;</P> <P><span class="lia-inline-image-display-wrapper lia-image-align-center" image-alt="stretch1.png" style="width: 491px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/221681i10C3B2FD40AB0C4F/image-size/large?v=v2&amp;px=999" role="button" title="stretch1.png" alt="stretch1.png" /></span></P> <P>&nbsp;</P> <P><SPAN style="font-family: inherit;">The first thing you want to consider is if you want security between the cluster nodes on the wire. As a default, all Cluster communication is signed. That may be fine for some, but for others, they wish to have that extra level of security. We can set the Cluster to encrypt all traffic between the nodes. It is simply a PowerShell command to change it. Once you change it, the Cluster as a whole needs to be restarted.</SPAN></P> <P>&nbsp;</P> <P class="lia-indent-padding-left-30px"><FONT color="#3366FF">PS &gt; (Get-Cluster).SecurityLevel = 2</FONT></P> <P class="lia-indent-padding-left-30px">&nbsp;</P> <P class="lia-indent-padding-left-30px"><FONT color="#3366FF">0 = Clear Text</FONT></P> <P class="lia-indent-padding-left-30px"><FONT color="#3366FF">1 = Signed (default)</FONT></P> <P class="lia-indent-padding-left-30px"><FONT color="#3366FF">2 = Encrypt (slight performance decrease)</FONT></P> <P>&nbsp;</P> <P>Here is a virtual machine (VM1) that has an IP Address on the 1.0.0.0/8 network and clients are connecting to it. If the virtual machine moves over to Site2 that is a different network (172.0.0.0/16), there will not be any connectivity as it stands.</P> <P>&nbsp;</P> <P>To get around this, there are basically a couple options.</P> <P>&nbsp;</P> <P>To prevent the virtual machine from moving from a Cluster-initiated move (i.e. drain, node shutdown, etc), consider using <A href="#" target="_blank" rel="noopener">sites</A>. When you create sites, Cluster now has site awareness. This means that any Cluster-initiated move will always keep resources in the same site. Setting a preferred site will also keep it in the same site. If the virtual machine was to ever move to the second site, it would be due to a user-initiated move (i.e. Move-ClusterGroup, etc) or a site failure.</P> <P>&nbsp;</P> <P>But you still have the IP Address of the virtual machine issue to deal with. During a migration of the virtual machine, one of the very last things is to register the name and IP Address with DNS. If you are using a static IP Address for the virtual machine, a script would need to be manually run to change the IP Address to the local site it is on. If you are using DHCP, with DHCP servers in each site, the virtual machine will obtain a new address for the local site and register it. You then have to deal with DNS replication and TTL records a client may have. Instead of waiting for the timeout periods, a forced replication and TTL clearing on the client side would allow them to connect again.</P> <P>&nbsp;</P> <P>If you do not wish to go that route, a virtual LAN (VLAN) could be set up across the routers/switches to be a single IP Address scheme. Doing this will not have the need to change the IP Address of the virtual machine as it will always remain the same. However, stretching a VLAN (not a recommendation by Microsoft) is not always easy to do and the Networking Group within your company may not want to do this for various reasons.&nbsp;&nbsp;</P> <P>&nbsp;</P> <P>Another consideration is implementing a network device on the network that has a third IP Address that clients connect to and it holds that actual IP Address of the virtual machine so it will route clients appropriately. For example:</P> <P>&nbsp;</P> <P><span class="lia-inline-image-display-wrapper lia-image-align-center" image-alt="stretch2.png" style="width: 482px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/221682i3F64B327DA54B026/image-size/large?v=v2&amp;px=999" role="button" title="stretch2.png" alt="stretch2.png" /></span></P> <P>&nbsp;<SPAN style="font-family: inherit;">In the above example, we have a network device that has the IP Address of the virtual machine as 30.0.30.1. It will register this with all DNS and will keep the same IP Address no matter which site it is on. Your Networking Group would need to involved with this and need to control it. The chances of them not doing it is something to also consider if it can even done within your network.</SPAN></P> <P>&nbsp;</P> <P>We talked about virtual machines, but what about other resources, say, a file server?&nbsp; Unlike virtual machine roles, roles such as a file server have a Network Name and IP Address resource in the Cluster. In Windows 2008 Failover Cluster, we added he concept of “or” dependencies. Meaning, we can depend on this "or" that.</P> <P>&nbsp;</P> <P><span class="lia-inline-image-display-wrapper lia-image-align-center" image-alt="Or.png" style="width: 550px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/221683iDAA2590D95E0F7BE/image-size/large?v=v2&amp;px=999" role="button" title="Or.png" alt="Or.png" /></span></P> <P><SPAN style="font-family: inherit;">In the case of the scenario above, your Network Name could be dependent on 1.0.0.50 “or” 172.0.0.50. As long as one of the IP Address resources is online, the name is online and what is published in DNS. To go a step further for the stretch scenario, we have two parameters that can be used.</SPAN></P> <P>&nbsp;</P> <P><STRONG>RegisterAllProvidersIP</STRONG>: (default = 0 for FALSE)&nbsp;&nbsp;&nbsp;&nbsp;</P> <P class="lia-indent-padding-left-30px">&nbsp;</P> <UL> <LI>Determines if all IP Addresses for a Network Name will be registered by DNS</LI> <LI>TRUE (1): IP Addresses can be online or offline and will still be registered</LI> <LI>Ensure application is set to try all IP Addresses, so clients can connect quicker</LI> <LI>Not supported by all applications, check with application vendor</LI> <LI>Supported by SQL Server starting with SQL Server 2012</LI> </UL> <P class="lia-indent-padding-left-30px">&nbsp;</P> <P><STRONG>HostRecordTTL</STRONG>: (default = 1200 seconds)</P> <P class="lia-indent-padding-left-30px">&nbsp;</P> <UL> <LI>Controls time the DNS record lives on client for a cluster network name</LI> <LI>Shorter TTL: DNS records for clients updated sooner</LI> <LI><EM>Disclaimer: This does not speed up DNS replication</EM></LI> </UL> <P>&nbsp;</P> <P>By manipulating these parameters, you will have quicker connection times by a client. For example, I want to enable to register all the IP Addresses with DNS but I want the TTL to be 5 minutes. I would run the commands:</P> <P>&nbsp;</P> <P class="lia-indent-padding-left-30px"><FONT color="#3366FF">PS &gt; Get-ClusterResource FSNetworkName | Set-ClusterParameter RegisterAllProvidersIP 1</FONT></P> <P class="lia-indent-padding-left-30px">&nbsp;</P> <P class="lia-indent-padding-left-30px"><FONT color="#3366FF">PS &gt; Get-ClusterResource FSNetworkName | Set-ClusterParameter HostRecordTTL 300</FONT></P> <P>&nbsp;</P> <P>When setting the parameters, recycling (offline/online) of the resources is needed.</P> <P>&nbsp;</P> <P>There is more I could go into here with this subject but need to signoff for now. I hope that this gives you some basics to consider when designing your Clusters while thinking of the networking aspects of it. Networking designs and considerations must be carefully thought out.</P> <P>&nbsp;</P> <P>Happy Clustering !!</P> <P>&nbsp;</P> <P>John Marlin</P> <P>Senior Program Manager</P> <P>High Availability and Storage</P> <P>Follow me on Twitter: <A href="#" target="_blank" rel="noopener">@johnmarlin_msft</A></P> Thu, 24 Sep 2020 01:19:20 GMT https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/failover-clustering-networking-basics-and-fundamentals/ba-p/1706005 John Marlin 2020-09-24T01:19:20Z Disaster Recovery in the next version of Azure Stack HCI https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/disaster-recovery-in-the-next-version-of-azure-stack-hci/ba-p/1027898 <P>Disaster can hit at any time.&nbsp; When thinking about disaster and recovery, I think of 3 things</P> <P>&nbsp;</P> <OL> <LI>Be prepared</LI> <LI>Plan on not involving humans</LI> <LI>Automatic, not automated</LI> </OL> <P>Having a good strategy is a must.&nbsp; You want to be able to have resources automatically move out of one datacenter to the other and not have to rely on someone to "flip the switch" to get things to move.</P> <P>&nbsp;</P> <P>As announced at <A href="#" target="_blank" rel="noopener">Microsoft Ignite 2019</A>, we are now going to be able to stretch Azure Stack HCI systems between multiple sites in the next version for disaster recovery purposes.&nbsp; This blog will give you some insight into how you can set it up along with videos showing how it works.</P> <P><EM>&nbsp;</EM></P> <P>To set this up, the configuration I am using is basic and common to how many networks are configured.</P> <P>&nbsp;</P> <P><span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Stretch-01-Config.png" style="width: 999px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/158540i0F9A1858D4F6B3B0/image-size/large?v=v2&amp;px=999" role="button" title="Stretch-01-Config.png" alt="Stretch-01-Config.png" /></span></P> <P>&nbsp;</P> <P>As you can see from the above, I have:</P> <P>&nbsp;</P> <UL> <LI>Two sites (Seattle and Redmond)</LI> <LI>Two nodes in each site</LI> <LI>A domain controller in each site</LI> <LI>Different IP Address schemes at each site</LI> <LI>Each site goes through a router to connect to the other site</LI> </UL> <P>&nbsp;</P> <P>When putting this scenario together, we considered multiple things.&nbsp; One of the main considerations is ease of configurations.&nbsp; In the past, when setting up a Failover Cluster in a stretched environment could have some inadvertent misconfigurations.&nbsp; We wanted to ensure, where we could, that misconfigurations can be averted.&nbsp; We will be using Storage Replica as our replication method between the sites.&nbsp; Everything you need from a software perspective for this scenario will be in-box.</P> <P>&nbsp;</P> <P>One of the first things we looked at was the sites themselves.&nbsp; What we are doing is detecting if nodes are in different sites when Failover Clustering is first created.&nbsp; We do this with two different methods.</P> <P>&nbsp;</P> <OL> <LI>Sites are configured in Active Directory</LI> <LI>Nodes being added have different IP Address schemes</LI> </OL> <P>&nbsp;</P> <P>If we see sites are configured in Active Directory, we will create a site fault domain with the name of the site and add nodes to this fault domain.&nbsp; If sites are not configured but the nodes are in different IP Address schemes, we will create a site fault domain with the IP Address scheme as the name with the nodes.&nbsp; So taking the above configuration:</P> <P>&nbsp;</P> <UL> <LI>Sites configured in Active Directory would be named <STRONG>SEATTLE</STRONG> and <STRONG>REDMOND</STRONG></LI> <LI>No sites configured in Active Directory, but different IP Address schemes would be named <STRONG>0.0.0/8</STRONG> and <STRONG>172.0.0.0/16</STRONG></LI> </UL> <P>&nbsp;</P> <P>In this first video, you can see that Active Directory Sites and Services do have sites configured and the creation of the site fault domains are created for you.</P> <P>&nbsp;</P> <P><IFRAME src="https://www.youtube.com/embed/HQr-p1k01BQ" width="987" height="554" frameborder="0" allowfullscreen="allowfullscreen" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture"></IFRAME></P> <P>&nbsp;</P> <P>For more information on how to set up sites within Active Directory, please refer to the following blog:</P> <P>&nbsp;</P> <P><EM>Step-by-Step: Setting Up Active Directory Sites, Subnets, and Site-Links</EM></P> <P><A href="#" target="_blank" rel="noopener"><EM>https://blogs.technet.microsoft.com/canitpro/2015/03/03/step-by-step-setting-up-active-directory-sites-subnets-site-links/</EM></A></P> <P>&nbsp;</P> <P>The next item we took to ease configuration burdens are when Storage Spaces Direct is enabled.&nbsp; In Windows Server 2016 and 2019 Storage Spaces Direct, we supported one storage pool.&nbsp; In the stretch scenario with the next version, we are now supporting a pool per site.&nbsp; When Storage Spaces Direct is now enabled, we are going to automatically create these pools and name them with the site they are created in.&nbsp; I.E.:</P> <P>&nbsp;</P> <UL> <LI><STRONG>Pool for site SEATTLE</STRONG> and <STRONG>Pool for site REDMOND</STRONG></LI> <LI><STRONG>Pool for site 1.0.0.0/8</STRONG> and <STRONG>Pool for site 172.0.0.0/16</STRONG></LI> </UL> <P>&nbsp;</P> <P>The other item we are detecting for is the presence of the Storage Replica service.&nbsp; We will go out to each node to detect if Storage Replica has been installed on each of the nodes specified.&nbsp; If it is missing, we will stop and let you know it is missing and let you know which node.</P> <P>&nbsp;</P> <P><IFRAME src="https://www.youtube.com/embed/W3V0YnHgetc" width="987" height="554" frameborder="0" allowfullscreen="allowfullscreen" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture"></IFRAME></P> <P>&nbsp;</P> <P>As an FYI, keep in mind that this is a pre-released product at the time of this blog creation.&nbsp; The stopping of the Storage Spaces Direct enablement is subject to change to more of a warning at a later date.</P> <P>&nbsp;</P> <P>Once everything is in place and all services are present, you can see from this video, the successful enablement of Storage Spaces Direct and the separate pools.</P> <P>&nbsp;</P> <P><IFRAME src="https://www.youtube.com/embed/JYM8aG57HNs" width="987" height="526" frameborder="0" allowfullscreen="allowfullscreen" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture"></IFRAME></P> <P>&nbsp;</P> <P>To point out here, in order to get this to work, we did a lot of work with the Health Service to allow this.&nbsp; The Health Service keeps track of the health of the entire cluster (resources, nodes, drives, pools, etc).&nbsp; With multiple pools and multiple sites, it needs to keep track of all this.&nbsp; I will go more into this a bit later and show you how it works in a few other scenarios with this configuration.</P> <P>&nbsp;</P> <P>Now that we have pools, we need to create virtual disks at the sites.&nbsp; Since this is a stretch and we want replication with Storage Replica, two disks need to be created at each site (one for data and one for logs).&nbsp; I use New-Volume to create them, but there is a caveat.&nbsp; When the virtual disk is created, Failover Clustering will auto add them into the Available Storage Group.&nbsp; Therefore, you need to ensure Available Storage is on the node you are creating the disks on.</P> <P>&nbsp;</P> <P>NOTE: We are aware of this caveat and will be working on getting a better story for this as we go along.</P> <P>&nbsp;</P> <P>In the examples below, I will use:</P> <P>&nbsp;</P> <UL> <LI>site names of <STRONG>SEATTLE</STRONG> and <STRONG>REDMOND</STRONG></LI> <LI>pool names of <STRONG>Pool for site SEATTLE</STRONG> and <STRONG>Pool for site REDMOND</STRONG></LI> <LI>NODE1 and NODE2 are in site <STRONG>SEATTLE</STRONG></LI> <LI>NODE3 and NODE4 are in site <STRONG>REDMOND</STRONG></LI> </UL> <P>&nbsp;</P> <P>So, let's create the disks.&nbsp; The first thing is to ensure the Available Storage group is on NODE1 in SITEA so we can create those disks.</P> <P>&nbsp;</P> <P>From Failover Cluster Manager:</P> <P>&nbsp;</P> <OL> <LI>Go under <STRONG>Storage</STRONG> to <STRONG>Disks</STRONG></LI> <LI>In the far-right pane, click <STRONG>Move Available Storage</STRONG> and <STRONG>Select Node</STRONG></LI> <LI>Select <STRONG>NODE1</STRONG> and <STRONG>OK</STRONG></LI> </OL> <P>&nbsp;</P> <P>From PowerShell:</P> <P>&nbsp;</P> <P style="padding-left: 30px;"><FONT color="#3366FF">Move-ClusterGroup -Name "Available Storage" -Node NODE1</FONT></P> <P>&nbsp;</P> <P>Now we can create the two disks for site SEATTLE.&nbsp; I will be creating a 2-way mirror From PowerShell you can run the commands:</P> <P>&nbsp;</P> <P style="padding-left: 30px;"><FONT color="#3366FF">New-Volume -FriendlyName DATA_SEATTLE -StoragePoolFriendlyName "Pool for Site SEATTLE" -FileSystem ReFS -NumberOfDataCopies 2 -ProvisioningType Fixed -ResiliencySettingName Mirror -Size 125GB</FONT></P> <P style="padding-left: 30px;">&nbsp;</P> <P style="padding-left: 30px;"><FONT color="#3366FF">New-Volume -FriendlyName LOG_SEATTLE -StoragePoolFriendlyName "Pool for Site SEATTLE" -FileSystem ReFS -NumberOfDataCopies 2 -ProvisioningType Fixed -ResiliencySettingName Mirror -Size 10GB</FONT></P> <P>&nbsp;</P> <P>Now, we must create the same drives on the other pool.&nbsp; Before doing so, take the current disks offline and move the Available Storage group is moved to NODE3 in site REDMOND.</P> <P>&nbsp;</P> <P style="padding-left: 30px;"><FONT color="#3366FF">Move-ClusterGroup -Name "Available Storage" -Node NODE1</FONT></P> <P>&nbsp;</P> <P>Once there, the commands would be:</P> <P>&nbsp;</P> <P style="padding-left: 30px;"><FONT color="#3366FF">New-Volume -FriendlyName DATA_REDMOND -StoragePoolFriendlyName "Pool for Site REDMOND" -FileSystem ReFS -NumberOfDataCopies 2 -ProvisioningType Fixed -ResiliencySettingName Mirror -Size 125GB</FONT></P> <P style="padding-left: 30px;">&nbsp;</P> <P style="padding-left: 30px;"><FONT color="#3366FF">New-Volume -FriendlyName LOG_REDMOND -StoragePoolFriendlyName "Pool for Site REDMOND" -FileSystem ReFS -NumberOfDataCopies 2 -ProvisioningType Fixed -ResiliencySettingName Mirror -Size 10GB</FONT></P> <P>&nbsp;</P> <P>We now have all the disks we want in the cluster.&nbsp; You can then move the Available Storage back to NODE1 and copy your data on it if you have is already. &nbsp;Next thing is to set them up with Storage Replica.&nbsp; I will not go through the steps here.&nbsp; But here is the link to the document on setting it up in the same cluster as well as another video here showing it being set up.</P> <P>&nbsp;</P> <P style="padding-left: 30px;"><EM><A href="#" target="_blank" rel="noopener">https://docs.microsoft.com/en-us/windows-server/storage/storage-replica/stretch-cluster-replication-using-shared-storage</A></EM></P> <P>&nbsp;</P> <P><IFRAME src="https://www.youtube.com/embed/mIfICmuVyWM" width="987" height="526" frameborder="0" allowfullscreen="allowfullscreen" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture"></IFRAME></P> <P>&nbsp;</P> <P>Everything is all set and resources can be created.&nbsp; If you have not seen what it looks like, here is a video of a site failure.&nbsp; Everything will move over and run.&nbsp; Storage Replica takes care of ensuring the right disks are utilized and halting replication until the site comes back.&nbsp;</P> <P>&nbsp;</P> <P>TIP:&nbsp; If you watch the tail end of the video, you will notice that the virtual machine automatically live migrates with the drives.&nbsp; We added this functionality back in Windows 2016 where the VM will chase the CSV so they are not on different sites.&nbsp; Again, nother extra you have to set up, just something we did to ease administration burdens for you.</P> <P>&nbsp;</P> <P><IFRAME src="https://www.youtube.com/embed/I4KWIMFm33g" width="987" height="551" frameborder="0" allowfullscreen="allowfullscreen" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture"></IFRAME></P> <P>&nbsp;</P> <P>I talked earlier about all the work we did with the health service.&nbsp; We also did a lot of work with the way we autopool drives together by site.&nbsp; In the next couple of videos, I want to help show you.</P> <P>&nbsp;</P> <P>This first video is more of the brownfield scenario.&nbsp; Meaning, I have an existing cluster running and want to add nodes from a separate site.&nbsp; In the video, you will see that we add the nodes from the different site into the cluster and create a pool from the drives in that site.</P> <P>&nbsp;</P> <P><IFRAME src="https://www.youtube.com/embed/E8ZW529Bqss" width="987" height="551" frameborder="0" allowfullscreen="allowfullscreen" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture"></IFRAME></P> <P>&nbsp;</P> <P>In this video, I will show how we react to disk failures.&nbsp; I have removed a disk from each pool to simulate a failure.&nbsp; I then replace the disks with new ones.&nbsp; We detect which drive is in which site and pool them into the proper pool.</P> <P>&nbsp;</P> <P><IFRAME src="https://www.youtube.com/embed/oJLp78ooO68" width="987" height="551" frameborder="0" allowfullscreen="allowfullscreen" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture"></IFRAME></P> <P>&nbsp;</P> <P>I hope this gives you a glimpse into what we are doing with the next version of Azure Stack HCI.&nbsp; This scenario should be in the public preview build when it becomes available.&nbsp; Get a preview of what is coming and help us test and stabilize things.&nbsp; You also can suggest how it could be better.</P> <P>&nbsp;</P> <P>Thanks</P> <P>John Marlin</P> <P>Senior Program Manager</P> <P>High Availability and Storage Team</P> <P>Twitter: @johnmarlin_msft</P> Thu, 30 Jul 2020 20:22:30 GMT https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/disaster-recovery-in-the-next-version-of-azure-stack-hci/ba-p/1027898 John Marlin 2020-07-30T20:22:30Z Talking Failover Clustering and Azure Stack HCI Ignite 2019 Announcements https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/talking-failover-clustering-and-azure-stack-hci-ignite-2019/ba-p/1015497 <P style="margin: 0in; margin-bottom: .0001pt;">Microsoft Ignite 2019 at the Orange County Convention Center in Orlando, Florida was a huge success with approximately 26,000 in attendance. We had a great time meeting and talking with our partners and customers. We also had a few announcements regarding features coming in the next Windows Server LTSC regarding Azure Stack HCI and Failover Clustering that I wanted to highlight.</P> <P style="margin: 0in; margin-bottom: .0001pt;">&nbsp;</P> <P style="margin: 0in; margin-bottom: .0001pt;">In this blog, I will try to separate out each of the features announced. In some of the sessions, there were multiple announcements. Since this is the case, I want to make sure you are aware of each announcement. Due to this, a particular session may be listed in multiple sections.&nbsp; I will also try to point out what things apply only to Windows Server vNext and what would also include using on Windows Server 2016/2019.</P> <P style="margin: 0in; margin-bottom: .0001pt;">&nbsp;</P> <P style="margin: 0in; margin-bottom: .0001pt;">Ensure you go through this entire blog as there are numerous announcements including one of the biggest asks from our customers at the end.&nbsp;</P> <P style="margin: 0in; margin-bottom: .0001pt;">&nbsp;</P> <P style="margin: 0in; margin-bottom: .0001pt;">I will first start out with this one.&nbsp; It doesn't apply necessarily to Failover Clustering or Azure Stack HCI, but it is an important one as the date is right around the corner.</P> <P style="margin: 0in; margin-bottom: .0001pt;">&nbsp;</P> <P style="margin: 0in; margin-bottom: .0001pt;"><STRONG>Windows 2008/2008R2 End of Life</STRONG></P> <P style="margin: 0in; margin-bottom: .0001pt;">&nbsp;</P> <P style="margin: 0in; margin-bottom: .0001pt;">Windows Server 2008/2008R2 are both reaching end of life, these two sessions talk about options of planning for this day.</P> <P style="margin: 0in; margin-bottom: .0001pt;">&nbsp;</P> <P style="margin: 0in; margin-bottom: .0001pt; text-align: center;" align="center"><EM>Plan for Z-Day 2020: Windows Server 2008 end of support is coming</EM></P> <P style="margin: 0in; margin-bottom: .0001pt; text-align: center;" align="center"><EM><A href="#" target="_blank" rel="noopener">https://myignite.techcommunity.microsoft.com/sessions/82850</A></EM></P> <P style="margin: 0in; margin-bottom: .0001pt; text-align: center;" align="center">&nbsp;</P> <P style="margin: 0in; margin-bottom: .0001pt; text-align: center;" align="center"><EM>It's 2019 and your servers are 2008</EM></P> <P style="margin: 0in; margin-bottom: .0001pt; text-align: center;" align="center"><EM><A href="#" target="_blank" rel="noopener">https://myignite.techcommunity.microsoft.com/sessions/89294</A></EM></P> <P style="margin: 0in; margin-bottom: .0001pt;">&nbsp;</P> <P style="margin: 0in; margin-bottom: .0001pt;"><STRONG>Quick list</STRONG></P> <P style="margin: 0in; margin-bottom: .0001pt;">&nbsp;</P> <P style="margin: 0in; margin-bottom: .0001pt;">For a quick list of announcements being made, you should catch this session listing 45 things in 45 minutes.&nbsp; This will give you a brief overview of things so that you can delve a little deeper into the other session announcements below.&nbsp; So if there is only one session you review from this list, this is the one.&nbsp; These will cover items for both Windows Server 2016/2019 as well as Windows Server vNext.</P> <P style="margin: 0in; margin-bottom: .0001pt;">&nbsp;</P> <P style="margin: 0in; margin-bottom: .0001pt; text-align: center;" align="center"><span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="clipboard_image_15.png" style="width: 640px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/157498i8762686211D8CAF2/image-dimensions/640x357?v=v2" width="640" height="357" role="button" title="clipboard_image_15.png" alt="clipboard_image_15.png" /></span>&nbsp;</P> <P style="margin: 0in; margin-bottom: .0001pt; text-align: center;" align="center">What's new for Azure Stack HCI: 45 things in 45 minutes</P> <P style="margin: 0in; margin-bottom: .0001pt; text-align: center;" align="center"><A href="#" target="_blank" rel="noopener">https://myignite.techcommunity.microsoft.com/sessions/82905</A></P> <P style="margin: 0in; margin-bottom: .0001pt;">&nbsp;</P> <P style="margin: 0in; margin-bottom: .0001pt;"><STRONG>Azure Stack is now one portfolio</STRONG></P> <P style="margin: 0in; margin-bottom: .0001pt;">&nbsp;</P> <P style="margin: 0in; margin-bottom: .0001pt;">We have expanded Azure Stack into a PORTFOLIO of products, including Azure Stack Edge, Azure Stack HCI, and Azure Stack Hub.</P> <P style="margin: 0in; margin-bottom: .0001pt;">&nbsp;</P> <P style="margin: 0in 0in 0.0001pt; text-align: center;"><span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="clipboard_image_0.png" style="width: 400px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/157481iF73660D77C3809AE/image-size/medium?v=v2&amp;px=400" role="button" title="clipboard_image_0.png" alt="clipboard_image_0.png" /></span>&nbsp;</P> <P style="margin: 0in; margin-bottom: .0001pt; text-align: center;" align="center"><EM>Discover Azure Stack HCI</EM></P> <P style="margin: 0in; margin-bottom: .0001pt; text-align: center;" align="center"><EM><A href="#" target="_blank" rel="noopener">https://myignite.techcommunity.microsoft.com/sessions/82907</A></EM></P> <P style="margin: 0in; margin-bottom: .0001pt; text-align: center;" align="center">&nbsp;</P> <P style="margin: 0in; margin-bottom: .0001pt; text-align: center;" align="center"><EM>Get started with Azure Stack HCI</EM></P> <P style="margin: 0in; margin-bottom: .0001pt; text-align: center;" align="center"><EM><A href="#" target="_blank" rel="noopener">https://myignite.techcommunity.microsoft.com/sessions/89352</A></EM></P> <P style="margin: 0in; margin-bottom: .0001pt;">&nbsp;</P> <P style="margin: 0in; margin-bottom: .0001pt;"><STRONG>Azure Iaas VM Guest Cluster support for Premium File Shares and Shared Azure Disk</STRONG></P> <P style="margin: 0in; margin-bottom: .0001pt;">&nbsp;</P> <P style="margin: 0in; margin-bottom: .0001pt;">This is a preview of file shares that act like shared disks for your Azure IaaS Failover Clusters. This is not Storage Spaces Direct, this is traditional Failover Clusters for SQL Server. If you do not wish to run Storage Spaces Direct, but would rather run a traditional Failover Cluster, Shared Azure Disk is now in limited preview for your shared storage.&nbsp; This one is also for all versions currently released as well as Windows Server vNext.</P> <P style="margin: 0in; margin-bottom: .0001pt;">&nbsp;</P> <P style="margin: 0in 0in 0.0001pt; text-align: center;"><span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="clipboard_image_1.png" style="width: 400px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/157482i1321681BA61D539C/image-size/medium?v=v2&amp;px=400" role="button" title="clipboard_image_1.png" alt="clipboard_image_1.png" /></span>&nbsp;</P> <P style="margin: 0in; margin-bottom: .0001pt; text-align: center;" align="center"><EM>Windows Server on Azure Overview: Lift-and-Shift migrations for Enterprise Workloads</EM></P> <P style="margin: 0in; margin-bottom: .0001pt; text-align: center;" align="center"><EM><A href="#" target="_blank" rel="noopener">https://myignite.techcommunity.microsoft.com/sessions/81956</A></EM></P> <P style="margin: 0in; margin-bottom: .0001pt;">&nbsp;</P> <P style="margin: 0in; margin-bottom: .0001pt;"><STRONG>Windows Admin Center setup wizard for Storage Spaces Direct and Software Defined Networking</STRONG></P> <P style="margin: 0in; margin-bottom: .0001pt;">&nbsp;</P> <P style="margin: 0in; margin-bottom: .0001pt;">That's right. With the latest release of Windows Admin Center, you now have a walk through of creating an Azure Stack HCI system as well as SDN. PowerShell not needed !!&nbsp; Note: what is in black is there now while items greyed out are coming.&nbsp; This one is not limited to only vNext, but also works with currently released Windows Server 2016/2019.</P> <P style="margin: 0in; margin-bottom: .0001pt;">&nbsp;</P> <P style="margin: 0in 0in 0.0001pt; text-align: center;"><span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="clipboard_image_2.png" style="width: 400px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/157483iEF8A4C4857BE2BF4/image-size/medium?v=v2&amp;px=400" role="button" title="clipboard_image_2.png" alt="clipboard_image_2.png" /></span>&nbsp;</P> <P style="margin: 0in 0in 0.0001pt; text-align: center;"><span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="clipboard_image_3.png" style="width: 400px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/157485i30996D13E5A2CA83/image-size/medium?v=v2&amp;px=400" role="button" title="clipboard_image_3.png" alt="clipboard_image_3.png" /></span>&nbsp;</P> <P style="margin: 0in 0in 0.0001pt; text-align: center;">&nbsp;</P> <P style="margin: 0in; margin-bottom: .0001pt; text-align: center;" align="center"><EM>Jumpstart your Azure Stack HCI Deployment</EM></P> <P style="margin: 0in; margin-bottom: .0001pt; text-align: center;" align="center"><EM><A href="#" target="_blank" rel="noopener">https://myignite.techcommunity.microsoft.com/sessions/82906</A></EM></P> <P style="margin: 0in; margin-bottom: .0001pt; text-align: center;" align="center">&nbsp;</P> <P style="margin: 0in; margin-bottom: .0001pt; text-align: center;" align="center"><EM>Get started with Azure Stack HCI</EM></P> <P style="margin: 0in; margin-bottom: .0001pt; text-align: center;" align="center"><EM><A href="#" target="_blank" rel="noopener">https://myignite.techcommunity.microsoft.com/sessions/89352</A></EM></P> <P style="margin: 0in; margin-bottom: .0001pt;">&nbsp;</P> <P style="margin: 0in; margin-bottom: .0001pt;"><STRONG>Using Azure Services to manage and monitor on-premises clusters</STRONG></P> <P style="margin: 0in; margin-bottom: .0001pt;">&nbsp;</P> <P style="margin: 0in; margin-bottom: .0001pt;">See how various Azure Services can be used from the cloud to your on-premises clusters all using Windows Admin Center. The "hybrid" way of doing things now.&nbsp; When talking about using these services, you can use them against your current Windows 2016/2019 HCI and traditional clusters as well as Windows Server vNext.</P> <P style="margin: 0in; margin-bottom: .0001pt;">&nbsp;</P> <P style="margin: 0in 0in 0.0001pt; text-align: center;"><span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="clipboard_image_4.png" style="width: 400px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/157484iA9DC06F4E097394E/image-size/medium?v=v2&amp;px=400" role="button" title="clipboard_image_4.png" alt="clipboard_image_4.png" /></span>&nbsp;</P> <P style="margin: 0in; margin-bottom: .0001pt; text-align: center;" align="center"><EM>Windows Server: What's new and what's next</EM></P> <P style="margin: 0in; margin-bottom: .0001pt; text-align: center;" align="center"><EM><A href="#" target="_blank" rel="noopener">https://myignite.techcommunity.microsoft.com/sessions/81704</A></EM></P> <P style="margin: 0in; margin-bottom: .0001pt;">&nbsp;</P> <P style="margin: 0in; margin-bottom: .0001pt; text-align: center;" align="center"><EM>Modernize your retail stores or branch offices with Azure Stack HCI</EM></P> <P style="margin: 0in; margin-bottom: .0001pt; text-align: center;" align="center"><EM><A href="#" target="_blank" rel="noopener">https://myignite.techcommunity.microsoft.com/sessions/82904</A></EM></P> <P style="margin: 0in; margin-bottom: .0001pt; text-align: center;" align="center">&nbsp;</P> <P style="margin: 0in; margin-bottom: .0001pt; text-align: center;" align="center"><EM>Clustering in the age of HCI and Hybrid</EM></P> <P style="margin: 0in; margin-bottom: .0001pt; text-align: center;" align="center"><EM><A href="#" target="_blank" rel="noopener">https://myignite.techcommunity.microsoft.com/sessions/83946</A></EM></P> <P style="margin: 0in; margin-bottom: .0001pt;">&nbsp;</P> <P style="margin: 0in; margin-bottom: .0001pt;"><STRONG>Shut down safeguard</STRONG></P> <P style="margin: 0in; margin-bottom: .0001pt;">&nbsp;</P> <P style="margin: 0in; margin-bottom: .0001pt;">Say you are needing to reboot your Azure Stack HCI systems. You first reboot your first node and bring it back up. Once it comes back up, it needs to go through a re-sync of the data. However, if you were to then reboot the second node before it finishes, bad things "could" happen to your data. We will now have a built-in safeguard to prevent this from happening. If here are storage jobs currently running, we will warn you not to.&nbsp; Sorry, but this is where we get into the "it's only available in Windows Server vNext" areas of the list.</P> <P style="margin: 0in; margin-bottom: .0001pt;">&nbsp;</P> <P style="margin: 0in 0in 0.0001pt; text-align: center;"><span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="clipboard_image_5.png" style="width: 400px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/157486i7509AC2AC0F7F5FA/image-size/medium?v=v2&amp;px=400" role="button" title="clipboard_image_5.png" alt="clipboard_image_5.png" /></span>&nbsp;</P> <P style="margin: 0in 0in 0.0001pt; text-align: center;">&nbsp;</P> <P style="margin: 0in; margin-bottom: .0001pt; text-align: center;" align="center"><EM>Modernize your retail stores or branch offices with Azure Stack HCI</EM></P> <P style="margin: 0in; margin-bottom: .0001pt; text-align: center;" align="center"><EM><A href="#" target="_blank" rel="noopener">https://myignite.techcommunity.microsoft.com/sessions/82904</A></EM></P> <P style="margin: 0in; margin-bottom: .0001pt;">&nbsp;</P> <P style="margin: 0in; margin-bottom: .0001pt;"><STRONG>Increase to 16 petabytes raw storage</STRONG></P> <P style="margin: 0in; margin-bottom: .0001pt;">&nbsp;</P> <P style="margin: 0in; margin-bottom: .0001pt;">In Windows Server 2019, we announced 4 petabyte raw storage availability. In the next Windows Server LTSC, we have increased it even further to 16 petabytes raw storage. This is a big win for backup server scenarios (and others).</P> <P style="margin: 0in; margin-bottom: .0001pt;">&nbsp;</P> <P style="margin: 0in 0in 0.0001pt; text-align: center;"><span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="clipboard_image_6.png" style="width: 400px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/157489i22DEAF9D6C54A065/image-size/medium?v=v2&amp;px=400" role="button" title="clipboard_image_6.png" alt="clipboard_image_6.png" /></span>&nbsp;</P> <P style="margin: 0in 0in 0.0001pt; text-align: center;">&nbsp;</P> <P style="margin: 0in; margin-bottom: .0001pt; text-align: center;" align="center"><EM>What's next for software defined storage and networking for Windows Server</EM></P> <P style="margin: 0in; margin-bottom: .0001pt; text-align: center;" align="center"><EM><A href="#" target="_blank" rel="noopener">https://myignite.techcommunity.microsoft.com/sessions/81960</A></EM></P> <P style="margin: 0in; margin-bottom: .0001pt;">&nbsp;</P> <P style="margin: 0in; margin-bottom: .0001pt;"><STRONG>Switchless Clusters</STRONG></P> <P style="margin: 0in; margin-bottom: .0001pt;">&nbsp;</P> <P style="margin: 0in; margin-bottom: .0001pt;">Here again, we announced 2-node switchless clusters for Windows Server 2019 Azure Stack HCI and Failover Clustering. For Windows Server vNext, we are now going to support more than 2 nodes with full mesh connectivity. So how many network adapters can you add?</P> <P style="margin: 0in; margin-bottom: .0001pt;">&nbsp;</P> <P style="margin: 0in 0in 0.0001pt; text-align: center;"><span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="clipboard_image_7.png" style="width: 400px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/157487iF80D33968DDA1F23/image-size/medium?v=v2&amp;px=400" role="button" title="clipboard_image_7.png" alt="clipboard_image_7.png" /></span>&nbsp;</P> <P style="margin: 0in 0in 0.0001pt; text-align: center;">&nbsp;</P> <P style="margin: 0in; margin-bottom: .0001pt; text-align: center;" align="center"><EM>What's next for software defined storage and networking for Windows Server</EM></P> <P style="margin: 0in; margin-bottom: .0001pt; text-align: center;" align="center"><EM><A href="#" target="_blank" rel="noopener">https://myignite.techcommunity.microsoft.com/sessions/81960</A></EM></P> <P style="margin: 0in; margin-bottom: .0001pt; text-align: center;" align="center">&nbsp;</P> <P style="margin: 0in; margin-bottom: .0001pt; text-align: center;" align="center"><EM>Get started with Azure Stack HCI</EM></P> <P style="margin: 0in; margin-bottom: .0001pt; text-align: center;" align="center"><EM><A href="#" target="_blank" rel="noopener">https://myignite.techcommunity.microsoft.com/sessions/89352</A></EM></P> <P style="margin: 0in; margin-bottom: .0001pt;">&nbsp;</P> <P style="margin: 0in; margin-bottom: .0001pt;"><STRONG>Repair/Resyc Times are much faster and can throttle it.</STRONG></P> <P style="margin: 0in; margin-bottom: .0001pt;">&nbsp;</P> <P style="margin: 0in; margin-bottom: .0001pt;">When a reboot is necessary, resync times when it comes back can take a while. We have changed the way we are doing these repairs/resyncs and made it much faster. We have also introduced a way to throttle the repair/rysync. So you can now control if you need it to complete even faster or run more in the background.&nbsp; This is something that will only be available in Windows Server vNext Azure Stack HCI.</P> <P style="margin: 0in; margin-bottom: .0001pt;">&nbsp;</P> <P style="margin: 0in 0in 0.0001pt; text-align: center;"><span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="clipboard_image_8.png" style="width: 400px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/157488iD33DC6D464BD33D5/image-size/medium?v=v2&amp;px=400" role="button" title="clipboard_image_8.png" alt="clipboard_image_8.png" /></span>&nbsp;</P> <P style="margin: 0in 0in 0.0001pt; text-align: center;">&nbsp;</P> <P style="margin: 0in; margin-bottom: .0001pt; text-align: center;" align="center"><EM>What's next for software defined storage and networking for Windows Server</EM></P> <P style="margin: 0in; margin-bottom: .0001pt; text-align: center;" align="center"><EM><A href="#" target="_blank" rel="noopener">https://myignite.techcommunity.microsoft.com/sessions/81960</A></EM></P> <P style="margin: 0in; margin-bottom: .0001pt;">&nbsp;</P> <P style="margin: 0in; margin-bottom: .0001pt;"><STRONG>New Affinity/AntiAffinity rules</STRONG></P> <P style="margin: 0in; margin-bottom: .0001pt;">&nbsp;</P> <P style="margin: 0in; margin-bottom: .0001pt;">In the past, we have only had antiaffinity as an option to keep roles apart. Introducing new rules for affinity and antiaffinity was a must. With nodes in different sites, we also had to make it site aware. So now, you can keep roles together or apart as well as storage affinity to keep virtual machines on the same node as the storage.&nbsp; These new rules will only be available in Windows Server vNext Azure Stack HCI and traditional failover clusters.</P> <P style="margin: 0in; margin-bottom: .0001pt;">&nbsp;</P> <P style="margin: 0in 0in 0.0001pt; text-align: center;"><span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="clipboard_image_9.png" style="width: 400px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/157490i5B3347B170AC6F60/image-size/medium?v=v2&amp;px=400" role="button" title="clipboard_image_9.png" alt="clipboard_image_9.png" /></span>&nbsp;</P> <P style="margin: 0in; margin-bottom: .0001pt; text-align: center;" align="center"><EM>What's next for software defined storage and networking for Windows Server</EM></P> <P style="margin: 0in; margin-bottom: .0001pt; text-align: center;" align="center"><EM><A href="#" target="_blank" rel="noopener">https://myignite.techcommunity.microsoft.com/sessions/81960</A></EM></P> <P style="margin: 0in; margin-bottom: .0001pt;">&nbsp;</P> <P style="margin: 0in; margin-bottom: .0001pt;"><STRONG>Bitlocker</STRONG></P> <P style="margin: 0in; margin-bottom: .0001pt;">&nbsp;</P> <P style="margin: 0in; margin-bottom: .0001pt;">This one had no slides as snuck it in there with some time available. Bitlocker has been available for Clusters for quite some time. The requirement was the cluster nodes must be all in the same domain as the bitlocker key is tied to the Cluster Name Object (CNO). However, for those clusters at the edge, workgroup clusters, and multidomain clusters, Active Directory may not be present. With no Active Directory, there is no CNO. These cluster scenarios had no data at-rest security. With Windows Server vNext, we introduced our own bitlocker key stored locally (encrypted of course) for cluster to use. Now you can feel safer as at-rest security is now in place.&nbsp; This feature will only be available in Windows Server vNext Azure Stack HCI and traditional failover clusters.</P> <P style="margin: 0in; margin-bottom: .0001pt;">&nbsp;</P> <P style="margin: 0in; margin-bottom: .0001pt; text-align: center;" align="center"><EM>What's next for software defined storage and networking for Windows Server</EM></P> <P style="margin: 0in; margin-bottom: .0001pt; text-align: center;" align="center"><EM><A href="#" target="_blank" rel="noopener">https://myignite.techcommunity.microsoft.com/sessions/81960</A></EM></P> <P style="margin: 0in; margin-bottom: .0001pt;">&nbsp;</P> <P style="margin: 0in; margin-bottom: .0001pt;"><STRONG>New Network Validation tests</STRONG></P> <P style="margin: 0in; margin-bottom: .0001pt;">&nbsp;</P> <P style="margin: 0in; margin-bottom: .0001pt;">We come up with new things all the time and networking continues to be an innovation. We are not standing pat and continue to add new tests.&nbsp;&nbsp;These new tests will only be available in Windows Server vNext Azure Stack HCI and traditional failover clusters.</P> <P style="margin: 0in; margin-bottom: .0001pt;">&nbsp;</P> <P style="margin: 0in 0in 0.0001pt; text-align: center;"><span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="clipboard_image_10.png" style="width: 400px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/157492i62D12F7BA71DFEDA/image-size/medium?v=v2&amp;px=400" role="button" title="clipboard_image_10.png" alt="clipboard_image_10.png" /></span>&nbsp;</P> <P style="margin: 0in; margin-bottom: .0001pt; text-align: center;" align="center"><EM>What's next for software defined storage and networking for Windows Server</EM></P> <P style="margin: 0in; margin-bottom: .0001pt; text-align: center;" align="center"><EM><A href="#" target="_blank" rel="noopener">https://myignite.techcommunity.microsoft.com/sessions/81960</A></EM></P> <P style="margin: 0in; margin-bottom: .0001pt;">&nbsp;</P> <P style="margin: 0in; margin-bottom: .0001pt;"><STRONG>Windows Admin Center</STRONG></P> <P style="margin: 0in; margin-bottom: .0001pt;">&nbsp;</P> <P style="margin: 0in; margin-bottom: .0001pt;">I mentioned in previous descriptions here about Windows Admin Center. The Windows Admin Center Team works tirelessly to continue to add more and more in. For the general availability (GA) announcement of the latest Windows Admin Center, you can see just how much has been added and new. Everything from continued on-premises work, hybrid, third party extensions, and more.&nbsp; The work being done here is not just for the future, but all existing Azure Stack HCI and traditional failover clustering systems.</P> <P style="margin: 0in; margin-bottom: .0001pt;">&nbsp;</P> <P style="margin: 0in 0in 0.0001pt; text-align: center;"><span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="clipboard_image_11.png" style="width: 400px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/157491i1B0CD179047AF5C6/image-size/medium?v=v2&amp;px=400" role="button" title="clipboard_image_11.png" alt="clipboard_image_11.png" /></span>&nbsp;</P> <P style="margin: 0in; margin-bottom: .0001pt; text-align: center;" align="center"><EM>Windows Server deep dive: Demopalooza</EM></P> <P style="margin: 0in; margin-bottom: .0001pt; text-align: center;" align="center"><EM><A href="#" target="_blank" rel="noopener">https://myignite.techcommunity.microsoft.com/sessions/81949</A></EM></P> <P style="margin: 0in; margin-bottom: .0001pt; text-align: center;" align="center">&nbsp;</P> <P style="margin: 0in; margin-bottom: .0001pt; text-align: center;" align="center"><EM>Live Q&amp;A: Manage your hybrid server environment with Windows Admin Center</EM></P> <P style="margin: 0in; margin-bottom: .0001pt; text-align: center;" align="center"><EM><A href="#" target="_blank" rel="noopener">https://myignite.techcommunity.microsoft.com/sessions/89341</A></EM></P> <P style="margin: 0in; margin-bottom: .0001pt; text-align: center;" align="center">&nbsp;</P> <P style="margin: 0in; margin-bottom: .0001pt; text-align: center;" align="center"><EM>Windows Admin Center: Unlock Azure Hybrid value</EM></P> <P style="margin: 0in; margin-bottom: .0001pt; text-align: center;" align="center"><EM><A href="#" target="_blank" rel="noopener">https://myignite.techcommunity.microsoft.com/sessions/81952</A></EM></P> <P style="margin: 0in; margin-bottom: .0001pt;">&nbsp;</P> <P style="margin: 0in; margin-bottom: .0001pt; text-align: center;" align="center"><EM>Automatically monitor, secure and update your on-premises servers from Azure with Windows Admin Center</EM></P> <P style="margin: 0in; margin-bottom: .0001pt; text-align: center;" align="center"><EM><A href="#" target="_blank" rel="noopener">https://myignite.techcommunity.microsoft.com/sessions/83942</A></EM></P> <P style="margin: 0in; margin-bottom: .0001pt; text-align: center;" align="center">&nbsp;</P> <P style="margin: 0in; margin-bottom: .0001pt; text-align: center;" align="center"><EM>Be a Windows Admin Center expert: Best practices for deployment, configuration, and security</EM></P> <P style="margin: 0in; margin-bottom: .0001pt; text-align: center;" align="center"><EM><A href="#" target="_blank" rel="noopener">https://myignite.techcommunity.microsoft.com/sessions/83943</A></EM></P> <P style="margin: 0in; margin-bottom: .0001pt;">&nbsp;</P> <P style="margin: 0in; margin-bottom: .0001pt; text-align: center;" align="center"><EM>Get more done with Windows Admin Center third-party extensions</EM></P> <P style="margin: 0in; margin-bottom: .0001pt; text-align: center;" align="center"><EM><A href="#" target="_blank" rel="noopener">https://myignite.techcommunity.microsoft.com/sessions/83944</A></EM></P> <P style="margin: 0in; margin-bottom: .0001pt; text-align: center;" align="center">&nbsp;</P> <P style="margin: 0in; margin-bottom: .0001pt; text-align: center;" align="center"><EM>Windows Admin Center: Better together with System Center and Microsoft Azure</EM></P> <P style="margin: 0in; margin-bottom: .0001pt; text-align: center;" align="center"><EM><A href="#" target="_blank" rel="noopener">https://myignite.techcommunity.microsoft.com/sessions/83945</A></EM></P> <P style="margin: 0in; margin-bottom: .0001pt;">&nbsp;</P> <P style="margin: 0in; margin-bottom: .0001pt;"><STRONG>New Windows Performance Monitor</STRONG></P> <P style="margin: 0in; margin-bottom: .0001pt;">&nbsp;</P> <P style="margin: 0in; margin-bottom: .0001pt;">Everyone is concerned with performance. We have in-box Performance Monitor, but it hasn't been updated in a long time (other than some counters with each release). In Windows Admin Center, we have also added a new Performance Monitor that can be run anywhere against any of the different versions of Windows Server. You can have multiple windows running, pause it, and much much more. We have also made it easier to look at.&nbsp; As with all things Windows Admin Center, this will work with current versions of Windows Server as well as future versions.&nbsp; It is also not limited to Azure Stack HCI or traditional failover clusters.</P> <P style="margin: 0in; margin-bottom: .0001pt;">&nbsp;</P> <P style="margin: 0in 0in 0.0001pt; text-align: center;"><span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="clipboard_image_12.png" style="width: 400px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/157495i2F4A70EBD4765DCD/image-size/medium?v=v2&amp;px=400" role="button" title="clipboard_image_12.png" alt="clipboard_image_12.png" /></span>&nbsp;</P> <P style="margin: 0in; margin-bottom: .0001pt; text-align: center;" align="center"><EM>Windows Server: What's new and what's next</EM></P> <P style="margin: 0in; margin-bottom: .0001pt; text-align: center;" align="center"><EM><A href="#" target="_blank" rel="noopener">https://myignite.techcommunity.microsoft.com/sessions/81704</A></EM></P> <P style="margin: 0in; margin-bottom: .0001pt;">&nbsp;</P> <P style="margin: 0in; margin-bottom: .0001pt;"><STRONG>Badges</STRONG></P> <P style="margin: 0in; margin-bottom: .0001pt;">&nbsp;</P> <P style="margin: 0in; margin-bottom: .0001pt;">Badges? What are those? I don't need no stinkin' badges. Well let me tell you.&nbsp;Our partners have developed solutions optimized for running specific apps/workloads on Azure Stack HCI (both current and future). We are working with our partners to add these "badges" that will be in full view and searchable so you can ensure that the solution you are purchasing is the right solution for you.&nbsp;&nbsp;</P> <P style="margin: 0in; margin-bottom: .0001pt;">&nbsp;</P> <P style="margin: 0in 0in 0.0001pt; text-align: center;"><span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="clipboard_image_13.png" style="width: 400px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/157493i524C87F46DAACE15/image-size/medium?v=v2&amp;px=400" role="button" title="clipboard_image_13.png" alt="clipboard_image_13.png" /></span>&nbsp;</P> <P style="margin: 0in; margin-bottom: .0001pt; text-align: center;" align="center"><EM>Windows Server: What's new and what's next</EM></P> <P style="margin: 0in; margin-bottom: .0001pt; text-align: center;" align="center"><EM><A href="#" target="_blank" rel="noopener">https://myignite.techcommunity.microsoft.com/sessions/81704</A></EM></P> <P style="margin: 0in; margin-bottom: .0001pt;">&nbsp;</P> <P style="margin: 0in; margin-bottom: .0001pt;"><STRONG>Stretch Azure Stack HCI for Disaster Recovery</STRONG></P> <P style="margin: 0in; margin-bottom: .0001pt;">&nbsp;</P> <P style="margin: 0in; margin-bottom: .0001pt;">I mentioned was saving our biggest ask for last. In Windows Server vNext, we will now offer Stretch Azure Stack HCI for disaster recovery purposes. You will now be able to have automatic failovers in case of disaster. We are also making it easier as there is less for you to configure for this, we will do a lot of this for you.&nbsp; This is also one of those, "sorry, but this will be in Windows Server vNext".</P> <P style="margin: 0in; margin-bottom: .0001pt;">&nbsp;</P> <P style="margin: 0in 0in 0.0001pt; text-align: center;" align="center"><span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="clipboard_image_14.png" style="width: 400px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/157494iD6671AEA9357D4C1/image-size/medium?v=v2&amp;px=400" role="button" title="clipboard_image_14.png" alt="clipboard_image_14.png" /></span></P> <P style="margin: 0in; margin-bottom: .0001pt; text-align: center;" align="center">&nbsp;</P> <P style="margin: 0in; margin-bottom: .0001pt; text-align: center;" align="center"><EM>Stretching Azure Stack HCI for disaster recovery: A glimpse into the future</EM></P> <P style="margin: 0in; margin-bottom: .0001pt; text-align: center;" align="center"><EM><A href="#" target="_blank" rel="noopener">https://myignite.techcommunity.microsoft.com/sessions/83962</A></EM></P> <P style="margin: 0in; margin-bottom: .0001pt; text-align: center;" align="center">&nbsp;</P> <P style="margin: 0in; margin-bottom: .0001pt; text-align: center;" align="center"><EM>What's next for software defined storage and networking for Windows Server</EM></P> <P style="margin: 0in; margin-bottom: .0001pt; text-align: center;" align="center"><EM><A href="#" target="_blank" rel="noopener">https://myignite.techcommunity.microsoft.com/sessions/81960</A></EM></P> <P style="margin: 0in; margin-bottom: .0001pt;">&nbsp;</P> <P style="margin: 0in; margin-bottom: .0001pt; text-align: center;" align="center"><EM>HCI is the name, futures are the game</EM></P> <P style="margin: 0in; margin-bottom: .0001pt; text-align: center;" align="center"><EM><A href="#" target="_blank" rel="noopener">https://myignite.techcommunity.microsoft.com/sessions/89330</A></EM></P> <P style="margin: 0in; margin-bottom: .0001pt;">&nbsp;</P> <P style="margin: 0in; margin-bottom: .0001pt;">As I have mentioned previously, we are not resting on our laurels. We are working on many many other improvements for Windows Server vNext, we just aren't ready to announce them just yet.</P> <P style="margin: 0in; margin-bottom: .0001pt;">&nbsp;</P> <P style="margin: 0in; margin-bottom: .0001pt;">Thank you,</P> <P style="margin: 0in; margin-bottom: .0001pt;">John Marlin</P> <P style="margin: 0in; margin-bottom: .0001pt;">Senior Program Manager</P> <P style="margin: 0in; margin-bottom: .0001pt;">Windows High Availability and Storage</P> Mon, 18 Nov 2019 20:03:53 GMT https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/talking-failover-clustering-and-azure-stack-hci-ignite-2019/ba-p/1015497 John Marlin 2019-11-18T20:03:53Z Windows Server 2019 Failover Clustering New Features https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/windows-server-2019-failover-clustering-new-features/ba-p/544029 <P>Greeting Failover Cluster fans!!&nbsp; John Marlin here and I own the Failover Clustering feature within the Microsoft product team.&nbsp; In this blog, I will be giving an overview and demo a lot of the new features in Windows Server 2019 Failover Clustering.&nbsp; I have held off on this to let some things settle down with some of the announcements regarding <A href="#" target="_self">Azure Stack HCI</A>, upcoming <A href="#" target="_self">Windows Server Summit</A>, etc.</P> <P>&nbsp;</P> <P>I have broken these all down into 7 videos so that you can view them in smaller chunks rather than one massive long video.&nbsp; With each video, I am including a quick description of what features that will be covered.&nbsp; Each of the videos are approximately 15 minutes long.</P> <P>&nbsp;</P> <P><FONT size="5" color="#0000ff"><U><STRONG>Part 1</STRONG></U></FONT></P> <P>In this video, we will take a brief look back at Windows Server 2016 Failover Clustering and preview what we did in regards to Windows Server 2019 to make things better.</P> <P>&nbsp;</P> <P><IFRAME src="https://www.youtube.com/embed/wU5ZJW3u3sc" width="987" height="616" frameborder="0" allowfullscreen="allowfullscreen" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture"></IFRAME></P> <P>&nbsp;</P> <P><FONT color="#0000ff"><U><STRONG><FONT size="5">Part 2</FONT></STRONG></U></FONT></P> <P>In Part 2 of the series, we take a look at&nbsp;<STRONG>Windows Admin Center</STRONG> and how it can make the user experience better, <STRONG>Cluster Performance History</STRONG> to get a history on how the cluster/nodes are performing, <STRONG>System Insights</STRONG> using predictive analytics (AI) and machine learning (ML), and <STRONG>Persistent Memory</STRONG> which is the latest in storage/memory technology.</P> <P>&nbsp;</P> <P><IFRAME src="https://www.youtube.com/embed/2ycqvQP96wA" width="987" height="616" frameborder="0" allowfullscreen="allowfullscreen" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture"></IFRAME></P> <P>&nbsp;</P> <P><U><STRONG><FONT size="5" color="#0000ff">Part 3</FONT></STRONG></U></P> <P>In Part 3 of the series, we take a look at <STRONG>Cluster Sets</STRONG> as our new scaling technology, actual in-place <STRONG>Windows Server Upgrades</STRONG> which have not been supported in the past, and <STRONG>Microsoft Distributed Transaction Coordinator (MSDTC)</STRONG>.</P> <P>&nbsp;</P> <P><IFRAME src="https://www.youtube.com/embed/ywXZpQLY6yw" width="987" height="616" frameborder="0" allowfullscreen="allowfullscreen" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture"></IFRAME></P> <P>&nbsp;</P> <P><FONT size="5" color="#0000ff"><U><STRONG>Part 4</STRONG></U></FONT></P> <P>In this video, we take a look at <STRONG>Two-Node Hyperconverged</STRONG> and the new way of configuring resiliency, <STRONG>File Share Witness</STRONG> capabilities for achieving quorum at the edge, <STRONG>Split Brain</STRONG> detection and how we try to lessen chances of nodes running independent of each other, and what we did with <STRONG>Security</STRONG> in mind.</P> <P>&nbsp;</P> <P><IFRAME src="https://www.youtube.com/embed/DLQTcIFsksE" width="987" height="616" frameborder="0" allowfullscreen="allowfullscreen" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture"></IFRAME></P> <P>&nbsp;</P> <P><STRONG><U><FONT size="5" color="#0000ff">Part 5</FONT></U></STRONG></P> <P>This video talks about <STRONG>Scale-Out File Servers</STRONG> and some of the connectivity enhancements, <STRONG>Cluster Shared Volumes&nbsp;(CSV)</STRONG> with caching and a security enhancement, <STRONG>Marginal Disk</STRONG> support and the way we are detecting drives that are starting to go bad, and <STRONG>Cluster Aware Updating</STRONG> enhancements for when you are patching your Cluster nodes.</P> <P>&nbsp;</P> <P><IFRAME src="https://www.youtube.com/embed/xc5RBEj74Dw" width="987" height="616" frameborder="0" allowfullscreen="allowfullscreen" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture"></IFRAME></P> <P>&nbsp;</P> <P><FONT size="5" color="#0000ff"><U><STRONG>Part 6</STRONG></U></FONT></P> <P>This video will talk about enhancements we made with the <STRONG>Cluster Network Name</STRONG>, changes made for when running <STRONG>Failover Clusters in Azure</STRONG>&nbsp;as IaaS virtual machines, and how <STRONG>Domain Migrations</STRONG> are no longer a pain point moving between domains.</P> <P>&nbsp;</P> <P><IFRAME src="https://www.youtube.com/embed/JfFK17A1KEM" width="987" height="616" frameborder="0" allowfullscreen="allowfullscreen" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture"></IFRAME></P> <P>&nbsp;</P> <P><FONT size="5" color="#0000ff"><U><STRONG>Part 7</STRONG></U></FONT></P> <P>As a wrap up, we will take a look at a couple announcements made and demonstrated at Microsoft Ignite 2018 regarding <STRONG>IOPs</STRONG> and <STRONG>Capacity</STRONG>.</P> <P>&nbsp;</P> <P><IFRAME src="https://www.youtube.com/embed/G4Opr3IbKoA" width="987" height="616" frameborder="0" allowfullscreen="allowfullscreen" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture"></IFRAME></P> <P>&nbsp;</P> <P>My hope is that you enjoy these videos and get a good understanding our roadmap from Windows Server 2016 to Windows Server 2019 from a Failover Clustering standpoint.&nbsp; If there are any questions about any of these features, hit me up on Twitter (below).</P> <P>&nbsp;</P> <P>Thanks</P> <P>John Marlin</P> <P>Senior Program Manager</P> <P>High Availability and Storage</P> <P>Twitter:&nbsp;<A href="#" target="_self">@JohnMarlin_MSFT</A></P> Thu, 05 Dec 2019 00:59:48 GMT https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/windows-server-2019-failover-clustering-new-features/ba-p/544029 John Marlin 2019-12-05T00:59:48Z List of Failover Cluster Events in Windows 2016/2019 https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/list-of-failover-cluster-events-in-windows-2016-2019/ba-p/447150 <P>Let me tell you of a story of myself and one of my asks while I was still in support.&nbsp;</P> <P>&nbsp;</P> <P>We always thought the it would be nice to have a listing of all Failover Clustering events for references.&nbsp; Our customers ask for it, we ask for it.&nbsp;&nbsp; So logically, we approached the Product Group and asked for it.&nbsp; The response back wasn't what we really wanted to hear which was, <EM>"<FONT color="#0000FF">It's in the code that you can pull out but will take you some time to piece it all together</FONT>"</EM>.&nbsp; Again, not what we wanted to hear and had this ask on multiple occasions.</P> <P>&nbsp;</P> <P>Then, little Johnny joined the big boys in the Product Group and became the PM owner of Failover Clustering infrastructure.&nbsp; Now, everyone keeps asking me for it.&nbsp; My response?&nbsp;</P> <P>&nbsp;</P> <P><EM>"<FONT color="#0000FF">It's in the code that you can pull out but will take you some time to piece it all together</FONT>"</EM></P> <P>&nbsp;</P> <P>Awful hypocritical of myself, but when I did take a look at it, yes, it would take a while to do (not talking about a day or so).&nbsp; I could see why we had the response previously.</P> <P>&nbsp;</P> <P>Here's a bit of trivia I bet you did not know.</P> <P>&nbsp;</P> <P>Q: How many events are there for Failover Clustering?</P> <P>A:&nbsp;Between the System Event and the Microsoft-Windows-FailoverClustering/Operational channel, there are a total of 388 events in Windows 2019</P> <P>&nbsp;</P> <P>&nbsp;While in Support, I did not realize how many people were asking for it.&nbsp; I was getting hit up from all over the place both internally and externally.&nbsp; I was starting to look a lot like:</P> <P>&nbsp;</P> <P><span class="lia-inline-image-display-wrapper lia-image-align-center" image-alt="Cluster-Events-1.png" style="width: 200px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/108484i463693A40E54460D/image-size/small?v=v2&amp;px=200" role="button" title="Cluster-Events-1.png" alt="Cluster-Events-1.png" /></span></P> <P>&nbsp;</P> <P>Finally, I decided that I was going to do it.&nbsp; So rolled up my sleeves and a couple weeks later, it was finally complete.&nbsp; FINALLY!!!!&nbsp; The spreadsheet with the events is attached to this blog.&nbsp; My hope is that you can make good use of it and is what you have been asking for.</P> <P>&nbsp;</P> <P>To explain it a bit, this list is for Windows 2016 and 2019 Failover Clustering.&nbsp; Many of these same events are in previous versions.&nbsp; We have not removed any events, only added with each version.&nbsp; I have separated it into two tabs, one for Windows 2016 and the other for the Windows 2019 new events.&nbsp;</P> <P>&nbsp;</P> <OL> <LI>There are a few duplicate event IDs.&nbsp; That is by design.&nbsp; The description of the event is going to depend on the call made, so they may differ slightly.&nbsp;&nbsp;</LI> <LI>I sorted it all by the severity levels.&nbsp; Feel free to sort however you wish.</LI> <LI>You may notice '%1', '%2', etc values in the description.&nbsp; When there is an event, we collect the values such as resource, group, etc as a variable and substitute the variable in the actual description.</LI> <LI>You may notice '%n' in some of the descriptions.&nbsp; I left those in the spreadsheet and are carriage returns.&nbsp; I didn't want to replace them as it would distort things if you wanted to sort differently.</LI> </OL> <P>Is there a moral of the story for little Johnny?&nbsp; Don't know.&nbsp; But now that it is done, he feels much better.</P> <P><span class="lia-inline-image-display-wrapper lia-image-align-center" image-alt="Cluster-Events-2.png" style="width: 256px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/108485i2EF3A4E1709C324A/image-size/medium?v=v2&amp;px=400" role="button" title="Cluster-Events-2.png" alt="Cluster-Events-2.png" /></span></P> <P>&nbsp;</P> <P>Happy Clustering !!!</P> <P>&nbsp;</P> <P>John Marlin</P> <P>Senior Program Manager</P> <P>High Availability and Storage</P> <P>&nbsp;</P> <P>&nbsp;</P> Thu, 26 Sep 2019 23:51:30 GMT https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/list-of-failover-cluster-events-in-windows-2016-2019/ba-p/447150 John Marlin 2019-09-26T23:51:30Z Optimizing Hyper-V Live Migrations on an Hyperconverged Infrastructure https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/optimizing-hyper-v-live-migrations-on-an-hyperconverged/ba-p/396609 <P>We have been doing some extensive testing in how to best configure Hyper-V live migration to achieve the best performance and highest level of availability.</P> <H2>Recommendations:</H2> <OL> <LI>Configure Live Migration to use SMB. This <STRONG>must be performed on all nodes</STRONG></LI> </OL> <PRE>&nbsp; &nbsp; <FONT color="#0000ff">Set-VMHost –VirtualMachineMigratio<SPAN style="font-family: inherit;">nPerformanceOption SMB</SPAN></FONT></PRE> <P>&nbsp;</P> <OL start="2"> <LI>Use RDMA enabled NIC’s to offload the CPU and improve network performance</LI> </OL> <P>&nbsp;</P> <OL start="3"> <LI>Configure SMB Bandwidth Limits to ensure live migrations do not saturate the network and throttle to 750 MB. This <STRONG>must be performed on all nodes</STRONG></LI> </OL> <P>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; First install the SMB Bandwidth Limits feature:</P> <PRE>&nbsp; <FONT color="#0000ff">Add-WindowsFeature -Name FS-SMBBW</FONT></PRE> <P>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Throttle to 750 MB</P> <PRE>&nbsp; &nbsp; &nbsp;<FONT color="#0000ff">Set-SmbBandwidthLimit -Category LiveMigration -BytesPerSecond 750MB</FONT></PRE> <P>&nbsp;</P> <OL start="4"> <LI>Configure a maximum of 2 simultaneous Live migrations (which is default). This <STRONG>must be performed on all nodes</STRONG></LI> </OL> <P>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Leave at the default value of 2, no changes required</P> <PRE>&nbsp; &nbsp; &nbsp;<FONT color="#0000ff">Set-VMHost -MaximumVirtualMachineMigrations 2</FONT></PRE> <P>&nbsp;</P> <H2>Background:</H2> <P>For those that want to understand the ‘why’ on the above recommendations, read on!</P> <P>&nbsp;</P> <P>What a live migration fundamentally does is take the memory allocated to a virtual machine and copies it over the network from one server to another.&nbsp; Let’s say you allocated 4 GB of memory to a virtual machine, when you invoke a live migration that 4 GB of memory is copied over the network between the source and destination server.&nbsp; Because the VM is running, that means that memory is changing while that 4 GB is copied.&nbsp; Those changes are tracked, and once the initial allocated memory is copied, then a second pass occurs and the changed memory is copied.&nbsp; In the second pass the amount of memory changed will be smaller and takes less time; all the while yet memory is changing while that happens.&nbsp; So a third pass happens, and so on with each pass getting faster and the delta of memory changed getting smaller.&nbsp; Eventually the set of memory gets small enough and the VM is paused and the final set of changes is copied over, then the VM is resumed on the new server.&nbsp; While the VM is paused and the final memory copy occurs the VM is not available, this is referred to as the blackout window.&nbsp; This is not unique to Hyper-V, all virtualization platforms have this.&nbsp; The magic of a live migration is that as long as the blackout window is within the TCP reconnect window, it is completely seamless to the applications.&nbsp; That’s how a live migration achieves zero downtime from an application perspective, even though there is a very small amount of downtime from an infrastructure perspective.&nbsp; Don’t get hung up on the blackout window (if it is within the TCP reconnect window), it’s all about the app!</P> <P>&nbsp;</P> <P>Live migration supports TCPIP, Compression, and SMB to perform live migration.&nbsp; On nearly all hyperconverged infrastructure (HCI) systems have RDMA enabled network cards, and Server Message Block (SMB) has a feature called SMB Direct which can take advantage of RDMA.&nbsp; By using SMB as the protocol for the memory copy over the network, it will result in drastically reduced CPU overhead to conduct the data copy with the best network performance.&nbsp; This is important to minimize consuming CPU cycles from other running virtual machines and keeping the data copy windows small so that the number of passes to copy changed memory is minimized.&nbsp; Another feature of SMB is SMB Multi-channel, which will stream the live migration across multiple network interfaces to achieve even better performance.</P> <P>&nbsp;</P> <P>An HCI system is a distributed system that is heavily dependent on reliable networking, as there is cluster communication and data replication also occurring over the network.&nbsp; From a network perspective, a live migration is a sudden burst of heavy network traffic.&nbsp; Using SMB bandwidth limits to achieve Network Quality of Service (QoS) is desired to keep this burst traffic from saturating the network and negatively impacting other aspects on the system.&nbsp; The testing conducted tested different bandwidth limits on a dual 10 Gpbs RDMA enabled NIC and measured failures under stress conditions and found that throttling live migration to 750 MB achieved the highest level of availability to the system.&nbsp; On a system with higher bandwidth, you may be able to throttle to a value higher than 750 MB.</P> <P>&nbsp;</P> <P>When draining a node in an HCI system, multiple VMs can be live migrated at the same time.&nbsp; This parallelization can achieve faster overall times when moving large numbers of VMs off a host.&nbsp; As an example, instead of copying just 4 GB for a single machine, it will copy 4 GB for one VM and 4 GB for another VM.&nbsp; But there is a sweet spot, a single live migration at a time serializes and results in longer overall times and having too may simultaneous live migrations can end up taking much longer.&nbsp; Remember that if the network becomes saturated with many large copies, that each one takes longer…&nbsp; which means more memory is changing on each, which means more passes, and results in overall longer times for each. &nbsp;Two simultaneous live migrations were found to deliver the best balance in combination with a 750 MB throttle.</P> <P>&nbsp;</P> <P>Lastly a live migration will not infinitely continue to make pass after pass copying changed memory on a very busy VM with a slow interconnect that is taking a long time, eventually live migration will give up and freeze the VM and make a final memory copy.&nbsp; This can result in longer blackout windows, and if that final copy exceeds the TCP reconnect window then it can impact the apps.&nbsp; This is why ensuring live migrations can complete in a timely manner is important.</P> <P>&nbsp;</P> <P>In our internal testing we have found that these recommended settings will achieve the fastest times to drain multiple VMs off of server, will achieve the smallest blackout windows for application availability, and the least impact to production VMs, and will achieve greatest levels of availability to the infrastructure.</P> <P>&nbsp;</P> <P>Elden Christensen</P> <P>Principal Program Manager</P> <P>High Availability and Storage Team</P> Thu, 05 Sep 2019 15:12:35 GMT https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/optimizing-hyper-v-live-migrations-on-an-hyperconverged/ba-p/396609 John Marlin 2019-09-05T15:12:35Z So what exactly is the CLIUSR account https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/so-what-exactly-is-the-cliusr-account/ba-p/388832 <P>From time to time, people stumble across the local user account called CLIUSR and wonder what it is, while you really don’t need to worry about it; we will cover it for the curious in this blog.</P> <P>&nbsp;</P> <P>The CLIUSR account is a local user account created by the Failover Clustering feature when it is installed on Windows Server 2012 or later. Well, that’s easy enough, but why is this account here? Taking a step back, let’s take a look at why we are using this account.</P> <P>&nbsp;</P> <P>In the Windows Server 2003 and previous versions of the Cluster Service, a domain user account was used to start the Cluster Service. This Cluster Service Account (CSA) was used for forming the Cluster, joining a node, registry replication, etc. Basically, any kind of authentication that was done between nodes used this user account as a common identity.</P> <P>&nbsp;</P> <P>A number of support issues were encountered as domain administrators were pushing down group policies that stripped rights away from domain user accounts, not taking into consideration that some of those user accounts were used to run services. An example of this is the Logon as a Service right. If the Cluster Service account did not have this right, it was not going to be able to start the Cluster Service. If you were using the same account for multiple clusters, then you could incur production downtime across a number of critical systems. You also had to deal with password changes in Active Directory. If you changed the user accounts password in AD, you also needed to change passwords across all Clusters/nodes that use the account.</P> <P>&nbsp;</P> <P>In Windows Server 2008, we learned and redesigned everything about the way we use start the service to make it more resilient, less error prone, and easier to manage. We started using the built-in Network Service to start the Cluster Service. Keep in mind that this is not the full blown account, just simply a reduced privileged set. Changing it to this reduced account was a solution for the group policy issues.</P> <P>&nbsp;</P> <P>For authentication purposes, it was switched over to use the computer object associated with the Cluster Name known as the Cluster Name Object (CNO)for a common identity. Because this CNO is a machine account in the domain, it will automatically rotate the password as defined by the domain’s policy for you (which is every 30 days by default).</P> <P>&nbsp;</P> <P>Great!! No more domain user account and its password changes we have to account for. No more trying to remember which Cluster was using which account. Yes!! Ah, not so fast my friend. While this solved some major pain, it did have some side effects.</P> <P>&nbsp;</P> <P>Starting in Windows Server 2008 R2, admins started virtualizing everything in their datacenters, including domain controllers. Cluster Shared Volumes (CSV) was also introduced and became the standard for private cloud storage. Some admin’s completely embraced virtualization and virtualized every server in their datacenter, including to add domain controllers as a virtual machine to a Cluster and utilize the CSV drive to hold the VHD/VHDX of the VM.</P> <P>&nbsp;</P> <P>This created a “chicken or the egg” scenario that many companies ended up in. In order to mount the CSV drive to get to the VMs, you had to contact a domain controller to get the CNO. However, you couldn’t start the domain controller because it was running on the CSV.</P> <P>&nbsp;</P> <P>Having slow or unreliable connectivity to domain controllers also had effect on I/O to CSV drives. CSV does intra-cluster communication via SMB much like connecting to file shares. To connect with SMB, it needs to authenticate and in Windows Server 2008 R2, that involved authenticating the CNO with a remote domain controller.</P> <P>&nbsp;</P> <P>For Windows Server 2012, we had to think about how we could take the best of both worlds and get around some of the issues we were seeing. We are still using the reduced Network Service privilege to start the Cluster Service, but now to remove all external dependencies we have a local (non-domain) user account for authentication between the nodes.</P> <P>&nbsp;</P> <P>This local “user” account is not an administrative account or domain account. This account is automatically created for you on each of the nodes when you create a cluster or on a new node being added to the existing Cluster. This account is completely self-managed by the Cluster Service and handles automatically rotating the password for the account and synchronizing all the nodes for you. The CLIUSR password is rotated at the same frequency as the CNO, as defined by your domain policy (which is every 30 days by default). With it being a local account, it can authenticate and mount CSV so the virtualized domain controllers can start successfully. You can now virtualize all your domain controllers without fear. So we are increasing the resiliency and availability of the Cluster by reducing external dependencies.</P> <P>&nbsp;</P> <P>This account is the CLIUSR account and is identified by its description.</P> <P>&nbsp;</P> <P><span class="lia-inline-image-display-wrapper lia-image-align-center" image-alt="cliusr-1.png" style="width: 622px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/100141iB84A3A53348BD68D/image-size/large?v=v2&amp;px=999" role="button" title="cliusr-1.png" alt="cliusr-1.png" /></span></P> <P>&nbsp;</P> <P><SPAN>One question that we get asked is if the CLIUSR account can be deleted. From a security standpoint, additional local accounts (not default) may get flagged during audits. If the network administrator isn’t sure what this account is for (i.e. they don’t read the description of “Failover Cluster Local Identity”), they may delete it without understanding the ramifications. For Failover Clustering to function properly, this account is necessary for authentication.</SPAN></P> <P>&nbsp;</P> <P><span class="lia-inline-image-display-wrapper lia-image-align-center" image-alt="cliusr-2.png" style="width: 588px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/100142i9E131CEF364F689D/image-size/large?v=v2&amp;px=999" role="button" title="cliusr-2.png" alt="cliusr-2.png" /></span></P> <P>&nbsp;</P> <OL> <LI>Joining node starts the Cluster Service and passes the CLIUSR credentials across.</LI> <LI>All passes, so the node is allowed to join.</LI> </OL> <P><LI-WRAPPER></LI-WRAPPER></P> <P>There is one extra safe guard we did to ensure continued success. If you accidentally delete the CLIUSR account, it will be recreated automatically when a node tries to join the Cluster.</P> <P>&nbsp;</P> <P>Short story… the CLIUSR account is an internal component of the Cluster Service. It is completely self-managing and there is nothing you need to worry about regarding configuring and managing it. So leave it alone and let it do its job.</P> <P>&nbsp;</P> <P>In Windows Server 2016, we will be taking this even a step further by leveraging certificates to allow Clusters to operate without any external dependencies of any kind. This allows you to create Clusters out of servers that reside in different domains or no domains at all. But that’s a blog for another day.</P> <P>&nbsp;</P> <P>Also, please be aware that there are security guides/blogs out there that can block local non-administrative accounts from doing certain things.&nbsp; For example, this blog explains:</P> <P>&nbsp;</P> <P><EM>Blocking Remote Use of Local Accounts</EM><BR /><EM><A href="#" target="_blank" rel="noopener">https://blogs.technet.microsoft.com/secguide/2014/09/02/blocking-remote-use-of-local-accounts</A></EM></P> <P>&nbsp;</P> <P>In this blog, it talks about&nbsp;"Deny access to this computer from the network" with the default generic security identifier of S-1-5-113 (NT AUTHORITY\Local account).&nbsp; This will cause the Cluster Service not be able to do what it needs to do (joins, management, etc).&nbsp; CLIUSR is a local account without administrative rights, so it really cannot do anything to the system in a disruptive manner.&nbsp; It goes on to explain what you should put in which is the generic identifier S-1-5-114 (NT AUTHORITY\Local account and member of Administrators group).&nbsp; This way, Cluster can still perform as Cluster should.</P> <P>&nbsp;</P> <P>As an FYI, this is not the only group policy.&nbsp; Others could include, but not limited to:</P> <P>&nbsp;</P> <UL> <LI>Access this computer over the network</LI> <LI>Deny log on locally</LI> <LI>Deny log on locally as a service</LI> </UL> <P>&nbsp;</P> <P>Hopefully, this answers any questions you have regarding the CLIUSR account and its use.</P> <P>&nbsp;</P> <P>Happy Clustering !!!</P> <P>&nbsp;</P> <P>John Marlin</P> <P>Senior Program Manager</P> <P>High Availability and Storage Team</P> <P>Twitter:&nbsp;@Johnmarlin@MSFT</P> Tue, 19 Nov 2019 22:44:50 GMT https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/so-what-exactly-is-the-cliusr-account/ba-p/388832 John Marlin 2019-11-19T22:44:50Z No such thing as a Heartbeat Network https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/no-such-thing-as-a-heartbeat-network/ba-p/388121 <P>This is a blog that has been a long time coming.&nbsp; From time to time, we get a request about how to configure networking in Failover Clusters.&nbsp; One of the questions we get is how should the heartbeat network be configured and that is what the focus is on this blog.&nbsp; I am here to say, there is no such thing, and never was, a heartbeat network.</P> <P>&nbsp;</P> <P>Please allow me to give a little background and explain.</P> <P>&nbsp;</P> <P>In Windows 2003 and below Failover Clustering, you could define which network was used for Cluster Communication.&nbsp; Below is a picture for reference.</P> <P>&nbsp;</P> <P style="text-align: center;"><span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Heartbeat1.jpg" style="width: 372px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/99808i7A1BFFE52D37672F/image-size/large?v=v2&amp;px=999" role="button" title="Heartbeat1.jpg" alt="Heartbeat1.jpg" /></span></P> <P style="text-align: center;">&nbsp;</P> <P>In the picture above, we would want to select Private for our Cluster Communication to as to not use the Public which has all WAN traffic.&nbsp; All Cluster Communication between nodes (joins, registry updates/changes, etc) would go only over this network if it is up.&nbsp; As the picture shows, the networks are called Public and Private.&nbsp; As years went by, some started calling the Private network a Heartbeat network.</P> <P>&nbsp;</P> <P>Heartbeats are small packets (134 bytes) that travel over UDP Port 3343 on all networks configured for Cluster use between all nodes.&nbsp; They serve multiple purposes.</P> <P>&nbsp;</P> <UL> <LI>Establishes if a Cluster Network is up or down</LI> <LI>Establishes routes between nodes</LI> <LI>Ensure the health of a node is good or bad</LI> </UL> <P>So let's say I have Private set as my priority network for Cluster Communications.&nbsp; If it is up, we are sending our communication through it.&nbsp; But what happens if that network wasn't reliable.&nbsp; If a node tries to join and packets are dropping, then the join could fail.&nbsp; If this is the case, you either determine where the problem is and fix it, or go back into the Cluster properties and set the Public as priority.</P> <P>&nbsp;</P> <P>Starting in Windows 2008 Failover Clusters, the concept of Public and Private networks went out the window.&nbsp; We will now send Cluster Communication over any of our networks.&nbsp; One of the reasons for this was reliability.&nbsp; With that change, we also gave the heartbeats an additional purpose.</P> <P>&nbsp;</P> <UL> <LI>Determine the fastest and reliable routes between nodes</LI> </UL> <P>Since we are now determining the fastest and reliable routes, we could use different networks between nodes for our communication.&nbsp; Take the below as an example.</P> <P>&nbsp;</P> <P style="text-align: center;"><span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Heartbeat2.jpg" style="width: 384px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/99831i2DACE0C54E9095EF/image-size/large?v=v2&amp;px=999" role="button" title="Heartbeat2.jpg" alt="Heartbeat2.jpg" /></span></P> <P style="text-align: center;">&nbsp;</P> <P>We have three individual networks between our nodes:</P> <P>&nbsp;</P> <P style="text-align: left;">&nbsp;</P> <UL> <LI><FONT color="#0000FF"><SPAN>Blue</SPAN></FONT><SPAN>&nbsp;</SPAN>– 10 gbps used for backups and administration only</LI> <LI><FONT color="#008000">Green</FONT><SPAN style="font-family: inherit;">&nbsp;</SPAN><SPAN style="font-family: inherit;">– 40 gbps used for communicating out on the WAN to clients</SPAN></LI> <LI><FONT color="#FF0000"><SPAN>Red</SPAN></FONT><SPAN>&nbsp;</SPAN>– 40 gbps used for communicating out on the WAN to clients</LI> </UL> <P style="text-align: left;">&nbsp;</P> <P>&nbsp;</P> <P><LI-WRAPPER></LI-WRAPPER></P> <P>As a refresher, here is what the heartbeats are doing:</P> <P>&nbsp;</P> <UL> <LI>Establishes if a Cluster Network is up or down</LI> <LI>Establishes routes between nodes</LI> <LI>Ensure the health of a node is good or bad</LI> <LI>Determine the fastest and reliable routes between nodes</LI> </UL> <P>What the heartbeats are going to tell the Cluster is to use one of the faster networks for its communication.&nbsp; With that as the case, it is going to use either Red or Green network.&nbsp; If the heartbeats start detecting that neither of these is as reliable (i.e. dropping a packet, network congested, etc), it will automatically switch and use the Blue network.&nbsp; That's it, nothing for you to configure extra.</P> <P>&nbsp;</P> <P>So to wrap things up, remember these things about Failover Clusters and Heartbeats.</P> <P>&nbsp;</P> <OL> <LI>There is no such thing as a heartbeat network or a network dedicated to heartbeats</LI> <LI>Heartbeat packets are lightweight (134 bytes in size)</LI> <LI>Heartbeats are sensitive to latency</LI> <LI>Bandwidth is not an important factor, quality of service is.&nbsp; If your network is all teamed, ensure you have set up Network QOS policies for our UDP 3343 traffic.</LI> </OL> <P>For more information regarding configuring networks in a Cluster, please see Microsoft Ignite session:</P> <P>&nbsp;</P> <P><A href="#" target="_self">Failover Clustering Networking Essentials</A></P> <P>&nbsp;</P> <P><SPAN>Happy Clustering !!!!</SPAN></P> <P>&nbsp;</P> <P><SPAN>John Marlin</SPAN></P> <P><SPAN>Senior Program Manager</SPAN></P> <P><SPAN>Microsoft Corporation</SPAN></P> <P><SPAN>Twitter:&nbsp;@Johnmarlin_MSFT</SPAN></P> Mon, 25 Mar 2019 22:49:09 GMT https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/no-such-thing-as-a-heartbeat-network/ba-p/388121 John Marlin 2019-03-25T22:49:09Z Windows Server 2016/2019 Cluster Resource / Resource Types https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/windows-server-2016-2019-cluster-resource-resource-types/ba-p/372163 <P><STRONG> First published on MSDN on Jan 16, 2019 </STRONG> <BR />Over the years, we have been asked about what some of the Failover Cluster resources/resource types are and what they do. There are several resources that have been asked about on multiple occasions and we haven't really had a good definition to point you to. Well, not anymore. <BR /><BR />What I want to do with this blog is define what they are, what they do, and when they were added (or removed). I am only going to cover the in-box resource types that come with Failover Clustering. But first, I wanted to explain what a cluster "resource" and "resource types" are. <BR /><BR />Cluster resources are physical or logical entities, such as a file share, disk, or IP Address managed by the Cluster Service. The operating system does not distinguish between cluster and local resources. Resources may provide a service to clients or be an integral part of the cluster. Examples of resources would be physical hardware devices such as disk drives, or logical items such as IP addresses, network names, applications, and services. They are the basic and smallest configurable unit managed by the Cluster Service. A resource can only run on a single node in a cluster at a time. <BR /><BR />Cluster resource types are dynamic library plug-ins. These Resource DLLs are responsible for carrying out most operations on cluster resources. A resource DLL is characterized as follows: <BR /><BR /></P> <UL> <UL> <LI>It contains the resource-specific code necessary to provide high availability for instances of one or more resource types.</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>It exposes this code through a standard interface consisting of a set of <A href="#" target="_blank" rel="noopener"> entry point functions </A> .</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>It is registered with the Cluster service to associate one or more resource type names with the name of the DLL.</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>It is always loaded into a Resource Monitor's process when in use.</LI> </UL> </UL> <P><BR /><BR />When the Cluster service needs to perform an operation on a resource, it sends the request to the Resource Monitor assigned to the resource. If the Resource Monitor does not have a DLL in its process that can handle that type of resource, it uses the registration information to load the DLL associated with the resource type. It then passes the Cluster service's request to one of the DLL's entry point functions. The resource DLL handles the details of the operation so as to meet the specific needs of the resource. <BR /><BR />You can define your own resource types to provide customized support for cluster-unaware applications, enhanced support for cluster-aware applications, or specialized support for new kinds of devices. For more information, see <A href="#" target="_blank" rel="noopener"> Creating Resource Types </A> . <BR /><BR />All resource types that are available in a Failover Cluster can be seen by right-mouse clicking on the name of the Cluster, choosing Properties, and selecting the Resource Types tab ( <EM> shown below </EM> ). <BR /><BR /><span class="lia-inline-image-display-wrapper lia-image-align-inline" style="width: 525px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90678i40FBACDA7CEAE51B/image-size/large?v=v2&amp;px=999" role="button" /></span> <BR /><BR />You can also get a list from running the PowerShell command <A href="#" target="_blank" rel="noopener"> Get-ClusterResourceType </A> . Please keep in mind that all resource types may not show up or have access to. For example, if the Hyper-V role is not installed, the virtual machine resource types will not be available. <BR /><BR />So enough about this, let's get to the resource types, when they were available and, for some, when they were last seen. <BR /><BR />Since there are multiple versions of Windows Clustering, this blog will only focus on the two latest versions (Windows Server 2016 and 2019). <BR /><BR /><BR /><BR /><STRONG> Windows Server 2016 / 2019 </STRONG> <BR /><BR /><STRONG> Cloud Witness (clusres.dll): </STRONG> Cloud Witness is a quorum witness that leverages Microsoft Azure as the arbitration point. It uses Azure Blob Storage to read/write a blob file which is then used as an arbitration point in case of split-brain resolution. <BR /><BR /><STRONG> DFS Replicated Folder (dfsrclus.dll): </STRONG> Manages a Distributed File System (DFS) replicated folder. When creating a DFS, this resource type is configured to ensure proper replication occurs. For more information regarding this, please refer to the <A href="#" target="_blank" rel="noopener"> 3-part blog series </A> on the topic. <BR /><BR /><STRONG> DHCP Service (clnetres.dll): </STRONG> The DHCP Service resource type supports the Dynamic Host Configuration Protocol (DHCP) Service as a cluster resource. There can be only one instance of a resource of this type in the cluster (that is, a cluster can support only one DHCP Service). Dynamic Host Configuration Protocol (DHCP) is a client/server protocol that automatically provides an Internet Protocol (IP) host with its IP address and other related configuration information such as the subnet mask and default gateway. RFCs 2131 and 2132 define DHCP as an Internet Engineering Task Force (IETF) standard based on Bootstrap Protocol (BOOTP), a protocol with which DHCP shares many implementation details. DHCP allows hosts to obtain required TCP/IP configuration information from a DHCP server. <BR /><BR /><STRONG> Disjoint IPv4 Address (clusres.dll): </STRONG> IPv4 Resource type that can be used if setting up a site to site VPN Gateway. It can only be configured by PowerShell, not by the Failover Cluster Manager, the GUI tool on Windows Server. We added two IP addresses of this resource type, one for the internal network and one for the external network. <BR /><BR /></P> <UL> <UL> <LI>The internal address is plumbed down for the cluster network that is identified by Routing Domain ID and VLAN number. Remember, we mapped them to the internal network adapters on the Hyper-V hosts earlier. It should be noted that this address is the default gateway address for all machines on the internal network that need to connect to Azure.</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>The external address is plumbed down for the cluster network that is identified by the network adapter name. Remember, we renamed the external network adapter to “Internet” on both virtual machines.</LI> </UL> </UL> <P><BR /><BR /><STRONG> Disjoint IPv6 Address (clusres.dll): </STRONG> IPv6 Resource type that can be used if setting up a site to site VPN Gateway. It can only be configured by PowerShell, not by the Failover Cluster Manager, the GUI tool on Windows Server. We added two IP addresses of this resource type, one for the internal network and one for the external network. <BR /><BR /></P> <UL> <UL> <LI>The internal address is plumbed down for the cluster network that is identified by Routing Domain ID and VLAN number. Remember, we mapped them to the internal network adapters on the Hyper-V hosts earlier. It should be noted that this address is the default gateway address for all machines on the internal network that need to connect to Azure.</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>The external address is plumbed down for the cluster network that is identified by the network adapter name. Remember, we renamed the external network adapter to “Internet” on both virtual machines.</LI> </UL> </UL> <P><BR /><BR /><STRONG> Ras Cluster Resource (rasclusterres.dll): </STRONG> This resource object specifies where the site-to-site VPN configuration is stored. The file share can be anywhere the two virtual machines have read / write access to. It can only be configured by PowerShell, not by the Failover Cluster Manager, the GUI tool on Windows Server. This resource type is only available after installing the VPN Roles in Windows Server. <BR /><BR /><STRONG> Distributed File System (clusres2.dll): </STRONG> Manages a Distributed File System (DFS) as a cluster resource. When creating a DFS, this resource type is configured to ensure proper replication occurs. For more information regarding this, please refer to the <A href="#" target="_blank" rel="noopener"> 3-part blog series </A> on the topic. <BR /><BR /><STRONG> Distributed Transaction Coordinator (mtxclu.dll): </STRONG> The Distributed Transaction Coordinator (DTC) resource type supports the Microsoft Distributed Transaction Coordinator (MSDTC). MSDTC is a Windows service providing transaction infrastructure for distributed systems, such as SQL Server. In this case, a transaction means a general way of structuring the interactions between autonomous agents in a distributed system. <BR /><BR /><STRONG> File Server (clusres2.dll): </STRONG> Manages the shares that are created as highly available. A file share is a location on the network where clients connect to access data, including documents, programs, images, etc. <BR /><BR /><STRONG> File Share Witness (clusres.dll): </STRONG> A File Share Witness is a witness (quorum) resource and is simply a file share created on a completely separate server from the cluster for tie-breaker scenarios when quorum needs to be established. A File Share Witness does not store cluster configuration data like a disk. It does, however, contain information about which version of the cluster configuration database is most recent. <BR /><BR /><STRONG> Generic Application (clusres2.dll): </STRONG> The Generic Application resource type manages <A href="#" target="_blank" rel="noopener"> cluster-unaware applications </A> as cluster resources, as well as <A href="#" target="_blank" rel="noopener"> cluster-aware applications </A> that are not associated with custom resource types. The Generic Application resource DLL provides only very basic application control. For example, it checks for application failure by determining whether the application's process still exists and takes the application offline by terminating the process. <BR /><BR /><STRONG> Generic Script (clusres2.dll): </STRONG> The Generic Script resource type works in conjunction with a script that you must provide to manage an application or service as a highly available cluster resource. In effect, the Generic Script resource type allows you to script your own resource DLL. For more information on how to use the Generic Script resource type, see <A href="#" target="_blank" rel="noopener"> Using the Generic Script Resource Type </A> . <BR /><BR /><STRONG> Generic Service (clusres2.dll): </STRONG> The Generic Service resource type manages services as cluster resources. Similar to the Generic Application resource type, the Generic Service resource type provides only the most basic functionality. For example, the failure of a Generic Service resource is determined by a query of the Service Control Manager (SCM). If the service is running, it is presumed to be online. To provide greater functionality, you can define a custom resource type (for information, see <A href="#" target="_blank" rel="noopener"> Creating Resource Types </A> ). <BR /><BR />A generic service resource type is usually used to manage a stateless service as a cluster resource, which can be failed over. However, generic services don't provide much state information other than their online state, so if they have an issue that doesn't cause the resource to go offline, it is more difficult to detect a service failure. <BR /><BR />Generic services should only be used when all of the following conditions are true; otherwise, you should create a resource DLL. <BR /><BR /></P> <UL> <UL> <LI>The resource is not a device. The generic resource types are not designed to manage hardware.</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>The resource is stateless.</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>The resource is not dependent on other resources.</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>The resource does not have unique attributes that should be managed with private properties.</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>The resource does not have special functionality that should be exposed through control codes.</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>The resource can be started and stopped easily without using special procedures.</LI> </UL> </UL> <P><BR /><BR /><STRONG> Health Service (healthres.dll): </STRONG> The Health Service constantly monitors your Storage Spaces Direct cluster to detect problems and generate "faults". Through either Windows Admin Center or PowerShell, it displays any current faults, allowing you to easily verify the health of your deployment without looking at every entity or feature in turn. Faults are designed to be precise, easy to understand, and actionable. <BR /><BR />Each fault contains five important fields: <BR /><BR /></P> <UL> <UL> <LI>Severity</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>Description of the problem</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>Recommended next step(s) to address the problem</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>Identifying information for the faulting entity</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>Its physical location (if applicable)</LI> </UL> </UL> <P><BR /><BR /><STRONG> IP Address (clusres.dll): </STRONG> The IP Address resource type is used to manage Internet Protocol (IP) network addresses. When an IP Address resource is included in a group with a Network Name resource, the group can be accessed by network clients as a failover cluster instance (formerly known as a virtual server). <BR /><BR /><STRONG> IPv6 Address (clusres.dll): </STRONG> The IPv6 Address resource type is used to manage Internet Protocol version 6 (IPv6) network addresses. When an IPv6 Address resource is included in a group with a Network Name resource, the group can be accessed by network clients as a failover cluster instance (formerly known as a virtual server). <BR /><BR /><STRONG> IPv6 Tunnel Address (clusres2.dll): </STRONG> The IPv6 Tunnel Address resource type is used to manage Internet Protocol version 6 (IPv6) network tunnel addresses. When an IPv6 Tunnel Address resource is included in a group with a Network Name resource, the group can be accessed by network clients as a failover cluster instance (formerly known as a virtual server). <BR /><BR /><STRONG> iSCSI Target Server (wtclusres.dll): </STRONG> Creates a highly available ISCSI Target server for machines to connect to for drives. <BR /><BR /><STRONG> Microsoft iSNS (isnsclusres.dll): </STRONG> Manages an Internet Storage Name Service (iSNS) server. iSNS provides discovery services for Internet Small Computer System Interface (ISCSI) storage area networks. iSNS processes registration requests, deregistration requests, and queries from iSNS clients. We would recommend not using this resource type moving forward as it is <A href="#" target="_blank" rel="noopener"> being removed </A> from the product. <BR /><BR /><STRONG> MSMQ (mqclus.dll): </STRONG> Message Queuing (MSMQ) technology enables applications running at different times to communicate across heterogeneous networks and systems that may be temporarily offline. Applications send messages to queues and read messages from queues. <BR /><BR /><STRONG> MSMQTriggers (mqtgclus.dll): </STRONG> Message Queuing triggers allow you to associate the arrival of incoming messages at a destination queue with the functionality of one or more COM components or stand-alone executable programs. These triggers can be used to define business rules that can be invoked when a message arrives at the queue without doing any additional programming. Application developers no longer must write any infrastructure code to provide this kind of message-handling functionality. <BR /><BR /><STRONG> Network File System (nfsres.dll): </STRONG> NFS cluster resource has dependency on one Network Name resource and can also depend on one or more disk resources in a resource group. For a give network name resource there can be only one NFS resource in a resource group. The dependent disk resource hosts one or more of NFS shared paths. The shares hosted on a NFS resource are scoped to the dependent network name resources. Shares scoped to one network name are not visible to clients that mount using other network names or node names residing on the same cluster. <BR /><BR /><STRONG> Network Name (clusres.dll): </STRONG> The Network Name resource type is used to provide an alternate computer name for an entity that exists on a network. When included in a group with an IP Address resource, a Network Name resource provides an identity to the role, allowing the role to be accessed by network clients as a Failover Cluster instance. <BR /><BR /><STRONG> Distributed Network Name (clusres.dll): </STRONG> A Distributed Network Name is a name in the Cluster that does not use a clustered IP Address.&nbsp; It is a name that is published in DNS using the IP Addresses of all the nodes in the Cluster.&nbsp; Client connectivity to this type name is reliant on DNS round robin.&nbsp; In Azure, this type name can be used in leiu of having the need for an Internal Load Balancer (ILB) address.&nbsp; The predominant usage of a Distributed Network Name is with a Scale-Out File Server (discussed next).&nbsp; In Windows Server 2019, we added the ability for the Cluster Name Object (CNO) to use a DNN.&nbsp; For more information on the CNO usage as a Distinguished Network Name, please refer to the&nbsp;<A href="https://gorovian.000webhostapp.com/?exam=t5/Failover-Clustering/Windows-Server-2019-Failover-Clustering-New-Features/ba-p/544029" target="_self">Windows Server 2019 Failover Clustering New Features</A> blog.</P> <P>&nbsp;</P> <P><STRONG>Scale Out File Server (clusres.dll): </STRONG> A Scale Out File Server (SOFS) is a clustered file share that can be accessed by any of the nodes.&nbsp; It uses the Distributed Network Name as the client access point and does not use a clustered IP Address.&nbsp; The Distributed Network Name is discussed previously.<BR /><BR /><STRONG> Physical Disk (clusres.dll): </STRONG> The Physical Disk resource type manages a disk on a shared bus connected to two or more cluster nodes. Some groups may contain one or more Physical Disk resources as dependencies for other resources in the group. On a <A href="#" target="_blank" rel="noopener"> Storage Spaces Direct </A> cluster, the disks are local to each of the nodes. <BR /><BR /><STRONG> Hyper-V Network Virtualization Provider Address (provideraddressresource.dll): </STRONG> The IP address assigned by the hosting provider or the datacenter administrators based on their physical network infrastructure. The PA appears in the packets on the network that are exchanged with the server running Hyper-V that is hosting network virtualized virtual machine(s). The PA is visible on the physical network, but not to the virtual machines. <BR /><BR /><STRONG>Storage Pool (clusres.dll): </STRONG> Manages a storage pool resource.&nbsp; It allows for the creation and deletion of storage spaces virtual disks. <BR /><BR /><STRONG> Storage QoS Policy Manager (clusres.dll): </STRONG> A resource type for the Policy Manger that collects the performance of storage resources allocated to the individual highly available virtual machines. It monitors the activity to help ensure storage is used fairly within I/O performance established through any policies that may be configured. <BR /><BR /><STRONG> Storage Replica (wvrres.dll): </STRONG> Storage Replica is Windows Server technology that enables replication of volumes between servers or clusters for disaster recovery. This resource type enables you to create stretch failover clusters that span two sites, with all nodes staying in sync. A Stretch Cluster allows configuration of computers and storage in a single cluster, where some nodes share one set of asymmetric storage and some nodes share another, then synchronously or asynchronously replicate with site awareness. By stretching clusters, workloads can be run in multiple datacenters for quicker data access by local proximity users and applications, as well as better load distribution and use of compute resources. <BR /><BR /><STRONG> Task Scheduler (clusres.dll): </STRONG> Task Scheduler is a resource that is tied to tasks you wish to run against the Cluster. Clustered tasks are not created or shown in Failover Cluster Manager. To create or view a Clustered Scheduled Task, you would need to use PowerShell. <BR /><BR /></P> <UL> <UL> <LI><A href="#" target="_blank" rel="noopener"> Set-ClusteredScheduledTask </A></LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI><A href="#" target="_blank" rel="noopener"> Register-ClusteredScheduledTask </A></LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI><A href="#" target="_blank" rel="noopener"> Get-ClusteredScheduledTask </A></LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI><A href="#" target="_blank" rel="noopener"> Unregister-ClusteredScheduledTask </A></LI> </UL> </UL> <P><BR /><BR /><STRONG> Virtual Machine (vmclusres.dll): </STRONG> The Virtual Machine resource type is used to control the state of a virtual machine (VM). The following table shows the mapping between the state of the VM (indicated by the EnabledState property of the <A href="#" target="_blank" rel="noopener"> Msvm_ComputerSystem </A> instance representing the VM) and the state of the Virtual Machine resource (indicated by the State property of the <A href="#" target="_blank" rel="noopener"> MSCluster_Resource </A> class or the return of <A href="#" target="_blank" rel="noopener"> GetClusterResourceState </A> function). <BR /><BR /><BR /></P> <TABLE> <TBODY> <TR> <TD>VM State</TD> <TD>Virtual Machine resource state</TD> </TR> <TR> <TD>Disabled</TD> <TD>3</TD> </TR> <TR> <TD>Offline</TD> <TD>3</TD> </TR> <TR> <TD>Suspended</TD> <TD>32769</TD> </TR> <TR> <TD>Starting</TD> <TD>32770</TD> </TR> <TR> <TD>Online Pending</TD> <TD>129</TD> </TR> <TR> <TD>Online</TD> <TD>2</TD> </TR> <TR> <TD>Stopping</TD> <TD>32774</TD> </TR> <TR> <TD>Offline Pending</TD> <TD>130</TD> </TR> <TR> <TD>Saving</TD> <TD>32773</TD> </TR> <TR> <TD>Enabled</TD> <TD>2</TD> </TR> <TR> <TD>Paused</TD> <TD>32768</TD> </TR> <TR> <TD>Pausing</TD> <TD>32776</TD> </TR> <TR> <TD>Resuming</TD> <TD>32777</TD> </TR> </TBODY> </TABLE> <P><BR /><STRONG> Virtual Machine Cluster WMI (vmclusres.dll): </STRONG> The Virtual Machine Cluster WMI resource type is one used when virtual machine grouping (also known as&nbsp;virtual machine sets)&nbsp;has been configured. By grouping virtual machines together, managing the "group" is much easier than all of the virtual machines individually. VM Groups enable checkpoints, backup and replication of VMs that form a guest-cluster and that use a Shared VHDX. <BR /><BR /><STRONG> Virtual Machine Configuration (vmclusres.dll): </STRONG> The Virtual Machine Configuration resource type is used to control the state of a virtual machine configuration. <BR /><BR /><STRONG> Virtual Machine Replication Broker (vmclusres.dll): </STRONG> Replication broker is a prerequisite if you are replicating clusters using Hyper-V replica. It acts the point of contact for any replication requests, and can query into the associated cluster database to decide which node is the correct one to redirect VM specific events such as Live Migration requests etc. The broker also handles authentication requests on behalf of the VMs. A new node can be added or removed from a cluster at any point, without the need to reconfigure the replication as the communication between the primary and recovery clusters is directed to the respective brokers. <BR /><BR /><STRONG> Virtual Machine Replication Coordinator (vmclusres.dll): </STRONG> Coordinator comes into picture when we use the concept of “collection” in Hyper-V replica . This was introduced in Windows Server 2016 and is a prerequisite if you are using few of the latest features for eg: shared virtual hard disks. When VMs are replicated as part of a collection, the replication broker coordinates actions/ events that affect VM group – for eg:, to take &nbsp;a point in time snapshot which is app consistent, &nbsp;applying the replication settings, &nbsp;modifying the interval for replication etc. and propagating the change across all the VMs in the collection. <BR /><BR /><STRONG> WINS Service (clnetres.dll): </STRONG> The WINS Service resource type supports the Windows Internet Name Service (WINS) as a cluster resource. There can be only one instance of a resource of this type in the cluster; in other words, a cluster can support only one WINS Service. Windows Internet Name Service (WINS) is a legacy computer name registration and resolution service that maps computer NetBIOS names to IP addresses. <BR /><BR /><BR /><BR /><STRONG> Windows Server 2016 only </STRONG> <BR /><BR /><STRONG> Cross Cluster Dependency Orchestrator (clusres.dll): </STRONG> This is a resource type that you can ignore and does not do anything. This was to be a new feature to be introduced. However, it never came to fruition, but the resource type was not removed. It is removed in Windows Server 2019. <BR /><BR /><BR /><BR /><STRONG> Windows Server 2019 only </STRONG> <BR /><BR /><STRONG> SDDC Management (sddcres.dll): </STRONG> SDDC Management is installed when the cluster is enabled for Storage Spaces Direct. It is the management API that Windows Admin Center uses to connect/manage your Storage Spaces Direct. It is an in-box resource type with Windows Server 2019 and is a download and manual addition to Windows Server 2016. For information regarding this, please refer to the <A href="#" target="_blank" rel="noopener"> Manage Hyper-Converged Infrastructure with Windows Admin Center </A> document. <BR /><BR /><STRONG> Scaleout Worker (scaleout.dll): </STRONG> This is used for Cluster Sets.&nbsp; In a Cluster Set deployment, the CS-Master interacts with a new cluster resource on the member Clusters called “Cluster Set Worker” (CS-Worker). CS-Worker acts as the only liaison on the cluster to orchestrate the local cluster interactions as requested by the CS-Master. Examples of such interactions include VM placement and cluster-local resource inventorying. There is only one CS-Worker instance for each of the member clusters in a Cluster Set. <BR /><BR /><STRONG> Scaleout Master (scaleout.dll): This is used when </STRONG> In a Cluster Set, the communication between the member clusters is loosely coupled, and is coordinated by a new cluster resource called “Cluster Set Master” (CS-Master). Like any other cluster resource, CS-Master is highly available and resilient to individual member cluster failures and/or the management cluster node failures. Through a new Cluster Set WMI provider, CS-Master provides the management endpoint for all Cluster Set manageability interactions. <BR /><BR /><STRONG> Infrastructure File Server (clusres.dll): </STRONG> In hyper-converged configurations, an Infrastructure SOFS allows an SMB client (Hyper-V host) to communicate with guaranteed Continuous Availability (CA) to the Infrastructure SOFS SMB server. This hyper-converged SMB loopback CA is achieved via VMs accessing their virtual disk (VHDx) files where the owning VM identity is forwarded between the client and server. This identity forwarding allows ACL-ing VHDx files just as in standard hyper-converged cluster configurations as before. There can be at most only one Infrastructure SOFS cluster role on a Failover Cluster. Each CSV volume created in the failover automatically triggers the creation of an SMB Share with an auto-generated name based on the CSV volume name. An administrator cannot directly create or modify SMB shares under an Infra SOFS role, other than via CSV volume create/modify operations. This role is commonly used with Cluster Sets. <BR /><BR /><BR /><BR />Thanks <BR />John Marlin <BR />Senior Program Manager <BR />High Availability and Storage <BR /><BR />Follow me on Twitter <A href="#" target="_blank" rel="noopener"> @JohnMarlin_MSFT </A></P> Tue, 03 Dec 2019 21:16:08 GMT https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/windows-server-2016-2019-cluster-resource-resource-types/ba-p/372163 John Marlin 2019-12-03T21:16:08Z Microsoft Ignite 2018 Clustering Sessions available https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/microsoft-ignite-2018-clustering-sessions-available/ba-p/372161 <HTML> <HEAD></HEAD><BODY> <STRONG> First published on MSDN on Nov 01, 2018 </STRONG> <BR /> For those who attended Microsoft Ignite 2018 in Orlando, Florida, we thank you for making it another huge success. <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90675i5D9E339C94BFA391" /> <BR /> <BR /> So much fun was had by all.&nbsp; We had the privilege of showing you what is new and coming in Windows Server 2019 with 700+ deep dive sessions and over 100+ workshops. <BR /> <BR /> You got the latest insights and skills from technology leaders and practitioners shaping the future of cloud, data, business intelligence, teamwork, and productivity. &nbsp;As well as immersed yourselves with the latest tools, tech, and experiences that matter, and heard the latest updates and ideas directly from the experts.&nbsp; There were demos galore throughout all the sessions. <BR /> <BR /> Who can forget the demo showing these unheard-of numbers before now running Windows Server 2019 Storage Spaces Direct and <A href="#" target="_blank"> Intel's </A> Optane DC Persistent Memory: <BR /> <BR /> <A href="#" target="_blank"> </A> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90676i87E47C48B629EB2A" /> <A href="#" target="_blank"> </A> <BR /> <BR /> Or, the storage limit increase to 4 petabytes.&nbsp; We are not just saying it because it’s a big number, we showed it with the help from our friends at <A href="#" target="_blank"> Quanta Cloud Technology </A> , <A href="#" target="_blank"> Seagate </A> , and <A href="#" target="_blank"> Samsung </A> . <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90677iF666CC6A48A33028" /> <BR /> <BR /> In case you missed Ignite, attended but missed a session, or you wish to view the sessions again, here is the link to all the sessions available for your viewing pleasure both from the Migrosoft Ignite pages as well as YouTube. <BR /> <BR /> To kick it all off, here is Satya Nadella's keynotes to kick off Microsoft Ignite 2018. <BR /> <P> Vision Keynote <BR /> <A href="#" target="_blank"> Ignite </A> , <A href="#" target="_blank"> YouTube </A> <BR /> Satya Nadella - Chief Executive Officer of Microsoft </P> <BR /> Since this is the Failover Clustering blog, I wanted to call out these sessions specifically to what we are doing in the hyper-converged infrastructure (HCI) space. <BR /> <P> BRK2035 - Windows Server 2019: What’s new and what's next <BR /> <A href="#" target="_blank"> Ignite </A> , <A href="#" target="_blank"> YouTube </A> <BR /> Erin Chapple, Vijay Kumar <BR /> Windows Server is a key component in Microsoft's hybrid and on-premises strategy and in this session, hear what's new in Windows Server 2019. Join us as we discuss the product roadmap, Semi-Annual Channel, and demo some exciting new features. </P> <BR /> <P> BRK2241 - Windows Server 2019 deep dive <BR /> <A href="#" target="_blank"> Ignite </A> , <A href="#" target="_blank"> YouTube </A> <BR /> Jeff Woolsey <BR /> Hybrid at its core. Secure by design. With cloud application innovation and hyper-converged infrastructure built into the platform, backed by the world’s most trusted cloud, Azure, Microsoft presents Windows Server 2019. In this session Jeff Woolsey - Principal Program Manager - dives into the details of what makes Windows Server 2019 an exciting platform for IT pros and developers looking into modernizing their infrastructure and applications. </P> <BR /> <P> BRK2232 - Jumpstart your hyper-converged infrastructure deployment with Windows Server <BR /> <A href="#" target="_blank"> Ignite </A> , <A href="#" target="_blank"> YouTube </A> <BR /> Elden Christensen, Steven Ekren <BR /> The time is now to adopt hyper-converged infrastructure and Storage Spaces Direct. Where to start? This session covers design considerations and best practices, how to choose and procure the best hardware, sizing and planning, deployment, and how to validate your cluster is ready for showtime. Get tips and tricks directly from the experts! Applies to Windows Server 2016 and Windows Server 2019. </P> <BR /> <P> BRK2036 - From Hyper-V to hyper-converged infrastructure with Windows Admin Center <BR /> <A href="#" target="_blank"> Ignite </A> , <A href="#" target="_blank"> YouTube </A> <BR /> Cosmos Darwin, Daniel Lee <BR /> Discover how Windows Admin Center (Formerly Project "Honolulu") makes it easier than ever to manage and monitor Hyper-V. It’s quick to deploy, there’s no additional license, and it’s built from years of feedback – this is YOUR new dashboard! Ready to go hyper-converged? New features like Storage Spaces Direct and Software-Defined Networking (SDN) are built right in, so you get an integrated, seamless experience ready for the future of the software-defined datacenter. </P> <BR /> <P> BRK2231 - Be an IT hero with Storage Spaces Direct in Windows Server 2019 <BR /> <A href="#" target="_blank"> Ignite </A> , <A href="#" target="_blank"> YouTube </A> <BR /> Cosmos Darwin, Adi Agashe <BR /> The virtualization wave of datacenter modernization, consolidation, and savings made you an IT hero. Now, the next big wave is here: Hyper-Converged Infrastructure, powered by software-defined storage! Storage Spaces Direct is purpose-built software-defined storage for Hyper-V. Save money, accelerate IO performance, and simplify your infrastructure, from the datacenter to the edge. This packed technical session covers everything that’s new for Storage Spaces Direct in Windows Server 2019. </P> <BR /> <P> BRK2233 - Get ready for Windows Server 2008 and 2008 R2 end of support <BR /> <A href="#" target="_blank"> Ignite </A> , <A href="#" target="_blank"> YouTube </A> <BR /> Ned Pyle, Jeff Woolsey, Sue Hartford <BR /> Windows Server 2008 and 2008 R2 were great operating systems at the time, but times have changed. Cyberattacks are commonplace, and you don’t want to get caught running unsupported software. End of support for Windows Server 2008 and 2008 R2 means no more security updates starting on January 14, 2020. Join us for a demo-intensive session to learn about your options for upgrading to the latest OS. Or consider migrating 2008 to Microsoft Azure where you can get three more years of extended security updates at no additional charge. </P> <BR /> We even had a few of our Microsoft MVP's jump in and deliver some theater sessions. <BR /> <P> THR3127 - Cluster Sets in Windows Server 2019: What is it and why should I use it? <BR /> <A href="#" target="_blank"> Ignite </A> , <A href="#" target="_blank"> YouTube </A> <BR /> Carsten Rachfahl, Microsoft MVP <BR /> Would you like to have an Azure-like availability set and fault domain across multiple clusters in your private cloud? Do you need to have more than 16 nodes in an hyper-converged infrastructure cluster or want multiple 4-node HCI clusters to behave like one? Then you definitely want to attend this session and learn about Cluster Sets - a new, amazing feature in Windows Server 2019 to solve these problems. </P> <BR /> <P> THR2233 - What is the Windows Server Software Defined (WSSD) program and why does it matter? <BR /> <A href="#" target="_blank"> Ignite </A> , <A href="#" target="_blank"> YouTube </A> <BR /> Carsten Rachfahl, Microsoft MVP <BR /> The Window Server Software Defined (WSSD) program allows vendors to build and offer a tested end-to-end hyper-converged infrastructure solution. After implementing more than 100 Storage Spaces Direct projects, Carsten think this is more important than ever. Why? In this session, learn the reasons, and get help choosing the right solution for you! </P> <BR /> <P> THR3137 - The case of the shrinking data: Data Deduplication in Windows Server 2019 <BR /> <A href="#" target="_blank"> Ignite </A> , <A href="#" target="_blank"> YouTube </A> <BR /> Dave Kawula, Microsoft MVP <BR /> One of the most requested features for Storage Spaces Direct was ReFS with Data Deduplication. This feature was released over a year ago, but it was only in the Semi-Annual Release which did not include support for Storage Spaces Direct. The IT community has waited patiently, and the time has finally come with Windows Server 2019. This release has added full support for ReFS Data Deduplication into Storage Spaces Direct. What does this mean for you? How about more than 80% space savings on your VMs, Backups, ISO repositories, all running on Cluster Shared Volumes with Storage Spaces Direct. In this session, learn how to set up, configure, and test Data Deduplication with ReFS based on his years of knowledge working with Microsoft storage. </P> <BR /> These are just the tip of the iceberg with the amount of sessions available to you. We hope you enjoy these sessions and had a great time at Ignite 2018 as we sure did. <BR /> <BR /> I leave you now with two other huge announcements. <BR /> <BR /> First, Ignite will be back in Orlando, Florida for <A href="#" target="_blank"> Microsoft Ignite 2019 </A> . The dates are set for November 4-8, 2019 at the <A href="#" target="_blank"> Orange County Convention Center </A> . You can pre-register today!! <BR /> <BR /> Second, Ignite 2018 is hitting the road and going global with " <A href="#" target="_blank"> Microsoft Ignite | The Tour </A> ". Join us at the place where developers and tech professionals continue learning alongside experts. Explore the latest developer tools and cloud technologies and learn how to put your skills to work in new areas. Connect with our community to gain practical insights and best practices on the future of cloud development, data, IT, and business intelligence. Join us for two days of community-building and hands-on learning. <BR /> <BR /> We will be heading to places such as: <BR /> <UL> <BR /> <LI> Toronto, Canada </LI> <BR /> <LI> Sydney, Australia </LI> <BR /> <LI> Berlin, Germany </LI> <BR /> <LI> Amsterdam, The Netherlands </LI> <BR /> </UL> <BR /> And these are just a few of the places we are going. Head to the "Microsoft Ignite | The Tour" page and find the city near you. Oh, and did I mention it is free!!! <BR /> <BR /> Thanks <BR /> John Marlin <BR /> Senior Program Manager <BR /> High Availability and Storage <BR /> <BR /> Follow me on Twitter <A href="#" target="_blank"> @JohnMarlin_MSFT </A> </BODY></HTML> Fri, 15 Mar 2019 22:17:08 GMT https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/microsoft-ignite-2018-clustering-sessions-available/ba-p/372161 John Marlin 2019-03-15T22:17:08Z Cluster Sets in Windows Server 2019 - Hyperscale for Hyperconverged !! https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/cluster-sets-in-windows-server-2019-hyperscale-for/ba-p/372157 <P>Cluster Sets is a new feature in Windows Server 2019 that was first introduced at <A href="#" target="_blank" rel="noopener"> Ignite 2017 </A> .&nbsp; Cluster Sets is the new cloud scale-out technology that increases cluster node count in a single Software Defined Data Center (SDDC) cloud by orders of magnitude. A Cluster Set is a loosely-coupled federated&nbsp;grouping of multiple Failover Clusters: compute, storage or hyper-converged.&nbsp;&nbsp;&nbsp; The&nbsp;Cluster Sets technology enables virtual machine fluidity across member clusters within a cluster set and a unified storage namespace across the set in support of virtual machine fluidity. <BR /><BR />Cluster Sets&nbsp;gives you the benefit of hyperscale while continuing to maintain great resiliency.&nbsp; So in more clearer words, you are pseudo clustering clusters together while not putting all your eggs in one basket.&nbsp; You can now have multiple baskets to maintain greater flexibility without sacrificing resiliency. <BR /><BR />While preserving existing Failover Cluster management experiences on member clusters, a Cluster Set instance additionally offers key use cases, such as lifecycle management. The Windows Server Preview Scenario Evaluation Guide for Cluster Sets provides you the necessary background information along with step-by-step instructions to evaluate cluster sets technology using PowerShell. <BR /><BR />Here is a video providing a&nbsp;brief overview of what Cluster Sets is and can do.</P> <P><IFRAME src="https://www.youtube.com/embed/7eWGnJpf4Fk" width="560" height="315" frameborder="0"> </IFRAME></P> <P><BR />The evaluation guide&nbsp;to read more about Cluster Sets along with information on how to set it up is listed on the <A href="#" target="_blank" rel="noopener"> Microsoft Docs </A> page where this, and numerous other Microsoft products are covered.&nbsp; The quick link to the Cluster Sets page is <A href="#" target="_blank" rel="noopener"> https://aka.ms/Cluster_Sets </A> . <BR /><BR />Finally, there is a <A href="#" target="_blank" rel="noopener"> GitHub lab scenario </A> where you can set this up on your own and try it out that gives you additional instructions. <BR /><BR />We hope that you try it out and provide feedback.&nbsp; Feedback can be done in two ways: <BR /><BR /></P> <OL> <OL> <LI>The Feedback Hub on Windows 10</LI> <LI>Email <A style="font-family: inherit; background-color: #ffffff;" href="https://gorovian.000webhostapp.com/?exam=mailto:csrequests@microsoft.com" target="_blank" rel="noopener"> Cluster Sets Feedback </A><SPAN style="font-family: inherit;"> .&nbsp; This alias has been set up to provide feedback only.</SPAN></LI> </OL> </OL> <P>Thanks, <BR />John Marlin <BR />Senior Program Manager <BR />High Availability and Storage <BR /><BR />Follow me on Twitter @JohnMarlin_MSFT <BR /><BR /></P> Fri, 12 Apr 2019 17:54:20 GMT https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/cluster-sets-in-windows-server-2019-hyperscale-for/ba-p/372157 John Marlin 2019-04-12T17:54:20Z Scale-Out File Server Improvements in Windows Server 2019 https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/scale-out-file-server-improvements-in-windows-server-2019/ba-p/372156 <P><STRONG>SMB Connections move on connect </STRONG> <BR /><BR />Scale-Out File Server (SOFS) relies on DNS round robin for inbound connections sent to cluster nodes.&nbsp; When using Storage Spaces on Windows Server 2016 and older, this behavior can be inefficient: if the connection is routed to a cluster node that is not the owner of the Cluster Shared Volume (aka the coordinator node), all data redirects over the network to another node before returning to the client. The SMB Witness service detects this lack of direct I/O and moves the connection to a coordinator.&nbsp; This can lead to delays. <BR /><BR />In Windows Server 2019, we are much more efficient.&nbsp; The SMB Server service determines if direct I/O on the volume is possible.&nbsp; If direct I/O is possible, it passes the connection on.&nbsp; If it is redirected I/O, it will move the connection to the coordinator before I/O starts.&nbsp; Synchronous client redirection required changes in the SMB client, so only Windows Server 2019 and Windows 10 Fall 2017 clients can use this new functionality when talking to a Windows 2019 Failover Cluster. &nbsp;SMB clients from older OS versions will&nbsp;continue relying upon the SMB Witness to move to a more optimal server. <BR /><BR /><span class="lia-inline-image-display-wrapper lia-image-align-inline" style="width: 536px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90669i45AE03C8D5CF0710/image-size/large?v=v2&amp;px=999" role="button" /></span> <BR />As a note here, I wanted to point out when a move would and would not occur in a stretch scenario and it will depend on the storage you are using.&nbsp; So for my example, my Scale-Out File Server is running on NodeA in SiteA.&nbsp;&nbsp;All node's IP Addresses are registered in DNS and it is round robin on where a client connects.&nbsp;&nbsp;</P> <P>&nbsp;</P> <P>If you have a stretch Failover Cluster and the storage presents itself as symmetric; meaning, all nodes have access to the drives, the client connection will be moved to SiteA as described above.</P> <P>&nbsp;</P> <P>But let's say the SAN storage and is asymmetric; meaning, each site has it's own SAN storage and there is replication between them.&nbsp; This is the process that will occur.</P> <P>&nbsp;</P> <P>1. A client connection is sent to a node in SiteB</P> <P>2. The node in SiteB will retain that connection.&nbsp;</P> <P>3. All data requests will be redirected over the CSV network to SiteA.</P> <P>4. Data is retrieved and sent back over the CSV network to the node in SiteB.</P> <P>5. The node in SiteB then sends the data to the client.</P> <P>6. Rinse, repeat for all other data requests.</P> <P><BR /><STRONG>Infrastructure Scale-Out File Server </STRONG> <BR /><BR />There is a new Scale-Out File Server role in Windows Server 2019 called Infrastructure File Server.&nbsp; When you create an Infrastructure File Server, it will create a single namespace share automatically for the CSV drive (i.e. \\InfraSOFSName\Volume1, etc.).&nbsp; In hyper-converged configurations, an Infrastructure SOFS allows an SMB client (Hyper-V host) to communicate with guaranteed Continuous Availability (CA) to the Infrastructure SOFS SMB server.&nbsp; There can be at most only one infrastructure SOFS cluster role on a Failover Cluster. <BR /><BR />To create the Infrastructure SOFS, you would need to use PowerShell.&nbsp; For example: <BR />Add-ClusterScaleOutFileServerRole -Cluster MyCluster -Infrastructure -Name InfraSOFSName <BR /><span class="lia-inline-image-display-wrapper lia-image-align-inline" style="width: 913px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90671i6BFBD216BB506889/image-size/large?v=v2&amp;px=999" role="button" /></span> <BR /><BR /><span class="lia-inline-image-display-wrapper lia-image-align-inline" style="width: 913px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90672i05028E3074938D69/image-size/large?v=v2&amp;px=999" role="button" /></span> <BR /><BR /><span class="lia-inline-image-display-wrapper lia-image-align-inline" style="width: 908px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90673iE87D904BBA6C2271/image-size/large?v=v2&amp;px=999" role="button" /></span> <BR /><BR /><STRONG> SMB Loopback </STRONG> <BR /><BR />There is an enhancement made with Server Message Block (SMB) to work properly with SMB local loopback to itself which was previously not supported.&nbsp; This hyper-converged SMB loopback CA is achieved via Virtual Machines accessing their virtual disk (VHDx) files where the owning VM identity is forwarded between the client and server. <BR /><BR /><span class="lia-inline-image-display-wrapper lia-image-align-inline" style="width: 719px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90674iF5C7CB15CF69BF63/image-size/large?v=v2&amp;px=999" role="button" /></span> <BR /><BR />This is a role that Cluster Sets takes advantage of where the path to the VHD/VHDX is placed as \\InfraSOFSName\Volume1.&nbsp; This \\InfraSOFSName\Volume1 path can then be utilized by the virtual machine&nbsp;if it is local or remote. <BR /><BR /><STRONG> Identity Tunneling </STRONG> <BR /><BR />In Server 2016, if Hyper-V virtual machines are hosted on a SOFS share, you must grant the machine accounts of the Hyper-V compute nodes permission to access the VHD/VHDX files.&nbsp; If the virtual machines and VHD/VHDX is running on the same cluster, then the user must have rights.&nbsp; This can make management difficult as two sets of permissions are needed. <BR /><BR />In Windows Server 2019 when using SOFS, we now have “identity tunneling” on Infrastructure shares. When you access Infrastructure Share from the same cluster or Cluster Set, the application token is serialized and tunneled to the server, and VM disk access is done using that token. This works even if your identity is Local System, a service, or&nbsp;virtual machine&nbsp;account. <BR /><BR />Thanks, <BR />John Marlin <BR />Senior Program Manager <BR />High Availability and Storage <BR /><BR />Follow me on Twitter @JohnMarlin_MSFT</P> Tue, 10 Sep 2019 23:39:08 GMT https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/scale-out-file-server-improvements-in-windows-server-2019/ba-p/372156 John Marlin 2019-09-10T23:39:08Z New File Share Witness Feature in Windows Server 2019 https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/new-file-share-witness-feature-in-windows-server-2019/ba-p/372149 <P><STRONG> First published on MSDN on Apr 16, 2018 </STRONG></P> <P><BR />One of the quorum models for Failover Clustering is the ability to use a file share as a witness resource.&nbsp; As a recap, the File Share Witness is designated a vote in the Cluster when needed and can act as a tie breaker in case there is ever a split between nodes (mainly seen in multi-site scenarios). <BR /><BR />I don't want to go through the list of all the requirements but do want to focus on one in particular. <BR /><BR /></P> <UL> <UL> <LI>The Windows Server holding the file share must be domain joined and a part of the same forest.</LI> </UL> </UL> <P>The reason for this is that Failover Cluster utilizes Kerberos for the Cluster Name Object (CNO) to connect and authenticate the share.&nbsp; Therefore, the share must reside on a domain member that is in the same active directory forest. <BR /><BR />There are scenarios where this is not possible. &nbsp;These scenarios are:&nbsp;</P> <P>&nbsp;</P> <UL> <LI>No or extremely poor Internet access because of a remote location, so cannot use a Cloud Witness</LI> <LI>No shared drives for a disk witness. This could be a Storage Spaces Direct hyper-converged configuration, SQL Server Always On Availability Groups (AG), Exchange Database Availability Group (DAG), etc.&nbsp; All of which do not utilize shared disks.</LI> <LI>A domain controller connection is not available as the cluster has been dropped behind a DMZ</LI> <LI>A workgroup or cross-domain cluster where there in no active directory CNO object</LI> </UL> <P>&nbsp;</P> <P>We have had a lot of requests over the years for how to get around these scenarios and there wasn't a good story for it.&nbsp; Well, I am here to tell you we listened, and we produced something better than a workaround. <BR /><BR />In comes Windows Server 2019 and the new File Share Witness feature to the rescue. <BR /><BR />We can now create a File Share Witness that does not utilize the CNO, but in fact, simply uses a local user account on the server the FSW is connected to. <BR /><BR />This means <FONT color="#FF0000"><STRONG> NO kerberos </STRONG></FONT> , <FONT color="#FF0000"><STRONG> NO domain controller </STRONG></FONT> , <FONT color="#FF0000"><STRONG> NO certificates </STRONG></FONT> , and <STRONG><FONT color="#FF0000"> NO Cluster Name Object needed</FONT>. </STRONG> While we are at it, <STRONG><FONT color="#FF0000"> NO account needed on the nodes</FONT></STRONG>. Oh my!! <BR /><BR />There are multiple scenarios where this can be used such as a NAS appliance, non-domain joined Windows installations, etc. <BR /><BR />The way it works is that on the Windows Server you wish to place the FSW, create a local (not administrative) user account, give that local account full rights to the share, connect the cluster to the share. <BR /><BR />Let's take for example, I have a server called SERVER and a share called SHARE I want to utilize as the File Share Witness.&nbsp; For creating this type of File Share Witness can only be done through PowerShell.&nbsp; The steps for setting this up are:&nbsp;</P> <P>&nbsp;</P> <UL> <LI>Log on to SERVER and create a local user account (i.e. FSW-ACCT)</LI> <LI>Create a folder on SERVER and share it out</LI> <LI>Give the local user account (FSW-ACCT) full rights to the share</LI> <LI>Log in to one of your cluster nodes and run the PowerShell command:</LI> </UL> <P style="padding-left: 60px;"><STRONG>Set-ClusterQuorum -FileShareWitness \\SERVER\SHARE -Credential $(Get-Credential)</STRONG></P> <P>&nbsp;</P> <UL> <LI>You will be prompted for the account and password for which you should enter SERVER\FSW-ACCT and the password.</LI> </UL> <P>&nbsp;</P> <P>Viola!!&nbsp; You are done as we just took care of all the above scenarios.&nbsp; The cluster will keep the name and password encrypted and not accessible by anyone. <BR /><BR />For those scenarios where a additional server is not available, how about using a USB drive connected to a router?&nbsp; Yes, we have that capability and just as simplistic as setting it up on a server. <BR /><BR /><span class="lia-inline-image-display-wrapper lia-image-align-inline" style="width: 482px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90668i386231613150F567/image-size/large?v=v2&amp;px=999" role="button" /></span> <BR />Simply plug your USB drive into the port in the router and get into your router's interface.&nbsp; In there, you can set up your share name, username, and password for access.&nbsp; Use the PowerShell command above pointing it to the router and share, and you are good to go.&nbsp; To answer your next question, this works with SMB 2.0 and above.&nbsp; SMB 3.0 is not required for the witness type. <BR /><BR />Please try out this new feature and provide feedback through the Feedback Hub app. <BR /><BR />Thanks, <BR />John Marlin <BR />Senior Program Manager <BR />High Availability and Storage <BR /><BR />Follow me on Twitter @JohnMarlin_MSFT</P> Tue, 13 Aug 2019 21:09:13 GMT https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/new-file-share-witness-feature-in-windows-server-2019/ba-p/372149 John Marlin 2019-08-13T21:09:13Z Failover Cluster File Share Witness and DFS https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/failover-cluster-file-share-witness-and-dfs/ba-p/372147 <P><STRONG> First published on MSDN on Apr 13, 2018 </STRONG> <BR />One of the quorum models for Failover Clustering is the ability to use a file share as a witness resource. As a recap, the File Share Witness is designated a vote in the Cluster when needed and can act as a tie breaker in case there is ever a split between nodes (mainly seen in multi-site scenarios). <BR /><BR />However, over the years, we have seen where this share is put on a DFS Share. This is an awfully bad idea and one not supported by Microsoft.&nbsp; Please do not misunderstand that this is a stance against DFS.&nbsp; DFS is a great feature with numerous deployments out there.&nbsp; I am specifically talking about putting a cluster File Share Witness on a DFS share. <BR /><BR />Let me give you an example of what can happen on a Windows Server 2016 Cluster. Let's take the example of a 4-node multisite cluster with two nodes at each site running SQL FCI. Each side has shared drives utilizing some sort of storage replication <EM> ( <A href="#" target="_blank" rel="noopener"> Storage Replica for those Ned fans </A> ) </EM> . The cluster connects to a file share witness that is a part of DFS share. So, it would look something like this. <BR /><BR /><span class="lia-inline-image-display-wrapper lia-image-align-inline" style="width: 454px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90664i89F6EE4F134166BF/image-size/large?v=v2&amp;px=999" role="button" /></span> <BR /><BR />All is fine, dandy and working smoothly. But this is what can happen if there is some sort of break in communications between the two sites. <BR /><BR /><span class="lia-inline-image-display-wrapper lia-image-align-inline" style="width: 624px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90665iB6931A41189352F0/image-size/large?v=v2&amp;px=999" role="button" /></span> <BR /><BR />What has happened is there is a loss of connectivity between the two sites. Site A already has the file share witness and places a lock on it so no one else can come along and take it. Because it is running SQL already, it stays status quo. Over on Site B, is where the problem occurs. Since it cannot communicate to Site A, it has no idea what is going on. Site B nodes do what it is supposed to which is to arbitrate to get the Cluster Group and the witness resource. It goes to connect and DFS Referral sends it to one of the other machines and connects. Site B nodes see it has the witness, so it starts bringing everything online, which would include SQL and its databases. For those not so familiar with Failover Clustering and all its jargons, this is known as a split brain. <BR /><BR /><span class="lia-inline-image-display-wrapper lia-image-align-inline" style="width: 336px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90666i94D679F505CC8076/image-size/large?v=v2&amp;px=999" role="button" /></span> <BR /><BR />So as far as each sides view of membership, they have quorum and SQL clients are connecting and writing/updating the databases. When connectivity is restored between the sites and we get back to our normal cluster view again, we think everything is all roses again. <BR /><BR />However, remember, each side had the SQL databases being written to. Once the storage replication begins again, a very possible outcome is that everything that was written on one of the sides is now gone. <BR /><BR />So as pointed out earlier: <BR /><BR /><STRONG> This is an awfully bad idea. </STRONG> <BR /><BR />Microsoft does not support running a File Share Witness on certain DFS shares. <BR /><BR />For Windows Server 2019, additional safeguards have been added to help protect from misconfigurations. We have added logic to check to check if the share is going to DFS.&nbsp; As long as there is only one link on the DFS share (meaning DFS-N only used as a namespace), you should be good.&nbsp; But if it is DFS-N with multiple links, or it is DFS-R, then it is not going to work.<BR /><BR />In Failover Cluster Manager, if you go through the quorum configuration wizard and try to use a DFS share, it will fail on the Summary Page with this dialog: <BR /><BR /><span class="lia-inline-image-display-wrapper lia-image-align-inline" style="width: 529px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90667iEE33B9A576BBBB8D/image-size/large?v=v2&amp;px=999" role="button" /></span> <BR /><BR />If you attempt to set it through PowerShell, it will fail with this error: </P> <P>PS C:\Windows\system32&gt; Set-ClusterQuorum -FileShareWitness \\contoso.com\dfs-share <BR />Set-ClusterQuorum : There was an error configuring the file share witness '\\contoso.com\dfs-share'. <BR />Unable to save property changes for 'File Share Witness'. <BR />The request is not supported</P> <P><BR />There has also been added logic during an online of the File Share Witness as well as the thorough resource health check (IsAlive) to validate if it is on a DFS share that has multile links. If multiple links are added to DFS after the fact, these checks will fail the resource. <BR /><BR />Let me reiterate what I have already mentioned: <BR /><BR />Microsoft does not support running the File Share Witness on a DFS share with multiple links. We did not support it in the past and we will not support it for the foreseeable future. <BR /><BR />Thanks, <BR />John Marlin <BR />Senior Program Manager <BR />High Availability and Storage <BR /><BR />Follow me on Twitter @JohnMarlin_MSFT</P> Thu, 18 Apr 2019 17:49:39 GMT https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/failover-cluster-file-share-witness-and-dfs/ba-p/372147 John Marlin 2019-04-18T17:49:39Z How to Switch a Failover Cluster to a New Domain https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/how-to-switch-a-failover-cluster-to-a-new-domain/ba-p/372142 <P><STRONG> First published on MSDN on Jan 09, 2018 </STRONG></P> <P>&nbsp;</P> <P>For the last two decades, changing the domain membership of a Failover Cluster has always required that the cluster be destroyed and re-created. This is a time-consuming process, and we have worked to improve this. <BR /><BR />This is going to enable scenarios such as building a Failover Cluster in one location and then ship it to its final location or in the event that companies have merged and need to move them to their domain structure. <BR /><BR />Moving a Cluster from one domain is a straight-forward process. To accomplish this, we introduced two new PowerShell commandlets. <BR /><BR /></P> <UL> <UL> <LI><STRONG> New-ClusterNameAccount </STRONG> – creates a Cluster Name Account in Active Directory</LI> </UL> </UL> <UL> <UL> <LI><STRONG> Remove-ClusterNameAccount </STRONG> – removes the Cluster Name Accounts from Active Directory</LI> </UL> </UL> <P><BR />In the following example, this is my setup and goal: <BR /><BR /></P> <UL> <UL> <LI>2-node Windows Server, version 1709 Failover Cluster</LI> </UL> </UL> <UL> <UL> <LI>In the Cluster, the Cluster Name is CLUSCLUS and I have a File Server called FS-CLUSCLUS</LI> </UL> </UL> <UL> <UL> <LI>Both nodes are member of the same domain</LI> </UL> </UL> <UL> <UL> <LI>Both nodes and Cluster need to move to a new domain</LI> </UL> </UL> <P><STRONG>NOTE:</STRONG> <EM>Although I am using Windows Server Failover Cluster in this example, this applies to all later versions of Windows Server Failover Cluster, Windows Server Storage Spaces Direct, and Azure Stack HCI.</EM></P> <P><BR />The process to accomplish to accomplish this is to change the cluster from one domain to a workgroup and back to the new domain. For example: <BR /><BR /><span class="lia-inline-image-display-wrapper lia-image-align-inline" style="width: 765px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90663iAE6DDBD743C8CC7E/image-size/large?v=v2&amp;px=999" role="button" /></span></P> <H2><STRONG> Steps to Change Domain Membership</STRONG></H2> <P>&nbsp;</P> <P>1. Create a local Administrator account with the same name and password on all nodes.</P> <P>&nbsp;</P> <P>2. Log on to the first node with a domain user or administrator&nbsp;account that has Active Directory permissions to the Cluster Name Object (CNO), Virtual Computer Objects (VCO), has access to the Cluster,&nbsp;and open PowerShell.</P> <P>&nbsp;</P> <P>3. Ensure all cluster Network Name resources are in an Offline state and run the below command to change the type of the Cluster to a workgroup.</P> <P style="padding-left: 60px;">&nbsp;</P> <P style="padding-left: 30px;"><STRONG>Remove-ClusterNameAccount -Cluster CLUSCLUS -DeleteComputerObjects </STRONG></P> <P>&nbsp;</P> <P>4. Use Active Directory Users and Computers to ensure the CNO and VCO computer objects associated with all cluster names have been removed. <BR /><BR />5. If so, it is a good idea to go ahead and stop the Cluster Service on both nodes and set the service to MANUAL so that it does not start during this process.</P> <P>&nbsp;</P> <P style="padding-left: 30px;"><STRONG>Stop-Service -Name ClusSvc </STRONG> <BR /><STRONG>Set-Service -Name ClusSvc -StartupType Manual </STRONG></P> <P>&nbsp;</P> <P>6. Change the nodes domain membership to a workgroup, reboot, then join to the new domain, and reboot again. <BR /><BR />7. Once the nodes are in the new domain, log on to a node with a domain user or administrator&nbsp;account that has Active Directory permissions to create objects, has access to the Cluster,&nbsp;and open PowerShell. start the Cluster Service, and set it back to Automatic.</P> <P>&nbsp;</P> <P style="padding-left: 30px;"><STRONG>Start-Service -Name ClusSvc </STRONG> <BR /><STRONG>Set-Service -Name ClusSvc -StartupType Automatic </STRONG></P> <P>&nbsp;</P> <P>8. Bring the Cluster Name and all other cluster Network Name resources to an Online state.</P> <P>&nbsp;</P> <P style="padding-left: 30px;"><STRONG>Start-ClusterResource -Name "Cluster Name" </STRONG> <BR /><STRONG>Start-ClusterResource -Name FS-CLUSCLUS </STRONG></P> <P>&nbsp;</P> <P>9. We now need to change Cluster to be a part of the new domain with associated active directory objects. To do this, the command is below. The network name resources must be in an online state.</P> <P>&nbsp;</P> <P style="padding-left: 30px;"><STRONG>New-ClusterNameAccount -Name CLUSTERNAME -Domain NEWDOMAINNAME.com -UpgradeVCOs </STRONG></P> <P>&nbsp;</P> <P>10. If&nbsp;you do not have any additional groups with network names (i.e. a Hyper-V Cluster with only virtual machines), the <STRONG> -UpgradeVCOs </STRONG> parameter switch is not needed.</P> <P style="padding-left: 30px;"><BR /><EM><STRONG>NOTE:</STRONG></EM> If you are using the new <A href="https://gorovian.000webhostapp.com/?exam=t5/Failover-Clustering/New-File-Share-Witness-Feature-in-Windows-Server-2019/ba-p/372149" target="_self">USB Witness</A> feature, you will be unable to add the cluster to the new domain.&nbsp; The reasoning is that the file share witness type must utilize Kerberos for authentication.&nbsp; Simply change the witness to none before adding the cluster to the domain.&nbsp; Once it is completed, recreate the USB witness.&nbsp; The error you will see is:</P> <P>&nbsp;</P> <P style="padding-left: 30px;"><EM>New-ClusternameAccount : Cluster name account cannot be created.&nbsp; This cluster contains a file share witness with invalid permissions for a cluster of type AdministrativeAccesssPoint ActiveDirectoryAndDns.&nbsp;To proceed, delete the file share witness.&nbsp; After this you can create the cluster name account and recreate the file share witness.&nbsp; The new file share witness will be automatically created with valid permissions.</EM></P> <P><BR />11. Use Active Directory Users and Computers to check the new domain and ensure the associated computer objects were created. If they have, then bring the remaining resources in the&nbsp;groups online.</P> <P>&nbsp;</P> <P style="padding-left: 30px;"><STRONG>Start-ClusterGroup -Name "Cluster Group" <BR />Start-ClusterGroup -Name FS-CLUSCLUS </STRONG></P> <P>&nbsp;</P> <P>One last thing I wanted to add.&nbsp; Accomplishing parts of this are well within support.&nbsp; I.E. if you wish to go only from a workgroup to a domain or a domain to a workgroup, perfectly fine.&nbsp; When going from a domain to a workgroup, the AdministrativeAccessPoint will change from ActiveDirectoryAndDNS to DNS.&nbsp; When going from a workgroup to a domain, this parameter will change from DNS to ActiveDirectoryAndDNS.</P> <P>&nbsp;</P> <P>Thanks, <BR />John Marlin <BR />Senior Program Manager <BR />High Availability and Storage <BR /><BR />Follow me on Twitter @JohnMarlin_MSFT</P> Mon, 19 Jul 2021 17:43:05 GMT https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/how-to-switch-a-failover-cluster-to-a-new-domain/ba-p/372142 John Marlin 2021-07-19T17:43:05Z Container Storage Support with Cluster Shared Volumes (CSV), Storage Spaces Direct (S2D), SMB Global Mapping https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/container-storage-support-with-cluster-shared-volumes-csv/ba-p/372140 <HTML> <HEAD></HEAD><BODY> <STRONG> First published on MSDN on Aug 10, 2017 </STRONG> <BR /> By Amitabh Tamhane <BR /> <BR /> Goals: This topic provides an overview of providing persistent storage for containers with data volumes backed by Cluster Shared Volumes (CSV), Storage Spaces Direct (S2D) and SMB Global Mapping. <BR /> <BR /> Applicable OS releases: Windows Server 2016, Windows Server version 1709 <BR /> <BR /> Prerequisites: <BR /> <UL> <BR /> <LI> This topic assumes <A href="#" target="_blank"> basic understanding of containers </A> as supported on Windows Server 2016 and Windows Server version 1709 </LI> <BR /> <LI> This topic also assumes basic understanding of <A href="#" target="_blank"> Storage Spaces Direct (S2D) </A> </LI> <BR /> </UL> <BR /> Blog: <BR /> <BR /> With Windows Server 2016, many new <A href="#" target="_blank"> infrastructure and application workload features </A> were added that deliver significant value to our customers today. Amongst this long list, two very distinct features that were added: Windows Containers &amp; Storage Spaces Direct! <BR /> <H2> 1.&nbsp;&nbsp; Quick Introductions </H2> <BR /> Let's review a few technologies that have evolved independently. Together these technologies provide a platform for persistent data store for applications when running inside containers. <BR /> <H3> 1.1&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Containers </H3> <BR /> In the cloud-first world, our industry is going through a fundamental change in how applications are being developed &amp; deployed. New applications are optimized for cloud scale, portability &amp; deployment agility. Existing applications are also transitioning to containers to achieve deployment agility. <BR /> <BR /> Containers provide a virtualized operating system environment where an application can safely &amp; independently run without being aware of other applications running on the same host. With applications running inside containers, customers benefit from the ease of deployment, ability to scale up/down and save costs by better resource utilization. <BR /> <BR /> More about <A href="#" target="_blank"> Windows Containers </A> . <BR /> <H3> 1.2&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Cluster Shared Volumes </H3> <BR /> Cluster Shared Volumes (CSV) provides a multi-host read/write file system access to a shared disk. Applications can read/write to the same shared data from any node of the Failover Cluster. The shared block volume can be provided by various storage technologies like Storage Spaces Direct (more about it below), Traditional SANs, or iSCSI Target etc. <BR /> <BR /> More about <A href="#" target="_blank"> Cluster Shared Volumes (CSV) </A> . <BR /> <H3> 1.3&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Storage Spaces Direct </H3> <BR /> Storage Spaces Direct (S2D) enables highly available &amp; scalable replicated storage amongst nodes by providing an easy way to pool locally attached storage across multiple nodes. <BR /> <BR /> Create a virtual disk on top of this single storage pool &amp; any node in the cluster can access this virtual disk. CSV (discussed above) seamlessly integrates with this virtual disk to provide read/write shared storage access for any application deployed on the cluster nodes. <BR /> <BR /> S2D works seamlessly when configured on physical servers or any set of virtual machines. Simply attach data disks to your VMs and configure S2D to get shared storage for your applications. In Azure, S2D can also be configured on <A href="#" target="_blank"> Azure VMs </A> that have premium data disks attached for faster performance. <BR /> <BR /> More about <A href="#" target="_blank"> Storage Spaces Direct (S2D) </A> . S2D Overview <A href="#" target="_blank"> Video </A> . <BR /> <H3> 1.4&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Container Data Volumes </H3> <BR /> With containers, any persistent data needed by the application running inside will need to be stored outside of the container or its image. This persistent data can be some shared read-only config state or read-only cached web-pages, or individual instance data (ex: replica of a database) or shared read-write state. A single containerized application instance can access this data from any container host in the fabric or multiple application containers can access this shared state from multiple container hosts. <BR /> <BR /> With <A href="#" target="_blank"> Data Volumes </A> , a folder inside the container is mapped to another folder on the container host using local or remote storage. Using data volumes, application running inside containers access its persistent data while not being aware of the infrastructure storage topology. Application developer can simply assume a well-known directory/path to have the persistent data needed by the application. This enables the same container application to run on various deployment infrastructures. <BR /> <H2> 2.&nbsp;&nbsp; Better Together: Persistent Store for Container Fabric </H2> <BR /> This data volume functionality is great but what if a container orchestrator decides to place the application container to a different node? The persistent data needs to be available on all nodes where the container may run. These technologies together can provide a seamless way to provide persistent store for container fabric. <BR /> <H3> 2.1&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Data Volumes with CSV + S2D </H3> <BR /> Using S2D, you can leverage locally attached storage disks to form a single pool of storage across nodes. After the single pool of storage is created, simply create a new virtual disk, and it automatically gets added as a new Cluster Shared Volume (CSV). Once configured, this CSV volume gives you read/write access to the container persistent data shared across all nodes in your cluster. <BR /> <BR /> With Windows Server 2016 (plus latest updates), we now have enabled support for mapping container data volumes on top of Cluster Shared Volumes (CSV) backed by S2D shared volumes. This gives application container access to its persistent data no matter which node the container orchestrator places the container instance. <BR /> <BR /> <STRONG> Configuration Steps </STRONG> <BR /> <BR /> Consider this example (assumes you have Docker &amp; container orchestrator of your choice already installed): <BR /> <OL> <BR /> <LI> Create a cluster (in this example 4-node cluster) </LI> <BR /> </OL> <BR /> <STRONG> New-Cluster -Name &lt;name&gt; -Node &lt;list of nodes&gt; </STRONG> <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90653i4680C00600FAF640" /> <BR /> <BR /> (Note: The generic warning text above is referring to the quorum witness configuration which you can add later.) <BR /> <OL> <BR /> <LI> Enable Cluster S2D Functionality </LI> <BR /> </OL> <BR /> <STRONG> Enable-ClusterStorageSpacesDirect </STRONG> or <STRONG> Enable-ClusterS2D </STRONG> <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90654i75998A722753186D" /> <BR /> <BR /> (Note: To get the optimal performance from your shared storage, it is recommended to have SSD cache disks. It is not a must have for getting a shared volume created from locally attached storage.) <BR /> <BR /> Verify single storage pool is now configured: <BR /> <BR /> <STRONG> Get-StoragePool S2D* </STRONG> <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90655iDC27DA73B22762BF" /> <BR /> <OL> <BR /> <LI> Create new virtual disk + CSV on top of S2D: </LI> <BR /> </OL> <BR /> <STRONG> New-Volume -StoragePoolFriendlyName *S2D* -FriendlyName &lt;name&gt; -FileSystem CSVFS_REFS -Size 50GB </STRONG> <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90656i9F0B27510DDDFAF7" /> <BR /> <BR /> Verify new CSV volume getting created: <BR /> <BR /> <STRONG> Get-ClusterSharedVolume </STRONG> <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90657i22A1F17CC7AA558B" /> <BR /> <BR /> This shared path is now accessible on all nodes in your cluster: <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90658i1001276089D7CA62" /> <BR /> <OL> <BR /> <LI> Create a folder on this volume &amp; write some data: </LI> <BR /> </OL> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90659i8D810951ECB8DEF3" /> <BR /> <OL> <BR /> <LI> Start a container with data volume linked to the shared path above: </LI> <BR /> </OL> <BR /> This assumes you have <A href="#" target="_blank"> installed Docker &amp; able to run containers </A> . Start a container with data volume: <BR /> <BR /> <STRONG> docker run -it --name demo -v C:\ClusterStorage\Volume1\ContainerData:G:\AppData nanoserver cmd.exe </STRONG> <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90660i575908F4DA567AA5" /> <BR /> <BR /> Once started the application inside this container will have access to "G:\AppData"&#157; which will be shared across multiple nodes. Multiple containers started with this syntax can get read/write access to this shared data. <BR /> <BR /> Inside the container, G:\AppData1 will then be mapped to the CSV volume's "ContainerData"&#157; folder. Any data stored on "C:\ClusterStorage\Volume1\ContainerData"&#157; will then be accessible to the application running inside the container. <BR /> <H3> 2.2&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Data Volumes with SMB Global Mapping (Available in Windows Server version 1709 Only) </H3> <BR /> Now what if the container fabric needs to scale independently of the storage cluster? Typically, this is possible through SMB share remote access. With containers, wouldn't it be great to support container data volumes mapped to a remote SMB share? <BR /> <BR /> In Windows Server version 1709, there is a new support for SMB Global Mapping which allows a remote SMB Share to be mapped to a drive letter. This mapped drive is then accessible to all users on the local host. This is required to enable container I/O on the data volume to traverse the remote mount point. <BR /> <BR /> With Scaleout File Server, created on top of the S2D cluster, the same CSV data folder can be made accessible via SMB share. This remote SMB share can then be mapped locally on a container host, using the new SMB Global Mapping PowerShell. <BR /> <BR /> Caution: When using SMB global mapping for containers, all users on the container host can access the remote share. Any application running on the container host will also have access to the mapped remote share. <BR /> <BR /> <STRONG> Configuration Steps </STRONG> <BR /> <BR /> Consider this example (assumes you have Docker &amp; container orchestrator of your choice already installed): <BR /> <OL> <BR /> <LI> On the container host, globally map the remote SMB share: </LI> <BR /> </OL> <BR /> <STRONG> $creds = Get-Credentials </STRONG> <BR /> <BR /> <STRONG> New-SmbGlobalMapping -RemotePath \\contosofileserver\share1 -Credential $creds -LocalPath G: </STRONG> <BR /> <BR /> This command will use the credentials to authenticate with the remote SMB server. Then, map the remote share path to G: drive letter (can be any other available drive letter). Containers created on this container host can now have their data volumes mapped to a path on the G: drive. <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90661i90ECC2891D2D9755" /> <BR /> <OL> <BR /> <LI> Create containers with data volumes mapped to local path where the SMB share is globally mapped. </LI> <BR /> </OL> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90662iA42274A781646314" /> <BR /> <BR /> Inside the container, G:\AppData1 will then be mapped to the remote share's "ContainerData" folder. Any data stored on globally mapped remote share will then be accessible to the application running inside the container. Multiple containers started with this syntax can get read/write access to this shared data. <BR /> <BR /> This SMB global mapping support is SMB client-side feature which can work on top of any compatible SMB server including: <BR /> <UL> <BR /> <LI> Scaleout File Server on top of S2D or Traditional SAN </LI> <BR /> <LI> Azure Files (SMB share) </LI> <BR /> <LI> Traditional File Server </LI> <BR /> <LI> 3 <SUP> rd </SUP> party implementation of SMB protocol (ex: NAS appliances) </LI> <BR /> </UL> <BR /> Caution: SMB global mapping does not support DFS, DFSN, DFSR shares in Windows Server version 1709. <BR /> <H3> 2.3 Data Volumes with CSV + Traditional SANs (iSCSI, FCoE block devices) </H3> <BR /> In Windows Server 2016, container data volumes are now supported on top of Cluster Shared Volumes (CSV). Given that CSV already works with most traditional block storage devices (iSCSI, FCoE). With container data volumes mapped to CSV, enables reusing existing storage topology for your container persistent storage needs. </BODY></HTML> Fri, 15 Mar 2019 22:14:19 GMT https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/container-storage-support-with-cluster-shared-volumes-csv/ba-p/372140 Rob Hindman 2019-03-15T22:14:19Z Container Storage Support with Cluster Shared Volumes (CSV), Storage Spaces Direct (S2D), SMB Global Mapping https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/container-storage-support-with-cluster-shared-volumes-csv/ba-p/372128 <HTML> <HEAD></HEAD><BODY> <STRONG> First published on MSDN on Aug 10, 2017 </STRONG> <BR /> By Amitabh Tamhane <BR /> <BR /> Goals: This topic provides an overview of providing persistent storage for containers with data volumes backed by Cluster Shared Volumes (CSV), Storage Spaces Direct (S2D) and SMB Global Mapping. <BR /> <BR /> Applicable OS releases: Windows Server 2016, Windows Server RS3 <BR /> <BR /> Prerequisites: <BR /> <UL> <BR /> <LI> This topic assumes <A href="#" target="_blank"> basic understanding of containers </A> as supported on Windows Server 2016 (and RS3) </LI> <BR /> <LI> This topic also assumes basic understanding of <A href="#" target="_blank"> Storage Spaces Direct (S2D) </A> </LI> <BR /> </UL> <BR /> Blog: <BR /> <BR /> With Windows Server 2016, many new <A href="#" target="_blank"> infrastructure and application workload features </A> were added that deliver significant value to our customers today. Amongst this long list, two very distinct features that were added: Windows Containers &amp; Storage Spaces Direct! <BR /> <H2> 1.&nbsp;&nbsp; Quick Introductions </H2> <BR /> Lets review a few technologies that have evolved independently. Together these technologies provide a platform for persistent data store for applications when running inside containers. <BR /> <H2> 1.1&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Containers </H2> <BR /> In the cloud-first world, our industry is going through a fundamental change in how applications are being developed &amp; deployed. New applications are optimized for cloud scale, portability &amp; deployment agility. Existing applications are also transitioning to containers to achieve deployment agility. <BR /> <BR /> Containers provide a virtualized operating system environment where an application can safely &amp; independently run without being aware of other applications running on the same host. With applications running inside containers, customers benefit from the ease of deployment, ability to scale up/down and save costs by better resource utilization. <BR /> <BR /> More about <A href="#" target="_blank"> Windows Containers </A> . <BR /> <H2> 1.2&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Cluster Shared Volumes </H2> <BR /> Cluster Shared Volumes (CSV) provides a multi-host read/write file system access to a shared disk. Applications can read/write to the same shared data from any node of the Failover Cluster. The shared block volume can be provided by various storage technologies like Storage Spaces Direct (more about it below), Traditional SANs, or iSCSI Target etc. <BR /> <BR /> More about <A href="#" target="_blank"> Cluster Shared Volumes (CSV) </A> . <BR /> <H2> 1.3&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Storage Spaces Direct </H2> <BR /> Storage Spaces Direct (S2D) enables highly available &amp; scalable replicated storage amongst nodes by providing an easy way to pool locally attached storage across multiple nodes. <BR /> <BR /> Create a virtual disk on top of this single storage pool &amp; any node in the cluster can access this virtual disk. CSV (discussed above) seamlessly integrates with this virtual disk to provide read/write shared storage access for any application deployed on the cluster nodes. <BR /> <BR /> S2D works seamlessly when configured on physical servers or any set of virtual machines. Simply attach data disks to your VMs and configure S2D to get shared storage for your applications. In Azure, S2D can also be configured on <A href="#" target="_blank"> Azure VMs </A> that have premium data disks attached for faster performance. <BR /> <BR /> More about <A href="#" target="_blank"> Storage Spaces Direct (S2D) </A> . S2D Overview <A href="#" target="_blank"> Video </A> . <BR /> <H2> 1.4&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Container Data Volumes </H2> <BR /> With containers, any persistent data needed by the application running inside will need to be stored outside of the container or its image. This persistent data can be some shared read-only config state or read-only cached web-pages, or individual instance data (ex: replica of a database) or shared read-write state. A single containerized application instance can access this data from any container host in the fabric or multiple application containers can access this shared state from multiple container hosts. <BR /> <BR /> With <A href="#" target="_blank"> Data Volumes </A> , a folder inside the container is mapped to another folder on the container host using local or remote storage. Using data volumes, application running inside containers access its persistent data while not being aware of the infrastructure storage topology. Application developer can simply assume a well-known directory/path to have the persistent data needed by the application. This enables the same container application to run on various deployment infrastructures. <BR /> <H2> 2.&nbsp;&nbsp; Better Together: Persistent Store for Container Fabric </H2> <BR /> This data volume functionality is great but what if a container orchestrator decides to place the application container to a different node? The persistent data needs to be available on all nodes where the container may run. These technologies together can provide a seamless way to provide persistent store for container fabric. <BR /> <H2> 2.1&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Data Volumes with CSV + S2D </H2> <BR /> Using S2D, you can leverage locally attached storage disks to form a single pool of storage across nodes. After the single pool of storage is created, simply create a new virtual disk, and it automatically gets added as a new Cluster Shared Volume (CSV). Once configured, this CSV volume gives you read/write access to the container persistent data shared across all nodes in your cluster. <BR /> <BR /> With Windows Server 2016 (plus latest updates), we now have enabled support for mapping container data volumes on top of Cluster Shared Volumes (CSV) backed by S2D shared volumes. This gives application container access to its persistent data no matter which node the container orchestrator places the container instance. <BR /> <BR /> <STRONG> Configuration Steps </STRONG> <BR /> <BR /> Consider this example (assumes you have Docker &amp; container orchestrator of your choice already installed): <BR /> <OL> <BR /> <LI> Create a cluster (in this example 4-node cluster) </LI> <BR /> </OL> <BR /> New-Cluster -Name &lt;name&gt; -Node &lt;list of nodes&gt; <BR /> <BR /> (Note: The generic warning text above is referring to the quorum witness configuration which you can add later.) <BR /> <OL> <BR /> <LI> Enable Cluster S2D Functionality </LI> <BR /> </OL> <BR /> Enable-ClusterStorageSpacesDirect <BR /> <BR /> (Note: To get the optimal performance from your shared storage, it is recommended to have SSD cache disks. It is not a must have for getting a shared volume created from locally attached storage.) <BR /> <BR /> Verify single storage pool is now configured: <BR /> <BR /> Get-StoragePool S2D* <BR /> <OL> <BR /> <LI> Create new virtual disk + CSV on top of S2D: </LI> <BR /> </OL> <BR /> New-Volume -StoragePoolFriendlyName *S2D* -FriendlyName &lt;name&gt; -FileSystem CSVFS_REFS -Size 50GB <BR /> <BR /> Verify new CSV volume getting created: <BR /> <BR /> Get-ClusterSharedVolume <BR /> <BR /> This shared path is now accessible on all nodes in your cluster: <BR /> <OL> <BR /> <LI> Create a folder on this volume &amp; write some data: </LI> <BR /> <LI> Start a container with data volume linked to the shared path above: </LI> <BR /> </OL> <BR /> This assumes you have <A href="#" target="_blank"> installed Docker &amp; able to run containers </A> . <BR /> <BR /> Start a container with data volume: <BR /> <BR /> docker run -it --name demo -v C:\ClusterStorage\Volume1\ContainerData:G:\AppData nanoserver cmd.exe <BR /> <BR /> Once started the application inside this container will have access to “G:\AppData1â€&#157; which will be shared across multiple nodes. Multiple containers started with this syntax can get read/write access to this shared data. <BR /> <BR /> Inside the container, G:\AppData1 will then be mapped to the CSV volume1’s “ContainerDataâ€&#157; folder. Any data stored on “C:\ClusterStorage\Volume1\ContainerDataâ€&#157; will then be accessible to the application running inside the container. <BR /> <H2> 2.2&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Data Volumes with SMB Global Mapping (Available in Windows Server RS3 Only) </H2> <BR /> Now what if the container fabric needs to scale independently of the storage cluster? Typically, this is possible through SMB share remote access. With containers, wouldn’t it be great to support container data volumes mapped to a remote SMB share? <BR /> <BR /> In Windows Server RS3, there is a new support for SMB Global Mapping which allows a remote SMB Share to be mapped to a drive letter. This mapped drive is then accessible to all users on the local host. This is required to enable container I/O on the data volume to traverse the remote mount point. <BR /> <BR /> With Scaleout File Server, created on top of the S2D cluster, the same CSV data folder can be made accessible via SMB share. This remote SMB share can then be mapped locally on a container host, using the new SMB Global Mapping PowerShell. <BR /> <BR /> Caution: When using SMB global mapping for containers, all users on the container host can access the remote share. Any application running on the container host will also have access to the mapped remote share. <BR /> <BR /> <STRONG> Configuration Steps </STRONG> <BR /> <BR /> Consider this example (assumes you have Docker &amp; container orchestrator of your choice already installed): <BR /> <OL> <BR /> <LI> On the container host, globally map the remote SMB share: </LI> <BR /> </OL> <BR /> $creds = Get-Credentials <BR /> <BR /> New-SmbGlobalMapping -RemotePath \\contosofileserver\share1 -Credential $creds -LocalPath G: <BR /> <BR /> This command will use the credentials to authenticate with the remote SMB server. Then, map the remote share path to G: drive letter (can be any other available drive letter). Containers created on this container host can now have their data volumes mapped to a path on the G: drive. <BR /> <OL> <BR /> <LI> Create containers with data volumes mapped to local path where the SMB share is globally mapped. </LI> <BR /> </OL> <BR /> Inside the container, G:\AppData1 will then be mapped to the remote share’s “ContainerDataâ€&#157; folder. Any data stored on globally mapped remote share will then be accessible to the application running inside the container. Multiple containers started with this syntax can get read/write access to this shared data. <BR /> <BR /> <STRONG> This SMB global mapping support is SMB client-side feature which can work on top of any compatible SMB server including: </STRONG> <BR /> <UL> <BR /> <LI> <STRONG> </STRONG> <STRONG> Scaleout File Server on top of S2D or Traditional SAN </STRONG> </LI> <BR /> <LI> <STRONG> </STRONG> <STRONG> Azure Files (SMB share) </STRONG> </LI> <BR /> <LI> <STRONG> </STRONG> <STRONG> Traditional File Server </STRONG> </LI> <BR /> <LI> <STRONG> </STRONG> <STRONG> 3 <SUP> rd </SUP> party implementation of SMB protocol (ex: NAS appliances) </STRONG> </LI> <BR /> </UL> <BR /> <STRONG> Caution: SMB global mapping does not support DFS, DFSN, DFSR shares in Windows Server RS3. </STRONG> <BR /> <H2> 2.3 Data Volumes with CSV + Traditional SANs (iSCSI, FCoE block devices) </H2> <BR /> In Windows Server 2016, container data volumes are now supported on top of Cluster Shared Volumes (CSV). Given that CSV already works with most traditional block storage devices (iSCSI, FCoE). With container data volumes mapped to CSV, enables reusing existing storage topology for your container persistent storage needs. </BODY></HTML> Fri, 15 Mar 2019 22:12:50 GMT https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/container-storage-support-with-cluster-shared-volumes-csv/ba-p/372128 Rob Hindman 2019-03-15T22:12:50Z Container Storage Support with Cluster Shared Volumes (CSV), Storage Spaces Direct (S2D), SMB Global Mapping https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/container-storage-support-with-cluster-shared-volumes-csv/ba-p/372127 <HTML> <HEAD></HEAD><BODY> <STRONG> First published on MSDN on Aug 10, 2017 </STRONG> <BR /> </BODY></HTML> Fri, 15 Mar 2019 22:12:47 GMT https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/container-storage-support-with-cluster-shared-volumes-csv/ba-p/372127 Rob Hindman 2019-03-15T22:12:47Z Deploying IaaS VM Guest Clusters in Microsoft Azure https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/deploying-iaas-vm-guest-clusters-in-microsoft-azure/ba-p/372126 <HTML> <HEAD></HEAD><BODY> <STRONG> First published on MSDN on Feb 14, 2017 </STRONG> <BR /> <EM> <STRONG> Authors: Rob Hindman and Subhasish Bhattacharya, Program Manager, Windows Server </STRONG> </EM> <BR /> <BR /> In this blog I am going to discuss deployment considerations and scenarios for IaaS VM Guest Clusters in Microsoft Azure. <BR /> <H2> IaaS VM Guest Clustering in Microsoft Azure </H2> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90634iA9A686314B6A20EE" /> <BR /> <BR /> A guest cluster in Microsoft Azure is a Failover Cluster comprised of IaaS VMs. This allows hosted VM workloads to failover across the guest cluster. This provides a higher availability SLA for your applications&nbsp;than a single Azure VM can provide. It is especially usefully in scenarios where your VM hosting&nbsp;a critical application needs to be patched or requires configuration changes. <BR /> <BR /> Learn more about the advantages of Guest Clustering in this video: <BR /> <BR /> <IFRAME height="325" src="https://www.youtube.com/watch?v=Xmtywqk9kNQ&amp;feature=youtu.be" width="525"> </IFRAME> <BR /> <H2> Supported workloads&nbsp;for Guest Clusters on Azure </H2> <BR /> The following Guest Cluster configurations are supported by Microsoft: <BR /> <UL> <BR /> <LI> SQL Server AlwaysOn Availability Groups (no shared storage needed) </LI> <BR /> <LI> Storage Spaces Direct(S2D) for shared storage for SQL Server FCI </LI> <BR /> <LI> S2D for shared storage for RDS User Profile Disk </LI> <BR /> <LI> S2D for shared storage for Scale-out File Server (SoFS) </LI> <BR /> <LI> File Server using Storage Replica </LI> <BR /> <LI> Generic Application and Services on Guest Clusters </LI> <BR /> </UL> <BR /> <UL> </UL> <BR /> <H2> SQL Server Failover Cluster Instance (FCI) on Azure </H2> <BR /> A sizable SQL Server FCI install base today is on expensive SAN storage on-premises. In the future, we see this install base taking the following paths: <BR /> <OL> <BR /> <LI> <EM> Conversion to virtual deployments leveraging SQL Azure (PaaS): </EM> Not all on-premises SQL FCI deployments are a good fit for migration to SQL Azure. </LI> <BR /> <LI> <EM> Conversion to virtual deployments leveraging Guest Clustering of Azure IaaS VMs and low cost software defined storage &nbsp;technologies such as <A href="#" target="_blank"> Storage Replica (SR) </A> and <A href="#" target="_blank"> Storage Spaces Direct(S2D) </A> </EM> : This is the focus of this blog. </LI> <BR /> <LI> Maintaining a physical deployment on-premises while leveraging low cost SDS technologies such as SR and S2D </LI> <BR /> <LI> Preserving the current deployment on-premises </LI> <BR /> </OL> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90635iC5A5C93B215E8E19" /> <BR /> <BR /> Deployment guidance for&nbsp;the second path can be found <A href="#" target="_blank"> here </A> <BR /> <H2> Creating a Guest Cluster using Azure Templates: </H2> <BR /> Azure templates decrease the complexity and speed of your deployment to&nbsp;production. In addition it provides a&nbsp;repeatable mechanism to replicate your production deployments. <BR /> <BR /> It is easy to create a Guest Cluster in Azure using these "1-click" templates! Learn more in the following video: <BR /> <BR /> <IFRAME frameborder="0" height="540" src="https://channel9.msdn.com/Series/Microsoft-Hybrid-Cloud-Best-Practices-for-IT-Pros/Step-by-Step-Deploy-Windows-Server-2016-Storage-Spaces-Direct-S2D-Cluster-in-Microsoft-Azure/player" width="960"> </IFRAME> <BR /> <BR /> The following are&nbsp;recommended templates to use for your&nbsp;IaaS VM guest cluster deployments to Azure. <BR /> <OL> <BR /> <LI> <BR /> <H3> Deploying Scale out File Server (SoFS)&nbsp;&nbsp;on Storage Spaces Direct </H3> <BR /> Find template <A href="#" target="_blank"> here </A> <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90636i272481B8787D02DA" /> </LI> <BR /> <LI> <BR /> <H3> Deploying SoFS on Storage Spaces Direct&nbsp;(with Managed Disk) </H3> <BR /> Find template <A href="#" target="_blank"> here </A> <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90637iD88CF532D7430ABD" /> </LI> <BR /> <LI> <BR /> <H3> Deploying SQL Server FCI on Storage Spaces Direct </H3> <BR /> Find template <A href="#" target="_blank"> here </A> <BR /> <BR /> <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90638iE451C4EA14764C1E" /> <BR /> <BR /> MVP, Nirmal Thewarathanthri, provides more guidance <A href="#" target="_blank"> here </A> and a video below: <BR /> <BR /> <IFRAME height="325" src="https://www.youtube.com/watch?v=6e3PMUTBP4E" width="525"> </IFRAME> </LI> <BR /> <LI> <BR /> <H3> Deploying SQL Server AG on Storage Spaces Direct </H3> <BR /> Find template <A href="#" target="_blank"> here </A> <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90639iECF7C548C2ED119F" /> </LI> <BR /> <LI> <BR /> <H3> Deploying a Storage Spaces Direct Cluster-Cluster replication with Storage Replica and Managed Disks </H3> <BR /> Find template <A href="#" target="_blank"> here </A> <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90640i58CCAC776C5AA7E0" /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90641i98871737F7423AAB" /> </LI> <BR /> <LI> <BR /> <H3> Deploying Server-Server replication with Storage Replica and Managed Disks </H3> <BR /> </LI> <BR /> </OL> <BR /> <P> Find template <A href="#" target="_blank"> here </A> </P> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90642iCDA46E46241861CA" /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90643i46E896FEFB6B1F29" /> <BR /> <H2> Deployment Considerations: </H2> <BR /> <H3> Cluster Witness: <STRONG> </STRONG> </H3> <BR /> It is recommended to use&nbsp;a <A href="#" target="_blank"> Cloud Witness </A> for Azure Guest Clusters. <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90644iA1175F0E4EEE2283" /> <BR /> <H3> Cluster Authentication: </H3> <BR /> There are three options for Cluster Authentication for your guest cluster: <BR /> <OL> <BR /> <LI> <BR /> Traditional Domain Controller <BR /> This is the default and predominant cluster authentication model where one or two (for higher availability)&nbsp;IaaS VM Domain Controllers are deployed. </LI> <BR /> </OL> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90645iCEEF9384C83489B6" /> <BR /> <BR /> Azure template to create a new Azure VM with a new AD Forest can be found <A href="#" target="_blank"> here </A> <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90646iDD9B7960B7F6AC40" /> <BR /> <BR /> Azure template to create a new AD Domain with 2 Domain Controllers can be found <A href="#" target="_blank"> here </A> <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90647i0C6AABACD643C129" /> <BR /> 2. Workgroup Cluster <BR /> <P> A workgroup cluster reduces the cost of the deployment due&nbsp;to no DC VMs required. It reduces dependencies on Active Directory&nbsp;helping&nbsp;deployment complexity. It is an ideal fit for small deployments and test environments. Learn more <A href="#" target="_blank"> here </A> . </P> <BR /> <P> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90648i8CCE0FECF80455DE" /> </P> <BR /> <BR /> 3. Using Azure Active Directory <BR /> <P> Azure Active Directory provides a multi-tenant cloud based directory and identity management service which can be leveraged for cluster authentication. Learn more <A href="#" target="_blank"> here </A> </P> <BR /> <P> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90649i38DC9CF4A4DC7BA4" /> </P> <BR /> <BR /> <H3> Cluster Storage: </H3> <BR /> There are three predominant options for cluster storage in Microsoft Azure: <BR /> <OL> <BR /> <LI> <BR /> Storage Spaces Direct <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90650iCA3A70A28E61792D" /> <BR /> <BR /> Creates virtual shared storage across Azure IaaS VMs. Learn more <A href="#" target="_blank"> here </A> </LI> <BR /> <LI> <BR /> Application Replication <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90651iB1534FA93FA08FF3" /> </LI> <BR /> </OL> <BR /> <P> Replicates data in application layer across Azure IaaS VMs. A typical scenario is seen with SQL Server 2012 (or higher) Availability Groups (AG). </P> <BR /> <BR /> 3. Volume Replication <BR /> <P> Replicates data at volume layer across Azure IaaS VMs. This is application agnostic and works with any solution. In Windows Server 2016 volume replication is provided in-box with <A href="#" target="_blank"> Storage Replica </A> . 3rd party solutions for volume replication includes SIOS Datakeeper. </P> <BR /> <BR /> <H3> Cluster Networking: </H3> <BR /> The recommended approach to configure the IP address for the VCO (for instance for the SQL Server FCI) is through an Azure load balancer. The&nbsp;load balancer&nbsp;holds the IP address, on 1 cluster node at a time. The below video walks through the configuration of the VCO through a load balancer. <BR /> <BR /> [video width="1920" height="1080" mp4="<A href="#" target="_blank">https://msdnshared.blob.core.windows.net/media/2017/02/LoadBalancer.mp4"][/video</A>] <BR /> <BR /> <BR /> <H3> Storage Space Direct Requirements in Azure: </H3> <BR /> <UL> <BR /> <LI> <STRONG> Number of IaaS VMs: </STRONG> A minimum of 2 </LI> <BR /> <LI> <STRONG> Data Disks attached to VMs: </STRONG> <BR /> <UL> <BR /> <LI> A minimum of 4 data disks required per cluster&nbsp;i.e. 2 data disks per VM </LI> <BR /> <LI> Data disks must be Premium Azure Storage </LI> <BR /> <LI> Minimum size of data disk 128GB </LI> <BR /> </UL> <BR /> </LI> <BR /> <LI> <STRONG> VM Size: </STRONG> The following are the guidelines for minimum VM deployment sizes. <BR /> <UL> <BR /> <LI> <EM> Small: </EM> DS2_V2 </LI> <BR /> <LI> <EM> Medium: </EM> DS5_V2 </LI> <BR /> <LI> <EM> Large: </EM> GS5 </LI> <BR /> <LI> It is recommended to run the DskSpd utility to evaluate the IOPS provided for a VM deployment size. This will help in planning an appropriate deployment for your production environment. The following video outlines how to run the DskSpd tool for this evaluation. </LI> <BR /> </UL> <BR /> </LI> <BR /> </UL> <BR /> [video width="1920" height="1080" mp4="<A href="#" target="_blank">https://msdnshared.blob.core.windows.net/media/2017/02/DskSpd.mp4"][/video</A>] <BR /> <H2> Using Storage Replica for a File Server </H2> <BR /> The following are&nbsp;the workload characteristics&nbsp;for which Storage Replica is a better fit than Storage Spaces Direct for your guest cluster. <BR /> <UL> <BR /> <LI> Large number of small random reads and writes </LI> <BR /> <LI> Lot of meta-data operations </LI> <BR /> <LI> Information Worker features that don't work with Cluster Shared Volumes. </LI> <BR /> </UL> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90652i3DE3334BF714AD05" /> <BR /> <H2> UDP using File Share (SoFS) Guest Cluster </H2> <BR /> Remote Desktop Services (RDS) requires a domain-joined file server for user profile disks (UPDs). This can be facilitated by <A href="#" target="_blank"> deploying </A> a SoFS on a&nbsp;domain-joined IaaS VM guest cluster in Azure. Learn about UPDs and Remote Desktop Services <A href="#" target="_blank"> here </A> </BODY></HTML> Fri, 15 Mar 2019 22:12:42 GMT https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/deploying-iaas-vm-guest-clusters-in-microsoft-azure/ba-p/372126 John Marlin 2019-03-15T22:12:42Z Failover Clustering Sets for Start Ordering https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/failover-clustering-sets-for-start-ordering/ba-p/372105 <HTML> <HEAD></HEAD><BODY> <STRONG> First published on MSDN on Oct 10, 2016 </STRONG> <BR /> I <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90632i396F26169DC2E456" /> n a private cloud there may be multi-tier applications which are deployed across a set of virtual machines. Such as a database running in one VM, and an application leveraging that database running in another VM. It may be desired to have start ordering of highly available virtual machines which have dependencies. <BR /> <BR /> <BR /> <H2> </H2> <BR /> <H2> </H2> <BR /> <H2> Sets: </H2> <BR /> Virtual machines and other clustered applications are controlled by cluster resources, and those resources are inside of a Cluster Group. A cluster group represents the smallest unit of failover within a cluster. <BR /> <BR /> A new concept is being introduced in Windows Server 2016 called a “Set”. A set can contain one or more groups, and sets can have dependencies on each other. This enables creating dependencies between cluster groups for controlling start ordering. While Sets were primarily focused at virtual machines, it is generic cluster infrastructure which will work with any clustered role. Such as SQL Server, etc… <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90633i6AA8270F62A120F1" /> <BR /> <BR /> Here is some details on how to create and manage Sets. <BR /> <H2> Basic Set Creation: </H2> <BR /> To create a new Set and place app group in it <BR /> PS C:\&gt; New-ClusterGroupSet -Name SetforApp -Group App <BR /> <BR /> Name&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp; : SetforApp <BR /> GroupNames&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp; : {App} <BR /> ProviderNames&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; : {} <BR /> StartupDelayTrigger : Delay <BR /> StartupCount&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; : 4294967295 <BR /> IsGlobal&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; : False <BR /> StartupDelay&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; : 20 <BR /> Now create a Set and place the database group in it: <BR /> PS C:\&gt; New-ClusterGroupSet -Name SetforDatabase -Group Database <BR /> <BR /> Name&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; : SetforDatase <BR /> GroupNames&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; : {Database} <BR /> ProviderNames&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; : {} <BR /> StartupDelayTrigger : Delay <BR /> StartupCount&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp; : 4294967295 <BR /> IsGlobal&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp; : False <BR /> StartupDelay&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;&nbsp; : 20 <BR /> To view the newly created set’s <BR /> PS C:\&gt; Get-ClusterGroupSet <BR /> <BR /> Name&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;&nbsp; : SetforApp <BR /> GroupNames&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;&nbsp; : {App} <BR /> ProviderNames&nbsp;&nbsp;&nbsp; &nbsp; : {} <BR /> StartupDelayTrigger : Delay <BR /> StartupCount&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp; : 4294967295 <BR /> IsGlobal&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp; : False <BR /> StartupDelay&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp; : 20 <BR /> <BR /> Name&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp; : SetforDatabase <BR /> GroupNames&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp; : {Database} <BR /> ProviderNames&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; : {} <BR /> StartupDelayTrigger : Delay <BR /> StartupCount&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp; : 4294967295 <BR /> IsGlobal&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp; : False <BR /> StartupDelay&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp; : 20 <BR /> Now let’s add a dependency between the two newly created set’s, so that the App set will depend on the Database set. <BR /> PS C:\&gt; Add-ClusterGroupSetDependency -Name SetforApp -Provider SetforDatabase <BR /> To view the newly created dependency between the set’s. You will see that the name of the set is “SetforApp” and that it contains a single group named “App” and the set is dependent based on the ProvidersNames property on the set named “SetforDatabase” <BR /> PS C:\&gt; Get-ClusterGroupSetDependency <BR /> <BR /> <BR /> Name&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp; : SetforApp <BR /> GroupNames&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp; : {App} <BR /> ProviderNames&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; : {SetforDatabase} <BR /> StartupDelayTrigger : Delay <BR /> StartupCount&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; : 4294967295 <BR /> IsGlobal&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; : False <BR /> StartupDelay&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; : 20 <BR /> <BR /> Name&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp; : SetforDatabase <BR /> GroupNames&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp; : {} <BR /> ProviderNames&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; : {} <BR /> StartupDelayTrigger : Delay <BR /> StartupCount&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;&nbsp; : 4294967295 <BR /> IsGlobal&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp; : False <BR /> StartupDelay&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp; : 20 <BR /> After completing these steps, the result is that the Database group will be brought online first and once complete there will be a delay of 20 seconds and then the App group will be brought online. <BR /> <H2> Set Configuration: </H2> <BR /> With the defaults, dependencies between sets will start the next set 20 seconds after all the groups come online. There are a few configuration settings to modify the start behavior of dependencies between sets: <BR /> <UL> <BR /> <LI> <STRONG> StartupDelayTrigger </STRONG> – This defines what action should trigger the start and can have one of two values <BR /> <UL> <BR /> <LI> Online – Waits until the group has reached an online state </LI> <BR /> <LI> Delay – Waits the number of seconds as defined by StartupDelay (default) </LI> <BR /> </UL> <BR /> </LI> <BR /> <LI> <STRONG> StartupDelay </STRONG> – This defines a delay time in seconds&nbsp;(default value of 20) which is used if StartupDelayTrigger is set to Delay </LI> <BR /> <LI> <STRONG> StartupCount </STRONG> – This defines the number of groups in the set which must have achieved StartupDelayTrigger before the Set is considered started. <BR /> <UL> <BR /> <LI> -1 for all groups in the set (default) </LI> <BR /> <LI> 0 for majority of groups in the set </LI> <BR /> <LI> N (user defined) for the specific number of groups <BR /> <UL> <BR /> <LI> Note: If N exceeds the number of groups in the set, it effectively results in All behavior. </LI> <BR /> </UL> <BR /> </LI> <BR /> </UL> <BR /> </LI> <BR /> </UL> <BR /> You can view the configuration with the following syntax: <BR /> PS C:\&gt; Get-ClusterGroupSetDependency -Name SetforApp <BR /> <BR /> Name&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; : SetforApp <BR /> GroupNames&nbsp;&nbsp; : {App} <BR /> ProviderNames : {SetforDatabase} <BR /> StartupDelayTrigger : Delay <BR /> StartupCount : 4294967295 <BR /> IsGlobal&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp; : False <BR /> StartupDelay&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; : 20 <BR /> A set can be configured for Online with the following syntax: <BR /> PS C:\&gt; Set-ClusterGroupSet -name SetforApp -StartupDelayTrigger Online <BR /> <H2> Infrastructure Groups </H2> <BR /> There may be some groups which you wish to start before all others, such as a utility VM for example. This might be a VM which runs a domain controller, or a DNS server, or maybe a storage appliance. These infrastructure groups may need to be running before attempting to start any other tenant VM which is running apps. It would be cumbersome to create a set, and make all other sets dependent on it. To simplify this configuration, a single property can be configured on a set. <BR /> <BR /> A set can be marked to start before all others with the following setting: <BR /> <UL> <BR /> <LI> <STRONG> IsGlobal </STRONG> – This defines if the set should start before all other sets </LI> <BR /> </UL> <BR /> Example of configuring a set: <BR /> PS C:\&gt; Set-ClusterGroupSet -name SetforInfra -IsGlobal 1 <BR /> Now you can see the set is configured as True for IsGlobal. <BR /> PS C:\&gt; Get-ClusterGroupSetDependency <BR /> <BR /> Name&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; : SetforInfra <BR /> GroupNames&nbsp;&nbsp; : {ApplianceVM} <BR /> ProviderNames&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; : {} <BR /> StartupDelayTrigger : Delay <BR /> StartupCount&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; : 4294967295 <BR /> IsGlobal&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp; : True <BR /> StartupDelay&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; : 20 <BR /> <BR /> <H2> PowerShell cmdlet’s Reference </H2> <BR /> The only UI for VM Start Ordering is through PowerShell, there is no Failover Cluster Manager support in Windows Server 2016. Here is a list of all&nbsp;the relevant Set cmdlet’s <BR /> <UL> <BR /> <LI> New-ClusterGroupSet </LI> <BR /> <LI> Remove-ClusterGroupSet </LI> <BR /> <LI> Set-ClusterGroupSet </LI> <BR /> <LI> Get-ClusterGroupSet </LI> <BR /> <LI> Get-ClusterGroupSetDependency </LI> <BR /> <LI> Add-ClusterGroupToSet </LI> <BR /> <LI> Add-ClusterGroupSetDependency </LI> <BR /> <LI> Remove-ClusterGroupSetDependency </LI> <BR /> <LI> Remove-ClusterGroupFromSet </LI> <BR /> </UL> <BR /> Thanks! <BR /> Elden Christensen <BR /> Principal PM Manager <BR /> High Availability &amp; Storage </BODY></HTML> Fri, 15 Mar 2019 22:09:50 GMT https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/failover-clustering-sets-for-start-ordering/ba-p/372105 Elden Christensen 2019-03-15T22:09:50Z Failover Clustering @ Ignite 2016 https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/failover-clustering-ignite-2016/ba-p/372102 <HTML> <HEAD></HEAD><BODY> <STRONG> First published on MSDN on Sep 23, 2016 </STRONG> <BR /> I am packing my bags getting ready for Ignite 2016 in Atlanta, and I thought I would post all the cluster and related sessions you might want to check out.&nbsp; See you there! <BR /> If you couldn't make it to Ignite this year, don't worry you can stream all these sessions online. <BR /> Cluster <BR /> <UL> <BR /> <LI> <A href="#" target="_blank"> BRK3196 </A> - Keep the lights on with Windows Server 2016 Failover Clustering </LI> <BR /> <LI> <A href="#" target="_blank"> BRK2169 </A> - Explore Windows Server 2016 Software Defined Datacenter </LI> <BR /> </UL> <BR /> Storage Spaces Direct for clusters with no shared storage: <BR /> <UL> <BR /> <LI> <A href="#" target="_blank"> BRK3088 </A> - Discover Storage Spaces Direct, the ultimate software-defined storage for Hyper-V </LI> <BR /> <LI> <A href="#" target="_blank"> BRK2189 </A> - Discover Hyper-converged infrastructure with Windows Server 2016 </LI> <BR /> <LI> <A href="#" target="_blank"> BRK3085 </A> - Optimize your software-defined storage investment with Windows Server 2016 </LI> <BR /> <LI> <A href="#" target="_blank"> BRK2167 </A> - Enterprise-grade Building Blocks for Windows Server 2016 SDDC: Partner Offers </LI> <BR /> </UL> <BR /> Storage Replica for stretched clusters: <BR /> <UL> <BR /> <LI> <A href="#" target="_blank"> BRK3072 </A> - Drill into Storage Replica in Windows Server 2016 </LI> <BR /> </UL> <BR /> SQL Clusters <BR /> <UL> <BR /> <LI> <A href="#" target="_blank"> BRK3187 </A> - Learn how SQL Server 2016 on Windows Server 2016 are better together </LI> <BR /> <LI> <A href="#" target="_blank"> BRK3286 </A> - Design a Private and Hybrid Cloud for High Availability and Disaster Recovery with SQL Server 2016 </LI> <BR /> </UL> <BR /> Thanks! <BR /> Elden Christensen <BR /> Principal PM Manager <BR /> High Availability &amp; Storage </BODY></HTML> Fri, 15 Mar 2019 22:09:29 GMT https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/failover-clustering-ignite-2016/ba-p/372102 Elden Christensen 2019-03-15T22:09:29Z Using PowerShell script make any application highly available https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/using-powershell-script-make-any-application-highly-available/ba-p/372101 <HTML> <HEAD></HEAD><BODY> <STRONG> First published on MSDN on Jun 06, 2016 </STRONG> <BR /> <EM> Author: </EM> <BR /> <EM> Amitabh Tamhane </EM> <BR /> <EM> Senior Program Manager </EM> <BR /> <EM> Windows Server Microsoft </EM> <BR /> <BR /> <EM> OS releases: Applicable to Windows Server 2008 R2 or later </EM> <BR /> <BR /> Now you can use PowerShell scripts to make any application highly available with Failover Clusters!!! <BR /> <BR /> The Generic Script is a built-in resource type included in Windows Server Failover Clusters. Its advantage is flexibility: you can make applications highly available by writing a simple script. For instance, you can make any PowerShell script highly available! Interested? <BR /> <BR /> We created GenScript in ancient times and it supports only Visual Basic scripts – including Windows Server 2016. This means you can’t <EM> directly </EM> configure PowerShell as GenScript resource. However, in this blog post, I’ll walk you through a sample Visual Basic script - and associated PS scripts - to build a custom GenScript resource that works well with PowerShell. <BR /> <BR /> <EM> Pre-requisites: This blog assumes you have the basic understanding of Windows Server Failover Cluster &amp; built-in resource types. </EM> <BR /> <BR /> <STRONG> <EM> Disclaimer: </EM> </STRONG> <EM> Microsoft does not intend to officially support any source code/sample scripts provided as part of this blog. This blog is written only for a quick walk-through on how to run PowerShell scripts using GenScript resource. To make your application highly available, you are expected to modify all the scripts (Visual Basic/PowerShell) as per the needs of your application. </EM> <BR /> <H2> Visual Basic Shell </H2> <BR /> It so happens that Visual Basic Shell supports calling PowerShell script, then passing parameters and reading output. Here’s a Visual Basic Shell sample that uses some custom Private Properties: <BR /> <BR /> <BR /> '&lt;your application name&gt; Resource Type <BR /> <BR /> Function Open( ) <BR /> Resource.LogInformation "Enter Open()" <BR /> <BR /> If Resource.PropertyExists("PSScriptsPath") = False Then <BR /> Resource.AddProperty("PSScriptsPath") <BR /> End If <BR /> <BR /> If Resource.PropertyExists("Name") = False Then <BR /> Resource.AddProperty("Name") <BR /> End If <BR /> <BR /> If Resource.PropertyExists("Data1") = False Then <BR /> Resource.AddProperty("Data1") <BR /> End If <BR /> <BR /> If Resource.PropertyExists("Data2") = False Then <BR /> Resource.AddProperty("Data2") <BR /> End If <BR /> <BR /> If Resource.PropertyExists("DataStorePath") = False Then <BR /> Resource.AddProperty("DataStorePath") <BR /> End If <BR /> <BR /> '...Result... <BR /> Open = 0 <BR /> <BR /> Resource.LogInformation "Exit Open()" <BR /> End Function <BR /> <BR /> <BR /> Function Online( ) <BR /> Resource.LogInformation "Enter Online()" <BR /> <BR /> '...Check for required private properties... <BR /> <BR /> If Resource.PropertyExists("PSScriptsPath") = False Then <BR /> Resource.LogInformation "PSScriptsPath is a required private property." <BR /> Online = 1 <BR /> Exit Function <BR /> End If <BR /> '...Resource.LogInformation "PSScriptsPath is " &amp; Resource.PSScriptsPath <BR /> <BR /> If Resource.PropertyExists("Name") = False Then <BR /> Resource.LogInformation "Name is a required private property." <BR /> Online = 1 <BR /> Exit Function <BR /> End If <BR /> Resource.LogInformation "Name is " &amp; Resource.Name <BR /> <BR /> If Resource.PropertyExists("Data1") = False Then <BR /> Resource.LogInformation "Data1 is a required private property." <BR /> Online = 1 <BR /> Exit Function <BR /> End If <BR /> '...Resource.LogInformation "Data1 is " &amp; Resource.Data1 <BR /> <BR /> If Resource.PropertyExists("Data2") = False Then <BR /> Resource.LogInformation "Data2 is a required private property." <BR /> Online = 1 <BR /> Exit Function <BR /> End If <BR /> '...Resource.LogInformation "Data2 is " &amp; Resource.Data2 <BR /> <BR /> If Resource.PropertyExists("DataStorePath") = False Then <BR /> Resource.LogInformation "DataStorePath is a required private property." <BR /> Online = 1 <BR /> Exit Function <BR /> End If <BR /> '...Resource.LogInformation "DataStorePath is " &amp; Resource.DataStorePath <BR /> <BR /> PScmd = "powershell.exe -file " &amp; Resource.PSScriptsPath &amp; "\PS_Online.ps1 " &amp; Resource.PSScriptsPath &amp; " " &amp; Resource.Name &amp; " " &amp; Resource.Data1 &amp; " " &amp; Resource.Data2 &amp; " " &amp; Resource.DataStorePath <BR /> <BR /> Dim WshShell <BR /> Set WshShell = CreateObject("WScript.Shell") <BR /> <BR /> Resource.LogInformation "Calling Online PS script= " &amp; PSCmd <BR /> rv = WshShell.Run(PScmd, , True) <BR /> Resource.LogInformation "PS return value is: " &amp; rv <BR /> <BR /> '...Translate result from PowerShell ... <BR /> '...1 (True in PS) == 0 (True in VB) <BR /> '...0 (False in PS) == 1 (False in VB) <BR /> If rv = 1 Then <BR /> Resource.LogInformation "Online Success" <BR /> Online = 0 <BR /> Else <BR /> Resource.LogInformation "Online Error" <BR /> Online = 1 <BR /> End If <BR /> <BR /> Resource.LogInformation "Exit Online()" <BR /> End Function <BR /> <BR /> Function Offline( ) <BR /> Resource.LogInformation "Enter Offline()" <BR /> <BR /> '...Check for required private properties... <BR /> <BR /> If Resource.PropertyExists("PSScriptsPath") = False Then <BR /> Resource.LogInformation "PSScriptsPath is a required private property." <BR /> Offline = 1 <BR /> Exit Function <BR /> End If <BR /> '...Resource.LogInformation "PSScriptsPath is " &amp; Resource.PSScriptsPath <BR /> <BR /> If Resource.PropertyExists("Name") = False Then <BR /> Resource.LogInformation "Name is a required private property." <BR /> Offline = 1 <BR /> Exit Function <BR /> End If <BR /> Resource.LogInformation "Name is " &amp; Resource.Name <BR /> <BR /> If Resource.PropertyExists("Data1") = False Then <BR /> Resource.LogInformation "Data1 is a required private property." <BR /> Offline = 1 <BR /> Exit Function <BR /> End If <BR /> '...Resource.LogInformation "Data1 is " &amp; Resource.Data1 <BR /> <BR /> If Resource.PropertyExists("Data2") = False Then <BR /> Resource.LogInformation "Data2 is a required private property." <BR /> Offline = 1 <BR /> Exit Function <BR /> End If <BR /> '...Resource.LogInformation "Data2 is " &amp; Resource.Data2 <BR /> <BR /> If Resource.PropertyExists("DataStorePath") = False Then <BR /> Resource.LogInformation "DataStorePath is a required private property." <BR /> Offline = 1 <BR /> Exit Function <BR /> End If <BR /> '...Resource.LogInformation "DataStorePath is " &amp; Resource.DataStorePath <BR /> <BR /> PScmd = "powershell.exe -file " &amp; Resource.PSScriptsPath &amp; "\PS_Offline.ps1 " &amp; Resource.PSScriptsPath &amp; " " &amp; Resource.Name &amp; " " &amp; Resource.Data1 &amp; " " &amp; Resource.Data2 &amp; " " &amp; Resource.DataStorePath <BR /> <BR /> Dim WshShell <BR /> Set WshShell = CreateObject("WScript.Shell") <BR /> <BR /> Resource.LogInformation "Calling Offline PS script= " &amp; PSCmd <BR /> rv = WshShell.Run(PScmd, , True) <BR /> Resource.LogInformation "PS return value is: " &amp; rv <BR /> <BR /> '...Translate result from PowerShell ... <BR /> '...1 (True in PS) == 0 (True in VB) <BR /> '...0 (False in PS) == 1 (False in VB) <BR /> If rv = 1 Then <BR /> Resource.LogInformation "Offline Success" <BR /> Offline = 0 <BR /> Else <BR /> Resource.LogInformation "Offline Error" <BR /> Offline = 1 <BR /> End If <BR /> <BR /> Resource.LogInformation "Exit Offline()" <BR /> End Function <BR /> <BR /> Function LooksAlive( ) <BR /> '...Result... <BR /> LooksAlive = 0 <BR /> End Function <BR /> <BR /> Function IsAlive( ) <BR /> Resource.LogInformation "Entering IsAlive" <BR /> <BR /> '...Check for required private properties... <BR /> <BR /> If Resource.PropertyExists("PSScriptsPath") = False Then <BR /> Resource.LogInformation "PSScriptsPath is a required private property." <BR /> IsAlive = 1 <BR /> Exit Function <BR /> End If <BR /> '...Resource.LogInformation "PSScriptsPath is " &amp; Resource.PSScriptsPath <BR /> <BR /> If Resource.PropertyExists("Name") = False Then <BR /> Resource.LogInformation "Name is a required private property." <BR /> IsAlive = 1 <BR /> Exit Function <BR /> End If <BR /> Resource.LogInformation "Name is " &amp; Resource.Name <BR /> <BR /> If Resource.PropertyExists("Data1") = False Then <BR /> Resource.LogInformation "Data1 is a required private property." <BR /> IsAlive = 1 <BR /> Exit Function <BR /> End If <BR /> '...Resource.LogInformation "Data1 is " &amp; Resource.Data1 <BR /> <BR /> If Resource.PropertyExists("Data2") = False Then <BR /> Resource.LogInformation "Data2 is a required private property." <BR /> IsAlive = 1 <BR /> Exit Function <BR /> End If <BR /> '...Resource.LogInformation "Data2 is " &amp; Resource.Data2 <BR /> <BR /> If Resource.PropertyExists("DataStorePath") = False Then <BR /> Resource.LogInformation "DataStorePath is a required private property." <BR /> IsAlive = 1 <BR /> Exit Function <BR /> End If <BR /> '...Resource.LogInformation "DataStorePath is " &amp; Resource.DataStorePath <BR /> <BR /> PScmd = "powershell.exe -file " &amp; Resource.PSScriptsPath &amp; "\PS_IsAlive.ps1 " &amp; Resource.PSScriptsPath &amp; " " &amp; Resource.Name &amp; " " &amp; Resource.Data1 &amp; " " &amp; Resource.Data2 &amp; " " &amp; Resource.DataStorePath <BR /> <BR /> Dim WshShell <BR /> Set WshShell = CreateObject("WScript.Shell") <BR /> <BR /> Resource.LogInformation "Calling IsAlive PS script= " &amp; PSCmd <BR /> rv = WshShell.Run(PScmd, , True) <BR /> Resource.LogInformation "PS return value is: " &amp; rv <BR /> <BR /> '...Translate result from PowerShell ... <BR /> '...1 (True in PS) == 0 (True in VB) <BR /> '...0 (False in PS) == 1 (False in VB) <BR /> If rv = 1 Then <BR /> Resource.LogInformation "IsAlive Success" <BR /> IsAlive = 0 <BR /> Else <BR /> Resource.LogInformation "IsAlive Error" <BR /> IsAlive = 1 <BR /> End If <BR /> <BR /> Resource.LogInformation "Exit IsAlive()" <BR /> End Function <BR /> <BR /> Function Terminate( ) <BR /> Resource.LogInformation "Enter Terminate()" <BR /> <BR /> '...Check for required private properties... <BR /> <BR /> If Resource.PropertyExists("PSScriptsPath") = False Then <BR /> Resource.LogInformation "PSScriptsPath is a required private property." <BR /> Terminate = 1 <BR /> Exit Function <BR /> End If <BR /> '...Resource.LogInformation "PSScriptsPath is " &amp; Resource.PSScriptsPath <BR /> <BR /> If Resource.PropertyExists("Name") = False Then <BR /> Resource.LogInformation "Name is a required private property." <BR /> Terminate = 1 <BR /> Exit Function <BR /> End If <BR /> Resource.LogInformation "Name is " &amp; Resource.Name <BR /> <BR /> If Resource.PropertyExists("Data1") = False Then <BR /> Resource.LogInformation "Data1 is a required private property." <BR /> Terminate = 1 <BR /> Exit Function <BR /> End If <BR /> '...Resource.LogInformation "Data1 is " &amp; Resource.Data1 <BR /> <BR /> If Resource.PropertyExists("Data2") = False Then <BR /> Resource.LogInformation "Data2 is a required private property." <BR /> Terminate = 1 <BR /> Exit Function <BR /> End If <BR /> '...Resource.LogInformation "Data2 is " &amp; Resource.Data2 <BR /> <BR /> If Resource.PropertyExists("DataStorePath") = False Then <BR /> Resource.LogInformation "DataStorePath is a required private property." <BR /> Terminate = 1 <BR /> Exit Function <BR /> End If <BR /> '...Resource.LogInformation "DataStorePath is " &amp; Resource.DataStorePath <BR /> <BR /> PScmd = "powershell.exe -file " &amp; Resource.PSScriptsPath &amp; "\PS_Terminate.ps1 " &amp; Resource.PSScriptsPath &amp; " " &amp; Resource.Name &amp; " " &amp; Resource.Data1 &amp; " " &amp; Resource.Data2 &amp; " " &amp; Resource.DataStorePath <BR /> <BR /> Dim WshShell <BR /> Set WshShell = CreateObject("WScript.Shell") <BR /> <BR /> Resource.LogInformation "Calling Terminate PS script= " &amp; PSCmd <BR /> rv = WshShell.Run(PScmd, , True) <BR /> Resource.LogInformation "PS return value is: " &amp; rv <BR /> <BR /> '...Translate result from PowerShell ... <BR /> '...1 (True in PS) == 0 (True in VB) <BR /> '...0 (False in PS) == 1 (False in VB) <BR /> If rv = 1 Then <BR /> Terminate = 0 <BR /> Else <BR /> Terminate = 1 <BR /> End If <BR /> <BR /> Resource.LogInformation "Exit Terminate()" <BR /> End Function <BR /> <BR /> Function Close( ) <BR /> '...Result... <BR /> Close = 0 <BR /> End Function <BR /> <BR /> <BR /> <H2> Entry Points </H2> <BR /> In the above sample VB script, the following entry points are defined: <BR /> <UL> <BR /> <LI> Open – Ensures all necessary steps complete before starting your application </LI> <BR /> <LI> Online – Function to start your application </LI> <BR /> <LI> Offline – Function to stop your application </LI> <BR /> <LI> IsAlive – Function to validate your application startup and monitor health </LI> <BR /> <LI> Terminate – Function to forcefully cleanup application state (ex: Error during Online/Offline) </LI> <BR /> <LI> Close – Ensures all necessary cleanup completes after stopping your application </LI> <BR /> </UL> <BR /> Each of the above entry points is defined as a function (ex: “Function Online( )”). Failover Cluster then calls these entry point functions as part of the GenScript resource type definition. <BR /> <H2> Private Properties </H2> <BR /> For resources of any type, Failover Cluster supports two types of properties: <BR /> <UL> <BR /> <LI> Common Properties – Generic properties that can have unique value for each resource </LI> <BR /> <LI> Private Properties – Custom properties that are unique to that resource type. Each resource of that resource type has these private properties. </LI> <BR /> </UL> <BR /> When writing a GenScript resource, you need to evaluate if you need private properties. In the above VB sample script, I have defined five sample private properties (only as an example): <BR /> <UL> <BR /> <LI> PSScriptsPath – Path to the folder containing PS scripts </LI> <BR /> <LI> Name </LI> <BR /> <LI> Data1 – some custom data field </LI> <BR /> <LI> Data2 – another custom data field </LI> <BR /> <LI> DataStorePath – path to a common backend store (if any) </LI> <BR /> </UL> <BR /> The above private properties are shown as example only &amp; you are expected to modify the above VB script to customize it for your application. <BR /> <H2> PowerShell Scripts </H2> <BR /> The Visual Basic script simply connects the Failover Clusters’ RHS (Resource Hosting Service) to call PowerShell scripts. You may notice the “PScmd” parameter containing the actual PS command that will be called to perform the action (Online, Offline etc.) by calling into corresponding PS scripts. <BR /> <BR /> For this sample, here are four PowerShell scripts: <BR /> <UL> <BR /> <LI> Online.ps1 – To start your application </LI> <BR /> <LI> Offline.ps1 – To stop your application </LI> <BR /> <LI> Terminate.ps1 – To forcefully cleanup your application </LI> <BR /> <LI> IsAlive.ps1 – To monitor health of your application </LI> <BR /> </UL> <BR /> Example of PS scripts: <BR /> <H2> Entry Point: Online </H2> <BR /> Param( <BR /> # Sample properties… <BR /> [Parameter(Mandatory=$true, Position=0)] <BR /> [ValidateNotNullOrEmpty()] <BR /> [string] <BR /> $PSScriptsPath, <BR /> <BR /> # <BR /> [Parameter(Mandatory=$true, Position=1)] <BR /> [ValidateNotNullOrEmpty()] <BR /> [string] <BR /> $Name, <BR /> <BR /> # <BR /> [Parameter(Mandatory=$true, Position=2)] <BR /> [ValidateNotNullOrEmpty()] <BR /> [string] <BR /> $Data1, <BR /> <BR /> # <BR /> [Parameter(Mandatory=$true, Position=3)] <BR /> [ValidateNotNullOrEmpty()] <BR /> [string] <BR /> $Data2, <BR /> <BR /> # <BR /> [Parameter(Mandatory=$true, Position=4)] <BR /> [ValidateNotNullOrEmpty()] <BR /> [string] <BR /> $DataStorePath <BR /> ) <BR /> <BR /> $filePath = Join-Path $PSScriptsPath "Output_Online.log" <BR /> <BR /> @" <BR /> Starting Online... <BR /> Name= $Name <BR /> Data1= $Data1 <BR /> Data2= $Data2 <BR /> DataStorePath= $DataStorePath <BR /> "@ | Out-File -FilePath $filePath <BR /> <BR /> $error.clear() <BR /> <BR /> ### Do your online script logic here <BR /> <BR /> if ($errorOut -eq $true) <BR /> { <BR /> "Error $error" | Out-File -FilePath $filePath -Append <BR /> exit $false <BR /> } <BR /> <BR /> "Success" | Out-File -FilePath $filePath -Append <BR /> exit $true <BR /> <H2> Entry Point: Offline </H2> <BR /> Param( <BR /> # Sample properties… <BR /> [Parameter(Mandatory=$true, Position=0)] <BR /> [ValidateNotNullOrEmpty()] <BR /> [string] <BR /> $PSScriptsPath, <BR /> <BR /> # <BR /> [Parameter(Mandatory=$true, Position=1)] <BR /> [ValidateNotNullOrEmpty()] <BR /> [string] <BR /> $Name, <BR /> <BR /> # <BR /> [Parameter(Mandatory=$true, Position=2)] <BR /> [ValidateNotNullOrEmpty()] <BR /> [string] <BR /> $Data1, <BR /> <BR /> # <BR /> [Parameter(Mandatory=$true, Position=3)] <BR /> [ValidateNotNullOrEmpty()] <BR /> [string] <BR /> $Data2, <BR /> <BR /> # <BR /> [Parameter(Mandatory=$true, Position=4)] <BR /> [ValidateNotNullOrEmpty()] <BR /> [string] <BR /> $DataStorePath <BR /> ) <BR /> <BR /> $filePath = Join-Path $PSScriptsPath "Output_Offline.log" <BR /> <BR /> @" <BR /> Starting Offline... <BR /> Name= $Name <BR /> Data1= $Data1 <BR /> Data2= $Data2 <BR /> DataStorePath= $DataStorePath <BR /> "@ | Out-File -FilePath $filePath <BR /> <BR /> $error.clear() <BR /> <BR /> ### Do your offline script logic here <BR /> <BR /> if ($errorOut -eq $true) <BR /> { <BR /> "Error $error" | Out-File -FilePath $filePath -Append <BR /> exit $false <BR /> } <BR /> <BR /> "Success" | Out-File -FilePath $filePath -Append <BR /> exit $true <BR /> <BR /> <BR /> <H2> Entry Point: Terminate </H2> <BR /> Param( <BR /> # Sample properties… <BR /> [Parameter(Mandatory=$true, Position=0)] <BR /> [ValidateNotNullOrEmpty()] <BR /> [string] <BR /> $PSScriptsPath, <BR /> <BR /> # <BR /> [Parameter(Mandatory=$true, Position=1)] <BR /> [ValidateNotNullOrEmpty()] <BR /> [string] <BR /> $Name, <BR /> <BR /> # <BR /> [Parameter(Mandatory=$true, Position=2)] <BR /> [ValidateNotNullOrEmpty()] <BR /> [string] <BR /> $Data1, <BR /> <BR /> # <BR /> [Parameter(Mandatory=$true, Position=3)] <BR /> [ValidateNotNullOrEmpty()] <BR /> [string] <BR /> $Data2, <BR /> <BR /> # <BR /> [Parameter(Mandatory=$true, Position=4)] <BR /> [ValidateNotNullOrEmpty()] <BR /> [string] <BR /> $DataStorePath <BR /> ) <BR /> <BR /> $filePath = Join-Path $PSScriptsPath "Output_Terminate.log" <BR /> <BR /> @" <BR /> Starting Terminate... <BR /> Name= $Name <BR /> Data1= $Data1 <BR /> Data2= $Data2 <BR /> DataStorePath= $DataStorePath <BR /> "@ | Out-File -FilePath $filePath <BR /> <BR /> $error.clear() <BR /> <BR /> ### Do your terminate script logic here <BR /> <BR /> if ($errorOut -eq $true) <BR /> { <BR /> "Error $error" | Out-File -FilePath $filePath -Append <BR /> exit $false <BR /> } <BR /> <BR /> "Success" | Out-File -FilePath $filePath -Append <BR /> exit $true <BR /> <BR /> <BR /> <H2> Entry Point: IsAlive </H2> <BR /> Param( <BR /> # Sample properties… <BR /> [Parameter(Mandatory=$true, Position=0)] <BR /> [ValidateNotNullOrEmpty()] <BR /> [string] <BR /> $PSScriptsPath, <BR /> <BR /> # <BR /> [Parameter(Mandatory=$true, Position=1)] <BR /> [ValidateNotNullOrEmpty()] <BR /> [string] <BR /> $Name, <BR /> <BR /> # <BR /> [Parameter(Mandatory=$true, Position=2)] <BR /> [ValidateNotNullOrEmpty()] <BR /> [string] <BR /> $Data1, <BR /> <BR /> # <BR /> [Parameter(Mandatory=$true, Position=3)] <BR /> [ValidateNotNullOrEmpty()] <BR /> [string] <BR /> $Data2, <BR /> <BR /> # <BR /> [Parameter(Mandatory=$true, Position=4)] <BR /> [ValidateNotNullOrEmpty()] <BR /> [string] <BR /> $DataStorePath <BR /> ) <BR /> <BR /> $filePath = Join-Path $PSScriptsPath "Output_IsAlive.log" <BR /> <BR /> @" <BR /> Starting IsAlive... <BR /> Name= $Name <BR /> Data1= $Data1 <BR /> Data2= $Data2 <BR /> DataStorePath= $DataStorePath <BR /> "@ | Out-File -FilePath $filePath <BR /> <BR /> $error.clear() <BR /> <BR /> ### Do your isalive script logic here <BR /> <BR /> if ($errorOut -eq $true) <BR /> { <BR /> "Error $error" | Out-File -FilePath $filePath -Append <BR /> exit $false <BR /> } <BR /> <BR /> "Success" | Out-File -FilePath $filePath -Append <BR /> exit $true <BR /> <BR /> <BR /> <H2> Parameters </H2> <BR /> The private properties are passed in as arguments to the PS script. In the sample scripts, these are all string values. You can potentially pass in different value types with more advanced VB script magic. <BR /> <BR /> <STRONG> Note: </STRONG> Another way to simplify this is by writing only one PS script, such that the entry points are all functions, with only a single primary function called by the VB script. To achieve this, you can pass in additional parameters giving the context of the action expected (ex: Online, Offline etc.). <BR /> <H2> Step-By-Step Walk-Through </H2> <BR /> Great! Now that you have the VB Shell &amp; Entry Point Scripts ready, let’s make the application highly available… <BR /> <H2> Copy VB + PS Scripts to Server </H2> <BR /> It is important to copy the VB script &amp; all PS scripts to a folder on each cluster node. Ensure that the scripts is copied to the same folder on all cluster nodes. In this walk-through, the VB + PS scripts are copied to “C:\SampleScripts” folder: <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90622iBD762D88FC579137" /> <BR /> <H2> Create Group &amp; Resource </H2> <BR /> Using PowerShell: <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90623iDD0E669809AFB69A" /> <BR /> <BR /> The “ScriptFilePath” private property gets automatically added. This is the path to the VB script file. There are no other private properties which get added (see above). <BR /> <BR /> You can also create Group &amp; Resource using Failover Cluster Manager GUI: <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90624i0168E7AC1CC6DCEA" /> <BR /> <H2> Specify VB Script </H2> <BR /> To specify VB script, set the “ScriptFilePath” private property as: <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90626i5AA685036EFC168D" /> <BR /> <BR /> When the VB script is specified, cluster automatically calls the Open Entry Point (in VB script). In the above VB script, additional private properties are added as part of the Open Entry Point. <BR /> <H2> Configure Private Properties </H2> <BR /> You can configure the private properties defined for the Generic Script resource as: <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90627i2EBD5D59F3FD61DB" /> <BR /> <BR /> In the above example, “PSScriptsPath” was specified as “C:\SampleScripts” which is the folder where all my PS scripts are stored. Additional example private properties like Name, Data1, Data2, DataStoragePath are set with custom values as well. <BR /> <BR /> At this point, the Generic Script resource using PS scripts is now ready! <BR /> <H2> Starting Your Application </H2> <BR /> To start your application, you simply will need to start (aka online) the group (ex: SampleGroup) or resource (ex: SampleResUsingPS). You can start the group or resource using PS as: <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90628i9CDED6CB78728F43" /> <BR /> <BR /> You can use Failover Cluster Manager GUI to start your Group/Role as well: <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90629i971CF2CEC87748ED" /> <BR /> <BR /> To view your application state in Failover Cluster Manager GUI: <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90630i4A05363A516BD61B" /> <BR /> <H2> Verify PS script output: </H2> <BR /> In the sample PS script, the output log is stored in the same directory as the PS script corresponding to each entry point. You can see the output of PS scripts for Online &amp; IsAlive Entry Points below: <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90631i4DA976E2B76A9D08" /> <BR /> <BR /> Awesome! Now, let’s see what it takes to customize the generic scripts for your application. <BR /> <H2> Customizing Scripts For Your Application </H2> <BR /> The sample VB Script above is a generic shell that any application can reuse. There are few important things that you may need to edit: <BR /> <OL> <BR /> <LI> Defining Custom Private Properties: The “Function Open” in the VB script defines sample private properties. You will need to edit those add/remove private properties for your application. </LI> <BR /> <LI> Validating Custom Private Properties: The “Function Online”, “Function Offline”, “Function Terminate”, “Function IsAlive” validate private properties whether they are set or not (in addition to being required or not). You will need to edit the validation checks for any private properties added/removed. </LI> <BR /> <LI> Calling the PS scripts: The “PSCmd” variable contains the exact syntax of the PS script which gets called. For any private properties added/removed you would need to edit that PS script syntax as well. </LI> <BR /> <LI> PowerShell scripts: Parameters for the PowerShell scripts would need to be edited for any private properties added/removed. In addition, your application specific logic would need to be added as specified by the comment in the PS scripts. </LI> <BR /> </OL> <BR /> <H2> Summary </H2> <BR /> Now you can use PowerShell scripts to make any application highly available with Failover Clusters!!! <BR /> <BR /> The sample VB script &amp; the corresponding PS scripts allow you to take any custom application &amp; make it highly available using PowerShell scripts. <BR /> Thanks, <BR /> Amitabh </BODY></HTML> Fri, 15 Mar 2019 22:09:22 GMT https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/using-powershell-script-make-any-application-highly-available/ba-p/372101 John Marlin 2019-03-15T22:09:22Z NetFT Virtual Adapter Performance Filter https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/netft-virtual-adapter-performance-filter/ba-p/372090 <P>In this blog I will discuss what the NetFT Virtual Adapter Performance Filter in <SPAN style="display: inline !important; float: none; background-color: #ffffff; color: #333333; cursor: text; font-family: inherit; font-size: 16px; font-style: normal; font-variant: normal; font-weight: 300; letter-spacing: normal; line-height: 1.7142; orphans: 2; text-align: left; text-decoration: none; text-indent: 0px; text-transform: none; -webkit-text-stroke-width: 0px; white-space: normal; word-spacing: 0px;">Windows Server 2012 and Windows Server 2012 R2 </SPAN>is and the scenarios when you should or should not enable it. <BR /><BR />The Microsoft Failover Cluster Virtual Adapter (NetFT) is a virtual adapter used by the Failover Clustering feature to build fault tolerant communication routes between nodes in a cluster for intra-cluster communication. <BR /><BR />When the Cluster Service communicates to another node in the cluster, it sends data down over TCP to the NetFT virtual adapter.&nbsp; NetFT then sends the data over UDP down to the physical network card, which then sends it over the network to another node.&nbsp; See the below diagram:</P> <P style="padding-left: 30px;"><BR /><span class="lia-inline-image-display-wrapper lia-image-align-inline" style="width: 209px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90619i96D8F60106053140/image-size/large?v=v2&amp;px=999" role="button" /></span></P> <P><BR />When the data is received by the other node, it follows the same flow.&nbsp; Up the physical adapter, then to NetFT, and finally up to the Cluster Service.&nbsp; The NetFT Virtual Adapter Performance Filter is a filter in Windows Server 2012 and Windows Server 2012 R2 which inspects traffic inbound on the physical NIC and then reroutes&nbsp;cluster traffic addressed for NetFT directly to the NetFT driver.&nbsp; This bypasses the physical NIC UDP / IP stack and delivers increased cluster network performance.</P> <P style="padding-left: 30px;"><BR /><span class="lia-inline-image-display-wrapper lia-image-align-inline" style="width: 180px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90620iE15E10E1E7A9C7F3/image-size/large?v=v2&amp;px=999" role="button" /></span></P> <P>&nbsp;</P> <H2>When to Enable the NetFT Virtual Adapter Performance Filter</H2> <P><BR />The NetFT Virtual Adapter Performance Filter is disabled by default.&nbsp; The filter is disabled because it can cause issues with Hyper-V clusters which have a Guest Cluster running in VMs running on top of them.&nbsp; Issues have been seen where the NetFT Virtual Adapter Performance Filter in the host incorrectly routes NetFT traffic bound for a guest VM to the host.&nbsp; This can result in communication issues with the guest cluster in the VM.&nbsp; More details can be found in this article: <BR /><BR /><A href="#" target="_blank" rel="noopener"> https://support.microsoft.com/en-us/kb/2872325 </A> <BR /><BR />If you are deploying any workload <STRONG> other </STRONG> than Hyper-V with guest clusters, enabling the NetFT Virtual Adapter Performance Filter will optimize and improve cluster performance. <BR /><BR /><span class="lia-inline-image-display-wrapper lia-image-align-inline" style="width: 389px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90621iB68C25A3EDEBF819/image-size/large?v=v2&amp;px=999" role="button" /></span></P> <H1>&nbsp;</H1> <H2>&nbsp;</H2> <H2>&nbsp;</H2> <H2>Windows Server 2016 and later</H2> <P><BR />Due to changes in the networking stack in Windows Server 2016, the NetFT Virtual Adapter Performance Filter&nbsp;has been removed. <BR /><BR />Thanks! <BR />Elden Christensen <BR />Principal PM Manager <BR />High-Availability &amp; Storage <BR />Microsoft</P> Thu, 08 Aug 2019 16:26:16 GMT https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/netft-virtual-adapter-performance-filter/ba-p/372090 Elden Christensen 2019-08-08T16:26:16Z Speeding Up Failover Tips-n-Tricks https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/speeding-up-failover-tips-n-tricks/ba-p/372086 <HTML> <HEAD></HEAD><BODY> <STRONG> First published on MSDN on Apr 29, 2016 </STRONG> <BR /> From time-to-time people ask me for suggestions on what tweaks they can do to make Windows server Failover Cluster failover faster. In this blog I’ll discuss a few tips-n-tricks. <BR /> <OL> <BR /> <LI> <STRONG> Disable NetBIOS over TCP/IP </STRONG> - Unless you need legacy OS compatibility, NetBIOS is doing you nothing but slow you down.&nbsp; You want to disable NetBIOS in a couple different places: <BR /> <OL> <BR /> <LI> <EM> Every Cluster IP Address resources </EM> -&nbsp;Here is the syntax (again, this needs to be set on all IP Address resources).&nbsp; Note: NetBIOS is disabled on all Cluster IP Addresses in Windows Server 2016 by default. <BR /> Get-ClusterResource “Cluster IP address” | Set-ClusterParameter EnableNetBIOS 0 <BR /> </LI> <BR /> <LI> <EM> Base Network Interfaces </EM> – In the Advanced TCP/IP Settings, go to the WINS tab, and select “Disable NetBIOS over TCP/IP.&nbsp; This needs to be done on every network interface. <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90618iB07D0E14E09F17B6" /> </LI> <BR /> </OL> <BR /> </LI> <BR /> <LI> <STRONG> Go Pure IPv6 </STRONG> - Going pure IPv6 will give faster failover as a result of optimizations in how Duplication Address Detection (DAD) works in the TCP/IP stack. </LI> <BR /> <LI> <STRONG> Avoid IPSec on Servers </STRONG> – Internet Protocol Security (IPsec) is a great security feature, especially for client scenarios. But it comes at a cost, and really should not be used on servers. Specifically enabling a single IPSec policy will reduce overall network performance by ~30% and significantly delay failover times. </LI> <BR /> </OL> <BR /> A few things I've found you can do to speed up failover and reduce downtime. <BR /> <BR /> Thanks! <BR /> Elden Christensen <BR /> Principal PM Manager <BR /> High-Availability &amp; Storage <BR /> Microsoft </BODY></HTML> Fri, 15 Mar 2019 22:07:23 GMT https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/speeding-up-failover-tips-n-tricks/ba-p/372086 Elden Christensen 2019-03-15T22:07:23Z Failover Cluster VM Load Balancing in Windows Server 2016 https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/failover-cluster-vm-load-balancing-in-windows-server-2016/ba-p/372084 <HTML> <HEAD></HEAD><BODY> <STRONG> First published on MSDN on Apr 29, 2016 </STRONG> <BR /> Windows Server 2016 introduces the&nbsp;Virtual Machine Load&nbsp;Balancing&nbsp;feature to optimize the utilization of nodes in a Failover Cluster. During the lifecycle of your private cloud, certain operations (such as rebooting a node for patching), results in the Virtual Machines (VMs) in your cluster being moved. This could result in an unbalanced cluster where some nodes are hosting more VMs and others are underutilized (such as a freshly rebooted server). The&nbsp;VM Load Balancing&nbsp;feature seeks to identify over committed nodes and re-distribute VMs from those nodes. VMs are live migrated to idle nodes with no down time. Failure policies such as anti-affinity, fault domains and possible owners are honored. Thus, the Node Fairness feature seamlessly balances your private cloud. <BR /> <H2> Heuristics for Balancing </H2> <BR /> VM Load Balancing evaluates a node's load based on the following heuristics: <BR /> <OL> <BR /> <LI> Current <EM> Memory pressure </EM> : Memory is the most common resource constraint on a Hyper-V host </LI> <BR /> <LI> <EM> CPU utilization </EM> of the Node averaged over a 5 minute window: Mitigates a node in the cluster becoming overcommitted </LI> <BR /> </OL> <BR /> <H2> Controlling Aggressiveness of Balancing </H2> <BR /> The aggressiveness of balancing based on the Memory and CPU heuristics can be configured using the by the cluster common property ‘AutoBalancerLevel’. To control the aggressiveness run the following in PowerShell: <BR /> <CODE> (Get-Cluster).AutoBalancerLevel = &lt;value&gt; </CODE> <BR /> <TABLE> <TBODY><TR> <TD> <STRONG> AutoBalancerLevel </STRONG> </TD> <TD> <STRONG> Aggressiveness </STRONG> </TD> <TD> <STRONG> Behavior </STRONG> </TD> </TR> <TR> <TD> 1 (default) </TD> <TD> Low </TD> <TD> Move when host is more than 80% loaded </TD> </TR> <TR> <TD> 2 </TD> <TD> Medium </TD> <TD> Move when host is more than 70% loaded </TD> </TR> <TR> <TD> 3 </TD> <TD> High </TD> <TD> Average nodes and move when host is more than 5% above average </TD> </TR> </TBODY></TABLE> <BR /> <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90615i7ABAE74BDF12FEC4" /> <BR /> <H2> Controlling VM Load Balancing </H2> <BR /> VM Load Balancing is enabled by default and when load balancing occurs can be configured by the cluster common property ‘AutoBalancerMode’. To control when&nbsp;VM Load Balancing&nbsp;balances the cluster: <BR /> <BR /> <STRONG> Using&nbsp;Failover Cluster Manager: </STRONG> <BR /> <OL> <BR /> <LI> Right-click on your cluster name and select the "Properties" option </LI> <BR /> </OL> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90616iACECCE1A9C48A5F9" /> <BR /> <BR /> 2.&nbsp; Select the "Balancer" pane <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90617i0C8AF7611B73A9D0" /> <BR /> <BR /> <STRONG> Using&nbsp;PowerShell: </STRONG> <BR /> <BR /> Run the following: : <BR /> <CODE> (Get-Cluster).AutoBalancerMode = &lt;value&gt; </CODE> <BR /> <TABLE> <TBODY><TR> <TD> <STRONG> AutoBalancerMode </STRONG> </TD> <TD> <STRONG> Behavior </STRONG> </TD> </TR> <TR> <TD> 0 </TD> <TD> Disabled </TD> </TR> <TR> <TD> 1 </TD> <TD> Load balance on node join </TD> </TR> <TR> <TD> 2 (default) </TD> <TD> Load balance on node join and every 30 minutes </TD> </TR> </TBODY></TABLE> <BR /> <BR /> <H2> VM Load Balancing vs. SCVMM Dynamic Optimization </H2> <BR /> The&nbsp;VM Load Balancing&nbsp;feature, provides in-box functionality, which is targeted towards deployments without System Center Virtual Machine Manager (SCVMM). SCVMM Dynamic Optimization is the recommended mechanism for balancing virtual machine load in your cluster for SCVMM deployments. SCVMM automatically disables the&nbsp;VM Load Balancing&nbsp;feature when Dynamic Optimization is enabled. <BR /> <H2> Short Video&nbsp;with an Overview of&nbsp;VM Load Balancing </H2> <BR /> <IFRAME frameborder="0" height="540" src="https://channel9.msdn.com/Blogs/windowsserver/Virtual-Machine-Load-Balancing-in-Windows-Server-2016/player" width="960"> </IFRAME> </BODY></HTML> Fri, 15 Mar 2019 22:07:01 GMT https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/failover-cluster-vm-load-balancing-in-windows-server-2016/ba-p/372084 John Marlin 2019-03-15T22:07:01Z Troubleshooting Hangs Using Live Dump https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/troubleshooting-hangs-using-live-dump/ba-p/372080 <HTML> <HEAD></HEAD><BODY> <STRONG> First published on MSDN on Mar 02, 2016 </STRONG> <BR /> In this blog post <A href="#" target="_blank"> https://blogs.msdn.microsoft.com/clustering/2014/12/08/troubleshooting-cluster-shared-volume-auto-pauses-event-5120/ </A> we&nbsp;discussed what a Cluster Shared Volumes (CSV) event&nbsp;ID&nbsp;5120 means, and how to troubleshoot it. In particular, we discussed the reason for auto-pause due to STATUS_IO_TIMEOUT (c00000b5), and some options on how to troubleshoot it. In this post we will discuss how to troubleshoot it using LiveDumps,&nbsp;which enables debugging the system with no downtime for your system. <BR /> <BR /> First let’s discuss what is the LiveDump. Some of you are probably familiar with kernel crash dumps <A href="#" target="_blank"> https://support.microsoft.com/en-us/kb/927069 </A> . You might have at least two challenges with kernel dump. <BR /> <OL> <BR /> <LI> Bugcheck&nbsp;halts the system resulting in&nbsp;downtime </LI> <BR /> <LI> Entire contents of memory are dumped to a file.&nbsp; On a system with a lot of memory, you might not have enough space on your system drive for OS to save the dump </LI> <BR /> </OL> <BR /> The good news is that LiveDump solves both of these issues. Live Dump was a new feature added in Windows Server 2012 R2. For the purpose of this discussion you can think of LiveDump as an OS feature that allows you to create a consistent snapshot of kernel memory and save it to a dump file for the future analysis. Taking this snapshot will NOT cause bugcheck so no downtime. LiveDump does not include all kernel memory, it excludes information which is not valuable in debugging. It will not include pages from stand by list and file caches. The kind of livedump that cluster collects for you also would not have pages consumed by Hypervisor. In Windows Server 2016 Cluster also makes sure to exclude from the livedump CSV Cache. As a result LiveDump has much smaller dump file size compared to what you would get when you bugcheck the server, and would not require as much space on your system drive.&nbsp; In Windows Server 2016 there is a new bugcheck option called an "Active Dump", which similarly excludes unnecessary information to create a smaller dump file during bugchecks. <BR /> <BR /> You can create LiveDump manually using LiveKD from Windows Sysinternals ( <A href="#" target="_blank"> https://technet.microsoft.com/en-us/sysinternals/bb897415.aspx </A> ). To generate LiveDump run command “livekd –ml –o &lt;path to a dump file&gt;” from an elevated command prompt. Path to the dump file does not have to be on the system drive, you can save it to any location. Here is an example of creating live dump on a Windows 10 Desktop with 12 GB RAM, which&nbsp;resulted in a dump file&nbsp;of only 3.7 GB. <BR /> D:\&gt;livekd -ml -o d1.dmp <BR /> LiveKd v5.40 - Execute kd/windbg on a live system <BR /> Sysinternals - <A href="#" target="_blank">www.sysinternals.com</A> <BR /> <BR /> Copyright (C) 2000-2015 Mark Russinovich and Ken Johnson <BR /> <BR /> Saving live dump to D:\d1.dmp... done. <BR /> <BR /> D:\&gt;dir *.dmp <BR /> <BR /> Directory of D:\ <BR /> <BR /> 02/25/2016 12:05 PM&nbsp;&nbsp;&nbsp;&nbsp; 2,773,164,032 d1.dmp <BR /> 1 File(s) 2,773,164,032 bytes <BR /> 0 Dir(s) 3,706,838,417,408 bytes free <BR /> <STRONG> If you are wondering how much disk space you would need to livedump you can generate one using LiveKD, and check its size. </STRONG> <BR /> <BR /> You might wonder what so great about LiveDump for troubleshooting. Logs and traces work well when something fails because hopefully in a log there will be a record where someone admits that he is failing operations and blames someone who causes that. LiveDump is great when we need to troubleshoot a problem where something is taking long time, and nothing is technically failing. If we start a watchdog when operation started, and if watchdog expires before operation completes then we can try to take a dump of the system hoping that we can walk a wait chain for that operation and see who owns it and why it is not completing. Looking at the livedump is just like looking at kernel dumps. It requires some skills, and understanding of Windows Internals. It has a steep learning curve for customers, but it is a great tool for Microsoft support and product teams who already have that expertise. If you reach out to Microsoft support with an issue where something is stuck in kernel, and a live dump taken while it was stuck then chances of prompt root causing of the issue are much higher. <BR /> <BR /> Windows Server Failover Clustering has many watchdogs which control how long it should wait for cluster resources to execute calls like resource online or offline. Or how long we should wait for CSVFS to complete a state transition. From our experience we know that in most cases some of these scenarios will be stuck in the kernel so we automatically ask Windows Error Reporting to generate LiveDump. It is important to notice that LiveKd uses different API that produces LiveDump without checking any other conditions. Cluster uses Windows Error Reporting. Windows Error Reporting will throttle LiveDump creation. We are using WER because it manages disk space consumption for us and it also will send telemetry information about the incident to Microsoft where we can see what issues are affecting customers. This helps us to priorities and strategize fixes. Starting from Windows Server 2016 you can control WER telemetry through common telemetry settings, and before that there was a separate control panel applet to control what WER is allowed to share with Microsoft. <BR /> <BR /> By default, Windows Error Reporting will allow only one LiveDump per report type per 7 days and only 1 LiveDump per machine per 5 days. You can change that by setting following registry keys <BR /> reg add "HKLM\Software\Microsoft\Windows\Windows Error Reporting\FullLiveKernelReports" /v SystemThrottleThreshold /t REG_DWORD /d 0 /f <BR /> reg add "HKLM\Software\Microsoft\Windows\Windows Error Reporting\FullLiveKernelReports" /v ComponentThrottleThreshold /t REG_DWORD /d 0 /f <BR /> Once LiveDump is created WER would launch a user mode process that creates a minidump from LiveDump, and immediately after that would delete the LiveDump. Minidump is only couple hundred kilobytes, but unfortunately it is not helpful because it would have call stack only of the thread that invoked LiveDUmp creation, and we need all other threads in the kernel to track down where we are stuck. You can tell WER to keep original Live dumps using these two registry keys. <BR /> reg add "HKLM\Software\Microsoft\Windows\Windows Error Reporting\FullLiveKernelReports" /v FullLiveReportsMax /t REG_DWORD /d 10 /f <BR /> reg add "HKLM\SYSTEM\CurrentControlSet\Control\CrashControl" /v AlwaysKeepMemoryDump /t REG_DWORD /d 1 /f <BR /> Set FullLiveReportsMax to the number of dumps you want to keep, the&nbsp;decision on&nbsp;how many to keep depends on how much free space you have and the size of LiveDump. <BR /> <STRONG> You need to reboot the machine for Windows Error Reporting registry keys to take an effect. <BR /> </STRONG> LiveDumps created by Windows Error Reporting are located in the %SystemDrive%\Windows\LiveKernelReports. <BR /> <H2> Windows Server&nbsp;version 1709 </H2> <BR /> In Windows Server&nbsp;version 1709&nbsp;release, Windows Error Reporting&nbsp;registry keys that control LiveDump behavior changed to the following: <BR /> reg add "HKLM\SYSTEM\CurrentControlSet\Control\CrashControl\FullLiveKernelReports" /v FullLiveReportsMax /t REG_DWORD /d 10 /f <BR /> reg add "HKLM\SYSTEM\CurrentControlSet\Control\CrashControl\FullLiveKernelReports" /v SystemThrottleThreshold /t REG_DWORD /d 0 /f <BR /> reg add "HKLM\SYSTEM\CurrentControlSet\Control\CrashControl\FullLiveKernelReports" /v ComponentThrottleThreshold /t REG_DWORD /d 0 /f <BR /> reg add "HKLM\SYSTEM\CurrentControlSet\Control\CrashControl" /v AlwaysKeepMemoryDump /t REG_DWORD /d 1 /f <BR /> To keep your scripts simple you can set values in both old and new location. <BR /> <H2> Windows Server 2016 </H2> <BR /> In Windows Server 2016 Failover Cluster Live Dump Creation is on by default. You can turn it on/off by manipulating lowest bit of the cluster DumpPolicy public property. By default, this bit is set, which means cluster is allowed to generate LiveDump. <BR /> PS C:\Windows\system32&gt; (get-cluster).DumpPolicy <BR /> 1118489 <BR /> If you set this bit to 0 then cluster will stop generating LiveDumps. <BR /> PS C:\Windows\system32&gt; (get-cluster).DumpPolicy=1118488 <BR /> You can set it back to 1 to enable it again <BR /> PS C:\Windows\system32&gt; (get-cluster).DumpPolicy=1118489 <BR /> Change take effect immediately on all cluster nodes. You do NOT need to reboot cluster nodes. <BR /> <BR /> Here is the list of LiveDump report types generated by cluster. Dump files will have report type string as a prefix. <BR /> <TABLE> <TBODY><TR> <TD> <STRONG> Report Type </STRONG> </TD> <TD> <STRONG> Description </STRONG> </TD> </TR> <TR> <TD> <STRONG> CsvIoT </STRONG> </TD> <TD> A CSV volume AutoPaused due to STATUS_IO_TIMEOUT and cluster on the coordinating node created LiveDump </TD> </TR> <TR> <TD> <STRONG> CsvStateIT </STRONG> </TD> <TD> CSV state transition to Init state is taking too long. </TD> </TR> <TR> <TD> <STRONG> CsvStatePT </STRONG> </TD> <TD> CSV state transition to Paused state is taking too long </TD> </TR> <TR> <TD> <STRONG> CsvStateDT </STRONG> </TD> <TD> CSV state transition to Draining state is taking too long </TD> </TR> <TR> <TD> <STRONG> CsvStateST </STRONG> </TD> <TD> CSV state transition to SetDownLevel state is taking too long </TD> </TR> <TR> <TD> <STRONG> CsvStateAT </STRONG> </TD> <TD> CSV state transition to Active state is taking too long </TD> </TR> </TBODY></TABLE> <BR /> You can learn more about CSV state transition in this blog post: <BR /> <UL> <BR /> <LI> <A href="#" target="_blank"> http://blogs.msdn.com/b/clustering/archive/2014/10/27/10567706.aspx </A> </LI> <BR /> </UL> <BR /> Following is the list of LiveDump report types that cluster generates when cluster resource call is taking too long <BR /> <TABLE> <TBODY><TR> <TD> <STRONG> Report Type </STRONG> </TD> <TD> <STRONG> Description </STRONG> </TD> </TR> <TR> <TD> <STRONG> ClusResCO </STRONG> </TD> <TD> Cluster resource Open call is taking too long </TD> </TR> <TR> <TD> <STRONG> ClusResCC </STRONG> </TD> <TD> Cluster resource Close call is taking too long </TD> </TR> <TR> <TD> <STRONG> ClusResCU </STRONG> </TD> <TD> Cluster resource Online call is taking too long </TD> </TR> <TR> <TD> <STRONG> ClusResCD </STRONG> </TD> <TD> Cluster resource Offline call is taking too long </TD> </TR> <TR> <TD> <STRONG> ClusResCK </STRONG> </TD> <TD> Cluster resource Terminate call is taking too long </TD> </TR> <TR> <TD> <STRONG> ClusResCA </STRONG> </TD> <TD> Cluster resource Arbitrate call is taking too long </TD> </TR> <TR> <TD> <STRONG> ClusResCR </STRONG> </TD> <TD> Cluster resource Control call is taking too long </TD> </TR> <TR> <TD> <STRONG> ClusResCT </STRONG> </TD> <TD> Cluster resource Type Control call is taking too long </TD> </TR> <TR> <TD> <STRONG> ClusResCI </STRONG> </TD> <TD> Cluster resource IsAlive call is taking too long </TD> </TR> <TR> <TD> <STRONG> ClusResCL </STRONG> </TD> <TD> Cluster resource LooksAlive call is taking too long </TD> </TR> <TR> <TD> <STRONG> ClusResCF </STRONG> </TD> <TD> Cluster resource Fail call is taking too long </TD> </TR> </TBODY></TABLE> <BR /> You can learn more about cluster resource state machine in these two blog posts: <BR /> <UL> <BR /> <LI> <A href="#" target="_blank"> https://blogs.msdn.microsoft.com/clustering/2010/03/10/creating-a-cluster-resource-dll-part-1/ </A> </LI> <BR /> <LI> <A href="#" target="_blank"> https://blogs.msdn.microsoft.com/clustering/2010/03/29/creating-a-cluster-resource-dll-part-2/ </A> </LI> <BR /> </UL> <BR /> You can control what resource types will generate LiveDumps by changing value of the first bit of the resource type DumpPolicy public property. Here are the default&nbsp;values: <BR /> C:\&gt; Get-ClusterResourceType | ft Name,DumpPolicy <BR /> <BR /> Name&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp; DumpPolicy <BR /> ----&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ---------- <BR /> Cloud Witness&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 5225058576 <BR /> DFS Replicated Folder&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 5225058576 <BR /> DHCP Service&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 5225058576 <BR /> Disjoint IPv4 Address&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 5225058576 <BR /> Disjoint IPv6 Address&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 5225058576 <BR /> Distributed File System&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 5225058576 <BR /> Distributed Network Name &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 5225058576 <BR /> Distributed Transaction Coordinator 5225058576 <BR /> File Server&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 5225058576 <BR /> File Share Witness&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 5225058576 <BR /> Generic Application&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 5225058576 <BR /> Generic Script&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 5225058576 <BR /> Generic Service&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 5225058576 <BR /> Health Service&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 5225058576 <BR /> IP Address&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 5225058576 <BR /> IPv6 Address&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 5225058576 <BR /> IPv6 Tunnel Address&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 5225058576 <BR /> iSCSI Target Server &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;5225058576 <BR /> Microsoft iSNS&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 5225058576 <BR /> MSMQ&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp; 5225058576 <BR /> MSMQTriggers&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 5225058576 <BR /> Nat&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 5225058576 <BR /> Network File System&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 5225058577 <BR /> Network Name&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 5225058576 <BR /> Physical Disk&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 5225058577 <BR /> Provider Address&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 5225058576 <BR /> Scale Out File Server&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 5225058577 <BR /> Storage Pool&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 5225058577 <BR /> Storage QoS Policy Manager&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 5225058577 <BR /> Storage Replica&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 5225058577 <BR /> Task Scheduler&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 5225058576 <BR /> Virtual Machine&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 5225058576 <BR /> Virtual Machine Cluster WMI&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 5225058576 <BR /> Virtual Machine Configuration&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 5225058576 <BR /> Virtual Machine Replication Broker 5225058576 <BR /> Virtual Machine Replication Coor... 5225058576 <BR /> WINS Service&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 5225058576 <BR /> By default, Physical Disk resources would produce LiveDump. You can disable that by setting lowest bit to 0. Here is an example how to do that for the physical disk resource <BR /> (Get-ClusterResourceType -Name "Physical Disk").DumpPolicy=5225058576 <BR /> Later on you can enable it back <BR /> (Get-ClusterResourceType -Name "Physical Disk").DumpPolicy=5225058577 <BR /> Changes take effect&nbsp;immediately&nbsp;on all new calls, no need to offline/online resource or restart the&nbsp;cluster. <BR /> <BR /> The last group is the report types that cluster service would generate when it observes that some operations are taking too long. <BR /> <TABLE> <TBODY><TR> <TD> <STRONG> Report Type </STRONG> </TD> <TD> <STRONG> Description </STRONG> </TD> </TR> <TR> <TD> <STRONG> ClusWatchDog </STRONG> </TD> <TD> Cluster service watchdog </TD> </TR> </TBODY></TABLE> <BR /> <H2> Windows Server 2012 R2 </H2> <BR /> We had such a positive experience troubleshooting issues using LiveDump on Windows Server 2016 that we’ve backported a subset of that back to Windows Server R2. You need to make sure that you have all the <A href="#" target="_blank"> recommended patches outlined here </A> . On Windows Server 2012 R2 LiveDump will not be generated by default, it can be enabled&nbsp;using following PowerShell command: <BR /> Get-Cluster | Set-ClusterParameter -create LiveDumpEnabled -value 1 <BR /> LiveDump can be disabled using the following command: <BR /> Get-Cluster | Set-ClusterParameter -create LiveDumpEnabled -value 0 <BR /> Only CSV report types were backported, as a result&nbsp;you will not see LiveDumps from cluster resource calls or cluster service watchdog.&nbsp; Windows Error Reporting throttling will also need to be adjusted as&nbsp;discussed above. <BR /> <H2> CSV AutoPause due to STATUS_IO_TIMEOUT (c00000b5) </H2> <BR /> Let’s see how LiveDump help troubleshooting this issue. In the blog post <A href="#" target="_blank"> https://blogs.msdn.microsoft.com/clustering/2014/12/08/troubleshooting-cluster-shared-volume-auto-pauses-event-5120/ </A> we’ve discussed that it is usually caused by an IO on the coordinating node taking long time. As a result of that CSVFS on a non-coordinating node would get an error STATUS_IO_TIMEOUT. CSVFS will notify cluster service about that event. Cluster service will create LiveDump with report type CsvIoT on the coordinating node where IO is taking time. If we are lucky, and the IO has not completed before&nbsp;the LiveDump has been generated&nbsp;then we can load the dump using WinDbg to&nbsp;try to find the IO that is taking a&nbsp;long time and see who owns that IO. <BR /> <BR /> Thanks! <BR /> Vladimir Petter <BR /> Principal Software Engineer <BR /> High-Availability &amp; Storage <BR /> Microsoft <BR /> <BR /> <BR /> <H2> Additional Resources: </H2> <BR /> To learn more, here are others in the Cluster Shared Volume (CSV) blog series: <BR /> <BR /> Cluster Shared Volume (CSV) Inside Out <BR /> <A href="#" target="_blank"> http://blogs.msdn.com/b/clustering/archive/2013/12/02/10473247.aspx </A> <BR /> <BR /> Cluster Shared Volume Diagnostics <BR /> <A href="#" target="_blank"> http://blogs.msdn.com/b/clustering/archive/2014/03/13/10507826.aspx </A> <BR /> <BR /> Cluster Shared Volume Performance Counters <BR /> <A href="#" target="_blank"> http://blogs.msdn.com/b/clustering/archive/2014/06/05/10531462.aspx </A> <BR /> <BR /> Cluster Shared Volume Failure Handling <BR /> <A href="#" target="_blank"> http://blogs.msdn.com/b/clustering/archive/2014/10/27/10567706.aspx </A> <BR /> <BR /> Troubleshooting Cluster Shared Volume Auto-Pauses – Event 5120 <BR /> <A href="#" target="_blank"> http://blogs.msdn.com/b/clustering/archive/2014/12/08/10579131.aspx </A> <BR /> <BR /> Troubleshooting Cluster Shared Volume Recovery Failure – System Event 5142 <BR /> <A href="#" target="_blank"> http://blogs.msdn.com/b/clustering/archive/2015/03/26/10603160.aspx </A> <BR /> <BR /> Cluster Shared Volume – A Systematic Approach to Finding Bottlenecks <BR /> <A href="#" target="_blank"> https://blogs.msdn.microsoft.com/clustering/2015/07/29/cluster-shared-volume-a-systematic-approach-to-finding-bottlenecks/ </A> </BODY></HTML> Fri, 15 Mar 2019 22:06:27 GMT https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/troubleshooting-hangs-using-live-dump/ba-p/372080 Elden Christensen 2019-03-15T22:06:27Z Managing Failover Clusters with 5nine Manager https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/managing-failover-clusters-with-5nine-manager/ba-p/372079 <HTML> <HEAD></HEAD><BODY> <STRONG> First published on MSDN on Jan 21, 2016 </STRONG> <BR /> <P> Hi Cluster Fans, </P> <BR /> <P> It is nice <A href="#" target="_blank"> to be back </A> on the Cluster Team Blog!&nbsp; After founding this blog and working closely with the cluster team for almost eight years, I left Microsoft last year to join a Hyper-V software partner, <A href="#" target="_blank"> 5nine Software </A> .&nbsp; I’ve spoken with thousands of customers and I realized that Failover Clustering is so essential to Hyper-V, that a majority of all VMs are using it, and it is businesses of all sizes that are doing this, not just enterprises.&nbsp; Most organizations need continual availability for their services to run 24/7, and their customers expect it.&nbsp; Failover Clustering is now commonplace even amongst small and medium-sized businesses.&nbsp; I was able to bring my passion for cluster management to 5nine’s engineering team, and into 5nine’s most popular SMB product, <A href="#" target="_blank"> 5nine Manager </A> .&nbsp; This blog provides an overview of how 5nine Manager can help you centralize management of your clustered resources. </P> <BR /> <P> Create a Cluster </P> <BR /> <P> 5nine Manager lets you discover hosts and create a Failover Cluster.&nbsp; It will allow you to specify nodes, run Cluster Validation, provide a client access point, and then create the cluster. </P> <BR /> <P> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90602iC066F07D6B80F0D0" /> </P> <BR /> <H2> Validate a Cluster </H2> <BR /> <P> Failover Cluster validation is an essential task in all deployments as it is required for a support cluster.&nbsp; With 5nine Manager you can test the health of your cluster during configuration, or afterwards as a troubleshooting tool.&nbsp; You can granularly select the different tests to run, and see the same graphical report as you are familiar with. </P> <BR /> <P> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90603i675D703EED4BC3BE" /> </P> <BR /> <H2> Host Best Practice Analyzer </H2> <BR /> <P> In addition to testing the clustering configuration, you can also run a series of advanced Hyper-V tests on each of the hosts and Scale-Out File Servers through 5nine Manager.&nbsp; The results will provide recommendations to enhance your node’s stability and performance. </P> <BR /> <P> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90604iB4B222243587249E" /> </P> <BR /> <H2> Configure Live Migration Settings </H2> <BR /> <P> It is important to have a dedicated network to Live Migration to ensure that its traffic does not interfere with cluster heartbeats or other important traffic.&nbsp; With 5nine Manager you can specify the number of simultaneous Live Migrations and Storage Live Migrations, and even copy those settings to the other cluster nodes. </P> <BR /> <P> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90605iAAD8EA34FA9BB361" /> </P> <BR /> <H2> View Cluster Summary </H2> <BR /> <P> 5nine Manager has a Summary Dashboard which centrally reports the health of the cluster and its VMs.&nbsp; It quickly identifies nodes or VMs with problems, and lists any alerts from its resources.&nbsp; This Summary Dashboard can also be refocused at the Datacenter, Cluster, Host, and VM level for more refined results. </P> <BR /> <P> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90606iE649B793E2D32EC4" /> </P> <BR /> <H2> Manage Cluster Nodes </H2> <BR /> <P> Using 5nine Manager you can configure your virtual disk and network settings.&nbsp; You can also perform standard maintenance tasks, such as to Pause and Resume a cluster node, which can live migrate VMs to other nodes.&nbsp; A list of active and failed cluster tasks is also displayed through the interface. </P> <BR /> <P> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90607iD1B9CA89845983D4" /> </P> <BR /> <H2> Manage Clustered VMs </H2> <BR /> <P> You can manage any type of virtual machine that is supported by Hyper-V, including Windows Server, Windows, Linux, UNIX, and Windows Server 2016 Nano Server.&nbsp; 5nine Manager lets you centrally manage all your virtual machines, including the latest performance and security feature for virtualization.&nbsp; The full GUI console even runs on all versions of Windows Server, including the otherwise GUI-less Windows Server Core and Hyper-V Server. </P> <BR /> <P> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90608iF0E995F590A60F50" /> </P> <BR /> <H2> Cluster Status Report </H2> <BR /> <P> It is now easy to create a report about the configuration and health of your cluster, showing you information about the configuration and settings for every resource.&nbsp; This document can be exported and retained for compliance. </P> <BR /> <P> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90609iC15D96C6463BC891" /> </P> <BR /> <H2> Host Load Balancing </H2> <BR /> <P> 5nine Manager allows you to pair cluster nodes and hosts to form a group that will load balance VMs.&nbsp; It live migrates the VMs between hosts when customizable threshold are exceeded.&nbsp; This type of dynamic optimization ensures that a single host does not get overloaded, providing higher-availability and greater performance for the VMs. </P> <BR /> <P> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90610i9728E11EEB8FDD79" /> </P> <BR /> <H2> Cluster Logs </H2> <BR /> <P> Sometime it can be difficult to see all the events from across your cluster.&nbsp; 5nine Manager pulls together all the logs for your clusters, hosts and VMs to simplify troubleshooting. </P> <BR /> <P> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90611iDC0ED80CA98D1546" /> </P> <BR /> <H2> Cluster Monitoring </H2> <BR /> <P> 5nine Manager provides a Monitor Dashboard to provide current and historical data about the usage of your clusters, hosts and VMs.&nbsp; It will show you which VMs are consuming the most resources, the latest alarms, and a graphical view of CPU, memory, disk and network usage.&nbsp; You can also browse through previous performance data to help isolate a past issue. </P> <BR /> <P> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90612i5AEF36DC61295373" /> </P> <BR /> <H2> Hyper-V Replica with Clustering </H2> <BR /> <P> Hyper-V Replica allows a virtual machine’s virtual hard disk to be copied to a secondary location for disaster recovery.&nbsp; Using 5nine Manager you can configure the Replication Settings on a host, then apply them to other cluster nodes and hosts. </P> <BR /> <P> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90613iBC2D9361694D61D9" /> </P> <BR /> <P> You can also configure replication on a virtual machine that is running a cluster node with the Hyper-V Replica Broker configured.&nbsp; The health state of replica is also displayed in the centralized console. </P> <BR /> <P> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90614i1B5C54686DD83B3F" /> </P> <BR /> <P> Failover Clustering should be an integral part of your virtualized infrastructure, and <A href="#" target="_blank"> 5nine Manager </A> provides a way to centrally manage all your clustered VMs.&nbsp; Failover cluster support will continue to be enhanced in future releases of 5nine Manager. </P> <BR /> <P> Thanks! <BR /> Symon Perriman <BR /> VP, 5nine Software <BR /> Hyper-V MVP <BR /> <A href="#" target="_blank"> @SymonPerriman </A> </P> </BODY></HTML> Fri, 15 Mar 2019 22:06:20 GMT https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/managing-failover-clusters-with-5nine-manager/ba-p/372079 Rob Hindman 2019-03-15T22:06:20Z How can we improve the installation and patching of Windows Server? (Survey Request) https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/how-can-we-improve-the-installation-and-patching-of-windows/ba-p/372065 <HTML> <HEAD></HEAD><BODY> <STRONG> First published on MSDN on Sep 11, 2015 </STRONG> <BR /> <P> Do you want your server OS deployment and servicing to move faster? We're a team of Microsoft engineers who want your experiences and ideas around solving real problems of deploying and servicing your server OS infrastructure. We prefer that you don't love server OS deployment already, and we’re interested even if you don’t use Windows Server. We need to learn it and earn it. </P> <P> Click the link below if you wish to fill out a brief survey and perhaps participate in a short phone call. </P> <P> <A href="#" target="_blank"> <STRONG> https://aka.ms/deployland </STRONG> </A> </P> <P> Many Thanks!!! </P> <P> -Rob. </P> <P> </P> <BR /> </BODY></HTML> Fri, 15 Mar 2019 22:03:55 GMT https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/how-can-we-improve-the-installation-and-patching-of-windows/ba-p/372065 Rob Hindman 2019-03-15T22:03:55Z Configuring Site Awareness with Multi-active Disaggregated Datacenters https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/configuring-site-awareness-with-multi-active-disaggregated/ba-p/372064 <HTML> <HEAD></HEAD><BODY> <STRONG> First published on MSDN on Sep 10, 2015 </STRONG> <BR /> In a previous <A href="#" target="_blank"> blog </A> ,I discussed the introduction of site-aware Failover Clusters in Windows Server 2016. In this blog, I am going to walk through how you can configure site-awareness for your multi-active disaggregated datacenters. You can learn more about Software Defined Storage and the advantages of a disaggregated datacenter <A href="#" target="_blank"> here </A> . <BR /> <BR /> Consider the following multi-active datacenters, with a compute and a storage cluster, stretched across two datacenters. Each cluster has two nodes on each datacenter. <BR /> <BR /> To configure site-awareness for the stretched compute and storage clusters proceed as follows: <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90601iC76B2FFB2685B06F" /> <BR /> <H2> Compute Stretch Cluster </H2> <BR /> 1)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Assign the nodes in the cluster to one of the two datacenters (sites). <BR /> <UL> <BR /> <LI> Open PowerShell <SUP> © </SUP> as an Administrator and type: </LI> <BR /> </UL> <BR /> (Get-ClusterNode Node1).Site = 1 <BR /> (Get-ClusterNode Node2).Site = 1 <BR /> (Get-ClusterNode Node3).Site = 2 <BR /> (Get-ClusterNode Node4).Site = 2 <BR /> 2)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Configure the site for your primary datacenter. <BR /> (Get-Cluster).PreferredSite = 1 <BR /> <H2> Storage Stretch Cluster </H2> <BR /> In multi-active disaggregated datacenters, the storage stretch cluster hosts a <A href="#" target="_blank"> Scale-Out File Server </A> (SoFS). For optimal performance, it should be ensured that the site hosting the Cluster Shared Volumes comprising the SoFS, follows the site hosting the compute workload. This avoids the cost of cross-datacenter network traffic. <BR /> <BR /> 1)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;As in the case of the compute cluster assign the nodes in the storage cluster to one of the two datacenters (sites). <BR /> (Get-ClusterNode Node5).Site = 1 <BR /> (Get-ClusterNode Node6).Site = 1 <BR /> (Get-ClusterNode Node7).Site = 2 <BR /> (Get-ClusterNode Node8).Site = 2 <BR /> 2)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;For each Cluster Shared Volume (CSV) in the cluster, configure the preferred site for the CSV group to be the same as the preferred site for the Compute Cluster. <BR /> $csv1 = Get-ClusterSharedVolume "Cluster Disk 1" | Get-ClusterGroup <BR /> ($csv1).PreferredSite = 1 <BR /> 3)&nbsp; Set each CSV group in the cluster to automatically failback to the preferred site when it is available after a datacenter outage. <BR /> ($csv1).AutoFailbackType = 1 <BR /> <STRONG> Note: </STRONG> Step 2 and 3 can also be used to configure the Preferred Site for a CSV group in a hyper-converged data-center deployment. You can learn more about hyper-converged deployments in Windows Server 2016 <A href="#" target="_blank"> here </A> . <BR /> <BR /> </BODY></HTML> Fri, 15 Mar 2019 22:03:50 GMT https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/configuring-site-awareness-with-multi-active-disaggregated/ba-p/372064 John Marlin 2019-03-15T22:03:50Z Hyper-converged with Windows Server 2016 https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/hyper-converged-with-windows-server-2016/ba-p/372062 <HTML> <HEAD></HEAD><BODY> <STRONG> First published on MSDN on Sep 08, 2015 </STRONG> <BR /> One of the big hot features in Windows Server 2016 which has me really excited is Storage Spaces Direct (S2D).&nbsp; With S2D you will be able to create a hyper-converged private cloud.&nbsp; A hyper-converged infrastructure (HCI) consolidates compute and storage into a common set of servers.&nbsp; Leveraging internal storage which is replicated, you can create a true Software-defined Storage (SDS) solution. <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90600i8C4BBA3FDB8F4C98" /> <BR /> <BR /> This is available in the Windows Server 2016 Technical Preview today!&nbsp; I encourage you to go try it out and give us some feedback.&nbsp; Here's where you can learn more: <BR /> <H3> Presentation from Ignite 2015: </H3> <BR /> <P> <EM> Storage Spaces Direct in Windows Server 2016 Technical Preview </EM> <BR /> <A href="#" target="_blank"> https://channel9.msdn.com/events/Ignite/2015/BRK3474 </A> </P> <BR /> <BR /> <H3> Deployment guide: </H3> <BR /> <P> <EM> Enabling Private Cloud Storage Using Servers with Local Disks </EM> </P> <BR /> <P> <A href="#" target="_blank"> https://technet.microsoft.com/en-us/library/mt126109.aspx </A> </P> <BR /> <BR /> <H3> Claus Joergensen's blog: </H3> <BR /> <P> <EM> Storage Spaces Direct </EM> <BR /> <A href="#" target="_blank"> http://blogs.technet.com/b/clausjor/archive/2015/05/14/storage-spaces-direct.aspx </A> </P> <BR /> Thanks! <BR /> Elden Christensen <BR /> Principal PM Manager <BR /> High-Availability &amp; Storage <BR /> Microsoft </BODY></HTML> Fri, 15 Mar 2019 22:03:30 GMT https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/hyper-converged-with-windows-server-2016/ba-p/372062 Elden Christensen 2019-03-15T22:03:30Z Site-aware Failover Clusters in Windows Server 2016 https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/site-aware-failover-clusters-in-windows-server-2016/ba-p/372060 <P>Windows Server 2016, debuts the birth of site-aware clusters. Nodes in stretched clusters can now be grouped based on their physical location (site). Cluster site-awareness enhances key operations during the cluster lifecycle such as failover behavior, placement policies, heartbeating between the nodes and quorum behavior. In the remainder of this blog I will explain how you can configure sites for your cluster, the notion of a “preferred site” and how site awareness manifests itself in your cluster operations.</P> <P>&nbsp;</P> <P>Please note that when talking about sites, we are talking about stretching a cluster between two locations.&nbsp; In the case of Storage Spaces Direct, we have not tested, nor do we support nodes of the same Storage Spaces Direct cluster to be in different locations (sites).</P> <P>&nbsp;</P> <H2>Configuring Sites</H2> <P>&nbsp;</P> <P>A node’s site membership can be configured by setting the Site node property to a unique numerical value. <BR /><BR />For example, in a four node cluster with nodes - Node1, Node2, Node3 and Node4, to assign the nodes to Sites 1 and Site 2, do the following: <BR /><BR /></P> <UL> <UL> <LI>Launch Microsoft PowerShell <SUP> © </SUP> as an Administrator and type:</LI> </UL> </UL> <PRE>#Create Site Fault Domains <BR />New-ClusterFaultDomain –Name Seattle –Type Site –Description “Primary” –Location “Seattle DC” <BR />New-ClusterFaultDomain –Name Denver –Type Site –Description “Secondary” –Location “Denver DC” <BR /><BR />#Set Fault Domain membership <BR />Set-ClusterFaultDomain –Name Node1 –Parent Seattle <BR />Set-ClusterFaultDomain –Name Node2 –Parent Seattle <BR /><BR />Set-ClusterFaultDomain –Name Node3 –Parent Denver <BR />Set-ClusterFaultDomain –Name Node4 –Parent Denver </PRE> <P><BR />Configuring sites enhances the operation of your cluster in the following ways:</P> <P>&nbsp;</P> <H3><STRONG>Failover Affinity </STRONG> <BR /><BR /></H3> <UL> <UL> <LI>Groups failover to a node within the same site, before failing to a node in a different site</LI> </UL> </UL> <UL> <UL> <LI>During <STRONG> Node Drain </STRONG> VMs are moved first to a node within the same site before being moved cross site</LI> </UL> </UL> <UL> <UL> <LI>The <STRONG> CSV load balancer </STRONG> will distribute within the same site</LI> </UL> </UL> <P><BR /><BR /><STRONG> Storage Affinity </STRONG> <BR />Virtual Machines (VMs) follow storage and are placed in same site where their associated storage resides. VMs will begin live migrating to the same site as their associated CSV after 1 minute of the storage being moved. <BR /><STRONG> Cross-Site Heartbeating </STRONG> <BR />You now have the ability to configure the thresholds for heartbeating between sites. These thresholds are controlled by the following new cluster properties:</P> <TABLE> <TBODY> <TR> <TD><BR /> <P><STRONG> Property </STRONG></P> </TD> <TD><BR /> <P><STRONG> Default Value </STRONG></P> </TD> <TD><BR /> <P><STRONG> Description </STRONG></P> </TD> </TR> <TR> <TD><BR /> <P>CrossSiteDelay</P> </TD> <TD><BR /> <P>1000</P> </TD> <TD><BR /> <P>Amount of time between each heartbeat sent to nodes on dissimilar sites in milliseconds</P> </TD> </TR> <TR> <TD><BR /> <P>CrossSiteThreshold</P> </TD> <TD><BR /> <P>20</P> </TD> <TD><BR /> <P>Missed heartbeats before interface considered down to nodes on dissimilar sites</P> </TD> </TR> </TBODY> </TABLE> <P><BR />To configure the above properties launch PowerShell <SUP> © </SUP> as an Administrator and type:</P> <PRE>(Get-Cluster).CrossSiteDelay = &lt;value&gt; <BR />(Get-Cluster).CrossSiteThreshold = &lt;value&gt; </PRE> <P>You can find more information on other properties controlling failover clustering heartbeating <A href="#" target="_blank" rel="noopener"> here </A> . <BR /><BR />The following rules define the applicability of the thresholds controlling heartbeating between two cluster nodes: <BR /><BR /></P> <UL> <UL> <LI>If the two cluster nodes are in two different sites and two different subnets, then the Cross-Site thresholds will override the Cross-Subnet thresholds.</LI> </UL> </UL> <UL> <UL> <LI>If the two cluster nodes are in two different sites and the same subnets, then the Cross-Site thresholds will override the Same-Subnet thresholds.</LI> </UL> </UL> <UL> <UL> <LI>If the two cluster nodes are in the same site and two different subnets, then the Cross-Subnet thresholds will be effective.</LI> </UL> </UL> <UL> <UL> <LI>If the two cluster nodes are in the same site and the same subnets, then the Same-Subnet thresholds will be effective.</LI> </UL> </UL> <P>&nbsp;</P> <H2>Configuring Preferred Site</H2> <P><BR />In addition to configuring the site a cluster node belongs to, a “Preferred Site” can be configured for the cluster. The Preferred Site is a preference for placement. The Preferred Site will be your Primary datacenter site. <BR /><BR />Before the Preferred Site can be configured, the site being chosen as the preferred site needs to be assigned to a set of cluster nodes. To configure the Preferred Site for a cluster, launch PowerShell <SUP> © </SUP> as an Administrator and type:</P> <PRE>(Get-Cluster).PreferredSite = &lt;Site assigned to a set of cluster nodes&gt; </PRE> <P>Configuring a Preferred Site for your cluster enhances operation in the following ways:</P> <P>&nbsp;</P> <P><STRONG>Cold Start </STRONG> <BR />During a cold start VMs are placed in in the preferred site</P> <P>&nbsp;</P> <P><STRONG>Quorum&nbsp;</STRONG></P> <UL> <UL> <LI>Dynamic Quorum drops weights from the Disaster Recovery site (DR site i.e. the site which is not designated as the Preferred Site) first to ensure that the Preferred Site survives if all things are equal. In addition, nodes are pruned from the DR site first, during regroup after events such as asymmetric network connectivity failures.</LI> </UL> </UL> <UL> <UL> <LI>During a Quorum Split i.e. the even split of two datacenters with no witness, the Preferred Site is automatically elected to win&nbsp;<BR /> <UL> <UL> <LI>The nodes in the DR site drop out of cluster membership</LI> </UL> </UL> <BR /> <UL> <UL> <LI>This allows the cluster to survive a simultaneous 50% loss of votes</LI> </UL> </UL> <BR /> <UL> <UL> <LI>Note that the LowerQuorumPriorityNodeID property previously controlling this behavior is deprecated in Windows Server 2016</LI> </UL> </UL> <BR /><BR /></LI> </UL> </UL> <P><STRONG> Preferred Site and Multi-master Datacenters </STRONG> <BR />The Preferred Site can also be configured at the granularity of a cluster group i.e. a different preferred site can be configured for each group. This enables a datacenter to be active and preferred for specific groups/VMs. <BR /><BR />To configure the Preferred Site for a cluster group, launch PowerShell <SUP> © </SUP> as an Administrator and type:</P> <PRE>(Get-ClusterGroup -Name &lt;GroupName&gt;).PreferredSite = &lt;Site assigned to a set of cluster nodes&gt;</PRE> <P><BR /><STRONG> Placement Priority </STRONG> <BR />Groups in a cluster are placed based on the following site priority:&nbsp;</P> <OL> <LI>Storage affinity site</LI> <LI>Group preferred site</LI> <LI>Cluster preferred site</LI> </OL> <P>&nbsp;</P> <H2>Additional Information:</H2> <P><BR />Fault Domains are being introduced for clustering in Windows Server 2016, which provide Node, Chasse, Rack, and Site awareness.&nbsp; See this blog as well as the below video's to learn more about this new feature: <A href="#" target="_blank" rel="noopener"> https://technet.microsoft.com/en-us/windows-server-docs/storage/storage-spaces/fault-domains-windows-server-2016 </A></P> <H2>Fault Domain Awareness in WS2016 - Part 1: Overview <BR /><IFRAME src="https://channel9.msdn.com/Blogs/windowsserver/Fault-Domain-Awareness-in-WS2016-Part-1-Overview/player" width="960" height="540" frameborder="0"> </IFRAME></H2> <P><BR /><BR /></P> <H2>Fault Domain Awareness in WS2016 - Part 2: Using PowerShell</H2> <P><BR /><IFRAME src="https://channel9.msdn.com/Blogs/windowsserver/Fault-Domain-Awareness-in-WS2016-Part-2-Using-PowerShell/player" width="960" height="540" frameborder="0"> </IFRAME> <BR /><BR /><BR /></P> <H2>Fault Domain Awareness in WS2016 - Part 3: Using&nbsp;XML</H2> <P>&nbsp;</P> <H2><IFRAME src="https://channel9.msdn.com/Blogs/windowsserver/Fault-Domain-Awareness-in-WS2016-Part-3-Using-XML/player" width="960" height="540" frameborder="0"> </IFRAME></H2> <P><BR /><BR /></P> <H2>Fault Domain Awareness in WS2016 - Part 4: Location, Description <BR /><IFRAME src="https://channel9.msdn.com/Blogs/windowsserver/Fault-Domain-Awareness-in-WS2016-Part-4-Location-Description/player" width="960" height="540" frameborder="0"> </IFRAME></H2> Thu, 21 May 2020 21:50:30 GMT https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/site-aware-failover-clusters-in-windows-server-2016/ba-p/372060 John Marlin 2020-05-21T21:50:30Z Workgroup and Multi-domain clusters in Windows Server 2016 https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/workgroup-and-multi-domain-clusters-in-windows-server-2016/ba-p/372059 <P><STRONG> First published on MSDN on Aug 17, 2015 </STRONG> <BR />In Windows Server 2012 R2 and previous versions, a cluster could only be created between member nodes joined to the same domain. Windows Server 2016 breaks down these barriers and introduces the ability to create a Failover Cluster without Active Directory dependencies. Failover Clusters can now therefore be created in the following configurations: <BR /><BR /></P> <UL> <UL> <LI><STRONG> Single-domain Clusters: </STRONG> Clusters with all nodes joined to the same domain</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI><STRONG> Multi-domain Clusters: </STRONG> Clusters with nodes which are members of different domains</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI><STRONG> Workgroup Clusters: </STRONG> Clusters with nodes which are member servers / workgroup (not domain joined)</LI> </UL> </UL> <P><BR /><BR /></P> <H2>Pre-requisites</H2> <P><BR />The prerequisites for Single-domain clusters are unchanged from previous versions of Windows Server. <BR /><BR /></P> <UL> <UL> <LI>All servers must be running Windows Server 2016.</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>All servers must have the Failover Clustering feature installed.</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>All servers must use logo’d hardware that has been certified and the collection of servers must pass all cluster validation tests. For more information, see <A href="#" target="_blank" rel="noopener"> Failover Clustering Hardware Requirements and Storage Options </A> and <A href="#" target="_blank" rel="noopener"> Validate Hardware for a Failover Cluster </A> .</LI> </UL> </UL> <P><BR /><BR />In addition to the pre-requisites of Single-domain clusters, the following are the pre-requisites for Multi-domain or Workgroup clusters in the Windows Server 2016: <BR /><BR /></P> <UL> <UL> <LI>To create a new cluster or to add nodes to the cluster, a local account needs to be provisioned on all nodes of the cluster (as well as the node from which the operation is invoked) with the following requirements:</LI> </UL> </UL> <P><BR /><BR /><BR /><BR /></P> <OL> <OL> <OL> <LI>Create a local ‘User’ account on each node in the cluster</LI> </OL> </OL> </OL> <P>&nbsp;</P> <OL> <OL> <OL> <LI>The username and password of the account must be the same on all nodes</LI> </OL> </OL> </OL> <P>&nbsp;</P> <OL> <OL> <OL> <LI>The account is a member of the local ‘Administrators’ group on each node</LI> </OL> </OL> </OL> <P>&nbsp;</P> <OL> <OL> <OL> <LI>When using&nbsp;a non-builtin local administrator account to create the cluster, set the LocalAccountTokenFilterPolicy registry policy to <STRONG> 1 </STRONG> , on all the nodes of the cluster. Builtin administrator accounts include the 'Administrator' account. You can set the&nbsp;LocalAccountTokenFilterPolicy&nbsp;registry policy&nbsp;as follows:</LI> </OL> </OL> </OL> <P><BR /><BR /><BR /></P> <UL> <UL> <LI>On each node of the cluster launch a&nbsp;Microsoft PowerShell&nbsp;shell as an administrator and type:</LI> </UL> </UL> <P><BR /><BR /><BR />new-itemproperty -path HKLM:\SOFTWARE\Microsoft\Windows\CurrentVersion\Policies\System -Name LocalAccountTokenFilterPolicy -Value 1</P> <P><span class="lia-inline-image-display-wrapper lia-image-align-inline" style="width: 998px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90591iC687BE4FFEFA5F51/image-size/large?v=v2&amp;px=999" role="button" /></span></P> <P>&nbsp;</P> <P>Without setting this policy you will see the following error while trying to create a cluster using non-builtin administrator accounts.</P> <P>&nbsp;</P> <P><span class="lia-inline-image-display-wrapper lia-image-align-inline" style="width: 823px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90592i39CE0B827BEBE541/image-size/large?v=v2&amp;px=999" role="button" /></span></P> <P><BR /><BR /><BR /></P> <UL> <UL> <LI>The Failover Cluster needs to be created as an <A href="#" target="_blank" rel="noopener"> Active Directory-Detached Cluster </A> without any associated computer objects. Therefore, the cluster needs to have a Cluster Network Name (also known as administrative access point) of type DNS.</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>Primary DNS Suffix Requirements <BR /><BR /> <UL> <UL> <LI>Each cluster node needs to have a primary DNS suffix.</LI> </UL> </UL> <BR /> <UL> <UL> <LI>For Multi-domain Clusters: The DNS suffix for all the domains in the cluster, should be present on all cluster nodes…</LI> </UL> </UL> <BR /><BR /></LI> </UL> </UL> <P><BR /><BR /></P> <P><span class="lia-inline-image-display-wrapper lia-image-align-inline" style="width: 550px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90593i537896931FF9918B/image-size/large?v=v2&amp;px=999" role="button" /></span></P> <P><BR /><BR /></P> <H2>Deployment</H2> <P><BR />Workgroup and Multi-domain clusters maybe deployed using the following steps: <BR /><BR /></P> <OL> <OL> <LI>Create consistent local user accounts on all nodes of the cluster. Ensure that the username and password of these accounts are same on all the nodes and add the account to the local Administrators group.</LI> </OL> </OL> <P><BR /><BR /></P> <P><span class="lia-inline-image-display-wrapper lia-image-align-inline" style="width: 752px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90594iCCEA7946494238DE/image-size/large?v=v2&amp;px=999" role="button" /></span></P> <P><BR />2.&nbsp;&nbsp;&nbsp;&nbsp;Ensure that each node to be joined to the cluster has a primary DNS suffix.</P> <P><span class="lia-inline-image-display-wrapper lia-image-align-inline" style="width: 418px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90595i64DC79ABE8B7AF2B/image-size/large?v=v2&amp;px=999" role="button" /></span></P> <P><BR />For Multi-domain Clusters ensure that the&nbsp;DNS suffix for all the domains in the cluster is present on all cluster nodes. <BR /><BR /><span class="lia-inline-image-display-wrapper lia-image-align-inline" style="width: 927px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90596i0AA285F159DE5EA0/image-size/large?v=v2&amp;px=999" role="button" /></span></P> <P>3.&nbsp;&nbsp;&nbsp; Create a Cluster with the Workgroup nodes or nodes joined to different domains. You may use the Failover Cluster Manager or Microsoft PowerShell.</P> <P>&nbsp;</P> <P><EM> Using Failover Cluster Manager </EM></P> <P>&nbsp;</P> <P>The following video shows the steps to create a Workgroup or Multi-Domain cluster using the Failover Cluster Manager UI.</P> <P><BR />[video width="1920" height="1080" mp4="<A href="#" target="_blank" rel="noopener">https://msdnshared.blob.core.windows.net/media/2016/08/WorkgroupCluster.mp4"][/video</A>]</P> <P><EM> Using PowerShell </EM></P> <P>&nbsp;</P> <P>When creating the cluster, use the AdministrativeAccessPoint switch to specify a type of DNS so that the cluster does not attempt to create computer objects.</P> <P><BR /><BR /><span class="lia-inline-image-display-wrapper lia-image-align-inline" style="width: 534px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90597iF7ED1E877965F182/image-size/large?v=v2&amp;px=999" role="button" /></span> <BR />New-Cluster –Name &lt;Cluster Name&gt; -Node &lt;Nodes to Cluster&gt; -AdministrativeAccessPoint DNS</P> <P><span class="lia-inline-image-display-wrapper lia-image-align-inline" style="width: 845px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90598i25B85196E10510A7/image-size/large?v=v2&amp;px=999" role="button" /></span></P> <P>&nbsp;</P> <P><span class="lia-inline-image-display-wrapper lia-image-align-inline" style="width: 561px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90599i569A3447359A8C64/image-size/large?v=v2&amp;px=999" role="button" /></span></P> <P><BR /><BR /></P> <H2>Workload</H2> <P><BR />The following table summarizes the workload support for Workgroup and Multi-site clusters.</P> <TABLE> <TBODY> <TR> <TD><BR /> <P><STRONG> Cluster Workload </STRONG></P> </TD> <TD><BR /> <P><STRONG> Supported/Not Supported </STRONG></P> </TD> <TD><BR /> <P><STRONG> More Information </STRONG></P> </TD> </TR> <TR> <TD><BR /> <P>SQL Server</P> </TD> <TD><BR /> <P>Supported</P> </TD> <TD><BR /> <P>We recommend that you use SQL Server Authentication.&nbsp; This will apply to only SQL Server Always On Availability Groups (AGs).&nbsp; SQL Server Failover Cluster Instances (FCI) will require Kerberos for Active Directory authentication.</P> </TD> </TR> <TR> <TD><BR /> <P>File Server</P> </TD> <TD><BR /> <P>Supported, but not recommended</P> </TD> <TD><BR /> <P>Kerberos (which is not available) authentication is the preferred authentication protocol for Server Message Block (SMB) traffic.</P> </TD> </TR> <TR> <TD><BR /> <P>Hyper-V</P> </TD> <TD><BR /> <P>Supported, but not recommended</P> </TD> <TD><BR /> <P>Live migration is not supported. Quick migration is supported.</P> </TD> </TR> <TR> <TD><BR /> <P>Message Queuing (MSMQ)</P> </TD> <TD><BR /> <P>Not supported</P> </TD> <TD><BR /> <P>Message Queuing stores properties in AD DS.</P> </TD> </TR> </TBODY> </TABLE> <P>&nbsp;</P> <H2>Quorum Configuration</H2> <P><BR />The witness type recommended for Workgroup clusters and Multi-domain clusters is a <A href="#" target="_blank" rel="noopener"> Cloud Witness </A> or Disk Witness. &nbsp;File Share Witness (FSW) is not supported with a Workgroup or Multi-domain cluster.</P> <H2>Servicing</H2> <P><BR />It is recommended that nodes in a cluster have a consistent configuration. &nbsp;Multi-domain and Workgroup clusters introduce higher risk of configuration drift, when deploying ensure that: <BR /><BR /></P> <UL> <UL> <LI>The same set of Windows patches are applied to all nodes in the clusters</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>If group policies are rolled out to the cluster nodes, they are not conflicting.</LI> </UL> </UL> <P><BR /><BR /></P> <H2>DNS Replication</H2> <P><BR />It should be ensured that the cluster node and network names for Workgroup and Multi-domain clusters are replicated to the DNS servers authoritative for the cluster nodes.</P> Tue, 09 Apr 2019 17:57:00 GMT https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/workgroup-and-multi-domain-clusters-in-windows-server-2016/ba-p/372059 John Marlin 2019-04-09T17:57:00Z Cluster Shared Volume - A Systematic Approach to Finding Bottlenecks https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/cluster-shared-volume-a-systematic-approach-to-finding/ba-p/372049 <P><STRONG> First published on MSDN on Jul 29, 2015 </STRONG></P> <DIV><BR /> <P>In this post we will discuss how to find if performance that you observe on a Cluster Shared Volume (CSV) is what you expect and how to find which layer in your solution may be the bottleneck. This blog assumes you have read the previous blogs in the CSV series (see the bottom of this blog for links to all the blogs in the series).</P> <BR /> <P>Cluster Shared Volume (CSV) Inside Out<BR /><A href="https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/cluster-shared-volume-csv-inside-out/ba-p/371872" target="_self">https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/cluster-shared-volume-csv-inside-out/ba-p/371872</A></P> <P>&nbsp;</P> <P>Cluster Shared Volume Diagnostics<BR /><A href="https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/cluster-shared-volume-diagnostics/ba-p/371908" target="_self">https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/cluster-shared-volume-diagnostics/ba-p/371908</A></P> <P>&nbsp;</P> <P>Cluster Shared Volume Performance Counters<BR /><A href="https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/cluster-shared-volume-performance-counters/ba-p/371980" target="_self">https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/cluster-shared-volume-performance-counters/ba-p/371980</A></P> <P>&nbsp;</P> <P>Cluster Shared Volume Failure Handling<BR /><A href="https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/cluster-shared-volume-failure-handling/ba-p/371989" target="_self">https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/cluster-shared-volume-failure-handling/ba-p/371989</A></P> <P>&nbsp;</P> <P>Troubleshooting Cluster Shared Volume Auto-Pauses – Event 5120<BR /><A href="https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/troubleshooting-cluster-shared-volume-auto-pauses-8211-event/ba-p/371994" target="_self">https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/troubleshooting-cluster-shared-volume-auto-pauses-8211-event/ba-p/371994</A></P> <P>&nbsp;</P> <P>Troubleshooting Cluster Shared Volume Recovery Failure – System Event 5142<BR /><A href="https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/troubleshooting-cluster-shared-volume-recovery-failure-8211/ba-p/371997" target="_self">https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/troubleshooting-cluster-shared-volume-recovery-failure-8211/ba-p/371997</A></P> <P>&nbsp;</P> </DIV> <P>Sometimes someone asks a question in why CSV performance does not match their expectations and how to investigate. The answer is that CSV consists of multiple layers, and the most straight forward troubleshooting approach is through a process of elimination to first remove all the layers, test speed of the disk and then start adding layers one by one until you find the one causing the issue. <BR /><BR />You might be tempted to use copy file as a quick way to test performance. While copy file is an important workload it is not the best way to test your storage performance. Review this blog which goes into more details why it does not work well.</P> <P>&nbsp;</P> <P>Using file copy to measure storage performance – Why it’s not a good idea and what you should do instead</P> <P><A href="#" target="_blank" rel="noopener">https://docs.microsoft.com/en-us/archive/blogs/josebda/using-file-copy-to-measure-storage-performance-why-its-not-a-good-idea-and-what-you-should-do-instead</A></P> <P>&nbsp;</P> <P>It is important to understand copy file performance that you can expect from your storage so I would suggest to run copy file after you are done with micro benchmarks as a part of workload testing. <BR /><BR />To test performance you can use DiskSpd that is described in this blog post.</P> <P>&nbsp;</P> <P>DiskSpd, PowerShell and storage performance: measuring IOPs, throughput and latency for both local disks and SMB file shares</P> <P><A href="#" target="_blank" rel="noopener">https://docs.microsoft.com/en-us/archive/blogs/josebda/diskspd-powershell-and-storage-performance-measuring-iops-throughput-and-latency-for-both-local-disks-and-smb-file-shares</A><BR /><BR />When selecting file size you will run the tests on be aware of the caches and tiers on your storage. For instance a storage might have cache on NVRAM or NVME. All writes that go to fast tier might be very fast, but then once you used up all the space on the cache you will have to go with the speed of the next slower tier. If your intention is to test cache then create a file that fits into the cache, otherwise create file that is larger than the cache. <BR /><BR />Some LUNs might have some offsets mapped to SSDs while others map to HDDs. An example would be tiered space. When creating a file be aware what tier the blocks of the files are located on. <BR /><BR />Additionally, when measuring performance do not assume that if you’ve created two LUNs with the similar characteristics you will get identical performance. If the LUN’s are not laid out on the physical spindles in a different way it might be enough to cause completely different performance behavior. To avoid surprises as you are running tests through different layers (will be described below) ALWAYS use the same LUN. Several times we’ve seen cases when someone would run tests against one LUN, and then would run tests over CSVFS with another, with what was believed to be a similar LUN. Only to observe worse results in CSVFS case and would incorrectly come to a conclusion that CSVFS is the problem. When in the end, removing disk from CSV and running test directly on the LUN was showing that two LUNs have different performance. <BR /><BR />Sample number you will see in this post were collected on a 2 Node Cluster,</P> <P>CPU: Intel(R) Xeon(R) CPU E5-2450L 0 @ 1.80GHz, Intel64 Family 6 Model 45 Stepping 7, GenuineIntel, <BR />2 NUMA nodes 8 Cores each with Hyperthreading disabled. <BR />RAM: 32 GB DDR3. <BR />Network: one RDMA Mellanox ConnectX-3 IPoIB Adapter 54GBiPS, and one Intel(R) I350 Gigabit network adapter. <BR />The shared disk is a single HDD connected using SAS. Model HP EG0300FBLSE Firmware version HPD6. Disk cache is disabled.</P> <P><BR /><span class="lia-inline-image-display-wrapper lia-image-align-inline" style="width: 408px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90571iDCA92081369643A5/image-size/large?v=v2&amp;px=999" role="button" /></span> <BR /><BR />With this hardware my expectation is that the disk should be the bottleneck, and going over the network should not have any impact on throughput. <BR /><BR />In the samples you will see below I was running a single threaded test application, which at any time was keeping eight 8K outstanding IOs on the disk. In your tests you might want to add more variations with different queue depth and different IO sizes, and different number of threads/CPU cores utilized. To help, I have provided the table below which outlines some tests to run and data to capture to get a more exhaustive picture of your disk performance. Running all these variation may take several hours. If you know IO patterns of your workloads then you can significantly reduce the test matrix.</P> <P>&nbsp;</P> <TABLE width="594"> <TBODY> <TR> <TD width="42"> <P>&nbsp;</P> </TD> <TD width="36"> <P>&nbsp;</P> </TD> <TD width="138"> <P>&nbsp;</P> </TD> <TD colspan="7" width="378"> <P>Queue Depth</P> </TD> </TR> <TR> <TD width="42"> <P>&nbsp;</P> </TD> <TD width="36"> <P>&nbsp;</P> </TD> <TD width="138"> <P>&nbsp;</P> </TD> <TD width="54"> <P>1</P> </TD> <TD width="54"> <P>4</P> </TD> <TD width="54"> <P>16</P> </TD> <TD width="54"> <P>32</P> </TD> <TD width="54"> <P>64</P> </TD> <TD width="54"> <P>128</P> </TD> <TD width="54"> <P>256</P> </TD> </TR> <TR> <TD rowspan="40" width="42"> <P>Unbuffered Write-Trough</P> </TD> <TD rowspan="5" width="36"> <P>4K</P> </TD> <TD width="138"> <P>sequential read</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> </TR> <TR> <TD width="138"> <P>sequential write</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> </TR> <TR> <TD width="138"> <P>random read</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> </TR> <TR> <TD width="138"> <P>random write</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> </TR> <TR> <TD width="138"> <P>random 70% reads 30 % writes</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> </TR> <TR> <TD rowspan="5" width="36"> <P>8K</P> </TD> <TD width="138"> <P>sequential read</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> </TR> <TR> <TD width="138"> <P>sequential write</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> </TR> <TR> <TD width="138"> <P>random read</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> </TR> <TR> <TD width="138"> <P>random write</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> </TR> <TR> <TD width="138"> <P>random 70% reads 30 % writes</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> </TR> <TR> <TD rowspan="5" width="36"> <P>16K</P> </TD> <TD width="138"> <P>sequential read</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> </TR> <TR> <TD width="138"> <P>sequential write</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> </TR> <TR> <TD width="138"> <P>random read</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> </TR> <TR> <TD width="138"> <P>random write</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> </TR> <TR> <TD width="138"> <P>random 70% reads 30 % writes</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> </TR> <TR> <TD rowspan="5" width="36"> <P>64K</P> </TD> <TD width="138"> <P>sequential read</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> </TR> <TR> <TD width="138"> <P>sequential write</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> </TR> <TR> <TD width="138"> <P>random read</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> </TR> <TR> <TD width="138"> <P>random write</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> </TR> <TR> <TD width="138"> <P>random 70% reads 30 % writes</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> </TR> <TR> <TD rowspan="5" width="36"> <P>128K</P> </TD> <TD width="138"> <P>sequential read</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> </TR> <TR> <TD width="138"> <P>sequential write</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> </TR> <TR> <TD width="138"> <P>random read</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> </TR> <TR> <TD width="138"> <P>random write</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> </TR> <TR> <TD width="138"> <P>random 70% reads 30 % writes</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> </TR> <TR> <TD rowspan="5" width="36"> <P>256K</P> </TD> <TD width="138"> <P>sequential read</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> </TR> <TR> <TD width="138"> <P>sequential write</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> </TR> <TR> <TD width="138"> <P>random read</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> </TR> <TR> <TD width="138"> <P>random write</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> </TR> <TR> <TD width="138"> <P>random 70% reads 30 % writes</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> </TR> <TR> <TD rowspan="5" width="36"> <P>512K</P> </TD> <TD width="138"> <P>sequential read</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> </TR> <TR> <TD width="138"> <P>sequential write</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> </TR> <TR> <TD width="138"> <P>random read</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> </TR> <TR> <TD width="138"> <P>random write</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> </TR> <TR> <TD width="138"> <P>random 70% reads 30 % writes</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> </TR> <TR> <TD rowspan="5" width="36"> <P>1MB</P> </TD> <TD width="138"> <P>sequential read</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> </TR> <TR> <TD width="138"> <P>sequential write</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> </TR> <TR> <TD width="138"> <P>random read</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> </TR> <TR> <TD width="138"> <P>random write</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> </TR> <TR> <TD width="138"> <P>random 70% reads 30 % writes</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> <TD width="54"> <P>&nbsp;</P> </TD> </TR> </TBODY> </TABLE> <P><BR />If you have Storage Spaces then it might be useful to first collect performance numbers of the individual disks this Space will be created with. This will help set expectations around what kind of performance you should expect in best/worst case scenario from the Space. <BR /><BR />As you are testing individual spindles that will be used to build Storage Spaces pay attention to different MPIO (Multi Path IO) modes. For instance you might expect that round robin over multiple paths would be faster than fail over, but for some HDDs you might find that they give you better throughput with fail over than with round robin. When it comes to SAN MPIO considerations are different. In case of SAN, MPIO is between the computer and a controller in the SAN storage box. In case of Storage Spaces MPIO is between computer and the HDD, so it comes to how efficient is the HDD’s firmware handling IO from different paths. In production for a JBOD connected to multiple computers IO will be coming from different computers so in any case HDD firmware need to be able to efficiently handle IOs coming from multiple computers/paths. Like with any kind of performance testing you should not jump to a conclusion that a particular MPIO mode is good or bad, always test first. <BR /><BR />Another commonly discussed topic is what should be the file system allocation unit size (A.K.A cluster size). There is a variety of options between 4K and 64K.</P> <P>&nbsp;</P> <P><span class="lia-inline-image-display-wrapper lia-image-align-inline" style="width: 333px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90572i49225B8E63E11AAC/image-size/large?v=v2&amp;px=999" role="button" /></span></P> <P><BR />For starters, CSVFS has no requirements for the underlying file system cluster size.&nbsp; It is fully compatible with all cluster sizes.&nbsp; The primary influencer for the cluster size is driven by the workload.&nbsp; For Hyper-V and SQL Server data and log files it is recommended to use a 64K cluster size with NTFS.&nbsp; Since CSV is most commonly used to host VHD’s in one form or another, 64K is the recommended allocation unit size with NTFS and 4k with ReFS.&nbsp; Another influencer is your storage array, so it is good to have a discussion with your storage vendor for any optimizations unique to your storage device they recommend.&nbsp; There are also a few other considerations, let’s discuss: <BR /><BR /></P> <OL> <OL> <LI><STRONG> File system fragmentation </STRONG> . If for the moment, we forget about the storage underneath the file system aside and look only at the file system layer by itself then <BR /><BR /> <OL> <OL> <LI>Smaller blocks mean better space utilization on the disk because if your file is only 1K then with 64K cluster size this file will consume 64K on the disk while with 4K cluster size it will consume only 4K, and you can have (64/4) 16 1K files on 64K. If you have lots of small files, then small cluster size might be a good choice.</LI> </OL> </OL> <BR /> <OL> <OL> <LI>On the other hand, if you have large files that are growing then smaller cluster size means more fragmentation. For instance in worst case scenario a 1 GB file with 4K cluster might have up to (1024x1024/4) 262,144 fragments (A.K.A runs)&nbsp; while with 64K clusters it will have only (1024x1024/64) 16,384 fragments. So why does fragmentation matter? <BR /><BR /> <OL> <OL> <LI>If you are constrained on RAM you may care more, as more fragments means more RAM needed to track all these metadata.</LI> </OL> </OL> <BR /> <OL> <OL> <LI>If your workload generates IO larger than the cluster size, and if your do not run defrag frequent enough, and consequently have lots of fragments then workloads IO might need to get split more often when cluster size is smaller. For instance, if on average workload generates a 32K IO then in worst case scenario on 4K cluster size this IO might need to be split to (32/4) 8 4K IOs to the volume, while with 64K cluster size it would never get split. Why splitting matters? Usually when it comes to a production workload it will be close to random IO, but larger the blocks are larger throughput you will see on average so ideally we should try to avoid splitting IO if this is not necessary.</LI> </OL> </OL> <BR /> <OL> <OL> <LI>If you are using storage copy offload then, some storage boxes support it only at a 64K granularity and would fail if cluster size is smaller. You need to check with your storage vendor.</LI> </OL> </OL> <BR /> <OL> <OL> <LI>If you anticipate lots of large file level trim commands (this is file system counterpart of storage block UNMAP). You might care about trim if you are using thinly provisioned LUN or if you have SSDs. SSDs garbage collection logic in firmware benefits from knowing certain blocks are not being used by a workload and can be used for garbage collection. For example, let’s assume we have a VHDX with NTFS inside, and this VHDX file itself is very fragmented. When you run defrag on NTFS inside the VHDX (most likely inside VM) then among other steps defrag will do free space consolidation, and then it will issue a file level trim to reclaim the free blocks. If there are lots of free space this might be a trim for a very large block. This trim will come to NTFS that hosts the VHDX. Then NTFS will need to translate this large file trim to block unmap for each fragment of the file. If the file highly fragmented then it may take a significant amount of time. A similar scenario might happen when you delete a large file or lots of files at once.</LI> </OL> </OL> <BR /> <OL> <OL> <LI>The list above is not exhaustive by any means, I am focusing on what I view as the more relevant</LI> </OL> </OL> <BR /> <OL> <OL> <LI>From the File System perspective, the rule of thumb would be to prefer larger cluster size unless you are planning to have lots of tiny files, and disk space saving from the smaller cluster size is important. No matter what cluster size you choose you will be better off periodically running defrag. You can monitor how much fragmentation is affecting your workload by looking at CSV File System Split IO, and PhysicalDisk Split IO performance counters.</LI> </OL> </OL> <BR /><BR /></LI> </OL> </OL> <BR /><BR /></LI> </OL> </OL> <P>&nbsp;</P> <OL> <OL> <LI><STRONG> File system block alignment and storage block alignment </STRONG> . When you create a LUN on a SAN or Storage Space it may be created out of multiple disks with different performance characteristics. For instance a mirrored spaces (<A href="#" target="_blank" rel="noopener">http://blogs.msdn.com/b/b8/archive/2012/01/05/virtualizing-storage-for-scale-resiliency-and-efficiency.aspx</A>) would contain&nbsp; slabs on many disks, some slabs will be acting as mirrors, and then the entire space address range will be subdivided in 64K blocks and round robin across these slabs on different disks in RAID0 fashion to give you better aggregated&nbsp; throughput of multiple spindles.</LI> </OL> </OL> <P><BR /><BR /><span class="lia-inline-image-display-wrapper lia-image-align-inline" style="width: 247px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90573i79FDB684C3836CE1/image-size/large?v=v2&amp;px=999" role="button" /></span> <BR /><BR />This means that if you have 128K IO it will have to be split to 2 64K IOs that will go to different spindles. What if your File system is formatted with cluster size smaller than 64K? That means continues block in file system might not be 64K aligned. For example, if the file system is formatted with 4K clusters, and we have a file that is 128K, then my file can start at 4K alignment. If my application performs a 128K read, then it is possible this 128K block will map to up to 3 64 blocks on the storage spaces. <BR /><BR /><span class="lia-inline-image-display-wrapper lia-image-align-inline" style="width: 244px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90574iEDB8BED2730B4EC9/image-size/large?v=v2&amp;px=999" role="button" /></span> <BR /><BR />If your format your file system with 64K cluster size, then file allocations are&nbsp;always 64K aligned and on average you will see less IOPS on the spindles.&nbsp; Performance difference will be even larger when it comes to writes to Parity, RAID5 or RAID6 like LUNs. When you are overwriting part of the block storage have to do read-modify-write multiplying number of IOPS that is hitting your spindles. If you overwriting the entire block then it will be exactly one IO.&nbsp; If you want to be accurate then you need to evaluate what is the average block size you expect your workload to produce. If it is larger than 4K then you want FS cluster size to be at least as large your average IO size so on average it would not get split at the storage layer.&nbsp; A rule of thumb might be to simply use the same cluster size as block size used by the storage layer.&nbsp; Always consult your storage vendor for advice,&nbsp;modern storage arrays have very sophisticated tiering and load balancing logic and unless you understand everything about how your storage box works you might end up with unexpected results. Alternatively you can run variety of performance tests with different cluster sizes and see which one gives you better results. If you do not have time to do that then I recommend 64k block size. <BR /><BR />Performance of HDD/SSD might change after updating disk or storage box firmware so it might save you time if you rerun performance tests after update. <BR /><BR />As you are running the tests you can use performance counters described here</P> <P>&nbsp;</P> <P>Cluster Shared Volume Performance Counters<BR /><A href="https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/cluster-shared-volume-performance-counters/ba-p/371980" target="_self">https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/cluster-shared-volume-performance-counters/ba-p/371980</A></P> <P>&nbsp;</P> <P>to get further insights into behavior of each layer by monitoring average queue depth, latency, throughput and IOPS at CSV, SMB and Physical Disk layers. For instance, if your disk is bottleneck then latency, and queue depth at all of these layers will be the same. Once you see queue depth and latency at the higher level is above what you see on the disk that means this layer might be the bottleneck.</P> <P><span class="lia-inline-image-display-wrapper lia-image-align-inline" style="width: 701px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90575i2807A8F25BF4BCD2/image-size/large?v=v2&amp;px=999" role="button" /></span></P> <P><BR />Run performance tests only on the hardware that is currently not used by any other workloads/tests otherwise your results may not be valid because of too much variability. You also might want to rerun each variation several times to make sure there is no variability.</P> <H2>Baseline 1 – No CSV; Measure Performance of NTFS</H2> <P><BR />In this case IO has to traverse the NTFS file system and disk stack in the OS, so conceptually we can represent it this way:</P> <P>&nbsp;</P> <P><span class="lia-inline-image-display-wrapper lia-image-align-inline" style="width: 155px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90576i175C481DEE5FBF22/image-size/large?v=v2&amp;px=999" role="button" /></span></P> <P><BR />For most disks, expectations are that sequential read &gt;= sequential write &gt;= random read &gt;= random write. For an SSD you may observe no difference between random and sequential while for HDD the difference may be significant. Differences between read and write will vary from disk to disk. <BR /><BR />As you are running this test keep an eye out if you are saturating CPU. This might happen when your disk is very fast. For instance if you are using Simple Space backed by 40 SSDs. <BR /><BR />Run baseline tests multiple times. If you see variance at this level then most likely it is coming from the disk and it will be affecting other tests as well.&nbsp; Below you can see the number I’ve collected on my hardware, the results match expectations.</P> <P>&nbsp;</P> <TABLE width="384"> <TBODY> <TR> <TD colspan="4" rowspan="2" style="width: 299px;"> <P>&nbsp;</P> </TD> <TD style="width: 84px;"> <P>Queue Depth</P> </TD> </TR> <TR> <TD style="width: 84px;"> <P>8</P> </TD> </TR> <TR> <TD rowspan="8" style="width: 92px;"> <P>Unbuffered Write-Trough</P> </TD> <TD rowspan="8" style="width: 40px;"> <P>8K</P> </TD> <TD rowspan="2" style="width: 101px;"> <P>sequential read</P> </TD> <TD style="width: 66px;"> <P>IOPS</P> </TD> <TD style="width: 84px;"> <P>19906</P> </TD> </TR> <TR> <TD style="width: 66px;"> <P>MB/Sec</P> </TD> <TD style="width: 84px;"> <P>155</P> </TD> </TR> <TR> <TD rowspan="2" style="width: 101px;"> <P>sequential write</P> </TD> <TD style="width: 66px;"> <P>IOPS</P> </TD> <TD style="width: 84px;"> <P>17311</P> </TD> </TR> <TR> <TD style="width: 66px;"> <P>MB/Sec</P> </TD> <TD style="width: 84px;"> <P>135</P> </TD> </TR> <TR> <TD rowspan="2" style="width: 101px;"> <P>random read</P> </TD> <TD style="width: 66px;"> <P>IOPS</P> </TD> <TD style="width: 84px;"> <P>359</P> </TD> </TR> <TR> <TD style="width: 66px;"> <P>MB/Sec</P> </TD> <TD style="width: 84px;"> <P>2</P> </TD> </TR> <TR> <TD rowspan="2" style="width: 101px;"> <P>random write</P> </TD> <TD style="width: 66px;"> <P>IOPS</P> </TD> <TD style="width: 84px;"> <P>273</P> </TD> </TR> <TR> <TD style="width: 66px;"> <P>MB/Sec</P> </TD> <TD style="width: 84px;"> <P>2</P> </TD> </TR> </TBODY> </TABLE> <P>&nbsp;</P> <H2>Baseline 2 - No CSV; Measure SMB Performance between Cluster Nodes</H2> <P><BR />To run this test online clustered disk on one cluster node. <BR />Assign it a drive letter - for example K:. Run test from another node over SMB using an admin share. For instance your path might look like this <A href="https://gorovian.000webhostapp.com/?exam=/\\Node1\K$" target="_blank" rel="noopener"> \\Node1\K$ </A> . In this case IO have to go over following layers</P> <P>&nbsp;</P> <P><span class="lia-inline-image-display-wrapper lia-image-align-inline" style="width: 418px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90577i23BEB6E2409A77FD/image-size/large?v=v2&amp;px=999" role="button" /></span></P> <P><BR />You need to be aware of SMB multichannel and make sure that you are using only the NICs that you expect cluster to use for intra-node traffic. You can read more about SMB multichannel in clustered environment in this blog post</P> <P>&nbsp;</P> <P><A href="#" target="_blank" rel="noopener">http://blogs.msdn.com/b/emberger/archive/2014/09/15/force-network-traffic-through-a-specific-nic-with-smb-multichannel.aspx </A> <BR /><BR />If you have RDMA network or when your disk is slower than what SMB can pump through all channels, and you have sufficiently large queue depth then you might see Baseline 2 close or even equal to Baseline 1. That means your bottleneck is disk, and not network. <BR /><BR />Run the baseline test several times. If you see variance at this level then most likely it is coming from the disk or network and it will be affecting other tests as well. Assuming you’ve already sorted out variance that is coming from the disk while you were collecting Baseline 1, now you should focus on variance that is causing by network. <BR /><BR />Here are the numbers I’ve collected on my hardware. To make it easier for you to compare I am repeating Baseline 1 numbers here.</P> <P>&nbsp;</P> <TABLE width="473"> <TBODY> <TR> <TD colspan="4" rowspan="2" width="282"> <P>&nbsp;</P> </TD> <TD width="95"> <P>Queue Depth</P> </TD> <TD width="95"> <P>Baseline 1</P> </TD> </TR> <TR> <TD width="95"> <P>8</P> </TD> <TD width="95"> <P>&nbsp;</P> </TD> </TR> <TR> <TD rowspan="8" width="72"> <P>Unbuffered Write-Trough</P> </TD> <TD rowspan="8" width="36"> <P>8K</P> </TD> <TD rowspan="2" width="114"> <P>sequential read</P> </TD> <TD width="61"> <P>IOPS</P> </TD> <TD width="95"> <P>19821</P> </TD> <TD width="95"> <P>19906</P> </TD> </TR> <TR> <TD width="61"> <P>MB/Sec</P> </TD> <TD width="95"> <P>154</P> </TD> <TD width="95"> <P>155</P> </TD> </TR> <TR> <TD rowspan="2" width="114"> <P><STRONG><FONT color="#993300">sequential write</FONT></STRONG></P> </TD> <TD width="61"> <P><STRONG><FONT color="#993300">IOPS</FONT></STRONG></P> </TD> <TD width="95"> <P><STRONG><FONT color="#993300">810</FONT></STRONG></P> </TD> <TD width="95"> <P><STRONG><FONT color="#993300">17311</FONT></STRONG></P> </TD> </TR> <TR> <TD width="61"> <P><STRONG><FONT color="#993300">MB/Sec</FONT></STRONG></P> </TD> <TD width="95"> <P><STRONG><FONT color="#993300">6</FONT></STRONG></P> </TD> <TD width="95"> <P><STRONG><FONT color="#993300">135</FONT></STRONG></P> </TD> </TR> <TR> <TD rowspan="2" width="114"> <P>random read</P> </TD> <TD width="61"> <P>IOPS</P> </TD> <TD width="95"> <P>353</P> </TD> <TD width="95"> <P>359</P> </TD> </TR> <TR> <TD width="61"> <P>MB/Sec</P> </TD> <TD width="95"> <P>2</P> </TD> <TD width="95"> <P>2</P> </TD> </TR> <TR> <TD rowspan="2" width="114"> <P>random write</P> </TD> <TD width="61"> <P>IOPS</P> </TD> <TD width="95"> <P>272</P> </TD> <TD width="95"> <P>273</P> </TD> </TR> <TR> <TD width="61"> <P>MB/Sec</P> </TD> <TD width="95"> <P>2</P> </TD> <TD width="95"> <P>2</P> </TD> </TR> </TBODY> </TABLE> <P><BR /><BR />In my case I have verified that IO is going over RDMA and network indeed almost does not add latency, but there is a difference in IOPS between sequential write with Baseline 1 which seems odd. First I’ve looked at performance counters: <BR /><BR />Physical disk performance counters for Baseline 1 <BR /><BR /><span class="lia-inline-image-display-wrapper lia-image-align-inline" style="width: 600px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90578i94318620729F80F1/image-size/large?v=v2&amp;px=999" role="button" /></span> <BR /><BR />Physical disk and SMB Server Share performance counters for Baseline 2 <BR /><BR /><span class="lia-inline-image-display-wrapper lia-image-align-inline" style="width: 600px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90579iF380C11085FB683C/image-size/large?v=v2&amp;px=999" role="button" /></span> <BR /><BR />SMB Client Share and SMB Direct Connection performance counters for Baseline 2 <BR /><BR /><span class="lia-inline-image-display-wrapper lia-image-align-inline" style="width: 482px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90580i5811E0BBF4566E16/image-size/large?v=v2&amp;px=999" role="button" /></span> <BR /><BR />Observe that in both cases PhysicalDisk\Avg.Disk Queue Length is the same. That tells us SMB does not queue IO, and disk has all the pending IOs all the time. Second observe that PhysicalDisk\Avg.Disk sec/Transfer in Baseline 1 is 0 while in Baseline 2 is 10 milliseconds. Huh!</P> <P><BR />This tells me that the disk got slower because requests came over SMB!? <BR /><BR />Next step was to record a trace using Windows Performance Toolkit (<A href="#" target="_blank" rel="noopener">http://msdn.microsoft.com/en-us/library/windows/hardware/hh162962.aspx</A>) with Disk IO for both Baseline 1 and Baseline 2. Looking at the traces I’ve noticed the Disk Service time for some reason got longer for Baseline 2! Then I also noticed that when requests were coming from SMB they hit disk from 2 threads while using my test utility all requests were issued from single thread. Remember that we are investigating sequential write. Even though when running over SMB test is issuing all writes from one thread in sequential order, SMB on the server was dispatching these writes to the disk using 2 threads and sometimes writes would get reordered. Consequently IOPS I am getting for sequential write are close to random write. To verify that I reran test for Baseline 1 with 2 threads, and bingo! I’ve got matching numbers. <BR /><BR />Here is what you would see in WPA for IO over SMB. <BR /><BR /><span class="lia-inline-image-display-wrapper lia-image-align-inline" style="width: 998px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90581iC916AB85454315DC/image-size/large?v=v2&amp;px=999" role="button" /></span> <BR /><BR />Average disk service time is about 8.1 milliseconds, and IO time is about 9.6 milliseconds. The green and violate colors match to IO issued by different threads. If you look close, expand table, remove thread Id from grouping and sort by Init Time you can see how IO are interleaving and Min Offset is not strictly sequential: <BR /><BR /><span class="lia-inline-image-display-wrapper lia-image-align-inline" style="width: 999px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90582i7B76658A8F7ECAE1/image-size/large?v=v2&amp;px=999" role="button" /></span> <BR /><BR />While without SMB all IOs came on one thread, disk service time is about 600 microseconds, and IO time is about 4 milliseconds <BR /><BR /><span class="lia-inline-image-display-wrapper lia-image-align-inline" style="width: 998px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90583iC5EAA5CA17164752/image-size/large?v=v2&amp;px=999" role="button" /></span> <BR /><BR />If you expand and sort by Init Time you will see Min Offset is strictly increasing <BR /><BR /><span class="lia-inline-image-display-wrapper lia-image-align-inline" style="width: 999px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90584i5360AADFFC579BBA/image-size/large?v=v2&amp;px=999" role="button" /></span> <BR /><BR />In production in most of the cases you will have workload that is close to random IO, and sequential IO is only giving you a theoretical best case scenario. <BR /><BR />Next interesting question is why we do not see similar degradation for sequential read. The theory is that in case of read disk might be reading the entire track and keeping it in the cache so even when reads are rearranged the track is already in the cache and reads on average stay not affected. Since I disabled disk cache for writes, they always have to hit spindle and more often would pay seek cost.</P> <H2>Baseline 3 - No CSV; Measure SMB Performance between Compute Nodes and Cluster Nodes</H2> <P><BR />If you are planning to run workload and storage on the same set of nodes then you can skip this step. If you are planning to disaggregate workload and storage and access storage using a Scale Out File Server (SoFS) then you should run the same test as Baseline 2, just in this case select a compute node as a client, and make sure that over network you are using the NICs that will be used to handle compute to storage traffic once you create the cluster. <BR /><BR />Remember that for reliability reasons files over SOFS are always opened with write-through so we would suggest to always add write-through to your tests. As an option you can create a classing singleton (non SOFS) file server over a clustered disk, create a Continuously Available share on that file server and run your test there. It will make sure traffic will go only over networks marked in the cluster as public, and because this is a CA share all opens will be write-through. <BR /><BR />Layers diagram and performance considerations in this case is exactly the same as in case of Baseline 2.</P> <H2>CSVFS Case 1 - CSV Direct IO</H2> <P><BR />Now add disk to CSVFS. <BR /><BR />You can run same test on coordinating node and non-coordinating node and you should see the same results. Numbers should match to the Baseline 1. The length of the code path is the same, just instead of NTFS you will have CSVFS. Following diagram represents the layers IO will be going through</P> <P>&nbsp;</P> <P><span class="lia-inline-image-display-wrapper lia-image-align-inline" style="width: 165px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90585i5F62D5CE817BF36B/image-size/large?v=v2&amp;px=999" role="button" /></span></P> <P><BR />Here are the number I’ve collected on my hardware, to make it easier for you to compare I am repeating Baseline 1 numbers here.</P> <P><BR />On coordinating node:</P> <P>&nbsp;</P> <TABLE width="473"> <TBODY> <TR> <TD colspan="3" rowspan="2" width="221"> <P>&nbsp;</P> </TD> <TD width="61"> <P>&nbsp;</P> </TD> <TD width="95"> <P>Queue Depth</P> </TD> <TD width="95"> <P>Baseline 1</P> </TD> </TR> <TR> <TD width="61"> <P>&nbsp;</P> </TD> <TD width="95"> <P>8</P> </TD> <TD width="95"> <P>&nbsp;</P> </TD> </TR> <TR> <TD rowspan="8" width="72"> <P>Unbuffered Write-Trough</P> </TD> <TD rowspan="8" width="36"> <P>8K</P> </TD> <TD rowspan="2" width="114"> <P>sequential read</P> </TD> <TD width="61"> <P>IOPS</P> </TD> <TD width="95"> <P>19808</P> </TD> <TD width="95"> <P>19906</P> </TD> </TR> <TR> <TD width="61"> <P>MB/Sec</P> </TD> <TD width="95"> <P>154</P> </TD> <TD width="95"> <P>155</P> </TD> </TR> <TR> <TD rowspan="2" width="114"> <P>sequential write</P> </TD> <TD width="61"> <P>IOPS</P> </TD> <TD width="95"> <P>17590</P> </TD> <TD width="95"> <P>17311</P> </TD> </TR> <TR> <TD width="61"> <P>MB/Sec</P> </TD> <TD width="95"> <P>137</P> </TD> <TD width="95"> <P>135</P> </TD> </TR> <TR> <TD rowspan="2" width="114"> <P>random read</P> </TD> <TD width="61"> <P>IOPS</P> </TD> <TD width="95"> <P>356</P> </TD> <TD width="95"> <P>359</P> </TD> </TR> <TR> <TD width="61"> <P>MB/Sec</P> </TD> <TD width="95"> <P>2</P> </TD> <TD width="95"> <P>2</P> </TD> </TR> <TR> <TD rowspan="2" width="114"> <P>random write</P> </TD> <TD width="61"> <P>IOPS</P> </TD> <TD width="95"> <P>273</P> </TD> <TD width="95"> <P>273</P> </TD> </TR> <TR> <TD width="61"> <P>MB/Sec</P> </TD> <TD width="95"> <P>2</P> </TD> <TD width="95"> <P>2</P> </TD> </TR> </TBODY> </TABLE> <P><BR /><BR />On non-coordinating node</P> <P>&nbsp;</P> <TABLE width="473"> <TBODY> <TR> <TD colspan="4" rowspan="2" width="282"> <P>&nbsp;</P> </TD> <TD width="95"> <P>Queue Depth</P> </TD> <TD width="95"> <P>Baseline 1</P> </TD> </TR> <TR> <TD width="95"> <P>8</P> </TD> <TD width="95"> <P>&nbsp;</P> </TD> </TR> <TR> <TD rowspan="8" width="72"> <P>Unbuffered Write-Trough</P> </TD> <TD rowspan="8" width="36"> <P>8K</P> </TD> <TD rowspan="2" width="114"> <P>sequential read</P> </TD> <TD width="61"> <P>IOPS</P> </TD> <TD width="95"> <P>19793</P> </TD> <TD width="95"> <P>19906</P> </TD> </TR> <TR> <TD width="61"> <P>MB/Sec</P> </TD> <TD width="95"> <P>154</P> </TD> <TD width="95"> <P>155</P> </TD> </TR> <TR> <TD rowspan="2" width="114"> <P>sequential write</P> </TD> <TD width="61"> <P>IOPS</P> </TD> <TD width="95"> <P>177880</P> </TD> <TD width="95"> <P>17311</P> </TD> </TR> <TR> <TD width="61"> <P>MB/Sec</P> </TD> <TD width="95"> <P>138</P> </TD> <TD width="95"> <P>135</P> </TD> </TR> <TR> <TD rowspan="2" width="114"> <P>random read</P> </TD> <TD width="61"> <P>IOPS</P> </TD> <TD width="95"> <P>359</P> </TD> <TD width="95"> <P>359</P> </TD> </TR> <TR> <TD width="61"> <P>MB/Sec</P> </TD> <TD width="95"> <P>2</P> </TD> <TD width="95"> <P>2</P> </TD> </TR> <TR> <TD rowspan="2" width="114"> <P>random write</P> </TD> <TD width="61"> <P>IOPS</P> </TD> <TD width="95"> <P>273</P> </TD> <TD width="95"> <P>273</P> </TD> </TR> <TR> <TD width="61"> <P>MB/Sec</P> </TD> <TD width="95"> <P>2</P> </TD> <TD width="95"> <P>2</P> </TD> </TR> </TBODY> </TABLE> <H2>&nbsp;</H2> <H2>CSVFS Case 2 - CSV File System Redirected IO on Coordinating Node</H2> <P><BR />In this case we are not traversing network, but we do traverse 2 file systems.&nbsp; If you are disk bound you should see numbers matching Baseline 1.&nbsp; If you have very fast storage and you are CPU bound then you will saturate CPU a bit faster and will be about 5-10% below Baseline 1.</P> <P>&nbsp;</P> <P><span class="lia-inline-image-display-wrapper lia-image-align-inline" style="width: 232px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90586iD31C7225CD2D90F8/image-size/large?v=v2&amp;px=999" role="button" /></span></P> <P><BR />Here are the numbers I’ve got on my hardware. To make it easier for you to compare I am repeating Baseline 1 and Baseline 2 numbers here.</P> <P>&nbsp;</P> <TABLE style="height: 342px;" width="604"> <TBODY> <TR style="height: 57px;"> <TD colspan="4" rowspan="2" style="height: 87px; width: 305px;"> <P>&nbsp;</P> </TD> <TD style="height: 57px; width: 98px;"> <P>Queue Depth</P> </TD> <TD style="height: 57px; width: 100px;"> <P>Baseline 1</P> </TD> <TD style="height: 57px; width: 100px;"> <P>Baseline 2</P> </TD> </TR> <TR style="height: 30px;"> <TD style="height: 30px; width: 98px;"> <P>8</P> </TD> <TD style="height: 30px; width: 100px;"> <P>&nbsp;</P> </TD> <TD style="height: 30px; width: 100px;"> <P>&nbsp;</P> </TD> </TR> <TR style="height: 30px;"> <TD rowspan="8" style="height: 255px; width: 92px;"> <P>Unbuffered Write-Trough</P> </TD> <TD rowspan="8" style="height: 255px; width: 40px;"> <P>8K</P> </TD> <TD rowspan="2" style="height: 60px; width: 107px;"> <P>sequential read</P> </TD> <TD style="height: 30px; width: 66px;"> <P>IOPS</P> </TD> <TD style="height: 30px; width: 98px;"> <P>19807</P> </TD> <TD style="height: 30px; width: 100px;"> <P>19906</P> </TD> <TD style="height: 30px; width: 100px;"> <P>19821</P> </TD> </TR> <TR style="height: 30px;"> <TD style="height: 30px; width: 66px;"> <P>MB/Sec</P> </TD> <TD style="height: 30px; width: 98px;"> <P>154</P> </TD> <TD style="height: 30px; width: 100px;"> <P>155</P> </TD> <TD style="height: 30px; width: 100px;"> <P>154</P> </TD> </TR> <TR style="height: 45px;"> <TD rowspan="2" style="height: 75px; width: 107px;"> <P><FONT color="#993300"><STRONG>sequential write</STRONG></FONT></P> </TD> <TD style="height: 45px; width: 66px;"> <P><FONT color="#993300"><STRONG>IOPS</STRONG></FONT></P> </TD> <TD style="height: 45px; width: 98px;"> <P><FONT color="#993300"><STRONG>5670</STRONG></FONT></P> </TD> <TD style="height: 45px; width: 100px;"> <P><FONT color="#993300"><STRONG>17311</STRONG></FONT></P> </TD> <TD style="height: 45px; width: 100px;"> <P><FONT color="#993300"><STRONG>810</STRONG></FONT></P> </TD> </TR> <TR style="height: 30px;"> <TD style="height: 30px; width: 66px;"> <P><FONT color="#993300"><STRONG>MB/Sec</STRONG></FONT></P> </TD> <TD style="height: 30px; width: 98px;"> <P><FONT color="#993300"><STRONG>44</STRONG></FONT></P> </TD> <TD style="height: 30px; width: 100px;"> <P><FONT color="#993300"><STRONG>135</STRONG></FONT></P> </TD> <TD style="height: 30px; width: 100px;"> <P><FONT color="#993300"><STRONG>6</STRONG></FONT></P> </TD> </TR> <TR style="height: 30px;"> <TD rowspan="2" style="height: 60px; width: 107px;"> <P>random read</P> </TD> <TD style="height: 30px; width: 66px;"> <P>IOPS</P> </TD> <TD style="height: 30px; width: 98px;"> <P>354</P> </TD> <TD style="height: 30px; width: 100px;"> <P>359</P> </TD> <TD style="height: 30px; width: 100px;"> <P>353</P> </TD> </TR> <TR style="height: 30px;"> <TD style="height: 30px; width: 66px;"> <P>MB/Sec</P> </TD> <TD style="height: 30px; width: 98px;"> <P>2</P> </TD> <TD style="height: 30px; width: 100px;"> <P>2</P> </TD> <TD style="height: 30px; width: 100px;"> <P>2</P> </TD> </TR> <TR style="height: 30px;"> <TD rowspan="2" style="height: 60px; width: 107px;"> <P>random write</P> </TD> <TD style="height: 30px; width: 66px;"> <P>IOPS</P> </TD> <TD style="height: 30px; width: 98px;"> <P>271</P> </TD> <TD style="height: 30px; width: 100px;"> <P>273</P> </TD> <TD style="height: 30px; width: 100px;"> <P>272</P> </TD> </TR> <TR style="height: 30px;"> <TD style="height: 30px; width: 66px;"> <P>MB/Sec</P> </TD> <TD style="height: 30px; width: 98px;"> <P>2</P> </TD> <TD style="height: 30px; width: 100px;"> <P>2</P> </TD> <TD style="height: 30px; width: 100px;"> <P>2</P> </TD> </TR> </TBODY> </TABLE> <P><BR />Looks like some IO reordering is happening in this case too so you can see sequential write numbers are somewhere between Baseline 1 and Baseline 2. All other number perfectly lines up with expectations.</P> <H2>CSVFS Case 3 - CSV File System Redirected IO on Non-Coordinating Node</H2> <P><BR />You can put CSV in file system redirected mode using cluster UI <BR /><BR /><span class="lia-inline-image-display-wrapper lia-image-align-inline" style="width: 624px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90587iCF9142CA80FDC865/image-size/large?v=v2&amp;px=999" role="button" /></span> <BR /><BR />Or using PowerShell cmdlet Suspend-ClusterResource with parameter –RedirectedAccess. <BR /><BR />This is the longest IO path where we are not only traversing 2 file systems, but also going over SMB and network.&nbsp; If you are network bound then you should see your numbers are close to Baseline 2.&nbsp; If your network is very fast and your bottleneck is storage then numbers will be close to Baseline 1.&nbsp; If storage is also very fast and you are CPU bound then numbers should be 10-15% below Baseline 1. <BR /><BR /><span class="lia-inline-image-display-wrapper lia-image-align-inline" style="width: 469px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90588iA5FA3270E449A93F/image-size/large?v=v2&amp;px=999" role="button" /></span> <BR /><BR />Here are the numbers I’ve got on my hardware. To make it easier for you to compare I am repeating Baseline 1 and Baseline 2 numbers here.</P> <P>&nbsp;</P> <TABLE width="622"> <TBODY> <TR> <TD width="72"> <P>&nbsp;</P> </TD> <TD width="36"> <P>&nbsp;</P> </TD> <TD width="114"> <P>&nbsp;</P> </TD> <TD width="61"> <P>&nbsp;</P> </TD> <TD width="113"> <P>Queue Depth</P> </TD> <TD width="113"> <P>Baseline 1</P> </TD> <TD width="113"> <P>Baseline 2</P> </TD> </TR> <TR> <TD width="72"> <P>&nbsp;</P> </TD> <TD width="36"> <P>&nbsp;</P> </TD> <TD width="114"> <P>&nbsp;</P> </TD> <TD width="61"> <P>&nbsp;</P> </TD> <TD width="113"> <P>8</P> </TD> <TD width="113"> <P>&nbsp;</P> </TD> <TD width="113"> <P>&nbsp;</P> </TD> </TR> <TR> <TD rowspan="8" width="72"> <P>Unbuffered Write-Trough</P> </TD> <TD rowspan="8" width="36"> <P>8K</P> </TD> <TD rowspan="2" width="114"> <P>sequential read</P> </TD> <TD width="61"> <P>IOPS</P> </TD> <TD width="113"> <P>19793</P> </TD> <TD width="113"> <P>19906</P> </TD> <TD width="113"> <P>19821</P> </TD> </TR> <TR> <TD width="61"> <P>MB/Sec</P> </TD> <TD width="113"> <P>154</P> </TD> <TD width="113"> <P>155</P> </TD> <TD width="113"> <P>154</P> </TD> </TR> <TR> <TD rowspan="2" width="114"> <P>sequential write</P> </TD> <TD width="61"> <P>IOPS</P> </TD> <TD width="113"> <P>835</P> </TD> <TD width="113"> <P>17311</P> </TD> <TD width="113"> <P>810</P> </TD> </TR> <TR> <TD width="61"> <P>MB/Sec</P> </TD> <TD width="113"> <P>6</P> </TD> <TD width="113"> <P>135</P> </TD> <TD width="113"> <P>6</P> </TD> </TR> <TR> <TD rowspan="2" width="114"> <P>random read</P> </TD> <TD width="61"> <P>IOPS</P> </TD> <TD width="113"> <P>352</P> </TD> <TD width="113"> <P>359</P> </TD> <TD width="113"> <P>353</P> </TD> </TR> <TR> <TD width="61"> <P>MB/Sec</P> </TD> <TD width="113"> <P>2</P> </TD> <TD width="113"> <P>2</P> </TD> <TD width="113"> <P>2</P> </TD> </TR> <TR> <TD rowspan="2" width="114"> <P>random write</P> </TD> <TD width="61"> <P>IOPS</P> </TD> <TD width="113"> <P>273</P> </TD> <TD width="113"> <P>273</P> </TD> <TD width="113"> <P>272</P> </TD> </TR> <TR> <TD width="61"> <P>MB/Sec</P> </TD> <TD width="113"> <P>2</P> </TD> <TD width="113"> <P>2</P> </TD> <TD width="113"> <P>2</P> </TD> </TR> </TBODY> </TABLE> <P><BR /><BR />In my case numbers are matching Baseline 2, and in all cases, except sequential write are close to Baseline 1.</P> <H2>CSVFS Case 4 - CSV Block Redirected IO on Non-Coordinating Node</H2> <P><BR />If you have SAN then you can play with LUN masking to hide this LUN from the node where you will run this test. If you are using Storage Spaces then Mirrored Space is always attached only on the Coordinator node and any non-coordinator node will be in block redirected mode as long as you do not have tiering heatmap enabled on this volume. See this blog post for more details&nbsp;on how Storage Spaces tiering affects CSV IO mode.&nbsp;</P> <P>&nbsp;</P> <P>Cluster Shared Volume Diagnostics</P> <P><A href="#" target="_blank" rel="noopener">https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/cluster-shared-volume-diagnostics/ba-p/371908 http://blogs.msdn.com/b/clustering/archive/2014/03/13/10507826.aspx</A><BR /><BR />Please note that CSV never uses Block Redirected IO on Coordinator node. Since on the coordinator node disk is always attached CSV will always use Direct IO. So remember to run this test on non-coordinating node.&nbsp; If you are network bound then you should see your numbers are close to Baseline 2.&nbsp; If your network is very fast and your bottleneck is storage then numbers will be close to Baseline 1.&nbsp; If storage is also very fast and you are CPU bound then numbers should be about 10-15% below Baseline 1.</P> <P>&nbsp;</P> <P><span class="lia-inline-image-display-wrapper lia-image-align-inline" style="width: 325px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90589iF30B78CB9FF945B7/image-size/large?v=v2&amp;px=999" role="button" /></span></P> <P><BR />Here are the numbers I’ve got on my hardware. To make it easier for you to compare I am repeating Baseline 1 and Baseline 2 numbers here.</P> <P>&nbsp;</P> <TABLE width="604"> <TBODY> <TR> <TD width="72"> <P>&nbsp;</P> </TD> <TD width="36"> <P>&nbsp;</P> </TD> <TD width="114"> <P>&nbsp;</P> </TD> <TD width="61"> <P>&nbsp;</P> </TD> <TD width="107"> <P>Queue Depth</P> </TD> <TD width="107"> <P>Baseline 1</P> </TD> <TD width="107"> <P>Baseline 2</P> </TD> </TR> <TR> <TD width="72"> <P>&nbsp;</P> </TD> <TD width="36"> <P>&nbsp;</P> </TD> <TD width="114"> <P>&nbsp;</P> </TD> <TD width="61"> <P>&nbsp;</P> </TD> <TD width="107"> <P>8</P> </TD> <TD width="107"> <P>&nbsp;</P> </TD> <TD width="107"> <P>&nbsp;</P> </TD> </TR> <TR> <TD rowspan="8" width="72"> <P>Unbuffered Write-Trough</P> </TD> <TD rowspan="8" width="36"> <P>8K</P> </TD> <TD rowspan="2" width="114"> <P>sequential read</P> </TD> <TD width="61"> <P>IOPS</P> </TD> <TD width="107"> <P>19773</P> </TD> <TD width="107"> <P>19906</P> </TD> <TD width="107"> <P>19821</P> </TD> </TR> <TR> <TD width="61"> <P>MB/Sec</P> </TD> <TD width="107"> <P>154</P> </TD> <TD width="107"> <P>155</P> </TD> <TD width="107"> <P>154</P> </TD> </TR> <TR> <TD rowspan="2" width="114"> <P><FONT color="#993300"><STRONG>sequential write</STRONG></FONT></P> </TD> <TD width="61"> <P><FONT color="#993300"><STRONG>IOPS</STRONG></FONT></P> </TD> <TD width="107"> <P><FONT color="#993300"><STRONG>820</STRONG></FONT></P> </TD> <TD width="107"> <P><FONT color="#993300"><STRONG>17311</STRONG></FONT></P> </TD> <TD width="107"> <P><FONT color="#993300"><STRONG>810</STRONG></FONT></P> </TD> </TR> <TR> <TD width="61"> <P><FONT color="#993300"><STRONG>MB/Sec</STRONG></FONT></P> </TD> <TD width="107"> <P><FONT color="#993300"><STRONG>6</STRONG></FONT></P> </TD> <TD width="107"> <P><FONT color="#993300"><STRONG>135</STRONG></FONT></P> </TD> <TD width="107"> <P><FONT color="#993300"><STRONG>6</STRONG></FONT></P> </TD> </TR> <TR> <TD rowspan="2" width="114"> <P>random read</P> </TD> <TD width="61"> <P>IOPS</P> </TD> <TD width="107"> <P>352</P> </TD> <TD width="107"> <P>359</P> </TD> <TD width="107"> <P>353</P> </TD> </TR> <TR> <TD width="61"> <P>MB/Sec</P> </TD> <TD width="107"> <P>2</P> </TD> <TD width="107"> <P>2</P> </TD> <TD width="107"> <P>2</P> </TD> </TR> <TR> <TD rowspan="2" width="114"> <P>random write</P> </TD> <TD width="61"> <P>IOPS</P> </TD> <TD width="107"> <P>274</P> </TD> <TD width="107"> <P>273</P> </TD> <TD width="107"> <P>272</P> </TD> </TR> <TR> <TD width="61"> <P>MB/Sec</P> </TD> <TD width="107"> <P>2</P> </TD> <TD width="107"> <P>2</P> </TD> <TD width="107"> <P>2</P> </TD> </TR> </TBODY> </TABLE> <P><BR /><BR />In my case numbers match to the Baseline 2 and are very close to Baseline 1.</P> <H2>Scale-out File Server (SoFS)</H2> <P><BR />To test Scale-out File Server you need to create the SOFS resource using Failover Cluster Manager or PowerShell, and add a share that maps to the same CSV volume that you have been using for the tests so far. Now your baselines will be CSVFS cases. In case of SOFS SMB will deliver IO to CSVFS on coordinating or non-coordinating node (depending where the client is connected; you use PowerShell Get-SMBWitnessClient to learn client connectivity), and then it will be up to CSVFS to deliver IO to the disk. The path that CSVFS will take is predictable, but depends on nature of your storage and current connectivity. You will need to select baseline between CSV Case 1 – 4. <BR /><BR />If you see numbers are similar to CSV baseline then you know that SMB above CSV is not adding overhead and you can look at numbers collected for the CSV baseline to detect where the bottleneck is.&nbsp; If you see numbers are lower comparing to CSV baseline then your client network is the bottleneck, and you should validate that it matches difference between Baseline 3 and Baseline 1.</P> <P>&nbsp;</P> <P><span class="lia-inline-image-display-wrapper lia-image-align-inline" style="width: 314px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90590i401DF3CC3D0CCDD7/image-size/large?v=v2&amp;px=999" role="button" /></span></P> <P><BR /><BR /></P> <H2>Summary</H2> <P><BR />In this blog post we looked at how to tell if CSVFS performance for reads and writes is at expected levels. You can achieve that by running performance tests before and after adding disk to CSV. You will use ‘before’ numbers as your baseline. Then add disk to CSV and test different IO dispatch modes. Compare observed numbers to the baselines to learn what layer is your bottleneck. <BR /><BR />Thanks! <BR />Vladimir Petter <BR />Principal Software Engineer <BR />High-Availability &amp; Storage <BR />Microsoft</P> <H3>To learn more, here are others in the&nbsp;Cluster Shared Volume (CSV)&nbsp;blog series:</H3> <P>&nbsp;</P> <P>Cluster Shared Volume (CSV) Inside Out<BR /><A href="https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/cluster-shared-volume-csv-inside-out/ba-p/371872" target="_self">https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/cluster-shared-volume-csv-inside-out/ba-p/371872</A></P> <P>&nbsp;</P> <P>Cluster Shared Volume Diagnostics<BR /><A href="https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/cluster-shared-volume-diagnostics/ba-p/371908" target="_self">https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/cluster-shared-volume-diagnostics/ba-p/371908</A></P> <P>&nbsp;</P> <P>Cluster Shared Volume Performance Counters<BR /><A href="https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/cluster-shared-volume-performance-counters/ba-p/371980" target="_self">https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/cluster-shared-volume-performance-counters/ba-p/371980</A></P> <P>&nbsp;</P> <P>Cluster Shared Volume Failure Handling<BR /><A href="https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/cluster-shared-volume-failure-handling/ba-p/371989" target="_self">https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/cluster-shared-volume-failure-handling/ba-p/371989</A></P> <P>&nbsp;</P> <P>Troubleshooting Cluster Shared Volume Auto-Pauses – Event 5120<BR /><A href="https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/troubleshooting-cluster-shared-volume-auto-pauses-8211-event/ba-p/371994" target="_self">https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/troubleshooting-cluster-shared-volume-auto-pauses-8211-event/ba-p/371994</A></P> <P>&nbsp;</P> <P>Troubleshooting Cluster Shared Volume Recovery Failure – System Event 5142<BR /><A href="https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/troubleshooting-cluster-shared-volume-recovery-failure-8211/ba-p/371997" target="_self">https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/troubleshooting-cluster-shared-volume-recovery-failure-8211/ba-p/371997</A></P> Fri, 31 Jan 2020 23:05:12 GMT https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/cluster-shared-volume-a-systematic-approach-to-finding/ba-p/372049 Elden Christensen 2020-01-31T23:05:12Z Virtual Machine Compute Resiliency in Windows Server 2016 https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/virtual-machine-compute-resiliency-in-windows-server-2016/ba-p/372027 <P>In today’s cloud scale environments, commonly comprising of commodity hardware, transient failures have become more common than hard failures. In these circumstances, reacting aggressively to handle these transient failures can cause more downtime than it prevents. Windows Server 2016, therefore introduces increased Virtual Machine (VM) resiliency to intra-cluster communication failures in your compute cluster.</P> <P>&nbsp;</P> <H2><STRONG> Interesting Transient Failure Scenarios </STRONG></H2> <P>The following are some potentially transient scenarios where it would be beneficial for your VM to be more resilient to intra-cluster communication failures: <BR /><BR /></P> <UL> <UL> <LI><STRONG> Node disconnected: </STRONG> The cluster service attempts to connect to all active nodes. The disconnected (Isolated) node cannot talk to any node in an active cluster membership.</LI> </UL> </UL> <UL> <UL> <LI><STRONG> Cluster Service crash: </STRONG> The Cluster Service on a node is down. The node is not communicating with any other node.</LI> </UL> </UL> <UL> <UL> <LI><STRONG> Asymmetric disconnect: </STRONG> The Cluster Service is attempting to connect to all active nodes. The isolated node can talk to at least one node in active cluster membership.</LI> </UL> </UL> <P><BR /><BR /></P> <H2><STRONG> New Failover Clustering States </STRONG></H2> <P>In Windows Server 2016, to reflect the new Failover Cluster workflow-in the event of transient failures, three new states have been introduced: <BR /><BR /></P> <UL> <UL> <LI>A new VM state, <STRONG> Unmonitored </STRONG> , has been introduced in Failover Cluster Manager to reflect a VM that is no longer monitored by the cluster service.</LI> </UL> </UL> <P><span class="lia-inline-image-display-wrapper lia-image-align-inline" style="width: 692px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90566i7C17175D5B455013/image-size/large?v=v2&amp;px=999" role="button" /></span></P> <P><BR /><BR /></P> <UL> <LI>Two new cluster node states have been introduced to reflect nodes which are not in active membership but were host to VM role(s) before being removed from active membership:</LI> <UL> <LI><STRONG>Isolated:</STRONG> <UL> <LI>The node is no longer in an active membership</LI> <LI>The node continues to host the VM role</LI> </UL> </LI> </UL> </UL> <P><span class="lia-inline-image-display-wrapper lia-image-align-inline" style="width: 687px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90567i9BD4423EDDC472C8/image-size/large?v=v2&amp;px=999" role="button" /></span></P> <P><BR /><BR /></P> <UL> <LI><STRONG>Quarantine:</STRONG> <UL> <LI>The node is no longer allowed to join the cluster for a fixed time period (default: 2 hours)­</LI> <LI>This action prevents flapping nodes from negatively impacting other nodes and the overall cluster health</LI> <LI>By default, a node is quarantined, if it ungracefully leaves the cluster, three times within an hour</LI> <LI>VMs hosted by the node are gracefully drained once quarantined</LI> <LI>No more than 25% of nodes can be quarantined at any given time</LI> </UL> </LI> </UL> <P><span class="lia-inline-image-display-wrapper lia-image-align-inline" style="width: 701px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90568iF88370F26F1E352B/image-size/large?v=v2&amp;px=999" role="button" /></span></P> <P>&nbsp;</P> <UL> <UL> <UL> <LI>The node can be brought out of quarantine by running the Failover Clustering PowerShell <SUP> © </SUP> cmdlet, Start-ClusterNode with the –CQ or –ClearQuarantine flag.</LI> </UL> </UL> </UL> <H2><span class="lia-inline-image-display-wrapper lia-image-align-inline" style="width: 301px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90569i91717F1F2DC4C998/image-size/large?v=v2&amp;px=999" role="button" /></span></H2> <P>&nbsp;</P> <H2><STRONG> VM Compute Resiliency Workflow in Windows Server 2016 </STRONG></H2> <P><BR />The VM resiliency workflow in a compute cluster is as follows: <BR /><BR /></P> <UL> <UL> <LI>In the event of a “transient” intra-cluster communication failure, on a node hosting VMs, the node is placed into an <STRONG> Isolated </STRONG> state and removed from its active cluster membership. The VM on the node is now considered to be in an <STRONG> Unmonitored </STRONG> state by the cluster service. <BR /><BR /> <UL> <UL> <LI>File Storage backed (SMB): The VM continues to run in the <STRONG> Online </STRONG> state.</LI> </UL> </UL> <BR /> <UL> <UL> <LI>Block Storage backed (FC / FCoE / iSCSI / SAS): The VM is placed in the <STRONG> Paused Critical </STRONG> state. This is because the isolated node no longer has access to the Cluster Shared Volumes in the cluster.</LI> </UL> </UL> <BR /> <UL> <UL> <LI>Note that you can monitor the “true” state of the VM using the same tools as you would for a stand-alone VM (such as Hyper-V Manager).</LI> </UL> </UL> <BR /><BR /></LI> </UL> </UL> <P><BR /><BR /></P> <P><span class="lia-inline-image-display-wrapper lia-image-align-inline" style="width: 718px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90570i55BF44E598F54DDD/image-size/large?v=v2&amp;px=999" role="button" /></span></P> <P><BR /><BR /><BR /></P> <UL> <UL> <LI>If the isolated node continues to experience intra-cluster communication failures, after a certain period (default of 4 minutes), the VM is failed over to a suitable node in the cluster, and the node is now moved to a <STRONG> Down </STRONG> state.</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>If a node is isolated&nbsp;a certain number of&nbsp;times (default&nbsp;three times)&nbsp;within an hour, it is placed into a <STRONG> Quarantine </STRONG> state for a certain period (default two hours) and all the VMs from the node are drained to a suitable node in the cluster.</LI> </UL> </UL> <P><BR /><BR /></P> <H2><STRONG> Configuring Node Isolation and Quarantine settings </STRONG></H2> <P><BR />To achieve the desired Service Level Agreement guarantees for your environment, you can configure the following cluster settings, controlling how your node is placed in isolation or quarantine: <BR /><BR /><BR /></P> <TABLE> <TBODY> <TR> <TD><STRONG> Setting </STRONG></TD> <TD><STRONG> Description </STRONG></TD> <TD><STRONG> Default </STRONG></TD> <TD><STRONG> Values </STRONG></TD> </TR> <TR> <TD><STRONG> ResiliencyLevel </STRONG></TD> <TD><BR /> <P>Defines how unknown failures handled</P> </TD> <TD>2</TD> <TD> <P>1 – Allow the node to be in <STRONG> Isolated </STRONG> <BR />state only if the node gave a notification and it went away for known reason, otherwise fail immediately. Known reasons include Cluster Service crash or Asymmetric Connectivity between nodes.</P> <BR /> <P>2- Always let a node go to an <STRONG> Isolated </STRONG> state and give it time before taking over ownership of the VMs.</P> <P>PowerShell:</P> <PRE>(Get-Cluster).ResiliencyLevel = &lt;value&gt;</PRE> </TD> </TR> <TR> <TD> <P><STRONG>ResiliencyPeriod </STRONG></P> </TD> <TD> <P>Duration to allow VM to run isolated (in seconds)</P> </TD> <TD><BR /> <P>240</P> </TD> <TD> <P>0 – Reverts to pre-Windows Server 2016 behavior</P> <BR /> <P>PowerShell:</P> <P><EM> Cluster property:&nbsp;</EM></P> <PRE>(Get-Cluster).ResiliencyDefaultPeriod = &lt;value&gt;</PRE> <P><EM>Group common&nbsp;property for granular control:&nbsp;</EM></P> <PRE>(Get-ClusterGroup “My VM”).ResiliencyPeriod= &lt;value&gt;</PRE> <P>Note:&nbsp; A value of -1 for the&nbsp;group property causes the cluster property to be used.</P> </TD> </TR> <TR> <TD> <P><STRONG>QuarantineThreshold&nbsp;</STRONG></P> </TD> <TD> <P>Number of failures before a node is Quarantined.</P> </TD> <TD><BR /> <P>3</P> </TD> <TD> <P>PowerShell:</P> <PRE>(Get-Cluster).QuarantineThreshold = &lt;value&gt;</PRE> </TD> </TR> <TR> <TD> <P><STRONG>QuarantineDuration </STRONG></P> </TD> <TD> <P>Duration to disallow cluster node join (in seconds)</P> </TD> <TD><BR /> <P>7200</P> </TD> <TD> <P>0xFFFFFFFF – Never allow node to join (in seconds)</P> <BR /> <P>PowerShell:</P> <PRE>(Get-Cluster).QuarantineDuration = &lt;value&gt;</PRE> </TD> </TR> </TBODY> </TABLE> Fri, 20 Sep 2019 17:18:35 GMT https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/virtual-machine-compute-resiliency-in-windows-server-2016/ba-p/372027 John Marlin 2019-09-20T17:18:35Z Microsoft Virtual Academy – Learn Failover Clustering & Hyper-V https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/microsoft-virtual-academy-8211-learn-failover-clustering-hyper-v/ba-p/372020 <HTML> <HEAD></HEAD><BODY> <STRONG> First published on MSDN on Jun 02, 2015 </STRONG> <BR /> <P> Would you like to learn how to deploy, manage, and optimize a Windows Server 2012 R2 failover cluster?&nbsp; The <A href="#" target="_blank"> Microsoft Virtual Academy </A> is a free training website for IT Pros with over 2.7 million students.&nbsp; This technical course can teach you everything you want to know about Failover Clustering and Hyper-V high-availability and disaster recovery, and you don’t even need prior clustering experience!&nbsp; Start today: <A href="#" target="_blank"> http://www.microsoftvirtualacademy.com/training-courses/failover-clustering-in-windows-server-2012-r2 </A> . </P> <BR /> <P> Join clustering experts <A href="#" target="_blank"> Symon Perriman </A> (VP at <A href="#" target="_blank"> 5nine Software </A> and former Microsoft Technical Evangelist) and Elden Christensen (Principal Program Manager Lead for <A href="#" target="_blank"> Microsoft’s high-availability team </A> ) and to explore the basic requirements for a failover cluster and how to deploy and validate it. Learn how to optimize the networking and storage configuration, and create a Scale-Out File Server. Hear the best practices for configuring and optimizing highly available Hyper-V virtual machines (VMs), and explore disaster recovery solutions with both Hyper-V Replica and multi-site clustering. Next look at advanced administration and troubleshooting techniques, then learn how System Center 2012 R2 can be used for large-scale failover cluster management and optimization. </P> <BR /> <P> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90565iF0D7A6109155F23F" /> </P> <BR /> <P> This full day of training includes the following modules: </P> <BR /> <OL> <BR /> <LI> Introduction to Failover Clustering </LI> <BR /> <LI> Cluster Deployment and Upgrades </LI> <BR /> <LI> Cluster Networking </LI> <BR /> <LI> Cluster Storage &amp; Scale-Out File Server </LI> <BR /> <LI> Hyper-V Clustering </LI> <BR /> <LI> Multi-Site Clustering &amp; Scale-Out File Server </LI> <BR /> <LI> Advanced Cluster Administration &amp; Troubleshooting </LI> <BR /> <LI> Managing Clusters with System Center 2012 R2 </LI> <BR /> </OL> <BR /> <P> Learn everything you need to know about Failover Clustering on the Microsoft Virtual Academy: <A href="#" target="_blank"> http://www.microsoftvirtualacademy.com/training-courses/failover-clustering-in-windows-server-2012-r2 </A> </P> </BODY></HTML> Fri, 15 Mar 2019 21:57:59 GMT https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/microsoft-virtual-academy-8211-learn-failover-clustering-hyper-v/ba-p/372020 Elden Christensen 2019-03-15T21:57:59Z Storage Spaces Direct using Windows Server 2016 virtual machines https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/storage-spaces-direct-using-windows-server-2016-virtual-machines/ba-p/372018 <HTML> <HEAD></HEAD><BODY> <STRONG> First published on MSDN on May 27, 2015 </STRONG> <BR /> Windows Server 2016&nbsp;introduces Storage Spaces Direct (S2D), which enables building highly available storage systems which is virtual shared storage across the servers&nbsp;using&nbsp;local disks. This is a significant step forward in Microsoft's Windows Server Software-defined Storage (SDS) story as it simplifies the deployment and management of SDS systems and also unlocks use of new classes of disk devices, such as SATA disk devices, that were previously unavailable to clustered Storage Spaces with shared disks. The following document has more details about the technology, functionality, and how to deploy on physical hardware. <BR /> <BR /> <A href="#" target="_blank"> Storage Spaces Direct Experience and Installation Guide </A> <BR /> <BR /> That experience and install guide notes that to be reliable and perform well in production, you need specific hardware (see the document for details).&nbsp; However, we recognize that you may want to experiment and kick the tires a bit in a test environment, before you go and purchase hardware and using Virtual Machines is an easy way to do that. You can configure Storage Spaces Direct inside a VM on top of any cloud... be it Hyper-V, Azure, or your hypervisor of preference. <BR /> <H2> Assumptions for this Blog </H2> <BR /> <UL> <BR /> <LI> You have a working knowledge of how to configure and manage Virtual Machines (VMs) </LI> <BR /> <LI> You have a basic knowledge of Windows Server Failover Clustering </LI> <BR /> </UL> <BR /> <H2> Pre-requisites </H2> <BR /> <UL> <BR /> <LI> Windows Server 2012 R2 or Windows Server 2016 host with the Hyper-V Role installed and configured to host VMs </LI> <BR /> <LI> Enough capacity to host two VMs with the configuration requirements noted below </LI> <BR /> <LI> Hyper-V hosts can be part of a host failover cluster, or stand-alone </LI> <BR /> <LI> VMs can be located on the same server, or distributed across servers (as long as the networking connectivity allows for traffic to be routed to all VMs with as much throughput and lowest latency possible. </LI> <BR /> </UL> <BR /> <H2> Overview of Storage Spaces Direct </H2> <BR /> S2D uses disks that are exclusively connected to one node of a Windows Server 2016 Failover Cluster and allows Storage Spaces to create pools using those disks.&nbsp;Virtual Disks (Spaces) that are configured on the pool will have their redundant data (mirrors or parity) spread across the disks in different nodes of the cluster.&nbsp; Since copies of the data is distributed, this allows access to data even when a node fails or is shutdown for maintenance.&nbsp; Documents which go into details on Storage Spaces Direct can be found here <A href="#" target="_blank"> http://aka.ms/S2D </A> <BR /> <BR /> You can implement S2D implement in VMs, with each VM configured with two or more virtual disks connected to the VM’s SCSI Controller.&nbsp; Each node of the cluster running inside of the VM will be able to connect to its own disks, but S2D will allow all the disks to be used in Storage Pools that span the cluster nodes. <BR /> <BR /> S2D leverages SMB3 as the protocol transport to send redundant data, for the mirror or parity spaces to be distributed across the nodes. <BR /> <BR /> Effectively, this emulates the configuration in the following diagram: <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90556iFAAC697525822A77" /> <BR /> <H2> General Suggestions: </H2> <BR /> <UL> <BR /> <LI> <STRONG> Network </STRONG> .&nbsp; Since the network between the VMs transports the&nbsp;replication of&nbsp;data, the bandwidth and latency of the network will be a significant factor in the performance of the system.&nbsp; Keep this in mind as you test configurations. </LI> <BR /> <LI> <STRONG> VHDx location optimization </STRONG> .&nbsp; If you have a Storage Space that is configured for a three way mirror, then the writes will be going to three separate disks (implemented as VHDx files on the hosts), each on different nodes of the cluster.&nbsp;Distributing the VHDx files across disks on the Hyper-V hosts will provide better response to the I/Os.&nbsp; For instance, if you have four disks or CSV volumes available on the Hyper-V hosts, and four VMs, then put the VHDx files for each VM on a separate disks (VM1 using CSV Volume 1, VM2 using CSV Volume 2, etc). </LI> <BR /> </UL> <BR /> <H2> Enabling Storage Spaces Direct in Virtual Machines: </H2> <BR /> Windows Server 2016&nbsp;includes enhancements that&nbsp;automatically configures&nbsp;the&nbsp;storage&nbsp;pool and storage tiers in "Enable-ClusterStorageSpacesDirect".&nbsp;&nbsp;It&nbsp;uses a combination of&nbsp; bus type and media type to determine devices to use for caching and the automatic configuration of storage pool and storage tiers. <BR /> <BR /> Below is an example of the steps to do this: <BR /> #Create cluster <BR /> New-Cluster -Name &lt;ClusterName -Node &lt;node1&gt;,&lt;node2&gt;,&lt;node3&gt; -NoStorage <BR /> <BR /> #Enable Storage Spaces Direct <BR /> Enable-ClusterS2D <BR /> <BR /> #Create a volume <BR /> New-Volume -StoragePool "S2d*" -FriendlyName &lt;friendlyname&gt; -FileSystem CSVFS_REFS -StorageTiersFriendlyNames Performance, Capacity -StorageTierSizes &lt;2GB&gt;, &lt;10GB&gt; <BR /> <BR /> #Note: The values for the -StorageTierSizes parameter above are examples, you can specify the size you prefer. The -StorageTierFriendNames of Performance and Capacity are the names of the default tiers created with the Enable-ClusterS2D cmdlet. There are some cases there may only be one of them, or someone could have added more tier definitions to the system. Use Get-StorageTier to confirm what storage tiers exist on your system. <BR /> <BR /> <BR /> <BR /> <H2> Configuration Option #1:&nbsp; Single Hyper-V Server (or Client) hosting VMs </H2> <BR /> The simplest configuration is one machine hosting all of the VMs used for the S2D system.&nbsp; In my case, a Windows Server 2016&nbsp;system running on a desktop class machine with 16GB or RAM and a 4 core modern processor. <BR /> <BR /> The VMs are configured identically. I have a virtual switch connected to the host’s network and goes out to the world for clients to connect and I created a second virtual switch that is set for Internal network, to provide another network path for S2D to utilize between the VMs. <BR /> <BR /> The configuration looks like the following diagram: <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90557iB0638F4A07139F38" /> <BR /> <H3> Hyper-V Host Configuration </H3> <BR /> <UL> <BR /> <LI> <STRONG> Configure the virtual switches: </STRONG> Configure a virtual switch connected to the machine’s physical NIC, and another virtual switch configured for internal only. </LI> <BR /> </UL> <BR /> <STRONG> Example: </STRONG> Two virtual switches.&nbsp;One configured to allow network traffic out to the world, which I labeled “Public”.&nbsp; The other is configured to only allow network traffic between VMs configured on the same host, which I labeled “InternalOnly”. <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90558iBFE25B101B9B0A79" /> <BR /> <BR /> <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90559i470CD325971F02AC" /> <BR /> <H3> VM Configuration </H3> <BR /> <UL> <BR /> <LI> <STRONG> Virtual Machines: </STRONG> Create two or more Virtual Machines <BR /> <UL> <BR /> <LI> The servers which are going to be nodes in the S2D cluster cannot be configured as&nbsp;Domain Controllers </LI> <BR /> </UL> <BR /> </LI> <BR /> <LI> <STRONG> Memory: </STRONG> If using Dynamic Memory, the default of 1024 Startup RAM will be sufficient.&nbsp; If using Fixed Memory you should configure 4GB or more. </LI> <BR /> <LI> <STRONG> Network: </STRONG> Configure each two network adapters.&nbsp; One connected to the virtual switch with external connection, the other network adapter connected to the virtual switch that is configured for internal only. <BR /> <UL> <BR /> <LI> It’s always recommended to have more than one network, each connected to separate virtual switches for resiliency&nbsp;so that if one stops flowing network traffic, the other(s) can be used and allow the cluster and Storage Spaces Direct system to remain running. </LI> <BR /> </UL> <BR /> </LI> <BR /> <LI> <STRONG> Virtual Disks: </STRONG> Each VM needs a virtual disk that is used as a boot/system disk, and two or more virtual disks to be used for Storage Spaces Direct. <BR /> <UL> <BR /> <LI> Disks used for Storage Spaces Direct must be connected to the VMs virtual SCSI Controller. </LI> <BR /> <LI> Like all other systems, the boot/system disk needs to have unique SIDs, meaning they need to be installed from ISO or other install methods, and if using duplicated VHDx it needs to be generalized (for example using Sysprep.exe), before the copy was made. </LI> <BR /> <LI> VHDx type and size:&nbsp; You need&nbsp;minimum of&nbsp;two VHDx data disks presented to each node, in addition to the OS VHDx disk.&nbsp; The data disks can be either “dynamically expanding” or “fixed size”. </LI> <BR /> </UL> <BR /> </LI> <BR /> </UL> <BR /> <STRONG> Example: </STRONG> The following is the Settings dialog for a VM that is configured to be part of an S2D system on one of my Hyper-V hosts.&nbsp; It’s booting from the Windows Server VHD that I downloaded from Microsoft’s external download site, and that is connected to the IDE Controller 0 (this had to be a Gen1 VM since the TP2 file that I downloaded is a VHD and not VHDx).&nbsp;I created two VHDx files to be used by S2D, and they are connected to the SCSI Controller.&nbsp; Also note the VM is connected to the Public and InternalOnly virtual switches. <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90560iA5E771008C481501" /> <BR /> <BR /> <STRONG> Note: </STRONG> Do not enable the virtual machine’s Processor Compatibility setting.&nbsp; This setting disables certain processor capabilities that S2D requires inside the VM.&nbsp;This option is unchecked by default, and needs to stay that way.&nbsp; You can see this setting here: <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90561i624146F9C0ADC598" /> <BR /> <H2> Guest Cluster Configuration </H2> <BR /> Once the VMs are configured, creating and managing the S2D system inside the VMs is almost identical to the steps for supported physical hardware: <BR /> <OL> <BR /> <LI> Start the VMs </LI> <BR /> <LI> Configure the Storage Spaces Direct system, using the “Installation and Configuration” section of the guide linked here: <A href="#" target="_blank"> Storage Spaces Direct Experience and Installation Guide </A> <BR /> <OL> <BR /> <LI> Since this in VMs using only VHDx files as its storage, there is no SSD or other faster media to allow tiers.&nbsp; Therefore, skip the steps that enables or configures tiers. </LI> <BR /> </OL> <BR /> </LI> <BR /> </OL> <BR /> <BR /> <BR /> <BR /> <H2> Configuration Option #2:&nbsp; VMs Spread Across Two or more Hyper-V Servers </H2> <BR /> You may not have a single machine with enough resources to host all VMs, or you may wish to spread the VMs across hosts to have greater resiliency to host failures.&nbsp; Here is an diagram showing a configuration spread across two nodes, as an example: <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90562iF8ABEB2585B74793" /> <BR /> <BR /> This configuration is very similar to the single host configuration.&nbsp; The differences are: <BR /> <BR /> <BR /> <H3> Hyper-V Host Configuration </H3> <BR /> <UL> <BR /> <LI> <STRONG> Virtual Switches </STRONG> :&nbsp; Each host is recommended to have a minimum of two virtual switches for the VMs to use.&nbsp; They need to be connected externally to different NICs on the systems.&nbsp; One can be on a network that is routed to the world for client access, and the other can be on a network that is not externally routed.&nbsp; Or, they both can be on externally routed networks.&nbsp; You can choose to use a single network, but then it will have all the client traffic and S2D traffic taking common bandwidth, and there is no redundancy if the single network goes down for the system S2D VMs to stay connected. However, since this is for testing and verification of S2D, you don’t have the resiliency to network loss requirements that we strongly suggest for production deployments. </LI> <BR /> </UL> <BR /> <STRONG> Example: </STRONG> On this system I have an internal 10/100 Intel NIC and a dual port Pro/1000 1gb card.&nbsp;All Three NICs have virtual switches.&nbsp;I labeled one Public and connected it to the 10/100 NIC since my connection to the rest of the world is through a 100mb infrastructure.&nbsp; I then have the 1gb NICs connected to a 1gb desktop switch (two different switches), and that provides my hosts two network paths between each other for S2D to use.&nbsp;As noted, three networks is not a requirement, but I have this available on my hosts so I use them all. <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90563i90CDCDFFD5DE4C52" /> <BR /> <H3> VM Configuration </H3> <BR /> <UL> <BR /> <LI> <STRONG> Network: </STRONG> If you choose to have a single network, then each VM will only have one network adapter in its configuration. </LI> <BR /> </UL> <BR /> <STRONG> Example: </STRONG> Below is a snip of a VM configuration on my two host configuration.&nbsp;You will note the following: <BR /> <UL> <BR /> <LI> <STRONG> Memory: </STRONG> I have this configured with 4GB of RAM instead of dynamic memory.&nbsp; It was a choice since I have enough memory resources on my nodes to dedicate memory. </LI> <BR /> <LI> <STRONG> Boot Disk: </STRONG> The boot disk is a VHDx, so I was able to use a Gen2 VM. </LI> <BR /> <LI> <STRONG> Data Disks: </STRONG> I chose to configure four data disks per VM.&nbsp; The minimum is two, I wanted to try four.&nbsp;All VHDx are configured on the SCSI Controller (which you don’t have a choice in Gen2 VMs). </LI> <BR /> <LI> <STRONG> Network Adapters: </STRONG> I have three adapters, each connected to one of the three virtual switches on the host to utilize the available network bandwidth that my hosts provide. </LI> <BR /> </UL> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90564i4D5E4E8155215EBB" /> <BR /> <H2> </H2> <BR /> <H2> FAQ: </H2> <BR /> <H3> How does this differ from what I can do in VMs with Shared VHDx? </H3> <BR /> Shared VHDx remains a valid and recommended solution to provide shared storage to a guest cluster (cluster running inside of VMs).&nbsp; It allows a VHDx to be accessed by multiple VMs at the same time in order to provide clustered shared storage.&nbsp; If any nodes (VMs) fail, the others have access to the VHDx and the clustered roles using the storage in the VMs can continue to access their data. <BR /> <BR /> S2D allows clustered roles access to clustered storage spaces inside of the VMs without provisioning shared VHDx on the host.&nbsp; S2D is also useful in scenarios where the private / public cloud does not support shared storage, such as Azure IaaS VMs.&nbsp; See this blog for more information on configuring Guest Clusters on Azure IaaS VMs, including with S2D: <A href="#" target="_blank"> https://blogs.msdn.microsoft.com/clustering/2017/02/14/deploying-an-iaas-vm-guest-clusters-in-microsoft-azure/ </A> <BR /> <H2> References </H2> <BR /> <A href="#" target="_blank"> Storage Spaces Direct Experience and Installation Guide </A> </BODY></HTML> Fri, 15 Mar 2019 21:57:43 GMT https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/storage-spaces-direct-using-windows-server-2016-virtual-machines/ba-p/372018 John Marlin 2019-03-15T21:57:43Z Windows Server 2016 Failover Cluster Troubleshooting Enhancements - Active Dump https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/windows-server-2016-failover-cluster-troubleshooting/ba-p/372008 <HTML> <HEAD></HEAD><BODY> <STRONG> First published on MSDN on May 18, 2015 </STRONG> <BR /> <H2> Active Dump </H2> <BR /> The following enhancement is not specific to Failover Cluster or even Windows Server.&nbsp; However, it has significant advantages when you are troubleshooting and getting memory.dmp files from servers running Hyper-V. <BR /> <H2> Memory Dump Enhancement – Active memory dump </H2> <BR /> Servers that are used as Hyper-V hosts tend to have a significant amount of RAM and a complete memory dump includes processor state as well as a dump of what is in RAM and this results in the dmp file for a Full Dump to be extremely large.&nbsp; On these Hyper-V <BR /> hosts, the parent partition is usually a small percentage of the overall RAM of the system, with the majority of the RAM allocated to Virtual Machines(VMs).&nbsp; It’s the parent partition memory that is interesting in debugging a bugcheck or other bluescreen and the VM <BR /> memory pages are not important for diagnosing most problems. <BR /> <BR /> Windows Server 2016 introduces a dump type of “Active memory dump”, which filters out most memory pages allocated to VMs and therefore makes the memory.dmp much smaller and easier to save/copy. <BR /> <BR /> As an example, I have a system with 16GB of RAM running Hyper-V and I initiated bluescreens with different crash dump settings to see what the resulting memory.dmp file size would be.&nbsp; I also tried “Active memory dump” with no VMs running and with 2 VMS taking up 8 of the 16GB of memory to see how effective it would be: <BR /> <TABLE> <TBODY><TR> <TD> </TD> <TD> <STRONG> Memory.dmp in KB </STRONG> </TD> <TD> <STRONG> % Compared to Complete </STRONG> </TD> </TR> <TR> <TD> <BR /> <P> Complete Dump: </P> <BR /> </TD> <TD> <BR /> <P> 16,683,673 </P> <BR /> </TD> <TD> </TD> </TR> <TR> <TD> <BR /> <P> Active Dump (no VMs): </P> <BR /> </TD> <TD> <BR /> <P> 1,586,493 </P> <BR /> </TD> <TD> <BR /> <P> 10% </P> <BR /> </TD> </TR> <TR> <TD> <BR /> <P> Active Dump (VMs with 8GB RAM total): </P> <BR /> </TD> <TD> <BR /> <P> 1,629,497 </P> <BR /> </TD> <TD> <BR /> <P> 10% </P> <BR /> </TD> </TR> <TR> <TD> <BR /> <P> Kernel Dump (VMs with 8GB RAM total) </P> <BR /> </TD> <TD> <BR /> <P> 582,261 </P> <BR /> </TD> <TD> <BR /> <P> 3% </P> <BR /> </TD> </TR> <TR> <TD> <BR /> <P> Automatic Dump (VMs with 8GB RAM total) </P> <BR /> </TD> <TD> <BR /> <P> 587,941 </P> <BR /> </TD> <TD> <BR /> <P> 4% </P> <BR /> </TD> </TR> </TBODY></TABLE> <BR /> *The size of the Active Dump as compared to a complete dump will vary depending on the total host memory and what is running on the system. <BR /> <BR /> In looking at the numbers in the table above, keep in mind that the Active Dump is larger than the kernel, but includes the usermode space of the parent partition, while being 10% of the size of the complete dump that would have normally been required to get the usermode space. <BR /> <H2> Configuration </H2> <BR /> The new dump type can be chosen through the Startup and Recovery dialog as shown here: <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90554i0200C178AFF871CE" /> <BR /> The memory.dmp type can also be set through the registry under the following key.&nbsp; The change will not take effect until the system is restarted if changing it directly in the registry: <STRONG> HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\CrashControl\ <BR /> </STRONG> <BR /> <BR /> <STRONG> Note: </STRONG> Information on setting memory dump types directly in the registry for previous versions can be found in a blog <A href="#" target="_blank"> here </A> . <BR /> <BR /> To configure the Active memory.dmp there are 2 values that need to be set, both are REG_DWORD values. <BR /> <BR /> HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\CrashControl\ <STRONG> CrashDumpEnabled </STRONG> <BR /> <BR /> The <STRONG> CrashDumpEnabled </STRONG> value needs to be 1, which is the same as a complete dump. <BR /> <BR /> And <BR /> <BR /> HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\CrashControl\ <STRONG> FilterPages. </STRONG> <BR /> <BR /> The <STRONG> FilterPages </STRONG> value needs to be set to 1 <STRONG> . </STRONG> <BR /> <BR /> <STRONG> Note: </STRONG> FilterPages value will not found under the HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\CrashControl\ key unless the GUI “Startup and Recovery” dialog is used to set the dump type <BR /> to “Active memory dump”, or you manually create and set the value. <BR /> <BR /> <BR /> <BR /> If you would like to set this via Windows PowerShell, here is the flow and example: <BR /> <OL> <BR /> <LI> Gets the value of CrashDumpEnabled </LI> <BR /> <LI> Sets the value of CrashDumpEnabled to 1 (so effectively this is now set to Complete dump). </LI> <BR /> <LI> Gets the value of FilterPages (note that there is an error because this value doesn’t exist yet). </LI> <BR /> <LI> Sets the value of FilterPages to 1 (this changes it from Complete dump to Active dump) </LI> <BR /> <LI> Gets the value of FilterPages to verify it was set correctly and exists now. </LI> <BR /> </OL> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90555iD5535B45B4A1EAB9" /> <BR /> <BR /> <BR /> <BR /> Here is TXT version of what is showing above, to make it easier to copy/paste: <BR /> Get-ItemProperty –Path HKLM:\System\CurrentControlSet\Control\CrashControl –Name CrashDumpEnabled <BR /> Get-ItemProperty –Path HKLM:\System\CurrentControlSet\Control\CrashControl –Name FilterPages <BR /> Set-ItemProperty –Path HKLM:\System\CurrentControlSet\Control\CrashControl –Name CrashDumpEnabled –value 1 <BR /> Set-ItemProperty –Path HKLM:\System\CurrentControlSet\Control\CrashControl –Name FilterPages –value 1 </BODY></HTML> Fri, 15 Mar 2019 21:56:15 GMT https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/windows-server-2016-failover-cluster-troubleshooting/ba-p/372008 John Marlin 2019-03-15T21:56:15Z Windows Server 2016 Failover Cluster Troubleshooting Enhancements - Cluster Log https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/windows-server-2016-failover-cluster-troubleshooting/ba-p/372005 <HTML> <HEAD></HEAD><BODY> <STRONG> First published on MSDN on May 14, 2015 </STRONG> <BR /> <H2> Cluster Log Enhancements </H2> <BR /> This is the first in a series of Blogs that will provide details about the improvements we have made in the tools and methods for troubleshooting Failover Clusters with Windows Server 2016. <BR /> <BR /> Failover Cluster has diagnostic logs running on each server that allow in-depth troubleshooting of problems without having to reproduce the issue. This log is valuable for Microsoft’s support as well as those out there who have expertise at troubleshooting failover clusters. <BR /> <BR /> <STRONG> Tip: </STRONG> Always go to the System event log first, when troubleshooting an issue.&nbsp;Failover cluster posts events in the System event log that are often enough to understand the nature and scope of the problem.&nbsp;It also gives you the specific date/time of the problem, which is useful if you do look at other event logs or dig into the cluster.log if needed. <BR /> <H3> Generating the Cluster.log </H3> <BR /> This is not new, but will be helpful information for those that aren’t familiar with generating the cluster log. <BR /> <BR /> <STRONG> Get-ClusterLog </STRONG> is the Windows PowerShell cmdlet that will generate the cluster.log on each server that is a member of the cluster and is currently running. The output looks like this on a 3 node cluster: <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90550i6FA37884C15C4AC6" /> <BR /> <BR /> The Cluster.log files can be found in the <STRONG> &lt;systemroot&gt;\cluster\reports </STRONG> directory (usually c:\windows\cluster\Reports) on each node. <BR /> <BR /> You can use the <STRONG> –Destination </STRONG> parameter to cause the files to be copied to a specified directory with the Server’s name appended to the log name, which makes it much easier to get and analyze logs from multiple servers: <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90551i659555D30A750811" /> <BR /> <BR /> Other useful parameters are discussed in the rest of this&nbsp;blog. <BR /> <H2> What’s New </H2> <BR /> I’m going to highlight the enhancements to the information in the Windows Server 2016 cluster.log that will be the most interesting and useful to the general audience interested in troubleshooting failover clusters,&nbsp;and leave detailing every item in the log to a future blog(s).&nbsp; I’m including references and links to resources related to troubleshooting clusters and using the cluster log at the end of this blog. <BR /> <H3> TimeZone Information </H3> <BR /> The cluster.log is a dump of the information from the system and captured in a text file. The time stamps default to UTC (which some people call GMT).&nbsp; Therefore if you are in a time zone that is UTC+8 you need to look at the time stamp in the cluster log and add 8 hours.&nbsp;For instance, if you are in that time zone and a problem occurred at 1:38pm (13:38), UTC time stamp in the cluster log would be (21:38). <BR /> <BR /> We offer 2 enhancements in the cluster.log that makes this time zone and UTC offset easier to discover and work with: <BR /> <OL> <BR /> <LI> <STRONG> UTC&nbsp;offset of the server: </STRONG> The Top of the cluster.log notes the UTC offset of&nbsp;the originating server.&nbsp; In the example&nbsp;below, it notes that the server is set to UTC + a 7 hour offset (420&nbsp;minutes).&nbsp; Specifically noting this&nbsp;offset in the log removes the guesswork related to the system’s time zone&nbsp;setting. </LI> <BR /> <LI> <STRONG> Cluster&nbsp;log uses UTC or local time </STRONG> . The top of the cluster.log notes whether the&nbsp;log was created using UTC or local time for the timestamps.&nbsp; The <STRONG> –UseLocalTime </STRONG> parameter for <STRONG> Get-ClusterLog </STRONG> causes the cluster.log to write&nbsp;timestamps that are already adjusted for the server’s time zone instead of&nbsp;using UTC.&nbsp; This is not new, but it&nbsp;became obvious that it’s helpful to know if that parameter was used or not, so&nbsp;it’s noted in the log. </LI> <BR /> </OL> <BR /> [===Cluster ===] <BR /> UTC= localtime + time zone offset; with daylight savings, the time zone offset of&nbsp;this machine is 420 minutes, or 7 hours <BR /> <BR /> The&nbsp;logs were generated using Coordinated Universal Time (UTC). 'Get-ClusterLog&nbsp;-UseLocalTime' will generate in local time. <BR /> <BR /> <STRONG> Tip: </STRONG> The sections of the cluster.log are encased in&nbsp;[===&nbsp;&nbsp; ===], which makes it easy to&nbsp;navigate down the log to each section by doing a find on “[===”.&nbsp; As a bit of trivia, this format was chosen&nbsp;because it kind of looks like a Tie Fighter and we thought it looked cool. <BR /> <H3> Cluster Objects </H3> <BR /> The cluster has objects that are part of its&nbsp;configuration.&nbsp; Getting the details of&nbsp;these objects can be useful in diagnosing problems.&nbsp; These objects include resources, groups,&nbsp;resource types, nodes, networks, network interfaces, and volumes.&nbsp; The cluster.log now dumps these objects in a&nbsp;Comma Separated Values list with headers. <BR /> Here is an example: <BR /> [===Networks ===] <BR /> <BR /> Name,Id,description,role,transport,ignore,AssociatedInterfaces,PrefixList,address,addressMask,ipV6Address,state,linkSpeed,rdmaCapable,rssCapable,autoMetric,metric, <BR /> <BR /> Cluster Network 1,27f2d19b-7e23-4ee3-a226-287d4ebe9113,,1,TCP/IP,false,82e5107c-5375-473a-ab9f-5b6450bf5c7f30ff5ff6-00a3-494b-84b6-62a27ef99bb3 187c582d-f23c-48f4-8c37-6a452b2a238b,10.10.1.0/24 ,10.10.1.0,255.255.255.0,,3,1000000000,false,false,true,39984, <BR /> <BR /> Cluster Network 2,e6efd1f6-474b-410a-bd7b-5ece99476cd8,,1,TCP/IP,false,57d9b74d-8d9e-4afe-8667-e91e0bd23412617bb075-3803-4e5e-a039-db513d60603d 51c4fd42-9cb4-4f2e-a65c-01fea9bfa582,10.10.3.0/24 ,10.10.3.0,255.255.255.0,,3,1000000000,false,false,true,39985, <BR /> <BR /> Cluster Network 3,1a5029c7-7961-40bb-b6b9-dcbbe4187034,,3,TCP/IP,false,d3cdef35-82bc-4a60-8ed4-5c2b278f7c0e83c7c4b8-b588-425c-bfae-0c69d7a45bcd c1fb12d2-071b-4cb2-8ca7-fa04e972cd1c,157.59.132.0/22 2001:4898:28:4::/64,157.59.132.0,255.255.252.0,2001:4898:28:4::,3,100000000,false,false,true,80000, <BR /> These sections can be consumed by any application that can parse CSV text.&nbsp; Or, you can copy/paste into an Excel spreadsheet, which makes it easier to read as well as provides filter/sort/search.&nbsp; For the example below, I pasted the above section into a spreadsheet and then used the “Text to Columns” action in the “DATA” tab of Microsoft’s Excel. <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90552i4E9878C0594C9EDD" /> <BR /> <H3> New Verbose Log </H3> <BR /> New for Windows Server 2016 is the DiagnosticVerbose event&nbsp;channel.&nbsp; This is a new channel that is&nbsp;in addition to the Diagnostic channel for FailoverClustering. <BR /> <BR /> In most cases the diagnostic channel, with the default log&nbsp;level set to the default of 3, gets enough information that an expert&nbsp;troubleshooter or Microsoft’s support engineers can understand a problem.&nbsp; However, there are occasions where we need&nbsp;more verbose logging and it’s necessary to set the cluster log level to 5, causing the diagnostic channel to start adding the verbose level of events to&nbsp;the log.&nbsp; After changing the log level&nbsp;you have to reproduce the problem and analyze the logs again. <BR /> <BR /> The question arises, why don’t we suggest keeping the log&nbsp;level at 5?&nbsp; The answer is that it causes&nbsp;the logs to have more events and therefore wrap faster.&nbsp; Being able to go back for hours or days in&nbsp;the logs is also desirable so the quicker wrapping poses its own&nbsp;troubleshooting problem. <BR /> <BR /> To accommodate wanting verbose logging for the most recent&nbsp;time frame, and having logging that provides adequate history, we implemented a&nbsp;parallel diagnostic channel we call DiagnoticVerbose.&nbsp; The DiagnosticVerbose log is always set for&nbsp;the equivalent of the cluster log level 5 (verbose) and runs in parallel to the Diagnostic channel for FailoverClustering. <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90553iB56384B32331446B" /> <BR /> <BR /> You can find the DiagnosticVerbose section in the&nbsp;cluster.log by doing a find on “DiagnosticeVerbose”.&nbsp; It will go the section header: <BR /> [===&nbsp;Microsoft-Windows-FailoverClustering/DiagnosticVerbose ===] <BR /> <BR /> [Verbose]&nbsp;00000244.00001644::2015/04/22-01:04:29.623 DBG <BR /> [RCM] rcm::PreemptionTracker::GetPreemptedGroups() <BR /> [Verbose]&nbsp;00000244.00001644::2015/04/22-01:04:29.623 DBG <BR /> [RCM] got asked for preempted groups, returning 0 records <BR /> The Diagnostic channel (default log level of 3) can be found&nbsp;by doing a find on “Cluster Logs”: <BR /> [=== Cluster Logs ===] <BR /> <BR /> 00000e68.00000cfc::2015/03/23-22:12:24.682 DBG&nbsp;&nbsp; [NETFTAPI] received NsiInitialNotification <BR /> 00000e68.00000cfc::2015/03/23-22:12:24.684 DBG&nbsp;&nbsp; [NETFTAPI] received NsiInitialNotification <BR /> <H3> Events From Other Channels </H3> <BR /> There is a “Tip” above that notes the recommendation to start in the system event log first.&nbsp; However, it’s not uncommon for someone to generate the cluster logs and send them to their internal 3 <SUP> rd </SUP> tier support or to other experts.&nbsp; Going back and getting the system or other event logs that may be useful in diagnosing the problem can take time, and sometimes the logs have already wrapped or have been cleared. <BR /> <BR /> New in Windows Server 2016 cluster log, the following event channels will also be dumped into the cluster.log for each node.&nbsp; Since they are all in one file, you no longer need to go to the nodes and pull each log individually. <BR /> [=== System ===] <BR /> <BR /> [=== Microsoft-Windows-FailoverClustering/Operational logs ===] <BR /> <BR /> [=== Microsoft-Windows-ClusterAwareUpdating-Management/Admin logs ===] <BR /> <BR /> [=== Microsoft-Windows-ClusterAwareUpdating/Admin logs ===] <BR /> Here is an example: <BR /> [=== System ===] <BR /> <BR /> [System] <BR /> 00000244.00001b3c::2015/03/24-19:46:34.671 ERR <BR /> Cluster resource 'Virtual Machine &lt;name&gt;' of type 'Virtual <BR /> Machine' in clustered role '&lt;name&gt;' failed. <BR /> Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it.&nbsp;Check the resource and group state using Failover Cluster Manager or the Get-ClusterResource Windows PowerShell cmdlet. <BR /> [System] 00000244.000016dc::2015/04/14-23:43:09.458 INFO&nbsp;The Cluster service has changed the password of account 'CLIUSR' on node '&lt;node name&gt;'. <BR /> <STRONG> Tip: </STRONG> If the size of the cluster.log file is bigger than you desire, the –TimeSpan switch for Get-ClusterLog will limit how far back (in minutes) it will go back in time for the events.&nbsp; For instance, Get-Clusterlog –TimeSpan 10 will cause the cluster.log on each node to be created and only include events from the last 10 minutes.&nbsp; That includes the Diagnostic, DiagnosticVerbose, and other channels that are included in the report. <BR /> <H3> Cluster.log References: </H3> <BR /> Troubleshooting Windows Server 2012 Failover Clusters, How to get to the root of the problem: <A href="#" target="_blank"> http://windowsitpro.com/windows-server-2012/troubleshooting-windows-server-2012-failover-clusters </A> <BR /> <BR /> Get-ClusterLog: <A href="#" target="_blank"> https://technet.microsoft.com/en-us/library/hh847315.aspx </A> <BR /> <BR /> Set-ClusterLog: <A href="#" target="_blank"> https://technet.microsoft.com/en-us/library/ee461043.aspx </A> </BODY></HTML> Fri, 15 Mar 2019 21:55:50 GMT https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/windows-server-2016-failover-cluster-troubleshooting/ba-p/372005 John Marlin 2019-03-15T21:55:50Z Invitation: Provide feedback, comments, and vote on Cluster UserVoice page https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/invitation-provide-feedback-comments-and-vote-on-cluster/ba-p/371999 <HTML> <HEAD></HEAD><BODY> <STRONG> First published on MSDN on May 11, 2015 </STRONG> <BR /> The clustering team has a new UserVoice page here: <A href="#" target="_blank"> http://windowsserver.uservoice.com/forums/295074-clustering </A> that is part of the Windows Server UserVoice page: <A href="#" target="_blank"> http://windowsserver.uservoice.com/forums/295047-general-feedback </A> . <BR /> <BR /> We welcome your feedback, comments, and votes – we would like to make Windows Server 2016 the best operating system for you and your customers. <BR /> <BR /> PS – You can find Windows Server 2016 Technical Preview 2 (TP2) here: <A href="#" target="_blank"> http://www.microsoft.com/en-us/evalcenter/evaluate-windows-server-technical-preview </A> <BR /> <BR /> <P> -RH. </P> </BODY></HTML> Fri, 15 Mar 2019 21:55:07 GMT https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/invitation-provide-feedback-comments-and-vote-on-cluster/ba-p/371999 Rob Hindman 2019-03-15T21:55:07Z Failover Clustering Sessions @ Ignite 2015 in Chicago https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/failover-clustering-sessions-ignite-2015-in-chicago/ba-p/371998 <HTML> <HEAD></HEAD><BODY> <STRONG> First published on MSDN on Apr 30, 2015 </STRONG> <BR /> If you are going to&nbsp;Ignite 2015 in Chicago next week, here are&nbsp;the cluster related sessions you might want to check out.&nbsp; We'll be talking about some exciting new enhancements coming in vNext.&nbsp; Weren't able to make&nbsp;it this year?&nbsp; Don't worry, you can stream live all the sessions and they are also recorded so that you can watch them any time at <A href="#" target="_blank"> https://channel9.msdn.com/ </A> <BR /> <BR /> <STRONG> BRK3474 - <A href="#" target="_blank"> Enabling Private Cloud Storage Using Servers With Local Disks </A> </STRONG> <BR /> Have you ever wanted to build a Scale-Out File Server using shared nothing Direct Attached Storage (DAS) hardware like SATA or NVMe disks? We cover advances in Microsoft Software Defined Storage that enables service providers to build Scale-Out File Servers using Storage Spaces with shared nothing DAS hardware. <BR /> <BR /> <STRONG> BRK3484 - <A href="#" target="_blank"> Upgrading Your Private Cloud to Windows Server 2012 R2 and Beyond! </A> </STRONG> <BR /> We are moving fast, and want to help you to keep on top of the latest technology! This session covers the features and capabilities that will enable you to upgrade to Windows Server 2012 R2 and to Windows Server vNext with the least disruption. Understand cluster role migration, cross version live migration, rolling upgrades, and more. <BR /> <BR /> <STRONG> BRK3487 - <A href="#" target="_blank"> Stretching Failover Clusters and Using Storage Replica in Windows Server vNext </A> </STRONG> <BR /> In this session we discuss the deployment considerations of taking a Windows Server Failover Cluster and stretching across sites to achieve disaster recovery. This session discusses the networking, storage, and quorum model considerations. This session also discusses new enhancements coming in vNext to enable multi-site clusters. <BR /> <BR /> <STRONG> BRK3489 - <A href="#" target="_blank"> Exploring Storage Replica in Windows Server vNext </A> </STRONG> <BR /> Delivering business continuity involves more than just high availability, it means disaster preparedness. In this session, we discuss the new Storage Replica feature, including scenarios, architecture, requirements, and demos. Along with our new stretch cluster option, it also covers use of Storage Replica in cluster-to-cluster and non-clustered scenarios. And we have swag! <BR /> <BR /> <STRONG> BRK3558 - <A href="#" target="_blank"> Microsoft SQL Server End-to-End High Availability and Disaster Recovery </A> </STRONG> <BR /> In this session we look at options which are available to the administrator of a Microsoft SQL Server 2014 database server so that the system can provide the 99.99% or higher uptime that customers demand. These options include Failover Cluster Instances, as well as AlwaysOn Availability Groups within a single site, stretching across multiple sites, as well as stretching into the Microsoft Azure public cloud. Learn when to use each technique, how to decide which option to implement, and how to implement these solutions. <BR /> <BR /> <STRONG> BRK4105 - <A href="#" target="_blank"> Under the Hood with DAGs </A> </STRONG> <BR /> Join this session to learn from the DAG master Tim McMichael. The session examines how an Microsoft Exchange 2013 Database Availability Group leverages the Windows Failover Clustering service. As a bonus, it provides a sneak peek at how this will evolve with Exchange Server vNext and Windows 10. It also explores registry replication, cluster networking, and cluster features such as dynamic quorum and dynamic witness. After this session, administrators should have understanding of cluster integration and basic support knowledge. <BR /> <BR /> <P> <STRONG> BRK3496 - <A href="#" target="_blank"> Deploying Private Cloud Storage with Dell Servers and Windows Server vNext </A> </STRONG> <BR /> The storage industry is going through strategic tectonic shifts. In this session, we’ll walk through Dell’s participation in the Microsoft Software Defined Storage journey and how cloud scale scenarios are shaping solutions. We will provide technical guidance for building Storage Spaces in Windows Server vNext clusters on the Dell PowerEdge R730xd platform. </P> </BODY></HTML> Fri, 15 Mar 2019 21:55:03 GMT https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/failover-clustering-sessions-ignite-2015-in-chicago/ba-p/371998 Elden Christensen 2019-03-15T21:55:03Z Troubleshooting Cluster Shared Volume Recovery Failure – System Event 5142 https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/troubleshooting-cluster-shared-volume-recovery-failure-8211/ba-p/371997 <HTML> <HEAD></HEAD><BODY> <STRONG> First published on MSDN on Mar 26, 2015 </STRONG> <BR /> In the last post <A href="#" target="_blank"> http://blogs.msdn.com/b/clustering/archive/2014/12/08/10579131.aspx </A> we discussed event 5120, which indicates that Cluster Shared Volumes (CSV) observed and error and attempted to recover. In this post we will discuss cases when recovery does not succeed. When CSV recovery does not succeed, an Event 5142 is logged to the System event log. <BR /> <DIV> <BR /> Log Name:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; System <BR /> Source:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Microsoft-Windows-FailoverClustering <BR /> Event ID:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 5142 <BR /> Task Category: Cluster Shared Volume <BR /> Level:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Error <BR /> Description: <BR /> Cluster Shared Volume 'Volume1' ('Cluster Disk 1') is no longer accessible from this cluster node because of error '(1460)'. Please troubleshoot this node's connectivity to the storage device and network connectivity. <BR /> </DIV> <BR /> In this post we will go over several possible root causes which may result in a 5142, and how to identify which issue are you hitting. <BR /> <H2> Cluster Service Failed </H2> <BR /> When the Cluster Service fails on a node, the Cluster Shared Volumes file system (CSVFS) will invalidate all file objects on all the volumes on that node. You may not see an event 5142 in this case because cluster may not have an opportunity to log it due to the service failed. You can find these cases by scanning Microsoft-Windows-FailoverClustering-CsvFs/Operational for the following sequence of events <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90548i7AF2F7A2EB6FBBFF" /> <BR /> <BR /> <BR /> <BR /> The first 8960 event indicates that CSVFS is moving the volume to Init state and that DcmSequenceIs is empty, which means that this command did not come from the cluster service. CSVFS initiated this activity on its own <BR /> <DIV> <BR /> <P> Log Name:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Microsoft-Windows-FailoverClustering-CsvFs/Operational <BR /> Source:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Microsoft-Windows-FailoverClustering-CsvFs-Diagnostic <BR /> Event ID:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 8960 <BR /> Task Category: Volume State Change Started <BR /> Level:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Information <BR /> Keywords:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Volume State <BR /> Description: <BR /> Volume {ca4ce06f-6b06-4405-b058-fd9d1cf869b3} transitioning from Init to Init. <BR /> Event Xml: <BR /> &lt;Event xmlns="<A href="#" target="_blank">http://schemas.microsoft.com/win/2004/08/events/event</A>"&gt; <BR /> &lt;EventData&gt; <BR /> &lt;Data Name="Volume"&gt;0xffffe000badfb1b0&lt;/Data&gt; <BR /> &lt;Data Name="VolumeId"&gt;{CA4CE06F-6B06-4405-B058-FD9D1CF869B3}&lt;/Data&gt; <BR /> &lt;Data Name="CurrentState"&gt;0&lt;/Data&gt; <BR /> &lt;Data Name="NewState"&gt;0&lt;/Data&gt; <BR /> &lt;Data Name="DcmSequenceId"&gt; <BR /> &lt;/Data&gt; <BR /> &lt;/EventData&gt; <BR /> &lt;/Event&gt; </P> <BR /> <BR /> </DIV> <BR /> Next 9216 event tells us that CSVFS successfully finished transition to Init state. <BR /> <DIV> <BR /> <P> Log Name:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Microsoft-Windows-FailoverClustering-CsvFs/Operational <BR /> Source:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Microsoft-Windows-FailoverClustering-CsvFs-Diagnostic <BR /> Event ID:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 9216 <BR /> Task Category: Volume State Change Completed <BR /> Level:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Information <BR /> Keywords:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Volume State <BR /> Description: <BR /> Volume {ca4ce06f-6b06-4405-b058-fd9d1cf869b3} moved to state Init. Reason Transition to Init; Status 0x0. <BR /> Event Xml: <BR /> &lt;Event xmlns="<A href="#" target="_blank">http://schemas.microsoft.com/win/2004/08/events/event</A>"&gt; <BR /> &lt;EventData&gt; <BR /> &lt;Data Name="Volume"&gt;0xffffe000badfb1b0&lt;/Data&gt; <BR /> &lt;Data Name="VolumeId"&gt;{CA4CE06F-6B06-4405-B058-FD9D1CF869B3}&lt;/Data&gt; <BR /> &lt;Data Name="State"&gt;0&lt;/Data&gt; <BR /> &lt;Data Name="Source"&gt;8&lt;/Data&gt; <BR /> &lt;Data Name="Status"&gt;0x0&lt;/Data&gt; <BR /> &lt;Data Name="DcmSequenceId"&gt; <BR /> &lt;/Data&gt; <BR /> &lt;/EventData&gt; <BR /> &lt;/Event&gt; </P> <BR /> <BR /> </DIV> <BR /> And finally an event 49152 is logged which provides details why this transitions was done, in this case because CSVFS observed that the Cluster Service is terminating. <BR /> <DIV> <BR /> <P> Log Name:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Microsoft-Windows-FailoverClustering-CsvFs/Operational <BR /> Source:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Microsoft-Windows-FailoverClustering-CsvFs-Diagnostic <BR /> Event ID:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 49152 <BR /> Task Category: ClusterDisconnected <BR /> Level: &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Information <BR /> Keywords:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ClusterServiceState <BR /> Description: <BR /> Cluster service disconnected. <BR /> Event Xml: <BR /> &lt;Event xmlns="<A href="#" target="_blank">http://schemas.microsoft.com/win/2004/08/events/event</A>"&gt; <BR /> &lt;EventData&gt; <BR /> &lt;Data Name="FileObject"&gt;0xffffe000bab597c0&lt;/Data&gt; <BR /> &lt;Data Name="ProcessId"&gt;0x1070&lt;/Data&gt; <BR /> &lt;/EventData&gt; <BR /> &lt;/Event&gt; </P> <BR /> <BR /> </DIV> <BR /> In the cluster logs every time cluster starts you will see a log statement which contains “--------------+” so you can look for the last statement from the previous cluster instance to see what was the last thing cluster service have been doing right before terminating. <BR /> <BR /> Here is what was logged in the cluster log when I terminated Cluster Service: <BR /> 00001070.000015e0::2014/10/23-00:12:29.885 DBG&nbsp;&nbsp; [API] s_ApiOpenKey: "ServerForNFS\ReadConfig" failed with error 2 <BR /> <BR /> And then the ClusSvc was started again: <BR /> 00000f10.00000fa4::2014/10/23-00:13:03.287 INFO&nbsp; -----------------------------+ LOG BEGIN +----------------------------- <BR /> <BR /> In the event that that it is unknown why the cluster service terminated, you can start reading cluster logs from this point back trying to understand why cluster service went down. <BR /> <BR /> When cluster fails on one of the nodes it will cause CSV volumes on this node to go down. CSV volumes will stay up on the other nodes. If node with failed clussvc was the coordinator then CSV on the other nodes will be paused until cluster fails over and onlines disk on a surviving node. <BR /> <H2> Disk Failure or Offline </H2> <BR /> When cluster exhausts all restart attempts to online a disk after too many failed attempts or when user manually offlines the disk, cluster will move CSV volumes corresponding to this disk to the Init state. <BR /> <BR /> For instance if you offline a disk using Failover Cluster Manager then in the Microsoft-Windows-FailoverClustering-CsvFs/Operational channel we would see following events <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90549iD28EF0B3832401F1" /> <BR /> <P> Log Name:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Microsoft-Windows-FailoverClustering-CsvFs/Operational <BR /> Source:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Microsoft-Windows-FailoverClustering-CsvFs-Diagnostic <BR /> Event ID:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 8960 <BR /> Task Category: Volume State Change Started <BR /> Level:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Information <BR /> Keywords:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Volume State <BR /> Description: <BR /> Volume {ca4ce06f-6b06-4405-b058-fd9d1cf869b3} transitioning from Active to Init. <BR /> Event Xml: <BR /> &lt;Event xmlns="<A href="#" target="_blank">http://schemas.microsoft.com/win/2004/08/events/event</A>"&gt; <BR /> &lt;EventData&gt; <BR /> &lt;Data Name="Volume"&gt;0xffffe000885581b0&lt;/Data&gt; <BR /> &lt;Data Name="VolumeId"&gt;{CA4CE06F-6B06-4405-B058-FD9D1CF869B3}&lt;/Data&gt; <BR /> &lt;Data Name="CurrentState"&gt;4&lt;/Data&gt; <BR /> &lt;Data Name="NewState"&gt;0&lt;/Data&gt; <BR /> &lt;Data Name="DcmSequenceId"&gt;&amp;lt;1:60129542151&amp;gt;&amp;lt;60129542147&amp;gt;&lt;/Data&gt; <BR /> &lt;/EventData&gt; <BR /> &lt;/Event&gt; </P> <BR /> <BR /> <DIV> <BR /> <P> Log Name:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Microsoft-Windows-FailoverClustering-CsvFs/Operational <BR /> Source:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Microsoft-Windows-FailoverClustering-CsvFs-Diagnostic <BR /> Event ID:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 9216 <BR /> Task Category: Volume State Change Completed <BR /> Level:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Information <BR /> Keywords:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Volume State <BR /> Description: <BR /> Volume {ca4ce06f-6b06-4405-b058-fd9d1cf869b3} moved to state Init. Reson Transition to Init; Status 0x0. <BR /> Event Xml: <BR /> &lt;Event xmlns="<A href="#" target="_blank">http://schemas.microsoft.com/win/2004/08/events/event</A>"&gt; <BR /> &lt;EventData&gt; <BR /> &lt;Data Name="Volume"&gt;0xffffe000885581b0&lt;/Data&gt; <BR /> &lt;Data Name="VolumeId"&gt;{CA4CE06F-6B06-4405-B058-FD9D1CF869B3}&lt;/Data&gt; <BR /> &lt;Data Name="State"&gt;0&lt;/Data&gt; <BR /> &lt;Data Name="Source"&gt;8&lt;/Data&gt; <BR /> &lt;Data Name="Status"&gt;0x0&lt;/Data&gt; <BR /> &lt;Data Name="DcmSequenceId"&gt;&amp;lt;1:60129542151&amp;gt;&amp;lt;60129542147&amp;gt;&lt;/Data&gt; <BR /> &lt;/EventData&gt; <BR /> &lt;/Event&gt; </P> <BR /> <BR /> </DIV> <BR /> DcmSequenceId is not empty, so this means that the command came to CSVFS from the cluster. Using DcmSequenceId &lt;1:60129542151&gt;&lt;60129542147&gt; you can correlate this to the place in the cluster log where cluster service initiated that state transition <BR /> [Verbose] 000004dc.00001668::2014/10/23-00:57:00.587 INFO&nbsp; [DCM] FilterAgent: ChangeCsvFsState: uniqueId ca4ce06f-6b06-4405-b058-fd9d1cf869b3, state CsvFsVolumeStateInit, sequence &lt;1:60129542151&gt;&lt;60129542147&gt; <BR /> <BR /> And working from this point backwards I see that there was a manual offline of the disk <BR /> [Verbose] 000004dc.000012dc::2014/10/23-00:57:00.527 INFO&nbsp; [RCM] rcm::RcmApi::OfflineResource: (Cluster Disk 3, 0) <BR /> <BR /> If disk is failing then you will find in the cluster logs records about that and the reason for the failure. <BR /> <H2> Volume Is Failing Too Often </H2> <BR /> If cluster observes that a CSV volume on one of the nodes is failing too often, and is unable to stay without interruption in a good state for 5 minutes without running into auto pause then the volume will be taken down on that node. On the other nodes the volume will remain active. After several minutes cluster will attempt to revive the volume back to active state. In this case in the cluster logs you would see statements similar to this: <BR /> 00000ab8.0000109c::2014/10/21-02:36:23.388 INFO&nbsp; [DCM] UnmapPolicy::enter_CountingToBad(aec3c2e8-a7eb-45e9-9509-f63190659ba4): goodTimer P0...75, badTimer R0...150, badCounter 1 state CountingToBad <BR /> <DIV> <BR /> <P> 00000ab8.0000109c::2014/10/21-02:36:23.544 INFO&nbsp; [DCM] CsvFs Listener: state [volume aec3c2e8-a7eb-45e9-9509-f63190659ba4, sequence &lt;&gt;&lt;145&gt;, state CsvFsVolumeStateChangeFromInit-&gt;CsvFsVolumeStateInit, status 0x0] </P> <BR /> <BR /> </DIV> <BR /> And in the Microsoft-Windows-FailoverClustering-CsvFs/Operational you will see correlating events 8960 and 9216 with matching DcmSequenceId - &lt;&gt;&lt;145&gt;. Note that first part of the sequence Id is empty. This is because the action is not global for all nodes, but is only for the current node. In general sequence format is: <BR /> <P> &lt;Id of the cluster node that initiated this action:Sequence number of action on the node that initiated this action&gt;&lt;Sequence number of the action on the node where the action was executed &gt; </P> <BR /> <BR /> <H2> Recovery Is Taking Too Long </H2> <BR /> <H2> Disk Is Timing Out During Online </H2> <BR /> Once CSVFS completes a state transition it waits for the cluster to start the next one, but CSVFS will not wait indefinitely. Depending on the volume type and state CSVFS will wait from 1 to 10 minutes. For example with a snapshot volume CSVFS would wait for only 1 minute. For volume which has some dirty pages in the file system cache CSVFS would wait for 3 minutes. For a volume which has no dirty pages CSVFS would wait up to 10 minutes. If cluster does not start the next state transition in that time then CSVFS will move volume to Init state. <BR /> <BR /> In the Microsoft-Windows-FailoverClustering-CsvFs/Operational you will see events 8960 and 9216 similar to the case when cluster service was terminated. DcmSequenceId will be empty because state change was initiated by CSVFS. You will NOT see event 49152 saying that cluster service has disconnected. In the cluster logs you will see a log record that volume went to Init state, and sequence number will be empty. <BR /> 00000af0.00000f88::2014/10/17-06:37:58.325 INFO&nbsp; [DCM] CsvFs Listener: state [volume aec3c2e8-a7eb-45e9-9509-f63190659ba4, sequence, state CsvFsVolumeStateChangeFromInit-&gt;CsvFsVolumeStateInit, status 0x0] <BR /> <BR /> The next step is to find the node which owns the cluster Physical Disk resource at the moment of the failure and use cluster logs to try to identify why the disk online is taking so much time. <BR /> <H2> CSV State Transition Is Taking Too Long </H2> <BR /> This case is similar to the case above – CSV transitioned the volume to Init state because it timed out waiting for cluster to start next state transition. The reason why this happens may vary. In this case the disk may be staying online and healthy the whole time, but CSVFS on another node might take too long to finish its state transitions. The result will be similar, and events that you find in CSVFS logs and cluster logs will be similar <BR /> <H2> Summary </H2> <BR /> In this blog post we went over common reasons for the event 5142 when the Cluster Service fails to recover CSV, and you see your workloads failing. This blog post explained the sequences of events you will see in this case in the CSVFS operational channel, and how to correlate them with cluster logs. <BR /> <BR /> Thanks! <BR /> Vladimir Petter <BR /> Principal Software Engineer <BR /> High-Availability &amp; Storage <BR /> Microsoft <BR /> <H3> To learn more, here are others in the&nbsp;Cluster Shared Volume (CSV)&nbsp;blog series: </H3> <BR /> Cluster Shared Volume (CSV) Inside Out <BR /> <A href="#" target="_blank"> http://blogs.msdn.com/b/clustering/archive/2013/12/02/10473247.aspx </A> <BR /> <BR /> Cluster Shared Volume Diagnostics <BR /> <A href="#" target="_blank"> http://blogs.msdn.com/b/clustering/archive/2014/03/13/10507826.aspx </A> <BR /> <BR /> Cluster Shared Volume Performance Counters <BR /> <A href="#" target="_blank"> http://blogs.msdn.com/b/clustering/archive/2014/06/05/10531462.aspx </A> <BR /> <BR /> Cluster Shared Volume Failure Handling <BR /> <A href="#" target="_blank"> http://blogs.msdn.com/b/clustering/archive/2014/10/27/10567706.aspx </A> <BR /> <BR /> Troubleshooting Cluster Shared Volume Auto-Pauses – Event 5120 <BR /> <A href="#" target="_blank"> http://blogs.msdn.com/b/clustering/archive/2014/12/08/10579131.aspx </A> </BODY></HTML> Fri, 15 Mar 2019 21:54:54 GMT https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/troubleshooting-cluster-shared-volume-recovery-failure-8211/ba-p/371997 Elden Christensen 2019-03-15T21:54:54Z Troubleshooting Cluster Shared Volume Auto-Pauses – Event 5120 https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/troubleshooting-cluster-shared-volume-auto-pauses-8211-event/ba-p/371994 <HTML> <HEAD></HEAD><BODY> <STRONG> First published on MSDN on Dec 08, 2014 </STRONG> <BR /> In the previous post <A href="#" target="_blank"> http://blogs.msdn.com/b/clustering/archive/2014/10/27/10567706.aspx </A> we have discussed how CSVFS abstracts failures from applications by going through the pause/resume state machine and we have also explained what the auto-pause is. Focus for this blog post will be auto-pauses. <BR /> <BR /> CSV auto pauses when it receives any failure from Direct IO or Block Redirected IO with a few exceptions like STATUS_INVALID_USER_BUFFER, STATUS_CANCELLED, STATUS_DEVICE_DATA_ERROR or STATUS_VOLMGR_PACK_CONFIG_OFFLINE, which indicate either user error or that storage is misconfigured. In both cases there is no value in trying to abstract the failure in CSV because as soon as the IO is retried it will get the same error. <BR /> <BR /> When File System Redirected IO fails (including any metadata IO) then CSV auto pauses only when error is one of the well knows status codes. Here is the list that we have as of Windows Server Technical Preview for vNext: <BR /> <DIV> <BR /> <P> STATUS_BAD_NETWORK_PATH <BR /> STATUS_BAD_NETWORK_NAME <BR /> STATUS_CONNECTION_DISCONNECTED <BR /> STATUS_UNEXPECTED_NETWORK_ERROR <BR /> STATUS_NETWORK_UNREACHABLE <BR /> STATUS_IO_TIMEOUT <BR /> STATUS_CONNECTION_RESET <BR /> STATUS_CONNECTION_ABORTED <BR /> STATUS_NO_SUCH_DEVICE <BR /> STATUS_DEVICE_DOES_NOT_EXIST <BR /> STATUS_VOLUME_DISMOUNTED <BR /> STATUS_NETWORK_NAME_DELETED <BR /> STATUS_VOLMGR_VOLUME_LENGTH_INVALID <BR /> STATUS_CLUSTER_CSV_AUTO_PAUSE_ERROR <BR /> STATUS_LOGON_FAILURE <BR /> STATUS_NETWORK_SESSION_EXPIRED <BR /> STATUS_CLUSTER_CSV_VOLUME_DRAINING <BR /> STATUS_CLUSTER_CSV_VOLUME_DRAINING_SUCCEEDED_DOWNLEVEL <BR /> STATUS_DEVICE_BUSY <BR /> STATUS_DEVICE_NOT_CONNECTED <BR /> STATUS_CLUSTER_CSV_NO_SNAPSHOTS <BR /> STATUS_FT_WRITE_FAILURE <BR /> STATUS_USER_SESSION_DELETED </P> <BR /> <BR /> </DIV> <BR /> This list is based on our experience and many years of testing, and includes status codes that you would see when communication channel fails or when storage stack is failing. Please note that this list evolves and changes as we discover new scenarios that we can help to make more resilient using auto pause. This list contains status codes that indicate communication/authentication/configuration failure, status codes that indicate that NTFS/disk on coordinating node are failing, and few CSV specific status codes. <BR /> <BR /> There are also few cases when CSV might auto-pause itself to handle some inconsistency that it observes in its state or when it cannot get to the desired state without compromising data correctness. An example would be when file is opened from multiple computers, and on write from one cluster node CSV needs to purge cache on another node, and that purge fails because someone has locked pages then we would auto-pause to see if retry would avoid the problem. In these cases you might see auto-pauses with status codes like STATUS_UNSUCCESSFUL or STATUS_PURGE_FAILED or STATUS_CACHE_PAGE_LOCKED. <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90546iC238E07CD3C0E6F6" /> <BR /> <BR /> When CSV conducts an auto pause, an event 5120 is written to the System event log.&nbsp; The description field will contain the specific status code that resulted in the auto pause. <BR /> <DIV> <BR /> <P> Log Name:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; System <BR /> Source:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Microsoft-Windows-FailoverClustering <BR /> Event ID:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 5120 <BR /> Task Category: Cluster Shared Volume <BR /> Level:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Error <BR /> Description: <BR /> Cluster Shared Volume 'Volume1' ('Cluster Disk 1') is no longer available on this node because of 'STATUS_VOLUME_DISMOUNTED(C000026E)'. All I/O will temporarily be queued until a path to the volume is reestablished. </P> <BR /> <BR /> </DIV> <BR /> Additional information is available in the CSV operational log channel Microsoft-Windows-FailoverClustering-CsvFs/Operational.&nbsp; This can be found in Event Viewer under ‘Applications and Services Logs \ Microsoft \ Windows \ FailoverClustering-CsvFs \ Operational’.&nbsp; Here is an Event 9296 logged to that channel: <BR /> <DIV> <BR /> <P> Log Name:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Microsoft-Windows-FailoverClustering-CsvFs/Operational <BR /> Source:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Microsoft-Windows-FailoverClustering-CsvFs-Diagnostic <BR /> Event ID:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 9296 <BR /> Task Category: Volume Autopause <BR /> Level:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Information <BR /> Keywords:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Volume State <BR /> Description: <BR /> Volume {ca4ce06f-6bAE-4405-b328-fd9d123469b3} is autopaused. Status 0xC000026E. Source: Tunneled metadata IO </P> <BR /> <P> Event Xml: <BR /> &lt;Event xmlns="<A href="#" target="_blank">http://schemas.microsoft.com/win/2004/08/events/event</A>"&gt; <BR /> &lt;EventData&gt; <BR /> &lt;Data Name="Volume"&gt;0xffffe000badfb1b0&lt;/Data&gt; <BR /> &lt;Data Name="VolumeId"&gt;{CA4CE06F-6B06-4405-B058-FD9D1CF869B3}&lt;/Data&gt; <BR /> &lt;Data Name="CountersName"&gt;Volume1me3&lt;/Data&gt; <BR /> &lt;Data Name="FromDirectIo"&gt;false&lt;/Data&gt; <BR /> &lt;Data Name="Irp"&gt;0xffffcf800fb72990&lt;/Data&gt; <BR /> &lt;Data Name="Status"&gt;0xc000026e&lt;/Data&gt; <BR /> &lt;Data Name="Source"&gt;11&lt;/Data&gt; <BR /> &lt;Data Name="Parameter1"&gt;0x0&lt;/Data&gt; <BR /> &lt;Data Name="Parameter2"&gt;0x0&lt;/Data&gt; <BR /> &lt;/EventData&gt; <BR /> &lt;/Event&gt; </P> <BR /> <BR /> </DIV> <BR /> In addition to status code the Event 9296 will contain the source of the auto-pause, and in some cases may contain additional parameters helping to further narrow down the scenario. Here is the complete list of sources. <BR /> <DIV> <BR /> <OL> <BR /> <LI> Unknown </LI> <BR /> <LI> Tunneled metadata IO </LI> <BR /> <LI> Apply byte range lock on down-level file system </LI> <BR /> <LI> Remove all byte range locks </LI> <BR /> <LI> Remove byte range lock </LI> <BR /> <LI> Continues availability resume complete </LI> <BR /> <LI> Continues availability resume complete for paging file object </LI> <BR /> <LI> Continues availability set bypass </LI> <BR /> <LI> Continues availability suspend handle on close </LI> <BR /> <LI> Stop buffering on file close </LI> <BR /> <LI> Remove all byte range locks on file close </LI> <BR /> <LI> User requested </LI> <BR /> <LI> Purge on oplock break </LI> <BR /> <LI> Advance VDL on oplock break </LI> <BR /> <LI> Flush on oplock break </LI> <BR /> <LI> Memory allocation to stop buffering </LI> <BR /> <LI> Stopping buffering </LI> <BR /> <LI> Setting maximum oplock level </LI> <BR /> <LI> Oplock break acknowledge to CSV filter </LI> <BR /> <LI> Oplock break acknowledge </LI> <BR /> <LI> Downgrade buffering asynchronous </LI> <BR /> <LI> Oplock upgrade </LI> <BR /> <LI> Query oplock status </LI> <BR /> <LI> Single client notification complete </LI> <BR /> <LI> Single client notification stop oplock </LI> <BR /> </OL> <BR /> </DIV> <BR /> <H2> Auto Pause due to STATUS_IO_TIMEOUT </H2> <BR /> One of common auto-pause reasons is STATUS_IO_TIMEOUT, because of intra-cluster communication over the network.&nbsp; This is happening when SMB client observes that an IO is taking over 1-4 minutes (depending on IO type). If IO times out then SMB client would attempt to fail IOs to another channel in multichannel configuration or if all channels are exhausted then it would fail IO back to the caller. <BR /> <BR /> <EM> You can learn more about SMB multichannel in the following blog posts </EM> <BR /> <P> Configuring IP Addresses and Dependencies for Multi-Subnet Clusters <BR /> <A href="#" target="_blank"> <EM> http://blogs.msdn.com/b/clustering/archive/2011/01/05/10112055.aspx </EM> </A> </P> <BR /> <P> Configuring IP Addresses and Dependencies for Multi-Subnet Clusters - Part II <BR /> <A href="#" target="_blank"> <EM> http://blogs.msdn.com/b/clustering/archive/2011/01/19/10117423.aspx </EM> </A> </P> <BR /> <P> Configuring IP Addresses and Dependencies for Multi-Subnet Clusters - Part III <BR /> <A href="#" target="_blank"> <EM> http://blogs.msdn.com/b/clustering/archive/2011/08/31/10204142.aspx </EM> </A> </P> <BR /> <P> Force network traffic through a specific NIC with SMB multichannel <BR /> <A href="#" target="_blank"> <EM> http://blogs.msdn.com/b/emberger/archive/2014/09/15/force-network-traffic-through-a-specific-nic-with-smb-multichannel.aspx </EM> </A> </P> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90547i9B436D608AAE56B5" /> <BR /> <BR /> On the diagram above you can see two node cluster, Node 2 is coordinator node. Let’s say Application running on Node 1 issued an IO or metadata operation that CSVFS forwarded to NTFS over SMB (follow the red path on the diagram above). Any of the components along the red path (network, file system drivers attached to NTFS, volume and disk drivers, software and hardware on the storage box, firmware on the disk) can take a long time. Once SMB Client sent IO it starts a timer. If IO does not complete in 1-4 minutes then SMB Client will suspect that there might be something wrong with the network. It will disconnect the socket and would retry all IOs using another socket on another channel. If all channels were tried then IO would fail with STATUS_IO_TIMEOUT. In case of CSV there are some internal controls (for example oplock request) that cannot be simply retried on another channel so SMB Client would fail them back to CSVFS, which would trigger an auto pause with STATUS_IO_TIMEOUT. <BR /> <BR /> Please note that CSVFS on the coordinating node would not use SMB to communicate to NTFS so these IOs would not complete with STATUS_IO_TIMEOUT from SMB Client. <BR /> <BR /> The next question is how we can find what operation is taking time, and why? <BR /> <BR /> First please note that auto pause with STATUS_IO_TIMEOUT would be reported on a non-coordinating node (Node 1 on the diagram above) while IO is stuck on the coordinating node (Node 2 on the diagram above). <BR /> <BR /> Second please note that the nature of the issue we are dealing with is a hang, and in this case traces are not particular helpful because in the traces it is hard to tell what activity took time, and where it was stuck. We found two approaches to be helpful when troubleshooting this sorts of issues: <BR /> <OL> <BR /> <LI> Collect a dump file on the coordinating node while hanging IO is in flight. There are number of options how you can create a dump file from most brutal: <BR /> <OL> <BR /> <LI> Bugchecking your machine using sysinternals notmyfault ( <A href="#" target="_blank"> http://technet.microsoft.com/en-us/sysinternals/bb963901 </A> ) </LI> <BR /> <LI> Configuring KD and using sysnternals livekd ( <A href="#" target="_blank"> http://technet.microsoft.com/en-us/sysinternals/bb897415.aspx </A> ) </LI> <BR /> <LI> Windbg. In fact this approach was so productive that starting from Windows Server Technical Preview cluster on observing an auto-pause due to STATUS_IO_TIMEOUT on non-coordinating node would collect kernel live dump on the coordinating node. We can open dump file using windbg ( <A href="#" target="_blank"> http://msdn.microsoft.com/en-us/library/windows/hardware/ff551063(v=vs.85).aspx </A> ) and try to find out what IO is taking long time and why. </LI> <BR /> </OL> <BR /> </LI> <BR /> <LI> On the coordinating node keep running Windows Performance Toolkit ( <A href="#" target="_blank"> http://msdn.microsoft.com/en-us/library/windows/apps/dn391696.aspx </A> ) session with wait analysis enabled ( <A href="#" target="_blank"> https://channel9.msdn.com/Shows/Defrag-Tools/Defrag-Tools-43-WPT-Wait-Analysis </A> ). When non-coordinating node auto pauses with STATUS_IO_TIMEOUT stop WPT session and collect etl file. Open etl using WPA and try to locate the IO that is taking long time, the thread that is executing this IO and what this thread has been blocked on. In some cases it might be helpful to also keep WPT sampling profiler enabled in cases if thread that is handling IO is not stuck forever, but periodically makes some forward progress. </LI> <BR /> </OL> <BR /> The reason for STATUS_IO_TIMEOUT might very from software, configuration, to hardware issue. Always check your system event log for events indicating HBA or disk failures. Make sure you have all the latest updates. <BR /> <P> Recommended hotfixes and updates for Windows Server 2012 R2-based failover clusters <BR /> <A href="#" target="_blank"> http://support.microsoft.com/kb/2920151 </A> </P> <BR /> <P> Recommended hotfixes and updates for Windows Server 2012-based failover clusters <BR /> <A href="#" target="_blank"> http://support.microsoft.com/kb/2784261 </A> </P> <BR /> Make sure your storage and disks have latest firmware supported for your environment. If it is not going away then troubleshoot it using one of the ways described above and analyze the dump or trace. <BR /> <BR /> It is expected that you may at times see Event 5120’s in the System event log, I would suggest not to worry about infrequent 5120’s as long it is happening once is a while (once a month or once a week), if cluster recovers from that, and you do not see workload failures. But I would suggest to monitor them and do some data mining for frequency and type (source and status code) of auto pauses.&nbsp; In some scenarios, an Event 5120 may be expected.&nbsp; This blog is an example of when an Event 5120 is expected during snapshot deletion: <A href="#" target="_blank"> http://blogs.msdn.com/b/clustering/archive/2014/02/26/10503497.aspx </A> <BR /> <BR /> For instance if you see that frequency of auto pauses increased after certain date then check perhaps you have installed or enabled certain feature that was not on or was not used before. <BR /> <BR /> You might be able to correlate auto-pause with some other activity that was happening on one of the cluster nodes around the same time. For example backup or antivirus scan. <BR /> <BR /> Or perhaps you see auto pause is happening only when certain node is coordinating. Then there might be some issue with hardware on that node. <BR /> <BR /> Or perhaps physical disk is going bad causing failure then try to look for storage errors in the system event log and query disk resiliency counters using powershell <BR /> <DIV> <BR /> <P> Get-PhysicalDisk | Get-StorageReliabilityCounter |&nbsp; ft DeviceId,ReadErrorsTotal,ReadLatencyMax,WriteErrorsTotal,WriteLatencyMax -AutoSize </P> <BR /> <BR /> </DIV> <BR /> The list above is not exhaustive, but might give you some idea on how to approach the problem. <BR /> <H2> Summary </H2> <BR /> In this blog post we went over possible causes for the event 5120, what they might mean and how to approach troubleshooting. Windows Server has plenty of tools that would help you with troubleshooting 5120. Keep in mind that 5120 does not mean that your workload failed. Most likely cluster will successfully recover from that failure, and your workload will keep running. If recovery was not successful when you see event 5142, and that will be the subject of the next post. <BR /> <BR /> Thanks! <BR /> Vladimir Petter <BR /> Principal Software Engineer <BR /> Clustering &amp; High-Availability <BR /> Microsoft <BR /> <BR /> <BR /> <H3> To learn more, here are others in the&nbsp;Cluster Shared Volume (CSV)&nbsp;blog series: </H3> <BR /> Cluster Shared Volume (CSV) Inside Out <BR /> <A href="#" target="_blank"> http://blogs.msdn.com/b/clustering/archive/2013/12/02/10473247.aspx </A> <BR /> <BR /> Cluster Shared Volume Diagnostics <BR /> <A href="#" target="_blank"> http://blogs.msdn.com/b/clustering/archive/2014/03/13/10507826.aspx </A> <BR /> <BR /> Cluster Shared Volume Performance Counters <BR /> <A href="#" target="_blank"> http://blogs.msdn.com/b/clustering/archive/2014/06/05/10531462.aspx </A> <BR /> <BR /> Cluster Shared Volume Failure Handling <BR /> <A href="#" target="_blank"> http://blogs.msdn.com/b/clustering/archive/2014/10/27/10567706.aspx </A> </BODY></HTML> Fri, 15 Mar 2019 21:54:27 GMT https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/troubleshooting-cluster-shared-volume-auto-pauses-8211-event/ba-p/371994 Elden Christensen 2019-03-15T21:54:27Z vNext Failover Clustering in Windows Server Technical Preview https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/vnext-failover-clustering-in-windows-server-technical-preview/ba-p/371991 <HTML> <HEAD></HEAD><BODY> <STRONG> First published on MSDN on Nov 12, 2014 </STRONG> <BR /> <P> Interested in the new features coming for Failover Clustering?&nbsp; I recently stopped over at Channel 9 and did an interview where we discussed&nbsp;some of the big clustering and availability features in&nbsp;Windows Server Technical Preview. </P> <BR /> <P> <STRONG> Here's the link: </STRONG> <BR /> <A href="#" target="_blank"> https://channel9.msdn.com/Shows/Edge/Edge-Show-125 </A> </P> <BR /> <P> </P> <BR /> <P> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90545i414B4C181D1FB169" /> </P> <BR /> <P> </P> <BR /> <P> Thanks! <BR /> Elden Christensen <BR /> Principal Program Manager Lead <BR /> Clustering &amp; High-Availability <BR /> Microsoft </P> </BODY></HTML> Fri, 15 Mar 2019 21:54:02 GMT https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/vnext-failover-clustering-in-windows-server-technical-preview/ba-p/371991 Elden Christensen 2019-03-15T21:54:02Z Cluster Shared Volume Failure Handling https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/cluster-shared-volume-failure-handling/ba-p/371989 <HTML> <HEAD></HEAD><BODY> <STRONG> First published on MSDN on Oct 27, 2014 </STRONG> <BR /> This is the fourth blog post in a series about Cluster Shared Volumes (CSV). In this post we will explain how CSV handles storage failures and how it hides failures from applications. This blog will build on prior knowledge with the assumption that the reader is familiar with the previous blog posts: <BR /> <P> <STRONG> Cluster Shared Volume (CSV) Inside Out <BR /> </STRONG> <A href="#" target="_blank"> http://blogs.msdn.com/b/clustering/archive/2013/12/02/10473247.aspx </A> </P> <BR /> Which explains CSV components and different CSV IO modes. <BR /> <P> <STRONG> Cluster Shared Volume Diagnostics </STRONG> <BR /> <A href="#" target="_blank"> <STRONG> http://blogs.msdn.com/b/clustering/archive/2014/03/13/10507826.aspx </STRONG> </A> </P> <BR /> Which explains tools that help you to understand why CSV volume uses one or another mode for IO. <BR /> <P> <STRONG> Cluster Shared Volume Performance Counters </STRONG> <BR /> <A href="#" target="_blank"> http://blogs.msdn.com/b/clustering/archive/2014/06/05/10531462.aspx </A> </P> <BR /> Which is a reference and guide to the CSV related performance counters. <BR /> <H2> Failure Handling </H2> <BR /> CSV is designed to increase availability by abstracting applications and make them resilient to failures of network, storage, and nodes. CSV accomplishes this by virtualizing file opens. When an application opens a file on CSVFS, this open is claimed by CSVFS. CSVFS then in turn opens another handle on NTFS. When handling failures CSVFS can reestablish its file open on NTFS while keeping the virtual handle to the application open on CSVFS valid. To better understand that, let’s look at how a hypothetical failure handling might go. We will do that with help of a diagram where we will remove many components that are part of CSV to keep the picture simple. <BR /> <BR /> Let’s assume that we start in the state where disk is mounted on the Node 2, and there are applications running on both nodes using files on this CSV volume. <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90540i9344FD6535D3DD57" /> <BR /> <BR /> Let’s take a failure scenario where Node 2 loses connectivity to the Disk. <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90541i9FB7E1569ACDB294" /> <BR /> <BR /> For instance this might be caused by HBA on that node going bad or by someone unintentionally misconfiguring the LUN masking while making another change. In this example, there are many different IO’s in flight at the moment of failure. For instance there might be File System or Block Redirected IO from Node 1 or any IO from Node 2 or any metadata IO. Because connectivity to the storage was lost, NTFS will start failing these IOs with status code indicating that device object has been removed. Once CSVFS observes failed IO it will switch volume to the Draining state. <BR /> <BR /> When CSVFS switches itself to the Draining state because it has observed a failure from Disk or SMB, we refer to that as CSVFS “Autopauses”. This indicates that the volume has automatically put itself in a recovery state. For instance when a user invokes an action to move a physical disk resource from one cluster node to another, then CSV volume also will be put in Draining state. But in this case it does it because of an explicit administrative action and the volume is not considered to be in an Autopause in this case. <BR /> <BR /> In the Draining state volume pends all new IOs and any failed IOs. Cluster will first put CSVFS for that volume into a ‘Draining’ state on all the nodes on the cluster. Once state transition to the draining is complete cluster will then tell CSVFS on all the nodes to move to the ‘Paused’ state. During transition to the paused state CSVFS will wait for all ongoing IOs to complete, and once all IO’s have completed so that there are no longer any IO’s is in flight, it will close the underlying file opens to NTFS. Meanwhile cluster will discover that path to the disk is gone and will dismount NTFS on the Node 2 <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90542i446814CA78EFD250" /> <BR /> <BR /> Clustering has a component called the Storage Topology Manager (STM) which will has a view of all the nodes disk connectivity, it will discover that Node 1 can see the disk. Cluster will mount NTFS on the Node 1. <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90543i43D04EE5B070BB44" /> <BR /> <BR /> Once mount is done Cluster will tell CSVFS to transition to the ‘Set Down Level’ state. During that transition CSVFS re-opens files on NTFS. Once all nodes are in Set Down Level state Cluster tells CSVFS on all nodes to go to ‘Active’ state. While transitioning to the active state CSVFS will resume all paused IOs and will stop pausing any new IOs. From this point on CSV has fully recovered from the disk failure and is back to a fully operational state. <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90544iFA89A0EA1A72972B" /> <BR /> <BR /> Applications running on CSVFS would perceive this failure as if IOs for some reason took longer than usual, but they will not observe the failure. <BR /> <BR /> On the nodes where CSVFS observed the failure due to the disk disconnect, and automatically put itself to ‘Draining’ state (a.k.a Autopaused) before Cluster told it to do so you will see System Event log message 5120 which would look like this: <BR /> <DIV> <BR /> <P> Log Name:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; System <BR /> Source:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Microsoft-Windows-FailoverClustering <BR /> Event ID:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 5120 <BR /> Task Category: Cluster Shared Volume <BR /> Level:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Error <BR /> Description: <BR /> Cluster Shared Volume 'Volume1' ('Cluster Disk 1') is no longer available on this node because of 'STATUS_VOLUME_DISMOUNTED(C000026E)'. All I/O will temporarily be queued until a path to the volume is reestablished. </P> <BR /> <BR /> </DIV> <BR /> In case if cluster was not able to recover from the failure and had to take CSVFS down you also will see System Event log message 5142 <BR /> <DIV> <BR /> <P> Log Name:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; System <BR /> Source:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Microsoft-Windows-FailoverClustering <BR /> Event ID:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 5142 <BR /> Task Category: Cluster Shared Volume <BR /> Level:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Error <BR /> Description: <BR /> Cluster Shared Volume 'Volume1' ('Cluster Disk 1') is no longer accessible from this cluster node because of error '(1460)'. Please troubleshoot this node's connectivity to the storage device and network connectivity. </P> <BR /> <BR /> </DIV> <BR /> <H2> Summary </H2> <BR /> CSV is a clustered file system which also helps increase availability by being resilient to underlying failures.&nbsp; In this blog post we went into detail how CSV abstracts storage failures from applications. <BR /> <BR /> Thanks! <BR /> Vladimir Petter <BR /> Principal Software Development Engineer <BR /> Clustering &amp; High-Availability <BR /> Microsoft <BR /> <BR /> <BR /> <H3> To learn more, here are others in the&nbsp;Cluster Shared Volume (CSV)&nbsp;blog series: </H3> <BR /> Cluster Shared Volume (CSV) Inside Out <BR /> <A href="#" target="_blank"> http://blogs.msdn.com/b/clustering/archive/2013/12/02/10473247.aspx </A> <BR /> <BR /> Cluster Shared Volume Diagnostics <BR /> <A href="#" target="_blank"> http://blogs.msdn.com/b/clustering/archive/2014/03/13/10507826.aspx </A> <BR /> <BR /> Cluster Shared Volume Performance Counters <BR /> <A href="#" target="_blank"> http://blogs.msdn.com/b/clustering/archive/2014/06/05/10531462.aspx </A> <BR /> <BR /> Cluster Shared Volume Failure Handling <BR /> <A href="#" target="_blank"> http://blogs.msdn.com/b/clustering/archive/2014/10/27/10567706.aspx </A> <BR /> <BR /> </BODY></HTML> Fri, 15 Mar 2019 21:53:46 GMT https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/cluster-shared-volume-failure-handling/ba-p/371989 Elden Christensen 2019-03-15T21:53:46Z Symantec ApplicationHA for Hyper-V https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/symantec-applicationha-for-hyper-v/ba-p/371983 <HTML> <HEAD></HEAD><BODY> <STRONG> First published on MSDN on Aug 28, 2014 </STRONG> <BR /> <P> In previous blogs I had discussed the Failover Clustering VM Monitoring feature adding in Windows Server 2012. You can find these blogs <A href="#" target="_blank"> here </A> and <A href="#" target="_blank"> here </A> . Below is a guest blog describing Symantec ApplicationHA for Hyper-V which leverages the functionality provided by VM Monitoring. </P> <BR /> <P> </P> <BR /> <P> Hi my name is Lorenzo Galelli, Senior Technical Product Manager at Symantec Corporation, here to talk about some exciting technologies from Symantec that focuses on Hyper-V and Failover Clustering. Before I jump into those new technologies I just wanted to say thanks to the Clustering and High Availability Windows team for the invite to write on their blog, thanks guys! </P> <BR /> <P> So let’s talks about the new and exciting technologies that I am sure will make your job easier when deploying applications within Hyper-V. ApplicationHA is the first technology that I would like to showcase, its actually been around for a couple of years providing application uptime on other hypervisors and with the release of our new version we have added support for Hyper-V running within Windows 2012 and Windows 2012 R2. So what’s so great about ApplicationHA I hear you ask, well a number of things. First, it monitors your applications running within the virtual machines and automatically remediates any faults that occur by trying to restart those. Second, it integrates with Failover Clustering specifically the heartbeat service and leverages a common set of APIs that we can hook into to assist remediation tasks. Also we removed a lot of the headaches revolving around configuration of availability where ApplicationHA will auto discover the majority of the application configuration so all that the admin needs to do is to decide what needs monitoring and with a couple clicks through the configuration wizard your all set. We also provide management and operations through a web interface and plan to have SCVMM extensibility in the coming release. So if you’re virtualizing SharePoint, Exchange, SQL, IIS, SAP or Oracle then we have a wizard to support that app along with many others as well as support for custom or in house applications. </P> <BR /> <P> Below is a diagram that explains how ApplicationHA for Hyper-V leverages the Microsoft Failover Cluster Heartbeat service which Microsoft added to Windows 2012, ApplicationHA leverages this heartbeat function to communicate with Failover Cluster that a heartbeat fault has occurred if ApplicationHA is unable to restart the application within the virtual machine, ApplicationHA will attempt to remediate the fault a number of times before it communicates with the heartbeat service. <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90539i026DEB2F5D3DE4E3" /> <BR /> </P> <BR /> <OL> <BR /> <LI> Microsoft Failover Cluster detects issues with virtual machines if faults occur and moves the effected VM. </LI> <BR /> <LI> ApplicationHA detects issues with the application under control and attempts to restart the faulted application. </LI> <BR /> <LI> In the event that ApplicationHA is unable to start the application it instructs a heartbeat fault with Failover Cluster. </LI> <BR /> <LI> Failover Cluster reboots the VM or moves the VM to another host if the application still has issues starting. </LI> <BR /> </OL> <BR /> <P> For more information on ApplicationHA 6.1 for Hyper-V be sure to check out the new whitepaper which describes in detail how ApplicationHA works with Failover Cluster. <A href="#" target="_blank"> http://www.symantec.com/connect/sites/default/files/White_Paper_Confidently_Virtualize_Business-critical_Applications_in_Microsoft_Hyper-V_with_Symantec_ApplicationHA.pdf </A> </P> <BR /> <P> For more information on Symantec ApplicationHA be sure to check out the Symantec ApplicationHA website <A href="#" target="_blank"> http://www.symantec.com/application-ha </A> </P> <BR /> <P> Next up is Virtual Business Service which is an application availability multi-tier orchestration tool which provides the ability to link applications together and control those as a single entity. Applications can be hosted on physical as well as virtual machines and as long as the application is using Symantec availability solution like ApplicationHA or Microsoft Failover Clustering then you’re good to go. </P> <BR /> <P> If you want to review this capability in more detail I have posted a number of videos on Symantec User Group Forum, Symantec Connect which walks through the installation and configuration from start to finish. </P> <BR /> <P> <A href="#" target="_blank"> http://www.symantec.com/connect/videos/applicationha-61-hyper-v-install-configure-and-manage-part-1 </A> </P> <BR /> <P> <A href="#" target="_blank"> http://www.symantec.com/connect/videos/applicationha-61-hyper-v-install-configure-and-manage-part-2 </A> </P> <BR /> <P> <A href="#" target="_blank"> http://www.symantec.com/connect/videos/applicationha-61-hyper-v-install-configure-and-manage-part-3 </A> </P> <BR /> <P> <A href="#" target="_blank"> http://www.symantec.com/connect/videos/applicationha-61-hyper-v-install-configure-and-manage-part-4 </A> </P> </BODY></HTML> Fri, 15 Mar 2019 21:52:55 GMT https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/symantec-applicationha-for-hyper-v/ba-p/371983 John Marlin 2019-03-15T21:52:55Z Planning Failover Cluster Node Sizing https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/planning-failover-cluster-node-sizing/ba-p/371981 <P>In this blog I will discuss considerations on planning the number of nodes in a Windows Server Failover Cluster. <BR /><BR />Starting with Windows Server 2012, Failover Clustering can support to up 64-nodes in a single cluster making it industry leading in scale for a private cloud.&nbsp; While this is exciting, the reality is that it is probably bigger than the average person cares to do.&nbsp; There is also no limitation on cluster sizes in different versions of Windows Server (Standard vs. Datacenter, etc…).&nbsp; Since there is no practical limitation on scale for the average IT admin, then how many nodes should you deploy with your cluster?&nbsp; The primary consideration comes down to defining a fault domain.&nbsp; Let’s discuss the considerations…</P> <P>&nbsp;</P> <H2>Resiliency to Hardware Faults</H2> <P>When thinking about fault domains, hardware resiliency is one of the biggest considerations.&nbsp; Be it chassis, rack, or datacenter.&nbsp; Let’s start with blades as an example; you probably don’t want a chassis to be a single point of failure.&nbsp; To mitigate a chassis failure you probably want to span across multiple chassis.&nbsp; If you have eight blades per chassis, it would be desired for your nodes to reside across two different chassis for resiliency, so you create a 16-node cluster with eight nodes in each chassis.&nbsp; Or maybe you want to have rack resiliency, in that case create a cluster out of nodes that span multiple racks.&nbsp; The number of nodes in the cluster will be influenced by how many servers you have in the rack.&nbsp; If you want your cluster to achieve disaster recovery in addition to high availability, you will have nodes in the cluster which will span across datacenters.&nbsp; Defining fault domains can protect you from hardware class failures.</P> <P>&nbsp;</P> <H2>Multi-site Clusters</H2> <P>To expand upon the previous topic a little… when thinking about disaster recovery scenarios and having a Failover Cluster that can achieve not only high availability, but also disaster recovery you may span clusters across physical locations.&nbsp; Generally speaking, local failover is less expensive than site failover.&nbsp; Meaning that on site failover data replication needs to flip, IP’s may switch to different subnets, and failover times may be longer.&nbsp; In fact, switching over to another site may require IT leadership approval.&nbsp; When deploying a multi-site cluster it is recommended to scale up the number of nodes so that there are 2+ nodes in each site.&nbsp; The goal is that when there is a server failure, there is fast failover to a site local node.&nbsp; Then when there is a catastrophic site failure, services failover to the disaster recovery site.&nbsp; Defining multiple nodes per fault domain can give you better service level agreements.</P> <P>&nbsp;</P> <H2>All your Eggs in One Basket</H2> <P>There is no technical limitations which makes one size cluster better than another.&nbsp; While we hope that there is never a massive failure which results in an entire cluster to go down, some might point out that they have seen it happen… so there’s a matter of how many eggs do you want in one basket?&nbsp; By breaking up your clusters you can have multiple fault domains, where in the event of losing an entire cluster it also mitigates impact.&nbsp; So let’s say you have 1,000 VMs… if you have a single 32-node cluster, and the entire cluster goes down, that means all 1,000 VMs go down.&nbsp; Where if you had them broken into two 16-node clusters, then only 500 VMs (half) go down.&nbsp; Defining fault domains can protect you from cluster class failures.</P> <P>&nbsp;</P> <H2>Flexibility with a Larger Pool of Nodes</H2> <P>System Center Virtual Machine Manager has a feature called Dynamic Optimization which analyzes the load across the nodes and moves VMs around to load balance the cluster. &nbsp;The larger the cluster, the more nodes Dynamic Optimization has to work with and the better balancing it can achieve.&nbsp; So while creating multiple smaller clusters may divide up multiple fault domains, creating too small of clusters can increase the management and keep them from being utilized optimally.&nbsp; Defining a larger cluster creates finer granularity to spread and move across.</P> <P>&nbsp;</P> <H2>Greater Resiliency to Failures</H2> <P>The more nodes you have in your cluster, the less impactful losing each node becomes.&nbsp; So let’s say you create a bunch of little 2-node clusters, if you were to lose 2 nodes… all the VMs go down.&nbsp; Where if you had a 4-node cluster, you can lose 2 servers and the cluster stays up and keeps running.&nbsp; Again, this ties back more to the hardware fault domains discussion. <BR /><BR />Another aspect of this is that when a node fails, the more surviving nodes you have to distribute the load across.&nbsp; So let’s say you have 2-nodes… if you lose 1 node, the surviving node is now running at 200% capacity (running everything it was before, and all the failed nodes as well).&nbsp; If you scale the number of nodes, the VMs can be spread out across more hosts, and the loss of an individual node is less impactful.&nbsp; If you have a 3-node cluster and lose a node, each node is operating at 150% capacity. <BR /><BR />Another way to think of it, is how much stress do you want to put on yourself?&nbsp; If you have a 2-node cluster, and you lose a node… you will probably have a fire drill to urgently get that node fixed.&nbsp; Where if you lose a node in a 32-node cluster… you might be ok finishing your round of golf before you worry about it.&nbsp; Increasing scale can protect you from larger numbers of failures and makes an individual failure less impactful.</P> <P>&nbsp;</P> <H2>Diagnosability</H2> <P>Troubleshooting a large cluster may at times be more difficult than smaller clusters.&nbsp; Say for example you have a problem on your 64-node cluster, that may involve pulling and correlating logs across all 64 servers.&nbsp; This can be more complex and cumbersome to deal with the large number of logs.&nbsp; Another example is that the cluster Validation tool is a functional test tool, and will also take longer on larger clusters… when things go wrong and you want to check your cluster.&nbsp; Some IT admin’s prefer smaller fault domains when troubleshooting problems.</P> <P>&nbsp;</P> <H2>Workload</H2> <P>You also scale different types of clusters differently based on the workload they are running:&nbsp;</P> <UL> <UL> <LI><STRONG> Hyper-V: </STRONG> You want your private cloud to be one fluid system where VMs are dynamically moving around and adjusting.&nbsp; Tools like SCVMM Dynamic Optimization really start to shine with larger clusters to monitor the load of the nodes and seamlessly move VMs around to optimize and load balance the cluster.&nbsp; Hyper-V clusters are usually the biggest and may have 16, 24, 32 or higher.</LI> </UL> </UL> <UL> <UL> <LI><STRONG> Scale-out File Server: </STRONG> File based storage for your applications with a SoFS should usually be 2 – 4 nodes.&nbsp; For example, internal Microsoft SoFS clusters are deployed with 4-nodes.</LI> </UL> </UL> <UL> <UL> <LI><STRONG> Traditional File Server: </STRONG> Traditional information worker File Clusters tend to also be smaller, again in the 2 – 4 node range.</LI> </UL> </UL> <UL> <UL> <LI><STRONG> SQL Server: </STRONG> Most SQL Server clusters deployed with a failover cluster instance (FCI) are 2-nodes, but that has more to do with SQL Server licensing and the ability to create a 2-node FCI with SQL Server standard edition.&nbsp; The other consideration is that each SQL instance requires a drive letter.&nbsp; So that’s a maximum of let’s say 24 instances.&nbsp; This is addressed with SQL Server 2014 support for Cluster Shared Volumes…&nbsp;&nbsp; but generally speaking, it doesn’t make much sense to deploy a 32-node SQL Server cluster. &nbsp;So think smaller…&nbsp; 2, 4, or maybe up to 8.&nbsp; A SQL cluster with an Availability Group (AG) is usually a multi-site cluster and will have more nodes than an FCI.</LI> </UL> </UL> <H2>Conclusion</H2> <P>There’s no right or wrong answer here on how many nodes to have in your cluster, many other vendors have strong recommendations to work around limitations they may have… but those don’t apply to Windows Server Failover Clustering.&nbsp; It’s more about thinking about your fault domains, and personal preference for a large part.&nbsp; Big clusters are cool and come with serious bragging rights, but have some considerations…&nbsp;&nbsp; little clusters seem simple, but don’t really shine as well as they should…&nbsp;&nbsp; you will likely find the right fit for you somewhere in-between. <BR /><BR />Thanks! <BR />Elden Christensen <BR />Principal PM&nbsp;Manager <BR />Clustering &amp; High-Availability <BR />Microsoft</P> Thu, 08 Aug 2019 16:16:35 GMT https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/planning-failover-cluster-node-sizing/ba-p/371981 Elden Christensen 2019-08-08T16:16:35Z Cluster Shared Volume Performance Counters https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/cluster-shared-volume-performance-counters/ba-p/371980 <P><STRONG> First published on MSDN on Jun 05, 2014 </STRONG> <BR />This is the third blog post in a series about Cluster Shared Volumes (CSV). In this post we will go over performance monitoring. We assume that reader is familiar with the previous blog posts. &nbsp;Blog post <A href="#" target="_blank" rel="noopener"> http://blogs.msdn.com/b/clustering/archive/2013/12/02/10473247.aspx </A> explains CSV components and different CSV IO modes. Second blog post <STRONG> <A href="#" target="_blank" rel="noopener"> http://blogs.msdn.com/b/clustering/archive/2014/03/13/10507826.aspx </A> </STRONG> explains tools that help you to understand why CSV volume uses one or another mode for IO. </P> <H2>Performance Counters</H2> <P><BR />Now let’s look at the various performance counters which you can leverage to monitor what is happening with a CSV volume. </P> <H2>Physical Disks Performance Counters</H2> <P><BR />These performance counters are not CSV specific. You can find “Physical Disk” performance counters on every node where disk is physically connected. <BR /><BR />There are large number of good articles that describe how to use Physical Disk performance counters (for instance here <A href="#" target="_blank" rel="noopener"> http://blogs.technet.com/b/askcore/archive/2012/03/16/windows-performance-monitor-disk-counters-explained.aspx </A> ) so I am not going to spend much time explaining them. The most important consideration when looking at counters on a CSV is to keep in mind that values of the counters is not aggregated across nodes. For instance if one node tells you that “Avg. Disk Queue Length” is 10, and another tells you 5 then actual queue length on the disk is about 15. <BR /><BR /><span class="lia-inline-image-display-wrapper lia-image-align-inline" style="width: 716px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90523i92E43ABDA4D8D3FF/image-dimensions/716x665?v=v2" width="716" height="665" role="button" /></span> <BR /><BR /><BR /></P> <H2>SMB Client and Server Performance Counters</H2> <P><BR />CSV uses SMB to redirect traffic using File System Redirected IO or Block Level Redirected IO to the Coordinating node. Consequently SMB performance counters might be a valuable source of insight. <BR /><BR />On the non-coordinating node you would want to use SMB Client Shares performance counters. Following blog post explains how to read these counters <A href="#" target="_blank" rel="noopener"> http://blogs.technet.com/b/josebda/archive/2012/11/19/windows-server-2012-file-server-tip-new-per-share-smb-client-performance-counters-provide-great-insight.aspx </A> . <BR /><BR /><span class="lia-inline-image-display-wrapper lia-image-align-inline" style="width: 685px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90524i89BC821E5D4BFA69/image-dimensions/685x829?v=v2" width="685" height="829" role="button" /></span> <BR /><BR /><BR /><BR />On the coordinating node you can use SMB Server Shares performance counters that work in the similar way, and would allow you to monitor all the traffic that comes to the Coordinating node on the given share. <BR /><BR /><span class="lia-inline-image-display-wrapper lia-image-align-inline" style="width: 725px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90525i43D15D3F447368E5/image-dimensions/725x700?v=v2" width="725" height="700" role="button" /></span> <BR /><BR />To map the CSV volume to the hidden SMB share that CSV uses to redirect traffic you can run following command to find CSV volume ID: </P> <DIV><BR /><BR />PS C:\Windows\system32&gt; Get-ClusterSharedVolume | fl * <BR /><BR />Name&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; : Cluster Disk 1 <BR />State&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; : Online <BR />OwnerNode&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; : clus01 <BR />SharedVolumeInfo : {C:\ClusterStorage\Volume1} <BR />Id&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; : 6861be1f-bf50-4bdb-941d-0a2dd2a46711 <BR /><BR /></DIV> <P><BR /><BR /><BR />CSV volume ID is also used as the SMB share name. As we’ve discussed in the previous post to get list of CSV hidden shares you can use Get-SmbShare. Starting from Windows Server 2012 R2 you also need to add -SmbInstance CSV parameter to that cmdlet to see CSV internal hidden shares. Here is an example: </P> <DIV><BR /><BR />PS C:\Windows\system32&gt; Get-SmbShare -IncludeHidden -SmbInstance CSV <BR /><BR />Name&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ScopeName&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Path&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Description <BR />----&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ---------&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;----&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ----------- <BR />6861be1f-bf50-4bdb-941d-0a... * <A href="https://gorovian.000webhostapp.com/?exam=\\?\GLOBALROOT\Device\Hard" target="_blank" rel="noopener"> \\?\GLOBALROOT\Device\Hard </A> ... <BR />CSV$&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; *&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;C:\ClusterStorage <BR /><BR /></DIV> <P><BR /><BR /><BR />All File System Redirected IO will be sent to the share named 6861be1f-bf50-4bdb-941d-0a2dd2a46711 on the Coordinating node. All the Block Level Redirected IO will be sent to the share CSV$ on the Coordinating node. <BR /><BR />In case if you are using RDMA you can use SMB Direct Connection performance counters. For instance if you are wondering if RDMA is used you can simply look at these performance counters on client and on the server. <BR /><BR />If you are using Scale Out File Server then SMB performance counters also will be helpful to monitor IO that comes from the clients to the SOFS and CSVFS. </P> <H2>Cluster CSV File System Performance Counters</H2> <P><BR />CSVFS provides a large number of performance counters. Logically we can split these counters into 4 categories <BR /><BR /></P> <UL> <UL> <LI><STRONG> Redirected: </STRONG> All counters that start with prefix “Redirected” help you to monitor if IO is forwarded using File System Redirected IO and its performance. Please note that these counters do NOT include the IO that is forwarded using Block Redirected IO. These counters are based on measuring time from when IO was sent to SMB (if we are on non-Coordinating node) or to NTFS (if we are on Coordinating node) until this component completed the IO. It does not include the time this IO spent inside CSVFS. If we are on non-Coordinating node then the values you observe through these counters should be very close to the corresponding values you would see using SMB Client Share performance counters.</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI><STRONG> IO: </STRONG> All counters that start with the prefix “IO” help you to monitor if IO is forwarded using Direct IO or Block Level Redirected IO and its performance. These counters are based on measuring time from when IO was sent to CSV Volume Manager until this component completed the IO. It does not include the time this IO spent inside CSVFS. If CSV Volume Manger does not forward any IO using Block Level Redirected IO path, but all the IO are dispatched using Direct IO then the values you will observe using these counters will be very close to what you would see using corresponding Physical Disk performance counters on this node.</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI><STRONG> Volume: </STRONG> All counters that start with prefix “Volume” help you to monitor current CSVFS state and its history.</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI><STRONG> Latency: </STRONG> All other counters help you to monitor how long IO took in CSVFS. This time includes how long the IO spend inside CSVFS waiting for its turn, as well as how long CSVFS was waiting for its completion from the underlying components. If IO is pause/resumed during failure handling then this time is also included.</LI> </UL> </UL> <P><BR /><BR />This diagram demonstrates what is measured by the CSVFS performance counters on non-Coordinating node. <BR /><BR /><span class="lia-inline-image-display-wrapper lia-image-align-inline" style="width: 575px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90526iDDA4B6DAFC4A4375/image-dimensions/575x603?v=v2" width="575" height="603" role="button" /></span> <BR /><BR />As we've discussed before on non-Coordinating node File System Redirected IO and Block Redirected IO goes to SMB client. On Coordinating node you will see a similar picture, except that File System Redirected IO will be sent directly to NTFS, and we would never use Block Level Redirected IO. <BR /><BR /><span class="lia-inline-image-display-wrapper lia-image-align-inline" style="width: 703px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90527iB4AF6ABDF7462E36/image-dimensions/703x766?v=v2" width="703" height="766" role="button" /></span> <BR /><BR /><BR /></P> <H2>Counters Reference</H2> <P><BR />In this section we will step through each counter and go into detail on specific counters.&nbsp; If you find this section too tedious to read then do not worry. You can skip over it and go directly to the scenario. This chapter will work for you as a reference. <BR /><BR />Now let’s go through the performance counters in each of these groups starting from the counters with prefix IO. I want to remind you again that this group of counters tells you only about IOs that are sent to the CSV Volume Manager, and does NOT tell you how long IO spent inside CSVFS. It only measures how long it took for CSV Volume Manager and all components below it to complete the IO. For a disk it is unusual to see that some IO go Direct IO while other go Block Redirected IO. CSV Volume Manager always prefers Direct IO, and uses Block Redirected IO only if disk is not connected or if disk completes IO with an error. Normally either all IO are sent using Direct IO or Block Redirected IO. If you see a mix that might mean something wrong with path to the disk from this node. </P> <H2>IO</H2> <P><BR /><BR /></P> <UL> <UL> <LI>IO Write Avg. Queue Length</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>IO Read Avg. Queue Length</LI> </UL> </UL> <P><BR /><BR />These two counters tell how many outstanding reads and writes we have on average per second. If we assume that all IOs are dispatched using Direct IO then value of this counter will be approximately equal to PhysicalDisk\Avg.Disk Write Queue Length and PhysicalDisk\Avg.Disk Read Queue Length accordingly. If IO is sent using Block Level Redirected IO then this counter will be reflecting SMB latency, which you can monitor using SMB Client Shares\Avg. Write Queue Length and SMB Client Shares\Avg. Read Queue Length. <BR /><BR /></P> <UL> <UL> <LI>IO Write Queue Length</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>IO Read Queue Length</LI> </UL> </UL> <P><BR /><BR />These two counters tell how many outstanding reads and writes we have at the moment of sample. A reader might wonder why do we need average queue length as well as current queue length and when does he need to look at one versus the other. The only shortcoming with average counter is that it is updated on IO completion. Let’s assume you are using perfmon that by default samples performance counters every second. If you have an IO that is taking 1 minute then for 1 minute average queue length will be 0, and once IO completes for a second it will become 60, while queue length tells you the length of IO queue at the moment of the sample so it will be telling you 1 for all 60 samples during the minute this IO is in progress. On the other hand if IO are completing very fast (microseconds) then there is a high chance that at the moment of the sample IO queue length will be 0 because we just happened to sample at the time when there was no IOs. In that case average queue length would be much more meaningful. Reads and writes are usually completing in microseconds or milliseconds so in majority of cases you want to look at the average queue length. <BR /><BR /></P> <UL> <UL> <LI>IO Writes/sec</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>IO Reads/sec</LI> </UL> </UL> <P><BR /><BR />Tells you how many read/write operations on average have completed in the past second. When all IO are sent using Direct IO then this value should be very close to PhysicalDisk\Disk Reads/sec and PhysicalDisk\Disk Writes/sec accordingly. If IO is sent using Block Level Redirected IO then counters value should be close to Smb Client Share\Write Requests/sec and Smb Client Share\ Read Requests/sec. <BR /><BR /></P> <UL> <UL> <LI>IO Writes</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>IO Reads</LI> </UL> </UL> <P><BR /><BR />Tells you how many read/write operations have completed since the volume was mounted. Keep in mind that values of these counters will be reset every time file system dismounts and mounts again for instance when you offline/online corresponding cluster disk resource. <BR /><BR /></P> <UL> <UL> <LI>IO Write Latency</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>IO Read Latency</LI> </UL> </UL> <P><BR /><BR />Tells you how many seconds read/write operations take on average.&nbsp; If you see a value 0.003 that means IO takes 3 milliseconds. When all IO are sent using Direct IO then this value should be very close to PhysicalDisk\Avg.Disk sec/Read and PhysicalDisk\Avg.Disk sec/Write accordingly. If IO are sent using Block Level Redirected IO then counters value should be close to Smb Client Share\Avg.sec/Write and Smb Client Share\Avg.sec/Read. <BR /><BR /></P> <UL> <UL> <LI>IO Read Bytes/sec</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>IO Read Bytes</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>IO Write Bytes/sec</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>IO Write Bytes</LI> </UL> </UL> <P><BR /><BR />These counters are similar to the counters above, except that instead of telling you # of read and write operations (a.k.a IOPS) they are telling throughput measured in bytes. <BR /><BR /></P> <UL> <UL> <LI>IO Split Reads/sec</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>IO Split Reads</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>IO Split Writes/sec</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>IO Split Writes</LI> </UL> </UL> <P><BR /><BR />These counters help you to monitor how often CSVFS needs to split IO to multiple IOs due to the disk fragmentation, when contiguous file offsets map to disjoined blocks on the disk. You can reduce fragmentation by running defrag. Please remember that to run defrag on CSVFS you need to put it to the File System Redirected mode so CSVFS would not disable block moves on NTFS. There is no straight answer to the question if a particular value of the counter is bad. Remember that this counter does not tell you how much fragmentation is on the disk. It only tells you how much fragmentation is being hit by ongoing IOs. For instance you might have all IOs going to couple locations on the disk that happened to be fragmented, and the rest of the volume is not fragmented. Should you worry then? It depends … if you are using SSDs then it might not matter, but if you are using HDDs then running defrag might improve throughput if it will make IO more sequential. Another common reason to run defrag is to consolidate free space so it can be trimmed. This is particularly important with SSDs or thinly provisioned disks. CSVFS IO Split performance counters would not help with monitoring free space fragmentation. <BR /><BR /></P> <UL> <UL> <LI>IO Single Reads/sec</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>IO Single Reads</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>IO Single Writes/sec</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>IO Single Writes</LI> </UL> </UL> <P><BR /><BR />Last set of counters in that group tells you how many IO were dispatched without need to be split. It is a complement of the corresponding “IO Split” counters, and is not that interesting for performance monitoring. </P> <H2>Redirected</H2> <P><BR />Next we will go through the performance counters with prefix Redirected. I want to remind you that this group of counters tells you only about IOs that are sent to NTFS directly (on Coordinating node) or over SMB (from a non-Coordinating node), and does NOT tell you how long IO spent inside CSVFS, but only measures how long it took for SMB/NTFS and all components below it to complete the IO. <BR /><BR /><BR /><BR /></P> <UL> <UL> <LI>Redirected Writes Avg. Queue Length</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>Redirected Reads Avg. Queue Length</LI> </UL> </UL> <P><BR /><BR />These two counters tell how many outstanding reads and writes do we have on average per second. This counter will be reflecting SMB latency, which you can monitor using SMB Client Shares\Avg. Write Queue Length and SMB Client Shares\Avg. Read Queue Length. <BR /><BR /></P> <UL> <UL> <LI>Redirected Write Queue Length</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>Redirected Read Queue Length</LI> </UL> </UL> <P><BR /><BR />These two counters tell how many outstanding reads and writes we have at the moment of sample. Please read comments for the IO Write Queue Length and IO Read Queue Length counters if you are wondering when you should look at average queue length versus the current queue length. <BR /><BR /></P> <UL> <UL> <LI>Redirected Write Latency</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>Redirected Read Latency</LI> </UL> </UL> <P><BR /><BR />Tells you how many milliseconds read/write operations take on average.&nbsp; Counters value should be close to Smb Client Share\Avg.sec/Write and Smb Client Share\Avg.sec/Read. <BR /><BR /></P> <UL> <UL> <LI>Redirected Read Bytes/sec</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>Redirected Read Bytes</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>Redirected Reads/sec</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>Redirected Reads</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>Redirected Write Bytes/sec</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>Redirected Write Bytes</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>Redirected Writes/sec</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>Redirected Writes</LI> </UL> </UL> <P><BR /><BR />These counters will help you to monitor IOPS and throughput, and do not require much explaining. Note that when CSVFS sends IO using FS Redirected IO it will never split IO on a fragmented files because it forwards IO to NTFS, and NTFS will perform translation of file offsets to the volume offsets and will split IO into multiple if required due to file fragmentation. </P> <H2>Volume</H2> <P><BR />Next we will go through the performance counters with prefix Volume. For all the counters in this group please keep in mind that values start fresh from 0 every time CSVFS mounts. For instance offlining and onlining back corresponding cluster physical disk resource will reset the counters. <BR /><BR /></P> <UL> <UL> <LI>Volume State <BR /><BR /> <UL> <UL> <LI>Tells current CSVFS volume state. Volume might be in one of the following states. <BR /><BR /> <UL> <UL> <LI>0 - Init state. In that state all files are invalidated and all IOs except volume IOs are failing.</LI> </UL> </UL> <BR /> <UL> <UL> <LI>1 - Paused state. In this state volume will pause any new IO and down-level state is cleaned.</LI> </UL> </UL> <BR /> <UL> <UL> <LI>2 - Draining state. In this state volume will pause any new IO, but down-level files are still opened and some down-level IOs might be still in process.</LI> </UL> </UL> <BR /> <UL> <UL> <LI>3 - Set Down Level state. In this state volume will pause any new IO. The down-level state is already reapplied.</LI> </UL> </UL> <BR /> <UL> <UL> <LI>4 - Active state. In this state all IO are proceeding as normal.</LI> </UL> </UL> <BR /><BR /></LI> </UL> </UL> <BR /> <UL> <UL> <LI>Down-level in the state descriptions above refer to the state that CSVFS has on the NTFS. Example of that state would be files opened by CSVFS on NTFS, byte range locks, file delete disposition, oplock states etc.</LI> </UL> </UL> <BR /><BR /></LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>Volume Pause Count – Total <BR /><BR /> <UL> <UL> <LI>Number of times volume was paused. This includes the number of times volume was paused because user told cluster to move corresponding physical disk resource from one cluster node to another. Or when customer turns volume redirection on and off.</LI> </UL> </UL> <BR /><BR /></LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>Volume Pause Count - Other</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>Volume Pause Count - Network</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>Volume Pause Count – Disk <BR /><BR /> <UL> <UL> <LI>Number of time volume this node experienced network, disk or some other failure that caused CSVFS to pause all IO on the volume and go through the recovery circle.</LI> </UL> </UL> <BR /><BR /></LI> </UL> </UL> <P><BR /><BR /></P> <H2>Latency</H2> <P><BR />And here comes the last set performance counters from CSVFS group. Counters in this group do not have a designated prefix. These counters are measuring IO at the time when it arrives to the CSVFS, and include all the time IO spent at any layers inside or below CSVFS. <BR /><BR /></P> <UL> <UL> <LI>Write Queue Length</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>Read Queue Length</LI> </UL> </UL> <P><BR /><BR />These two counters tell how many outstanding reads and writes we have at the moment of sample. <BR /><BR /></P> <UL> <UL> <LI>Write Latency</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>Read Latency</LI> </UL> </UL> <P><BR /><BR />These counters tell you how much time on average passes since IO has arrived to CSVFS before CSVFS completes this IO. It includes the time IO spends at any layer below CSVFS. Consequently it includes IO Write Latency, IO Read Latency, Redirected Write Latency and Redirected Read Latency depending on the type of IO and how IO was dispatched by the CSVFS. <BR /><BR /></P> <UL> <UL> <LI>Writes/sec</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>Reads/sec</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>Writes</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>Reads</LI> </UL> </UL> <P><BR /><BR />These counters will help you monitor IOPS and throughput, and hopefully do not require much explaining. <BR /><BR /></P> <UL> <UL> <LI>Flushes/sec</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>Flushes</LI> </UL> </UL> <P><BR /><BR />These counters tell you how many flushes come to CSVFS on all the file objects that are opened <BR /><BR /></P> <UL> <UL> <LI>Files Opened</LI> </UL> </UL> <P><BR /><BR />Tells how many files are currently opened on this CSV volume <BR /><BR /></P> <UL> <UL> <LI>Files Invalidated - Other</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>Files Invalidated - During Resume</LI> </UL> </UL> <P><BR /><BR />CSVFS provides fault tolerance and attempts to hide various failures from application, but in some cases it might need to indicate that recovery was not successful. It does that by invalidating application’s file open and by failing all IOs issued on this open. These two counters allow you to see how many files opens were invalidated. Please note that invalidating open does not do anything bad to the file on the disk. It simply means that application will see IO failure and will need to reopen the file and reissue these IOs. <BR /><BR /></P> <UL> <UL> <LI>Create File/sec</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>Create File</LI> </UL> </UL> <P><BR /><BR />Allows you to monitor how many file opens are happening on the volume. <BR /><BR /></P> <UL> <UL> <LI>Metadata IO/sec</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>Metadata IO</LI> </UL> </UL> <P><BR /><BR />This is catch all for all other operations that are not covered by any counters above. This counters will be incremented when you query or set file information or issue an FSCTL on a file. </P> <H2>Performance Counter Relationships</H2> <P><BR />To better understand relationship between different groups of CSVFS performance counters lets go through lifetime of some hypothetical non-cached write operation. <BR /><BR /></P> <OL> <OL> <LI>A non-cached write comes to CSVFS <BR /><BR /> <OL> <OL> <LI>CSVFS increments Write Queue Length</LI> </OL> </OL> <BR /> <OL> <OL> <LI>CSVFS remembers the timestamp when IO has arrived.</LI> </OL> </OL> <BR /><BR /></LI> </OL> </OL> <P>&nbsp;</P> <OL> <OL> <LI>Let’s assume CSVFS decides that it can perform Direct IO on the file and it dispatches the IO to the CSV Volume Manager. <BR /><BR /> <OL> <OL> <LI>CSVFS increments IO Write Queue Length</LI> </OL> </OL> <BR /> <OL> <OL> <LI>CSVFS remembers the timestamp when IO was forwarded to the volume manager</LI> </OL> </OL> <BR /><BR /></LI> </OL> </OL> <P>&nbsp;</P> <OL> <OL> <LI>Let’s assume IO fails because something has happened to the disk and CSV Volume Manager is not able to deal with that. CSVFS will pause this IO and will go through recovery. <BR /><BR /> <OL> <OL> <LI>CSVFS decrements IO Write Queue Length</LI> </OL> </OL> <BR /> <OL> <OL> <LI>CSVFS take timestamp of completion, subtracts the timestamp we took in step 2.ii. This tells us how long CSV Volume manager took to complete this IO. Using this value CSVFS update IO Write Avg. Queue Length and IO Write Latency</LI> </OL> </OL> <BR /> <OL> <OL> <LI>CSVFS increments IO Writes and IO Writes/sec counter</LI> </OL> </OL> <BR /> <OL> <OL> <LI>Depending if this write had to be split due to file fragmentation CSVFS increments either IO Single Writes and IO Single Writes/sec or IO Split Writes and IO Split Writes/sec.</LI> </OL> </OL> <BR /><BR /></LI> </OL> </OL> <P>&nbsp;</P> <OL> <OL> <LI>Once CSVFS recovered it will reissue the paused write. Let’s assume that this time CSVFS finds that it has to dispatch IO using File System Redirected IO <BR /><BR /> <OL> <OL> <LI>CSVFS increments Redirected Write Queue Length</LI> </OL> </OL> <BR /> <OL> <OL> <LI>CSVFS remembers the timestamp when IO was forwarded to NTFS directly or over SMB</LI> </OL> </OL> <BR /><BR /></LI> </OL> </OL> <P>&nbsp;</P> <OL> <OL> <LI>Let’s assume SMB completes write successfully <BR /><BR /> <OL> <OL> <LI>CSVFS decrements Redirected Write Queue Length</LI> </OL> </OL> <BR /> <OL> <OL> <LI>CSVFS take timestamp of completion, subtracts the timestamp we took in step 4.ii. This tells us how long SMB and NTFS took to complete this IO. Using this value CSVFS update Redirected Writes Avg. Queue Length and Redirected Write Latency</LI> </OL> </OL> <BR /> <OL> <OL> <LI>CSVFS increments Redirected Write and Redirected Write/sec counters.</LI> </OL> </OL> <BR /><BR /></LI> </OL> </OL> <P>&nbsp;</P> <OL> <OL> <LI>If necessary CSVFS will do any post-processing after the IO completion and finally will complete the IO <BR /><BR /> <OL> <OL> <LI>CSVFS decrements Write Queue Length</LI> </OL> </OL> <BR /> <OL> <OL> <LI>CSVFS take timestamp of completion, subtracts the timestamp we took in step 1.ii. This tells us how long CSVFS took to complete this IO. Using this value CSVFS updates Write Latency</LI> </OL> </OL> <BR /> <OL> <OL> <LI>CSVFS increments Writes and Writes/sec counters.</LI> </OL> </OL> <BR /><BR /></LI> </OL> </OL> <P><BR /><BR />Note the color highlighting in the sample above. It is emphasizing relationship of the performance counter groups. We can also describe this scenario using following diagram where you can see how time ranges are included <BR /><BR /><span class="lia-inline-image-display-wrapper lia-image-align-inline" style="width: 519px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90528iF4C6F18CFD2DF7A4/image-dimensions/519x89?v=v2" width="519" height="89" role="button" /></span> <BR /><BR />CSV volume pause is a very rear event and very few IOs run into it. For majority of IOs timeline would look in one of the following ways <BR /><BR /><span class="lia-inline-image-display-wrapper lia-image-align-inline" style="width: 577px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90529iC0CD7DEA13851950/image-dimensions/577x104?v=v2" width="577" height="104" role="button" /></span> <BR /><BR />In these cases Read Latency and Write Latency will be the same as IO Read Latency, IO Write Latency, Redirected Read Latency and Redirected Write Latency depending on the IO type and how it was dispatched. <BR /><BR />In some cases file system might hold an IO. For instance if IO extends a file then it will be serialized with other extending IO on the same file. In that case timeline might looks like this <BR /><BR /><span class="lia-inline-image-display-wrapper lia-image-align-inline" style="width: 597px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90530iEE6DD0565764E58E/image-dimensions/597x119?v=v2" width="597" height="119" role="button" /></span> <BR /><BR />The important point here is that Read Latency and Write Latency are expected to be slightly larger than its IO* and Redirected* partner counters. <BR /><BR /><BR /></P> <H2>Cluster CSV Volume Manager</H2> <P><BR />One thing that we cannot tell using CSVFS performance counters is if IO was sent using Direct IO or Block Level Redirected IO. This is because CSVFS does not have that information as it happens at a lower layer, only CSV Volume Manager knows that. You can get visibility to what is going on in the CSV volume manger using Cluster CSV Volume Manager counter set. <BR /><BR />Performance counters in this group can be split into 2 categories. The first category has Redirected in its name and second does not. All the counters that do NOT have Redirected in the name are describing what is going on with Direct IO, and all the counters that do describe Block Level Redirected IO. Most of the counters are self-explanatory. Two require some explaining. If disk is connected then CSV Volume Manager always first attempts to send IO directly to the disk. If disk fails the IO the CSV Volume Manger will increment Direct IO Failure Redirection and Direct IO Failure Redirection/sec and will retry this IO using Block Level Redirected IO path over SMB. So these two counters help you to tell if IO is redirected because disk is not physically connected or because disk is failing IO for some reason. <BR /><BR /><span class="lia-inline-image-display-wrapper lia-image-align-inline" style="width: 696px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90531iCAE24972AB706811/image-dimensions/696x568?v=v2" width="696" height="568" role="button" /></span> </P> <H2>Common Scenarios</H2> <P><BR />In this section we will go over the common scenario/question that can be answered using performance counters. <BR /><BR /><EM> Disclaimer:&nbsp; Please do not read much into the actual values of the counters in the samples below because samples were taken on test machines that are backed by extremely slow storage and on the machines with bunch of debugging features enabled. These samples are here to help you to understand relationship between counters. </EM> </P> <H2>Is direct IO is happening?</H2> <P><BR />Simply check IOPS and throughput using following CSV Volume Manager counters <BR /><BR /></P> <UL> <UL> <LI>\Cluster CSV Volume Manager(*)\IO Reads/sec</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>\Cluster CSV Volume Manager(*)\IO Writes/sec</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>\Cluster CSV Volume Manager(*)\IO Read-Bytes/sec</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>\Cluster CSV Volume Manager(*)\IO Write-Bytes/sec</LI> </UL> </UL> <P><BR /><BR />Values should be approximately equal to the load that you are placing on the volume. <BR /><BR />You can also verify that no unexpected redirected IO is happening by checking IOPs and throughput on the CSV File System Redirected IO path using CSVFS performance counters <BR /><BR /></P> <UL> <UL> <LI>\Cluster CSV File System(*)\Redirected Reads/sec</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>\Cluster CSV File System(*)\Redirected Writes/sec</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>\Cluster CSV File System(*)\Redirected Read Bytes/sec</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>\Cluster CSV File System(*)\Redirected Write Bytes/sec</LI> </UL> </UL> <P><BR /><BR />and CSV Volume Redirected IO path <BR /><BR /></P> <UL> <UL> <LI>\Cluster CSV Volume Manager(*)\IO Reads/sec - Redirected</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>\Cluster CSV Volume Manager(*)\IO Writes/sec - Redirected</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>\Cluster CSV Volume Manager(*)\IO Read-Bytes/sec - Redirected</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>\Cluster CSV Volume Manager(*)\IO Write-Bytes/sec – Redirected</LI> </UL> </UL> <P><BR /><BR /><span class="lia-inline-image-display-wrapper lia-image-align-inline" style="width: 905px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90532iF193CAF56525891B/image-size/large?v=v2&amp;px=999" role="button" /></span> <BR /><BR />Above is an example of how Performance Monitor would look like on the Coordinator node when all IO is going using Direct IO. You can see that \Cluster CSV File System(*)\Redirected* counters are all 0 as well as \Cluster CSV Colume Manager(*)\* - Redirected performance counters are 0. This tells us that there is no File System Redirected IO or Block level Redirected IO. </P> <H2>What is total IOPS and throughput?</H2> <P><BR />You can check how much over all IO is going through a CSV volume using following CSVFS performance counters <BR /><BR /></P> <UL> <UL> <LI>\Cluster CSV File System(*)\Reads/sec</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>\Cluster CSV File System(*)\Writes/sec</LI> </UL> </UL> <P><BR /><BR />Value of the counters above will be equal to the sum of IO going to the File System Redirected IO path <BR /><BR /></P> <UL> <UL> <LI>\Cluster CSV File System(*)\Redirected Reads/sec</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>\Cluster CSV File System(*)\Redirected Writes/sec</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>\Cluster CSV File System(*)\Redirected Read Bytes/sec</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>\Cluster CSV File System(*)\Redirected Write Bytes/sec</LI> </UL> </UL> <P><BR /><BR />and IO going to the CSV Volume Manager, which volume manager might dispatch using Direct IO or Block Redirected IO <BR /><BR /></P> <UL> <UL> <LI>\Cluster CSV File System(*)\IO Reads/sec</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>\Cluster CSV File System(*)\IO Writes/sec</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>\Cluster CSV File System(*)\IO Read Bytes/sec</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>\Cluster CSV File System(*)\IO Write Bytes/sec</LI> </UL> </UL> <P><BR /><BR /></P> <H2>What is Direct IO IOPs and throughput?</H2> <P><BR />You can check how much IO is sent by the CSV volume manager directly to the disk connected on the cluster node using following performance counters <BR /><BR /></P> <UL> <UL> <LI>\Cluster CSV Volume Manager(*)\IO Reads/sec</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>\Cluster CSV Volume Manager(*)\IO Writes/sec</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>\Cluster CSV Volume Manager(*)\IO Read-Bytes/sec</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>\Cluster CSV Volume Manager(*)\IO Write-Bytes/sec</LI> </UL> </UL> <P><BR /><BR />Value of these counters will be approximately equal to the values of the following performance counters of the corresponding physical disk <BR /><BR /></P> <UL> <UL> <LI>\PhysicalDisk(*)\Disk Reads/sec</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>\PhysicalDisk(*)\Disk Writes/sec</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>\PhysicalDisk(*)\Disk Read Bytes/sec</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>\PhysicalDisk(*)\Disk Write Bytess/sec</LI> </UL> </UL> <P><BR /><BR /></P> <H2>What is File System Redirected IOPS and throughput?</H2> <P><BR />You can check how much IO is sent by the CSVFS directly to the disk connected on the cluster node using following performance counters <BR /><BR /></P> <UL> <UL> <LI>\Cluster CSV File System(*)\Redirected Reads/sec</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>\Cluster CSV File System(*)\Redirected Writes/sec</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>\Cluster CSV File System(*)\Redirected Read Bytes/sec</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>\Cluster CSV File System(*)\Redirected Write Bytes/sec</LI> </UL> </UL> <P><BR /><BR /><span class="lia-inline-image-display-wrapper lia-image-align-inline" style="width: 990px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90533i7F0099096DFDB628/image-size/large?v=v2&amp;px=999" role="button" /></span> <BR /><BR />The picture above shows you what you would see on the coordinating node if you put Volume1 in the File System Redirected mode. You can see that only \Cluster CSV File System(*)\Redirected* are changing while \Cluster CSV File System(*)\IO* are all 0. Since File system redirected IO does not go through the CSV Volume manager its counters stay 0. File System Redirected IO go to NTFS, and NTFS sends this IO to the disk so you can see Physical Disk counters match to the CSV File System Redirected IO counters. </P> <H2>What is Block Level Redirected IOPs and throughput?</H2> <P><BR />You can check how much IO CSV Volume Manager dispatches directly to the Coordinating node using SMB using following performance counters. <BR /><BR /></P> <UL> <UL> <LI>\Cluster CSV Volume Manager(*)\IO Reads/sec - Redirected</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>\Cluster CSV Volume Manager(*)\IO Writes/sec - Redirected</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>\Cluster CSV Volume Manager(*)\IO Read-Bytes/sec - Redirected</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>\Cluster CSV Volume Manager(*)\IO Write-Bytes/sec – Redirected</LI> </UL> </UL> <P><BR /><BR />Please note that since on the Coordinating node disk is always present CSV will always use Direct IO, and will never use Block Redirected IO so on the Coordinating node values of these counters should stay 0. </P> <H2>What is average Direct IO and Block Level Redirected IO latency?</H2> <P><BR />To find out Direct IO latency you need to look at the counters <BR /><BR /></P> <UL> <UL> <LI>\Cluster CSV File System(*)\IO Read Latency</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>\Cluster CSV File System(*)\IO Write Latency</LI> </UL> </UL> <P><BR /><BR />To understand where this latency is coming from you need to first look at the following CSV Volume Manager performance counters to see if IO is going Direct IO or Block Redirected IO <BR /><BR /></P> <UL> <UL> <LI>\Cluster CSV Volume Manager(*)\IO Reads/sec</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>\Cluster CSV Volume Manager(*)\IO Writes/sec</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>\Cluster CSV Volume Manager(*)\IO Reads/sec - Redirected</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>\Cluster CSV Volume Manager(*)\IO Writes/sec – Redirected</LI> </UL> </UL> <P><BR /><BR />If IO goes to Direct IO then next compare CSVFS latency to the latency reported by the disk <BR /><BR /></P> <UL> <UL> <LI>\PhysicalDisk(*)\Avg. Disk sec/Read</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>\PhysicalDisk(*)\Avg. Disk sec/Write</LI> </UL> </UL> <P><BR /><BR />If IO goes to Block Redirected Direct IO, and you are on Coordinator node then you still need to look at the Physical Disk performance counters. If you are on non-Coordinator node then compare look at the latency reported by SMB on the CSV$ share using following counters <BR /><BR /></P> <UL> <UL> <LI>\SMB Client Shares(*)\Avg sec/Write</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>\SMB Client Shares(*)\Avg sec/Read</LI> </UL> </UL> <P><BR /><BR />Then compare SMB client latency to the Physical disk latency on the coordinator node. <BR /><BR />Below you can see a sample where all IO goes Direct IO path and latency reported by the physical disk matches to the latency reported by the CSVFS, which means the disk is the only source of the latency and CSVFS does not add on top of it. <BR /><BR /><span class="lia-inline-image-display-wrapper lia-image-align-inline" style="width: 523px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90534i92BFABB7C15A795E/image-size/large?v=v2&amp;px=999" role="button" /></span> </P> <H2>What is File System Redirected IO latency?</H2> <P><BR />To find out File System Redirect IO latency you need to look at the counters <BR /><BR /></P> <UL> <UL> <LI>\Cluster CSV File System(*)\Redirected Read Latency</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>\Cluster CSV File System(*)\Redirected Write Latency</LI> </UL> </UL> <P><BR /><BR />To find out where this latency is coming from on coordinator node compare it to the latency reported by the physical disk. <BR /><BR /></P> <UL> <UL> <LI>\PhysicalDisk(*)\Avg. Disk sec/Read</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>\PhysicalDisk(*)\Avg. Disk sec/Write</LI> </UL> </UL> <P><BR /><BR />If you see latency reported by the physical disk is much lower to what reported by the physical disk then one of the components located between CSVFS and the disk is enquing/serializing the IO. <BR /><BR /><span class="lia-inline-image-display-wrapper lia-image-align-inline" style="width: 523px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90535i2409E92F12054200/image-size/large?v=v2&amp;px=999" role="button" /></span> <BR /><BR />Above you can see an example where you can see physical disk reported latency <BR /><BR /></P> <UL> <UL> <LI>\Cluster CSV File System(*)\Write Latency is 19 milliseconds</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>\Cluster CSV File System(*)\Redirected Write Latency is 19 milliseconds</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>\PhysicalDisk(*)\Avg. Disk sec/Write is 18 milliseconds</LI> </UL> </UL> <P><BR /><BR /><BR /><BR /></P> <UL> <UL> <LI>\Cluster CSV File System(*)\Read Latency is 24 milliseconds</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>\Cluster CSV File System(*)\Redirected Read Latency is 23 milliseconds</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>\PhysicalDisk(*)\Avg. Disk sec/Read is 23 milliseconds</LI> </UL> </UL> <P><BR /><BR />Given statistical errors ant that snapshotting values of different counters is not synchronized we can ignore 1 milliseconds that CSVFS adds and we can say that most of the latency comes from the physical disk. <BR /><BR />If you are on non-Coordinating node then you need to look at SMB Client Share performance counters for the volume share <BR /><BR /></P> <UL> <UL> <LI>\SMB Client Shares(*)\Avg sec/Write</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>\SMB Client Shares(*)\Avg sec/Read</LI> </UL> </UL> <P><BR /><BR />After that look at the latency reported by the physical disk on the Coordinator node to see how much latency is coming from SMB itself <BR /><BR /><span class="lia-inline-image-display-wrapper lia-image-align-inline" style="width: 916px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90536i369F78349A52EE31/image-size/large?v=v2&amp;px=999" role="button" /></span> <BR /><BR />In the sample above you can see <BR /><BR /></P> <UL> <UL> <LI>\Cluster CSV File System(*)\Write Latency is 18 milliseconds</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>\Cluster CSV File System(*)\Redirected Write Latency is 18 milliseconds</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>\SMB Client Shares(*)\Avg sec/Write is 18 milliseconds</LI> </UL> </UL> <P><BR /><BR /><BR /><BR /></P> <UL> <UL> <LI>\Cluster CSV File System(*)\Read Latency is 23 milliseconds</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>\Cluster CSV File System(*)\Redirected Read Latency is 23 milliseconds</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>\SMB Client Shares(*)\Avg sec/Read is 22 milliseconds</LI> </UL> </UL> <P><BR /><BR />In the sample before we’ve seen that disk read latency is 23 milliseconds and write latency is 18 milliseconds so we can conclude that disk is the biggest source of latency. </P> <H2>Is my disk the bottleneck?</H2> <P><BR />To answer this question you need to look at the sum/average of the following performance counters across all cluster nodes that perform IO on this disk. Each node’s counters will tell you how much IO is done by this node, and you will need to do the math to find out the aggregate values. <BR /><BR /></P> <UL> <UL> <LI>\PhysicalDisk(*)\Avg. Disk Read Queue Length</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>\PhysicalDisk(*)\Avg. Disk Write Queue Length</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>\PhysicalDisk(*)\Avg. Disk sec/Read</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>\PhysicalDisk(*)\Avg. Disk sec/Write</LI> </UL> </UL> <P><BR /><BR />You can play with different queue length changing load of the disk and checking against your target. There is really no right or wrong answer here and it all depends on what is your application expectations are. <BR /><BR /><span class="lia-inline-image-display-wrapper lia-image-align-inline" style="width: 523px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90537iEC30AA82D65E5819/image-size/large?v=v2&amp;px=999" role="button" /></span> <BR /><BR />In the sample above you can see that total IO queue length on the disk is about (8.951+8.538+10.336+10.5) 38.3, and average latency is about ((0.153+0.146+0.121+0.116)/4) 134 milliseconds. <BR /><BR />Please note that physical disk number in this sample happens to be the same – 7 on both cluster nodes. You should not assume it will be the same. On the coordinator node you can find it using Cluster Administrator UI by looking at the Disk Number column. <BR /><BR /><span class="lia-inline-image-display-wrapper lia-image-align-inline" style="width: 837px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90538i595249D82BB287F0/image-size/large?v=v2&amp;px=999" role="button" /></span> <BR /><BR />Unfortunately there are no good tools to find it on the non-Coordinator node, the Disk Management mmc snap-in is the best tool. <BR /><BR /><EM> To find physical disk number on all cluster nodes you can move the Cluster Disk from node to node writing down Disk Number on each node, but be careful especially when you have actual workload running because while moving volume CSV will pause all IOs, which will impact your workload throughput. </EM> </P> <H2>Is my network the bottleneck?</H2> <P><BR />When you are looking at the cluster networks keep in mind that cluster splits networks into several categories and each type of traffic uses only some of the categories. You can read more about that in the following blog post <A href="#" target="_blank" rel="noopener"> http://blogs.msdn.com/b/clustering/archive/2011/06/17/10176338.aspx </A> . <BR /><BR />If you know the network bandwidth you can always do math to verify that it is large enough to be able to handle your load. But you should verify it in empirical way then before you put your cluster into production. &nbsp;I would suggest you use following steps: <BR /><BR /></P> <OL> <OL> <LI>Online disk on the cluster node and stress/saturate your disks by putting lots of non-cached IO on the disk. Monitor IOPS, and MBPS on the disk using PhysicalDisk performance counters <BR /><BR /> <OL> <OL> <LI>\PhysicalDisk(*)\Disk Reads/sec</LI> </OL> </OL> <BR /> <OL> <OL> <LI>\PhysicalDisk(*)\Disk Writes/sec</LI> </OL> </OL> <BR /> <OL> <OL> <LI>\PhysicalDisk(*)\Disk Read Bytes/sec</LI> </OL> </OL> <BR /> <OL> <OL> <LI>\PhysicalDisk(*)\Disk Write Bytes/sec</LI> </OL> </OL> <BR /><BR /></LI> </OL> </OL> <P>&nbsp;</P> <OL> <OL> <LI>From another cluster node run the same test over SMB, and now monitor SMB Client Share performance counters <BR /><BR /> <OL> <OL> <LI>\SMB Client Shares(*)\Write Bytes/Sec</LI> </OL> </OL> <BR /> <OL> <OL> <LI>\SMB Client Shares(*)\Read Bytes/Sec</LI> </OL> </OL> <BR /> <OL> <OL> <LI>\SMB Client Shares(*)\Writes/Sec</LI> </OL> </OL> <BR /> <OL> <OL> <LI>\SMB Client Shares(*)\Reads/Sec</LI> </OL> </OL> <BR /><BR /></LI> </OL> </OL> <P><BR /><BR />If you see that you are getting the same IOPS and MBPS then your bottleneck is the disk, and network is fine. <BR /><BR />In case if you are using Scale-out File Server (SOFS) you can use similar way to verify that network is not the bottleneck, but in this case instead of PhysicalDisk counters use Cluster CSV File System performance counters <BR /><BR /></P> <UL> <UL> <LI>\Cluster CSV File System(*)\Reads/sec</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>\Cluster CSV File System(*)\Writes/sec</LI> </UL> </UL> <P><BR /><BR />When using RDMA it is a good idea to verify that RDMA is actually working by looking at the \SMB Direct Connection(*)\* family of counters. </P> <H2>Performance Counters Summary</H2> <P><BR />We went over 4 counter sets that are most useful when investigating CSVFS performance. Using PowerShell cmdlets we are able to see if Direct IO is possible. Using performance counters we can verify if IO is indeed is going according to our expectations, and by looking at counters at different layers we can find where the bottleneck is. <BR /><BR />Remember that none of the performance counters that we talked about is aggregated across multiple cluster nodes, each of them provides one node view. <BR /><BR />If you want to automate collection of performance counters from multiple node consider using this simple script <A href="#" target="_blank" rel="noopener"> http://blogs.msdn.com/b/clustering/archive/2009/10/30/9915526.aspx </A> that is just a convenience wrapper around logman.exe. It is described in the following blog post <A href="#" target="_blank" rel="noopener"> http://blogs.msdn.com/b/clustering/archive/2009/11/10/9919999.aspx </A> . <BR /><BR />Thanks! <BR />Vladimir Petter <BR />Principal Software Development Engineer <BR />Clustering &amp; High-Availability <BR />Microsoft <BR /><BR /><BR /></P> <H3>To learn more, here are others in the&nbsp;Cluster Shared Volume (CSV)&nbsp;blog series:</H3> <P><BR />Cluster Shared Volume (CSV) Inside Out <BR /><A href="#" target="_blank" rel="noopener"> http://blogs.msdn.com/b/clustering/archive/2013/12/02/10473247.aspx </A> <BR /><BR />Cluster Shared Volume Diagnostics <BR /><A href="#" target="_blank" rel="noopener"> http://blogs.msdn.com/b/clustering/archive/2014/03/13/10507826.aspx </A> <BR /><BR />Cluster Shared Volume Performance Counters <BR /><A href="#" target="_blank" rel="noopener"> http://blogs.msdn.com/b/clustering/archive/2014/06/05/10531462.aspx </A> <BR /><BR />Cluster Shared Volume Failure Handling <BR /><A href="#" target="_blank" rel="noopener"> http://blogs.msdn.com/b/clustering/archive/2014/10/27/10567706.aspx </A> <BR /><BR /></P> Thu, 05 Dec 2019 19:54:19 GMT https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/cluster-shared-volume-performance-counters/ba-p/371980 Elden Christensen 2019-12-05T19:54:19Z Sessions at TechEd Houston 2014 from the Cluster team https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/sessions-at-teched-houston-2014-from-the-cluster-team/ba-p/371963 <HTML> <HEAD></HEAD><BODY> <STRONG> First published on MSDN on May 16, 2014 </STRONG> <BR /> The Cluster team presented multiple exciting sessions at the sold-out TechEd Houston from May 12-15 <SUP> th </SUP> ! If you didn’t get a chance to attend the conference in-person, the sessions are now posted online so you can watch on-demand!&nbsp; Here are the sessions, their descriptions, and links to the videos. <BR /> <BR /> <STRONG> The sessions from the clustering team at TechEd Houston: </STRONG> <BR /> <BR /> <STRONG> 1) </STRONG> <STRONG> </STRONG> <A href="#" target="_blank"> DCIM-B354 Failover Clustering: What's New in Windows Server 2012 R2 </A> <BR /> <BR /> This session will give the complete roadmap of the wealth of new Failover Clustering features and features which enable high availability scenarios in Windows Server 2012 R2. If you are going to attend one session at TechReady on clustering / high availability… this is it! This session will cover all the incremental feature improvements from Windows Server 2012 to Windows Server 2012 R2 for clustering and availability. <BR /> <BR /> <STRONG> 2) </STRONG> <STRONG> </STRONG> <A href="#" target="_blank"> DCIM-B364 Step-by-Step to Deploying Microsoft SQL Server 2014 with Cluster Shared Volumes </A> <BR /> <BR /> SQL Server 2014 now supports deploying Failover Cluster Instances on top of Windows Server 2012 R2 Cluster Shared Volumes (CSV). You can now leverage the same CSV storage deployment model you do for your Hyper-V and Scale-out File Server deployments, with SQL Server. This session walks through how to configure a highly available SQL Server 2014 on top of Cluster Shared Volumes (CSV). It discusses best practices and recommendations. <BR /> <BR /> <STRONG> 3) </STRONG> <STRONG> </STRONG> <A href="#" target="_blank"> DCIM-B349 Software-Defined Storage in Windows Server 2012 R2 and Microsoft System Center 2012 R2 </A> <BR /> <BR /> Regardless of whether you’re building on a private infrastructure, in a hybrid environment, or deploying to a public cloud, there are optimizations you can make in storage and availability that will improve the manageability and performance of your application and environment. Join this session to hear more about the end-to-end scale, performance, and availability improvements coming with Windows Server. We dive into deploying and managing storage using SMB shares, show the improved experience in everyday storage management such as deploying patches to the cloud, and share how to leverage faster live migration when responding to new load demands. Starting with Windows Server 2012, Microsoft offered a complete storage stack from the hardware up, leveraging commodity JBODs surfaced as Virtual Disks via Storage Spaces, hosted by Scale-Out File Server nodes and servicing client requests via SMB 3.0. Now, with major features added in Windows Server 2012 R2 (e.g., Storage Tiering and SMB Redirection), the story gets even better! As a critical piece in the Modern Datacenter (i.e., a Software-Defined Datacenter), SDS plays a crucial role in improving utilization and increasing cost efficiency, scalability, and elasticity. This session empowers you to architect, implement, and monitor this key capability. Come learn how to design, deploy, configure, and automate a storage solution based completely on Windows technologies, as well as how to troubleshoot and manage via the in-box stack and Microsoft System Center. Live demos galore! <EM> </EM> <BR /> <BR /> <STRONG> 4) </STRONG> <STRONG> </STRONG> <A href="#" target="_blank"> FDN06 Transform the Datacenter: Making the Promise of Connected Clouds a Reality </A> <BR /> <BR /> Cloud computing continues to shift the technology landscape, but most are still working to truly harness its potential for their organizations. How can you bring cloud computing models and technologies into your datacenter? How can you implement a hybrid approach across clouds that delivers value and meets your unique needs? This foundational session is all about making it real from the ground up. Topics include new innovations spanning server virtualization and infrastructure as a service, network virtualization and connections to the cloud, ground-breaking on-premises and cloud storage technologies, business continuity, service delivery, and more. Come learn from Microsoft experts across Windows Server, System Center, and Microsoft Azure about how you can apply these technologies to bring your datacenter into the modern era! <BR /> <BR /> <BR /> <BR /> <P> Thanks! <BR /> Subhasish Bhattacharya <BR /> Program Manager <BR /> Clustering &amp; High Availability <BR /> Microsoft </P> </BODY></HTML> Fri, 15 Mar 2019 21:50:09 GMT https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/sessions-at-teched-houston-2014-from-the-cluster-team/ba-p/371963 John Marlin 2019-03-15T21:50:09Z Deploying SQL Server 2014 with Cluster Shared Volumes https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/deploying-sql-server-2014-with-cluster-shared-volumes/ba-p/371962 <HTML> <HEAD></HEAD><BODY> <STRONG> First published on MSDN on May 08, 2014 </STRONG> <BR /> An exciting new feature in SQL Server 2014 is the support for the deployment of a Failover Cluster Instance (FCI) with Cluster Shared Volumes (CSV). In this blog, I am going to discuss the value of deploying SQL Server with CSV as well as how you can deploy SQL with CSV. I also&nbsp;did a session at&nbsp;TechEd North America 2014 if you would like to learn more beyond this blog, recording is below: <BR /> <IFRAME frameborder="0" height="540" src="https://channel9.msdn.com/Events/TechEd/NorthAmerica/2014/DCIM-B364/player" width="960"> </IFRAME> <BR /> <BR /> <BR /> <BR /> <STRONG> Value of Deploying SQL 2014 with CSV </STRONG> <BR /> <BR /> A SQL 2014 deployment with Cluster Shared Volumes provides several advantages over a deployment on “traditional” cluster storage. <BR /> <BR /> <STRONG> Scalability </STRONG> <BR /> <BR /> <EM> Consolidation of multiple SQL instances </EM> : With traditional cluster storage, each SQL instance requires a separate LUN to be carved out. This because the LUN would need to failover with the SQL instance. CSV allows nodes in the cluster to have shared access to storage. This facilitates the consolidation of SQL instances by storing multiple SQL instances on a single CSV. <BR /> <BR /> <EM> Better capacity planning, storage utilization </EM> : Consolidating multiple SQL instances on a single LUN&nbsp;makes the storage&nbsp;utilization more efficient. <BR /> <BR /> <EM> Addresses drive letter limitation: </EM> Traditionally, the number of SQL instances that can be deployed on a cluster is limited to the number of drive letters (24 excluding the system drive and a drive for a peripheral device). There is no limit to the number of mount points for a CSV. Therefore, scalability of your SQL deployment is enhanced. <BR /> <BR /> <STRONG> Availability </STRONG> <BR /> <BR /> <EM> Resilience from storage failures </EM> : When storage connectivity on a node is disrupted, CSV routes traffic over the network using SMB 3.0 allowing the SQL instance to&nbsp;remain operational. In a traditional deployment, the SQL instance would need to be failed over to a node with connectivity to the storage, resulting in downtime. <BR /> <BR /> <EM> Fast failover </EM> : Given that nodes in a cluster have shared access to storage, a SQL Server failover no longer required the dismounting and remounting of volumes. Additionally, the SQL Server DB is moved without drive ownership changes. <BR /> <BR /> <EM> Zero downtime Chkdsk: </EM> CSV integrates with <A href="#" target="_blank"> the improvements in Chkdsk in Windows Server 2012 </A> to provide a disk repair without any SQL Server downtime. <BR /> <BR /> <STRONG> Operability </STRONG> <BR /> <BR /> With CSV, the management of your SQL Server Instance is simplified. You are able to manage the underlying storage from any node as there is an abstraction to which node owns the disk. <BR /> <BR /> <STRONG> Performance and Security </STRONG> <BR /> <BR /> <A href="#" target="_blank"> <EM> CSV Block Cache </EM> </A> : CSV provides a distributed read-only cache for unbuffered I/O to SQL databases. <BR /> <BR /> <A href="#" target="_blank"> <EM> BitLocker Encrypted CSV </EM> </A> : With the CSV integration with BitLocker you have an option to secure your deployments outside your datacenters such as at branch offices. Volume level encryption allows you to meet compliance requirements. <BR /> <BR /> <BR /> <BR /> <STRONG> How to deploy a SQL Server 2014 FCI on CSV </STRONG> <BR /> <BR /> You can deploy a SQL Server 2014 FCI on CSV with the following steps: <BR /> <BR /> <STRONG> Note: </STRONG> The Steps to deploy a SQL Server FCI with CSV is identical with that with traditional storage except for Steps 3, 4 and&nbsp;19 below. The remaining steps have been provided as a reference. For detailed instructions on the installation steps for a&nbsp;"traditional" FCI deployment&nbsp;refer to <STRONG> : </STRONG> <A href="#" target="_blank"> <STRONG> http://technet.microsoft.com/en-us/library/hh231721.aspx </STRONG> </A> <BR /> <BR /> 1) <A href="#" target="_blank"> Create the cluster </A> which will host the FCI deployment. <BR /> <BR /> 2)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Run validation on your cluster and <A href="#" target="_blank"> ensure that there are no errors </A> . <BR /> <BR /> 3)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Provision storage for your cluster. Add the storage to the cluster. You may rename the cluster disks corresponding to the storage for your convenience. <A href="#" target="_blank"> Add the cluster disks to CSV </A> . <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90490iC2F5B903D0ABB45D" /> <BR /> <BR /> 4)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Rename your CSV&nbsp;mount points to enhance your manageability <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90491iACAB481255280240" /> <BR /> <BR /> 5)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Install .NET Framework 3.5 <BR /> <BR /> <A href="#" target="_blank"> Using Windows PowerShell® </A> <BR /> <BR /> <A href="#" target="_blank"> Using Server Manager </A> <BR /> <BR /> 6)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Begin SQL installation on the first cluster node. Choose the <STRONG> Installation </STRONG> tab and choose the <STRONG> New SQLServer failover cluster installation </STRONG> option. <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90492i36EE0EBACC6563E1" /> <BR /> <BR /> 7)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Enter the Product Key <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90493i60D00EA6BD6AF216" /> <BR /> <BR /> 8)</img>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Accept the License Terms <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90494i678D9E95B4B72BC1" /> <BR /> <BR /> 9)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Choose to use Microsoft Update to check for updates. <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90495i07876376FAE1F703" /> <BR /> <BR /> 10)&nbsp;&nbsp;Failover Cluster rules will be installed. It is essential that this step completes without errors. <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90496iA441E30F88B7E0F4" /> <BR /> <BR /> 11)&nbsp;&nbsp;Choose the <STRONG> SQLServer Feature Installation </STRONG> option. <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90497iBB058F571D94BC08" /> <BR /> <BR /> 12)&nbsp;&nbsp;Select the <STRONG> Database Engine Services </STRONG> and <STRONG> Management Tools – Basic </STRONG> features. <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90498iF3B7C83B3DB7EB36" /> <BR /> <BR /> <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90499i61349967A09A2A90" /> <BR /> <BR /> 13)&nbsp;&nbsp;Provide a Network Name for your SQL instance. <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90500i1C08F2EF21BA88E6" /> <BR /> <BR /> 14)&nbsp;&nbsp;Specify a name for the SQL Server cluster resource group of proceed with the default. <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90501i5DB9C4250913A5F0" /> <BR /> <BR /> 15)&nbsp;&nbsp;Proceed with the default Cluster Disk selected. <STRONG> We will adjust this selection in step 19. </STRONG> <BR /> <BR /> <STRONG> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90502i26B15F4E724E5064" /> </STRONG> <BR /> <BR /> 16)&nbsp;&nbsp;Choose both the IPv4 and IPv6 networks if available. <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90503iD3C24FF54441A8CD" /> <BR /> <BR /> 17)&nbsp;&nbsp;Configure your SQL Server Agent and Database Engine accounts <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90504i4D681F6519D565DD" /> <BR /> <BR /> 18)&nbsp;&nbsp;Specify your SQL Server administrators and choose your authentication mode. <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90505iD1BC19E3C0E3A5F6" /> <BR /> <BR /> 19)&nbsp;&nbsp;Select the <STRONG> Data Directories </STRONG> tab. This allows you to customize the <STRONG> Cluster Shared Volumes paths </STRONG> where you want to store the files corresponding to your SQL Database. <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90506i08DB22CF986DF5AD" /> <BR /> 20)&nbsp;&nbsp;Proceed with the final SQL Server installation. <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90507iE565BB7A0035D4B5" /> <BR /> <BR /> <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90508iC9295A61B653C083" /> <BR /> <BR /> On completion of installation you will now see the FCI data files stored in the CSV volumes specified. <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90509iBCE3D2C426883F71" /> <BR /> <BR /> Failover Cluster Manager (type <STRONG> cluadmin.msc </STRONG> on an elevated command prompt to launch) will reflect the SQL server instance deployed. <BR /> <BR /> <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90510i7C9DEBA0C46A2446" /> <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90511i2FBC0DEAEABB168F" /> <BR /> <BR /> 21)&nbsp;&nbsp;Now add the other cluster nodes to the FCI. In the SQL Server Installation Center,&nbsp;choose the <STRONG> Add node to a SQL Server failover cluster </STRONG> option. <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90512i55E2339C2970F577" /> <BR /> <BR /> 22)&nbsp;&nbsp;Analogous to the installation on node 1. Proceed with the addition of the cluster node to the FCI. <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90513iBB12FBBEA98642FF" /> <BR /> <BR /> <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90514iDE49CCD7DAF1F027" /> <BR /> <BR /> <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90515iF3C94F8650182430" /> <BR /> <BR /> <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90516iCB8EA96FA30FCB7C" /> <BR /> <BR /> <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90517i2881B016C76CC3AF" /> <BR /> <BR /> <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90518i056C768120019E41" /> <BR /> <BR /> <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90519i3C3D479D1087CD43" /> <BR /> <BR /> Once your installation is done you can test a failover of your SQL instance through the Failover Cluster Manager. <STRONG> Right Click </STRONG> on the SQL Server role and choose to <STRONG> Move </STRONG> to the <STRONG> Best Possible Node. </STRONG> <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90520i1320815A172DBF18" /> <BR /> <BR /> Note the difference with CSV. Your CSV will remain online for the duration of the SQL Server failover. There is no need to failover the storage to the node the SQL Server instance is moved to. <BR /> <BR /> <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90521iC861880775457479" /> <BR /> <BR /> <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90522iA402BC93495D1E4D" /> <BR /> <BR /> <BR /> <BR /> Thanks! <BR /> <BR /> Subhasish Bhattacharya <BR /> Senior Program Manager <BR /> Clustering &amp; High Availability <BR /> Microsoft </BODY></HTML> Fri, 15 Mar 2019 21:50:06 GMT https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/deploying-sql-server-2014-with-cluster-shared-volumes/ba-p/371962 John Marlin 2019-03-15T21:50:06Z Configuring a File Share Witness on a Scale-Out File Server https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/configuring-a-file-share-witness-on-a-scale-out-file-server/ba-p/371927 <HTML> <HEAD></HEAD><BODY> <STRONG> First published on MSDN on Mar 31, 2014 </STRONG> <BR /> In this blog, I am going to discuss the considerations for configuring a File Share Witness (FSW) for the Failover Cluster hosting your workloads, on a separate <A href="#" target="_blank"> Scale-Out File Server </A> cluster. You can find more information on Failover Clustering quorum <A href="#" target="_blank"> here </A> . <BR /> <BR /> <STRONG> File Share Witness on a Scale-Out File Server </STRONG> <BR /> <BR /> It is supported to use a file share as a witness that is hosted on a <A href="#" target="_blank"> Scale-Out File Server </A> cluster. It is recommended that the following guidelines be considered when configuring your File Share Witness on a Scale-Out File Server: <BR /> <UL> <BR /> <LI> Starting in Windows Server 2012 R2, the recommendation is to&nbsp;always&nbsp;configure a Witness for your cluster. The cluster will now automatically determine if the Witness is to have a vote in determining quorum for the cluster. </LI> <BR /> <LI> Create a new Server Message Block (SMB)&nbsp;share on the Scale-Out File Server for the exclusive use of a cluster. Note that the same share can be used for multiple clusters. </LI> <BR /> <LI> Ensure that the File Share has a minimum of 5MB provisioned per cluster it is used for. </LI> <BR /> <LI> The&nbsp;Scale-Out File Server hosting the file share to be used as a quorum witness&nbsp;should not be created within a Virtual Machine hosted on the same cluster for which the File Share Witness&nbsp;is being created. </LI> <BR /> <LI> Multi-site stretched-clusters: <BR /> <UL> <BR /> <LI> With the Service Level Agreement (SLA) of automatic failover across sites, it is necessary that the Scale-Out File Server backing the File Share Witness be hosted in an independent third site. This enables sites with nodes participating in quorum equal opportunity to survive in case a site experiences a&nbsp;power outage or WAN link connectivity breaks. </LI> <BR /> <LI> With the SLA of manual failover across sites, we still recommend that the Scale-Out File Server backing the File Share Witness be hosted in an independent third site. This simplifies the recovery steps necessary in case of a primary site power outage. You may also configure the Scale-Out File Server to be hosted in the primary site. However note that this&nbsp;would require recreating the quorum witness&nbsp;while recovering the cluster from the Backup Disaster Recovery&nbsp;site. </LI> <BR /> </UL> <BR /> </LI> <BR /> <LI> Create a non-CA file share for the witness on the Scale-Out File server. A non-CA file share can result in a faster failover of the file share witness resource in the event the Scale-Out File server cluster is unavailable. For a CA share, the file share witness resource may not experience an immediate failure and may only timeout after the 90 second quorum timeout window. On a non-CA share, the file share witness resource will fail immediately, triggering remedial actions from the cluster&nbsp;service. Setting up the configuration for a non-CA share is explained in the next section. </LI> <BR /> <LI> The Scale-Out File Server hosting the File Share Witness should be a member of a domain in the same forest as the cluster it is a Witness for. This is because the Cluster uses the <A href="#" target="_blank"> Cluster Name Object </A> to set the permissions on a folder in the share containing the cluster specific information. This ensures that the Cluster has appropriate permissions needed to maintain appropriate cluster state in the share. Additionally, the cluster administrator configuring the File Share Witness&nbsp;needs to have Full Control permissions to the share. This is necessary to set the permissions for the Cluster Name Object to the folder in the share. </LI> <BR /> <LI> It is important that the file share created on the Scale-Out File Server is not part of a Distributed File System (DFS) Namespace. The cluster needs to be able to arbitrate a single point for quorum. </LI> <BR /> </UL> <BR /> <BR /> <BR /> <STRONG> Configuring a File Share Witness on a Scale-Out File Server </STRONG> <BR /> <BR /> In this section, I will explain how you can create a file share on a Scale-Out File Server that will act as a witness for the cluster hosting your workloads. Therefore, you have two clusters – a “storage” cluster hosting your file share witness and a “compute” cluster hosting your highly available workloads. <BR /> <BR /> You can configure a File Share Witness on a Scale-Out File Server as follows: <BR /> <BR /> 1)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Create the Scale-Out File Server as described in Section 2.1 of this <A href="#" target="_blank"> article </A> . <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90477i04856E65F2488F85" /> <BR /> <BR /> 2)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Create a File share on the Scale-Out File as described in Section 2.2 of this <A href="#" target="_blank"> article </A> . <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90478i45345E314B19A5E3" /> <BR /> <BR /> <STRONG> a. </STRONG> Modify the properties of the share to make it a non-CA share. Right-click on the share, select <STRONG> Settings </STRONG> and uncheck the <STRONG> Enable continuous availability </STRONG> checkbox. <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90479i1A3178246307BDBE" /> <BR /> <BR /> b.&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Ensure that you have Full Control to the newly created share. <BR /> <BR /> <STRONG> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90480iA3F4DA1AA060CF7A" /> </STRONG> <BR /> <BR /> <STRONG> 3) </STRONG> <STRONG> Configure the File Share as a Witness on your cluster </STRONG> <BR /> <BR /> <STRONG> a. </STRONG> <STRONG> Using the Failover Cluster Manager </STRONG> <BR /> <BR /> <STRONG> <BR /> </STRONG> i.&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Type cluadmin.msc on an elevated command prompt <BR /> <BR /> <STRONG> <BR /> </STRONG> ii.&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Launch the Quorum Wizard by Right-clicking on the Cluster Name, Selecting <STRONG> More Actions </STRONG> and then selecting <STRONG> Configure Quorum Settings </STRONG> <BR /> <BR /> <STRONG> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90481i10331AB29243DE34" /> <BR /> </STRONG> iii.&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Select <STRONG> Next </STRONG> and then choose the <STRONG> Select the quorum witness </STRONG> option and select <STRONG> Next </STRONG> . <BR /> <BR /> <STRONG> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90482i47AE392490111588" /> </STRONG> <BR /> <BR /> <STRONG> <BR /> </STRONG> iv.&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Choose the <STRONG> Configure a file share witness </STRONG> option and select <STRONG> Next </STRONG> . <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90483iAB3B2669ACEB38CA" /> <BR /> <BR /> <STRONG> <BR /> </STRONG> v.&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Specify the path to the File Share on your Scale-Out File Server and select <STRONG> Next </STRONG> . <BR /> <BR /> <STRONG> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90484i810CADDCC50D0310" /> </STRONG> <BR /> <BR /> <STRONG> b. </STRONG> <STRONG> Using Windows PowerShell </STRONG> <BR /> i.&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Open a Windows PowerShell® console as an Administrator <BR /> ii.&nbsp;&nbsp; Type Set-ClusterQuorum –FileShareWitness &lt;File Share Witness Path&gt; <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90485i79C672839A78B957" /> <BR /> <BR /> You should now see the File Share Witness configured for your Cluster. <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90486i1238E47C76DC96A3" /> <BR /> <BR /> When you navigate to your File Share Witness share you will see a folder created for your Cluster. <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90487i8FBF89357DD7543D" /> <BR /> <BR /> This folder will have permissions for the Cluster Name Object of your “Compute” Cluster so that the entries in the folder can be modified on Cluster membership changes. <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90488i3097F125A1B85EFB" /> <BR /> <BR /> You will also notice a file <STRONG> Witness.log </STRONG> which contains the membership information for the Cluster. <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90489iFF3695BCC6EB7A29" /> <BR /> <BR /> You have now successfully configured a File Share Witness on a Scale-Out File Server, for the cluster hosting your workloads. <BR /> <BR /> <BR /> <BR /> Thanks! <BR /> Subhasish Bhattacharya <BR /> Program Manager <BR /> Clustering and High Availability <BR /> Microsoft </BODY></HTML> Fri, 15 Mar 2019 21:44:36 GMT https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/configuring-a-file-share-witness-on-a-scale-out-file-server/ba-p/371927 John Marlin 2019-03-15T21:44:36Z Failover Clustering and IPv6 in Windows Server 2012 R2 https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/failover-clustering-and-ipv6-in-windows-server-2012-r2/ba-p/371912 <HTML> <HEAD></HEAD><BODY> <STRONG> First published on MSDN on Mar 24, 2014 </STRONG> <BR /> <P> In this blog, I will discuss some common questions pertaining to IPv6 and Windows Server 2012 R2 Failover Clusters. </P> <BR /> <STRONG> What network protocol does Failover Clustering default to? </STRONG> <BR /> <P> If both IPv4 and IPv6 are enabled (which is the default configuration), IPv6 will be always used by clustering. The key take away is that it is not required to configure IPv4 when the IPv6 stack is enabled and you can go as far as to unbind IPv4. Additionally, you can use link-local (fe80) IPv6 address for your internal cluster traffic so IPv6 can be used for clustering even if you don’t use IPv6 for your public facing interfaces. Note that you can only have&nbsp;one cluster network using&nbsp;IPv6 link-local (fe80) addresses in your cluster. All networks that have IPv6 also have an IPv6 link-local address which is ignored if any IPv4 or other IPv6 prefix is present. </P> <BR /> <STRONG> Should IPv6 be disabled for Failover Clustering? </STRONG> <BR /> <P> The&nbsp;recommendation&nbsp;for Failover Clustering and Windows in general, starting in 2008 RTM, is to not disable IPv6 for your Failover Clusters. The majority of the internal testing for Failover Clustering is done with IPv6 enabled. Therefore, having IPv6 enabled will result in the safest configuration for your production deployment. </P> <BR /> <STRONG> Will Failover Clustering cease to work if IPv6 is disabled? </STRONG> <BR /> <P> A common misconception is that Failover Clustering will cease to work if IPv6 is disabled. This is incorrect. The Failover Clustering release criterion includes functional validation in an IPv4-only environment. </P> <BR /> <STRONG> How does Failover Clustering handle IPv6 being disabled? </STRONG> <BR /> <P> There are two levels at which IPv6 can be disabled: </P> <BR /> <P> 1)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; At the adapter level: This is done by unbinding the IPv6 stack by launching ncpa.cpl and unchecking “Internet Protocol Version 6 (TCP/IPv6)”. </P> <BR /> <P> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90474iB8A83EC7391CF76F" /> </P> <BR /> <P> <STRONG> Failover Clustering behavior: </STRONG> NetFT, the virtual cluster adapter, will still tunnel traffic using IPv6 over IPv4. </P> <BR /> <P> 2)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; At the registry level: This can be done using the following steps: </P> <BR /> <OL> <BR /> <LI> Launch regedit.exe </LI> <BR /> <LI> Navigating to the <EM> HKEY_LOCAL_MACHINE&gt; SYSTEM &gt; CurrentControlSet &gt; services &gt;TCPIP6 &gt; Parameters </EM> key. </LI> <BR /> <LI> Right clicking <EM> Parameters </EM> in the left sidebar and choosing <EM> New-&gt;DWORD (32 bit) Value </EM> and creating an entry <EM> DisabledComponents </EM> with value FF. </LI> <BR /> <LI> Restarting your computer to disable IPv6 </LI> <BR /> </OL> <BR /> <P> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90475i47F6E55AD810D9FB" /> </P> <BR /> <P> <STRONG> Failover Clustering behavior: </STRONG> This is the only scenario where NetFT traffic will be sent entirely over IPv4. It is to be noted that this is not recommended and not the mainstream tested code path. </P> <BR /> <STRONG> Any gotchas with using Symantec Endpoint Protection and Failover Clustering? </STRONG> <BR /> <P> A default Symantec Endpoint Protection (SEP) firewall policy has rules to Block IPv6 communication and IPv6 over IPv4 communication, which conflicts with the Failover Clustering communication over IPv6 or IPv6 over IPv4. Currently Symantec Endpoint Protection Firewall doesn't support IPv6. This is also indicated in the guidance from Symantec <A href="#" target="_blank"> here </A> . The default Firewall policies in SEP Manager is shown below: </P> <BR /> <P> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90476i1C642824859BDA74" /> </P> <BR /> <P> It is therefore recommended that if SEP is used on a Failover Cluster, the rules indicated above blocking IPv6 and IPv6 over IPv4 traffic be disabled. Also, refer to the following article - <A href="#" target="_blank"> <EM> About Windows and Symantec firewalls </EM> </A> </P> <BR /> <STRONG> Do Failover Clusters support static IPv6 addresses? </STRONG> <BR /> <P> The Failover Cluster Manager and clustering in general is streamlined for the most common case (in which customers do not use static IPv6 address). Networks are configured automatically, in that the cluster will automatically generate IPv6 addresses for the IPv6 Address resources on your networks. If you prefer to select your own statically assigned IPv6 addresses, you can reconfigure the IPv6 Address resources using PowerShell as follows (it cannot be specified when the cluster is created): </P> <BR /> <P> Open a Windows PowerShell® console as an Administrator and do the following: </P> <BR /> <P> 1)&nbsp; Create a new IPv6 Cluster IP Resource </P> <BR /> <P> Add-ClusterResource -Name "IPv6 Cluster Address" -ResourceType "IPv6 Address" -Group "Cluster Group" </P> <BR /> <P> 2)&nbsp; Set the properties for the newly created IP Address resource </P> <BR /> <P> Get-ClusterResource "IPv6 Cluster Address" | Set-ClusterParameter –Multiple @{"Network"="Cluster Network 1"; "Address"= "2001:489828:4::";"PrefixLength"=64} </P> <BR /> <P> 3)&nbsp; Stop the netname which corresponds to this static IPv6 address </P> <BR /> <P> Stop-ClusterResource "Cluster Name" </P> <BR /> <P> 4)&nbsp; Create a dependency between the netname and the static IPv6 address </P> <BR /> <P> Set-ClusterResourceDependency "Cluster Name" "[Ipv6 Cluster Address]" </P> <BR /> <P> You might consider having an OR dependency with between the netname and, the static IPv6 and IPv4 addresses as follows: </P> <BR /> <P> Set-ClusterResourceDependency "Cluster Name" "[Ipv6 Cluster Address] or [Ipv4 Cluster Address]" </P> <BR /> <P> 5)&nbsp; Restart the netname </P> <BR /> <P> Start-ClusterResource "Cluster Name" </P> <BR /> <P> </P> <BR /> <P> For name resolution, if you prefer not to use dynamic DNS, you can configure DNS mappings for the address automatically generated by the cluster, or you can configure DNS mappings for your static address. Also note that, Cluster IPv6 Address resources do not support DHCPv6. </P> <BR /> <P> </P> <BR /> <P> Thanks! </P> <BR /> <P> Subhasish Bhattacharya <BR /> <BR /> Program Manager <BR /> <BR /> Clustering &amp; High Availability <BR /> <BR /> Microsoft </P> </BODY></HTML> Fri, 15 Mar 2019 21:42:34 GMT https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/failover-clustering-and-ipv6-in-windows-server-2012-r2/ba-p/371912 John Marlin 2019-03-15T21:42:34Z Cluster Shared Volume Diagnostics https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/cluster-shared-volume-diagnostics/ba-p/371908 <HTML> <HEAD></HEAD><BODY> <STRONG> First published on MSDN on Mar 13, 2014 </STRONG> <BR /> This is the second blog post in a series about Cluster Shared Volumes (CSV). In this post we will go over diagnostics. We assume that reader is familiar with the previous blog post that explains CSV components and different CSV IO modes <A href="#" target="_blank"> http://blogs.msdn.com/b/clustering/archive/2013/12/02/10473247.aspx </A> <BR /> <H2> Is Direct IO on this Volume Possible? </H2> <BR /> Let’s assume you have created a cluster, added a disk to Cluster Shared Storage, you see that disk is online, and path to the volume (let’s say c:\ClusterStorage\Volume1) is accessible.&nbsp; The very first question you might have is if Direct IO even possible on this volume. With Windows Server 2012 R2 there is a PowerShell cmdlet that attempts to answer exactly that question: <BR /> <DIV> Get-ClusterSharedVolumeState [[-Name] &lt;StringCollection&gt;] [-Node &lt;StringCollection&gt;] [-InputObject &lt;psobject&gt;]&nbsp;&nbsp;&nbsp; [-Cluster &lt;string&gt;]&nbsp; [&lt;CommonParameters&gt;] </DIV> <BR /> <BR /> <BR /> If you run this PowerShell cmdlet providing name of the cluster Physical Disk Resource then for each cluster node it will tell you if on that node if the volume is in File System Redirected mode or Block Level Redirected mode, and will tell you the reason. <BR /> <BR /> Here is how output would look like if Direct IO is possible <BR /> <DIV> PS C:\Windows\system32&gt; get-ClusterSharedVolumeState -Name "Cluster Disk 1" <BR /> Name&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; : Cluster Disk 1 <BR /> VolumeName&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;: <A href="#"https://gorovian.000webhostapp.com/?exam= target="_blank"> \\?\Volume{1c67fa80-1171-4a9e-9f41-0bb132e88ee4}\ </A> <BR /> Node&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; : clus01 <BR /> StateInfo&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; : Direct <BR /> VolumeFriendlyName&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; : Volume1 <BR /> FileSystemRedirectedIOReason : NotFileSystemRedirected <BR /> BlockRedirectedIOReason&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; : NotBlockRedirectedName&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; : Cluster Disk 1 <BR /> VolumeName&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; : <A href="#"https://gorovian.000webhostapp.com/?exam= target="_blank"> \\?\Volume{1c67fa80-1171-4a9e-9f41-0bb132e88ee4}\ </A> <BR /> Node&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; : clus02 <BR /> StateInfo&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; : Direct <BR /> VolumeFriendlyName&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; : Volume1 <BR /> FileSystemRedirectedIOReason : NotFileSystemRedirected <BR /> BlockRedirectedIOReason&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; : NotBlockRedirected </DIV> <BR /> <BR /> <BR /> In the output above you can see that Direct IO on this volume is possible on both cluster nodes. <BR /> <BR /> If we put this disk in File System Redirected mode using <BR /> <DIV> PS C:\Windows\system32&gt; Suspend-ClusterResource -Name "Cluster Disk 1" -RedirectedAccess -ForceName&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; State&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Node <BR /> ----&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -----&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;---- <BR /> Cluster Disk 1&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Online(Redirected)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; clus01 </DIV> <BR /> Then output of get-ClusterSharedVolumeState will change to <BR /> <BR /> PS C:\Windows\system32&gt; get-ClusterSharedVolumeState -Name "Cluster Disk 1" <BR /> <DIV> Name&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; : Cluster Disk 1 <BR /> VolumeName&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; : <A href="#"https://gorovian.000webhostapp.com/?exam= target="_blank"> \\?\Volume{1c67fa80-1171-4a9e-9f41-0bb132e88ee4}\ </A> <BR /> Node&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; : clus01 <BR /> StateInfo&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; : FileSystemRedirected <BR /> VolumeFriendlyName&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;: Volume1 <BR /> FileSystemRedirectedIOReason : UserRequest <BR /> BlockRedirectedIOReason&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; : NotBlockRedirectedName&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; : Cluster Disk 1 <BR /> VolumeName&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; : <A href="#"https://gorovian.000webhostapp.com/?exam= target="_blank"> \\?\Volume{1c67fa80-1171-4a9e-9f41-0bb132e88ee4}\ </A> <BR /> Node&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;: clus02 <BR /> StateInfo&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; : FileSystemRedirected <BR /> VolumeFriendlyName&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; : Volume1 <BR /> FileSystemRedirectedIOReason : UserRequest <BR /> BlockRedirectedIOReason&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; : NotBlockRedirected </DIV> <BR /> <BR /> <BR /> You can turn off File System redirected mode using following cmdlet <BR /> <DIV> PS C:\Windows\system32&gt; resume-ClusterResource -Name "Cluster Disk 1"Name&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; State&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Node <BR /> ----&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -----&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;---- <BR /> Cluster Disk 1&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Online&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; clus01 </DIV> <BR /> <BR /> <BR /> State of CSV volume does not have to be the same on all nodes. For instance if disk is not connected to all the nodes then you might see volume in Direct mode on nodes where disk is connected and in BlockRedirected mode on the nodes where it is not connected. <BR /> <BR /> CSV volume might be in a Block Level Redirected mode for one of the following reasons <BR /> <UL> <BR /> <LI> <STRONG> NoDiskConnectivity </STRONG> – Disk is not visible on/connected to this node. You need to validate your SAN settings. </LI> <BR /> <LI> <STRONG> StorageSpaceNotAttached </STRONG> – Space is not attached on this node. Many Storage Spaces on disk formats are not trivial, and cannot be accessed for read/write by multiple nodes at the same time. Cluster enforces that a Space is accessible by only one cluster node at a time. Space is detached on all other nodes and it is attached only on the node where corresponding Physical Disk Resource is online. The only type of Space that can be attached on multiple nodes and is a Simple Space, which does not have write-back cache. </LI> <BR /> </UL> <BR /> When you are using a Mirrored or Parity Space then most often you will see that volume is in Direct IO mode on the coordinator node and in Block Redirected mode on all other nodes, and the reason for block redirected mode is StorageSpaceNotAttached. Please note that if a Space uses write-back cache then it always will be in Block Redirected mode even it is a Simple Space. <BR /> <BR /> <BR /> <BR /> CSV might be in the File System Redirected mode for one of the following reasons <BR /> <UL> <BR /> <LI> <STRONG> UserRequest </STRONG> – user put volume in redirected state. This can be done using the Failover Cluster Manager snap-in or PowerShell cmdlet Suspend-ClusterResource. </LI> <BR /> <LI> <STRONG> IncompatibleFileSystemFilter </STRONG> – An incompatible file system filter attached to the NTFS/REFS file system. Use “fltmc instances” system event log and cluster log to learn more. Usually that means you have installed a storage solution that uses a file system filter. In the previous blog post you can find samples of fltmc output. To resolve that you can either disable or uninstall the filter. The presence of a Legacy File System filter will always disable direct IO. If solution uses a File System Minifilter Driver then filters present at the following altitudes will cause CSV to stay in File System Redirected mode <BR /> <UL> <BR /> <LI> 300000 – 309999 Replication </LI> <BR /> <LI> 280000 – 289999 Continuous Backup </LI> <BR /> <LI> 180000 – 189999 HSM </LI> <BR /> <LI> 160000 – 169999 Compression </LI> <BR /> <LI> 140000 – 149999 EncryptionThe reason is that some of these filters might do something that is not compatible with Direct IO or Block Level Redirected IO. For instance a replication filter might assume that it will observe all IO so it can then replicate data to the remote site. A compression or encryption filter might need to modify data before it goes to/from the disk. If we perform Direct IO or Block Redirected IO we will bypass these filters attached to NTFS and consequently might corrupt data. Our choice is to be safe by default so we put volume in File System Redirected Mode if we notice a filter at one of the above altitudes is attached to this volume. You can explicitly inform cluster that this filter is compatible with Direct IO by adding the minifilter name to the cluster common property SharedVolumeCompatibleFilters. If you have a filter that is not on one of the altitudes that are not compatible with Direct IO, but you know that it is not compatible then you can add this minifilter to the cluster property SharedVolumeIncompatibleFilters. </LI> <BR /> </UL> <BR /> </LI> <BR /> <LI> <STRONG> IncompatibleVolumeFilter </STRONG> - An incompatible volume filter attached below NTFS/REFS. Use system event log and cluster log to learn more. The reasons and solution are similar to what we’ve discussed above. </LI> <BR /> <LI> <STRONG> FileSystemTiering </STRONG> - Volume is in file system redirected mode because the volume is a Tiered Space with heatmap tracking enabled. Tiering heatmap assumes that it can see every IO. Information about IO operations is produced by REFS/NTFS. If we perform Direct IO then statistics will be incorrect and the tiering engine could make incorrect placement decisions by moving hot data to a cold tier or vice versa. You can control if per volume heatmap is enabled/disabled using <BR /> <BR /> fsutil.exe tiering setflags/clearflags with flag /TrNHIf you choose to disable heatmap then you can control which files should go to what tier by pinning them to a tier using PowerShell cmdlet Set-FileStorageTier, and then running Optimize-Volume with –TierOptimize. Please note that for Optimize-Volume to work on CSV volume you need to put volume in File System Redirected mode using Suspend-ClusterResource. You can learn more about Storage Spaces tiering from this blogpost <A href="#" target="_blank"> http://blogs.technet.com/b/josebda/archive/2013/08/28/step-by-step-for-storage-spaces-tiering-in-windows-server-2012-r2.aspx </A> . </LI> <BR /> <LI> <STRONG> BitLockerInitializing </STRONG> – Volume is in redirected state because we are waiting for BitLocker to finish initial volume encryption of this volume. </LI> <BR /> </UL> <BR /> If Get-ClusterSharedVolumeState tells volume on a node is in Direct IO state does it mean that absolutely all IO will go Direct IO way? The answer is: It is not so simple. <BR /> <BR /> Here is another blog post that covers Get-ClusterSharedVolumeState PowerShell cmdlet <A href="#" target="_blank"> http://blogs.msdn.com/b/clustering/archive/2013/12/05/10474312.aspx </A> . <BR /> <H2> Is Direct IO on this File Possible? </H2> <BR /> Even if CSV volume is in Direct IO or Volume Level Redirected mode to be able to do Direct IO on a file there are number of preconditions that have to be true: <BR /> <UL> <BR /> <LI> CSVFS understands on disk file format <BR /> <UL> <BR /> <LI> Such as the file is not sparse, compressed, encrypted, resilient etc </LI> <BR /> </UL> <BR /> </LI> <BR /> <LI> There are no File System filters that might modify file layout or expect to see all IO <BR /> <UL> <BR /> <LI> File System minifilters that provide compression, encryption, replication etc </LI> <BR /> </UL> <BR /> </LI> <BR /> <LI> There are no File System filters that object to Direct IO on the stream. An example would be the Windows Server Deduplication feature. When you install deduplication and enable it on a CSV volume it will NOT disable Direct IO on all files. Instead it will veto Direct IO only for the files that have been optimized by dedup. </LI> <BR /> <LI> CsvFs was able to make sure NTFS/REFS will not change location of the file data on the volume – file is pinned. If NTFS relocates file’s block while CSVFS does Direct IO that could result in volume corruption. </LI> <BR /> <LI> There are no applications that need to make sure IO is observed by NTFS/REFS stack. There is an FSCTL that an application can send to the file system to tell it to keep the file in File System Redirected mode for as long as this application has the file opened. File will be switched back to the redirected mode as soon as application closes the file. </LI> <BR /> <LI> CSVFS has appropriate oplock level. Oplocks guarantee cross node cache coherency. Oplocks are documented on MSDN <A href="#" target="_blank"> http://msdn.microsoft.com/en-us/library/windows/hardware/ff551011(v=vs.85).aspx </A> <BR /> <UL> <BR /> <LI> <A> Read-Write-Handle </A> (RWH) or Read-Write (RW) for write. If CSVFS was able to obtain this level of oplock that means this file is opened only from this node. </LI> <BR /> <LI> Read-Write-Handle (RWH) or Read-Handle (RH) or Read-Write (RW) or Read (R) for reads. If CSVFS was able to obtain RH or R oplock then this file is opened from multiple nodes, but all nodes perform only file read or other operations that do not modify file content. </LI> <BR /> </UL> <BR /> </LI> <BR /> <LI> CSVFS was able to purge cache on NTFS/REFS. Make sure there is no stale cache on NTFS/REFS. </LI> <BR /> </UL> <BR /> If any of the preconditions are not true then IO is dispatched using File System Redirected mode. If all preconditions are true then CSVFS will translate IO from file offsets to the volume offsets and will send it to the CSV Volume Manager. Keep in mind that CSV Volume Manager might send it using Direct IO to the disk when disk is connected or it might send it over SMB to the disk on the Coordinator node using Block level Redirected IO. CSV Volume Manager always prefer Direct IO, and Block Level Redirected IO is used only when disk is not connected or when disk fails IO. <BR /> <H2> Summary </H2> <BR /> To provide high availability and good performance CSVFS has several alternative ways how IO might be dispatched to the disk. This demonstrated some of the tools that can be used to analyze why CSV volume chooses one path for IO versus the other. <BR /> <BR /> Thanks! <BR /> Vladimir Petter <BR /> Principal Software Development Engineer <BR /> Clustering &amp; High-Availability <BR /> Microsoft <BR /> <BR /> <BR /> <H3> To learn more, here are others in the&nbsp;Cluster Shared Volume (CSV)&nbsp;blog series: </H3> <BR /> Cluster Shared Volume (CSV) Inside Out <BR /> <A href="#" target="_blank"> http://blogs.msdn.com/b/clustering/archive/2013/12/02/10473247.aspx </A> <BR /> <BR /> Cluster Shared Volume Diagnostics <BR /> <A href="#" target="_blank"> http://blogs.msdn.com/b/clustering/archive/2014/03/13/10507826.aspx </A> <BR /> <BR /> Cluster Shared Volume Performance Counters <BR /> <A href="#" target="_blank"> http://blogs.msdn.com/b/clustering/archive/2014/06/05/10531462.aspx </A> <BR /> <BR /> Cluster Shared Volume Failure Handling <BR /> <A href="#" target="_blank"> http://blogs.msdn.com/b/clustering/archive/2014/10/27/10567706.aspx </A> </BODY></HTML> Fri, 15 Mar 2019 21:42:03 GMT https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/cluster-shared-volume-diagnostics/ba-p/371908 Elden Christensen 2019-03-15T21:42:03Z Event ID 5120 in System Event Log https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/event-id-5120-in-system-event-log/ba-p/371907 <HTML> <HEAD></HEAD><BODY> <STRONG> First published on MSDN on Feb 26, 2014 </STRONG> <BR /> When conducting backups of a Windows Server 2012 or later Failover Cluster using Cluster Shared Volumes (CSV), you may encounter the following event in the System event log: <BR /> Log Name:&nbsp; System <BR /> Source:&nbsp; Microsoft-Windows-FailoverClustering <BR /> Event ID:&nbsp; 5120 <BR /> Task Category: Cluster Shared Volume <BR /> Level:&nbsp; Error <BR /> Description:&nbsp; Cluster Shared Volume 'VolumeName' ('ResourceName') is no longer available on this node because of 'STATUS_CLUSTER_CSV_AUTO_PAUSE_ERROR(c0130021)'. All I/O will temporarily be queued until a path to the volume is reestablished. <BR /> Having an Event ID 5120 logged may or may not be the sign of a problem with the cluster, based on the error code logged.&nbsp; Having an Event 5120 with an error code of STATUS_CLUSTER_CSV_AUTO_PAUSE_ERROR or the error code c0130021 may be expected and can be safely ignored in most situations. <BR /> <BR /> An Event ID 5120 with an error code of STATUS_CLUSTER_CSV_AUTO_PAUSE_ERROR is logged on the node which owns the cluster Physical Disk resource when there was a VSS Software Snapshot which clustering knew of, but the software snapshot was deleted.&nbsp; When a snapshot is deleted which Failover Clustering had knowledge of, clustering must resynchronize its state of the view of the snapshots. <BR /> <BR /> One scenario where an Event ID 5120 with an error code of STATUS_CLUSTER_CSV_AUTO_PAUSE_ERROR may be logged is when using System Center Data Protection Manager (DPM), and DPM may delete a software snapshot once a backup has completed.&nbsp; When DPM requests deletion of a software snapshot, volsnap will mark the software snapshot for deletion.&nbsp; However volsnap conducts deletion in an asynchronous fashion which occurs at a later point in time.&nbsp; Even though the snapshot has been marked for deletion, Clustering will detect that the software snapshot still exists and needs to handle it appropriately.&nbsp; Eventually volsnap will perform the actual deletion operation of the software snapshot.&nbsp; When clustering then notices that a software snapshot it knew of was deleted, it must resynchronize its view of the snapshots. <BR /> <BR /> Think of it as clustering getting surprised by an un-notified software snapshot deletion, and the cluster service telling the various internal components of the cluster service that they need to resynchronize their views of the snapshots. <BR /> <BR /> There are also a few other expected scenarios where volsnap will delete snapshots, and as a result clustering will need to resynchronize its snapshot view.&nbsp; Such as if a copy on write fails due to lack of space or an IO error. In these conditions volsnap will log an event in the system event log associated with those failures.&nbsp; So review the system event logs for other events accompanying the event 5120, this could be logged on any node in the cluster. <BR /> <BR /> <BR /> <H2> Troubleshooting: </H2> <BR /> <OL> <BR /> <LI> If you see a few random event 5120 with an error of STATUS_CLUSTER_CSV_AUTO_PAUSE_ERROR or the error code c0130021, they can be safely ignored.&nbsp; We recognize this is not optimal as they create false positive alarms and trigger alerts in management software.&nbsp; We are investigating breaking out cluster state resynchronization into a separate non-error event in the future. </LI> <BR /> <LI> If you are seeing many Event 5120’s being logged, this is a sign that clustering is in need of constantly resynchronizing its snapshot state.&nbsp; This could be a sign of a problem and may require engaging Microsoft support for investigation. </LI> <BR /> <LI> If you are seeing event 5120’s logged with error codes other than STATUS_CLUSTER_CSV_AUTO_PAUSE_ERROR, it is a sign of a problem.&nbsp; Be due-diligent to review the error code in the description of all of the 5120’s logged be certain.&nbsp; Be careful not to dismiss the event because of a single event with STATUS_CLUSTER_CSV_AUTO_PAUSE_ERROR.&nbsp; If you see other errors logged, there are fixes available that need to be applied.&nbsp; Your first troubleshooting step should be to apply the recommended hotfixes in the appropriate article for your OS versionRecommended hotfixes and updates for Windows Server 2012-based failover clusters <BR /> <A href="#" target="_blank"> http://support.microsoft.com/kb/2784261 </A> <BR /> <A> Recommended </A> hotfixes and updates for Windows Server 2012 R2-based failover clusters <BR /> <A href="#" target="_blank"> http://support.microsoft.com/kb/2920151 </A> </LI> <BR /> <LI> If an Event 5120 is accompanied by other errors, such as an Event 5142 as below.&nbsp; It is a sign of a failure and should not be ignored. </LI> <BR /> </OL> <BR /> Log Name:&nbsp; System <BR /> Source:&nbsp; Microsoft-Windows-FailoverClustering <BR /> Event ID:&nbsp; 5142 <BR /> Task Category: Cluster Shared Volume <BR /> Level:&nbsp; Error <BR /> Description:&nbsp; Cluster Shared Volume 'VolumeName' ('ResourceName') is no longer accessible from this cluster node because of error 'ERROR_TIMEOUT(1460)'. Please troubleshoot this node's connectivity to the storage device and network connectivity. <BR /> Thanks! <BR /> Elden Christensen <BR /> Principal PM Manager <BR /> Clustering &amp; High-Availability <BR /> Microsoft </BODY></HTML> Fri, 15 Mar 2019 21:41:56 GMT https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/event-id-5120-in-system-event-log/ba-p/371907 Elden Christensen 2019-03-15T21:41:56Z How to Run ChkDsk and Defrag on Cluster Shared Volumes in Windows Server 2012 R2 https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/how-to-run-chkdsk-and-defrag-on-cluster-shared-volumes-in/ba-p/371905 <HTML> <HEAD></HEAD><BODY> <STRONG> First published on MSDN on Jan 01, 2014 </STRONG> <BR /> Cluster Shared Volumes (CSV) is a layer of abstraction on either the ReFS or NTFS file system (which is used to format the underlying private cloud storage). Just as with a non-CSV volume, at times it may be necessary to run ChkDsk and Defrag on the file system. In this blog, I am going to first address the recommended procedure to run Defrag on your CSV, in Windows Server 2012 R2 and later. I will then discuss how ChkDsk is run on your CSVs. <BR /> <H2> Running Defrag on&nbsp;a CSV: </H2> <BR /> Fragmentation of files on a CSV can impact the perceived file system performance by increasing the seek time to retrieve file system metadata. It is therefore recommended to periodically run Defrag on your CSV volume. Fragmentation is primarily a concern when running dynamic VHDs and less prevalent with static VHDs. On a stand-alone server defrag runs as part of the “Maintenance Task”, so it runs automatically. However, on a CSV volume it will never run automatically, so you need to run it manually or script it to run (potentially using a <A href="#" target="_blank"> Clustered Scheduled Task </A> ). It is recommended to conduct this process during non-peak production times, as performance may be impacted.&nbsp; The following are the steps to defragment your CSV: <BR /> <BR /> 1.&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Determine if defragmentation is required for your CSV by running the following on an elevated command prompt: <BR /> Defrag.exe &lt;CSV Mount Point&gt; /A /U /V <BR /> /A &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Perform analysis on the specified volumes <BR /> /U &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Print the progress of the operation on the screen <BR /> /V &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Print verbose output containing the fragmentation statistics <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90461i6BFCEAB2C45C24F0" /> <BR /> <BR /> <STRONG> Note: </STRONG> If your CSV is backed by thinly provisioned storage, slab consolidation analysis (not the actual slab consolidation) is run during defrag analysis.&nbsp;Slab consolidation analysis&nbsp;requires the CSV to be placed in redirected mode before execution. Please refer to step 2, for instructions on how to place your CSV into redirected mode. <BR /> <BR /> <BR /> <BR /> 2.&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; If defragmentation is required for your CSV, put the CSV into redirected mode.&nbsp; This can be achieve in either of the following ways: <BR /> <P> a.&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Using Windows PowerShell© open a new elevated Windows PowerShell console and run the following: </P> <BR /> <BR /> <A href="#" target="_blank"> Suspend-ClusterResource </A> &lt;Cluster Disk Name&gt; -RedirectedAccess <BR /> <P> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90462i0B57FC5744435A37" /> </P> <BR /> <P> b.&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Using the Failover Cluster Manager right-click on the CSV and select “Turn On Redirected Access”: </P> <BR /> <P> <STRONG> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90463iC80171275F0761B0" /> </STRONG> </P> <BR /> <STRONG> Note: </STRONG> If you attempt to run Defrag on a CSV without first putting it in redirected mode, it will fail with the following error: <BR /> <P> <EM> CSVFS failed operation as volume is not in redirected mode. (0x8007174F) </EM> </P> <BR /> <P> <EM> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90464i186FDC29E2989152" /> </EM> </P> <BR /> <P> You may also run into the following error: <EM> This operation is not supported on this filesystem. (0x89000020) </EM> </P> <BR /> 3.&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Run defrag on your CSV by running the following on an elevated command prompt: <BR /> Defrag.exe &lt;CSV Mount Point&gt; <BR /> <P> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90465iDE3036E137FE092A" /> </P> <BR /> 4.&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Once defrag has completed, revert the CSV back into direct mode by using either of the follow methods: <BR /> <P> a.&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Using Windows PowerShell© open a new elevated Windows PowerShell console and run the following: </P> <BR /> <BR /> <A href="#" target="_blank"> Resume-ClusterResource </A> &lt;Cluster Disk Name&gt; <BR /> <P> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90466iA843E2F2F385046B" /> </P> <BR /> <P> b.&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Using the Failover Cluster Manager right-click on the CSV and select “Turn Off Redirected Access”: </P> <BR /> <P> <STRONG> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90467iC7920AB92642A13C" /> </STRONG> </P> <BR /> <BR /> <BR /> <BR /> <BR /> <H2> Running&nbsp;ChkDsk&nbsp;on&nbsp;a CSV: </H2> <BR /> During the lifecycle of your file system corruptions may occur which require resolution through ChkDsk. As you are aware, CSVs in Windows Server 2012 R2 also supports the ReFS file system. However, the ReFS filesystem achieves self-healing through integrity checks on metadata. As a consequence, ChkDsk does not need to be run for CSV volumes with the ReFS file system. Thus, this discussion is scoped to corruptions in CSV with the NTFS file system. Also, note the <A href="#" target="_blank"> redesigned ChkDsk operation </A> introduced with Windows Server 2012, which separates the ChkDsk scan for errors (online operation) and the ChkDsk fix (offline operation). This results in higher availability for your Private Cloud storage since you only need to take your storage offline to fix corruptions in your storage (which is a significantly faster process than the scan for corruptions). In Windows Server 2012, we integrated ChkDsk /SpotFix into the cluster IsAlive health check for the Physical Disk Resource corresponding to the CSV. As a consequence we will now attempt to fix corruptions in your CSV without any perceptible downtime for your application. <BR /> <H2> Detection of Corruptions – ChkDsk /Scan: </H2> <BR /> The following is the workflow on Windows Server 2012 R2 or later&nbsp;systems to scan for NTFS corruptions: <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90468iC8F09F92FA42DDE4" /> <BR /> <BR /> <STRONG> Note: </STRONG> If the system is never idle it is possible that the ChkDsk scan will never be run. In this case the administrator will need to invoke this operation manually. To invoke this operation manually, on an elevated command prompt run the following: <BR /> chkdsk.exe &lt;CSV mount point name&gt; /scan <BR /> <P> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90469iE59EE4126C648C73" /> </P> <BR /> <BR /> <H2> Resolution of CSV corruptions during Physical Disk Resource IsAlive Checks: </H2> <BR /> The following is the CSV workflow in Windows Server 2012 R2&nbsp;or later to fix corruptions: <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90470iAB5AAB63872F4CB8" /> <BR /> <BR /> <STRONG> Note: </STRONG> In the rare event that a single CSV corruption takes greater than 15 seconds to fix, the above workflow will not resolve the error. &nbsp;In this case the administrator will need to manually fix this error. &nbsp;A CSV does not need to be place in maintenance or redirected mode before invoking chkdsk. The CSV will re-establish its state automatically once the chkdsk run has completed. To invoke this operation manually, on an elevated command prompt run the following: <BR /> chkdsk.exe &lt;CSV mount point name&gt; /SpotFix <BR /> <H2> Running Defrag or ChkDsk through Repair-ClusterSharedVolume cmdlet: </H2> <BR /> Running Defrag or ChkDsk on your CSV, through the Repair-ClusterSharedVolume, is deprecated with Windows Server 2012 R2 and the cmdlet has been removed completely in Windows Server 2016. It is instead highly encouraged to directly use either Defrag.exe or ChkDsk.exe for your CSV, using the procedure indicated in the preceding sections. While not recommended the use of the Repair-ClusterSharedVolume cmdlet is still supported by Microsoft. To use this cmdlet to run chkdsk or defrag, run the following on a new elevated Windows PowerShell console: <BR /> <A href="#" target="_blank"> Repair-ClusterSharedVolume </A> &lt;Cluster Disk Name&gt; -ChkDsk –Parameters &lt;ChkDsk parameters&gt; <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90471i9C857D644DA09E13" /> <BR /> <A href="#" target="_blank"> Repair-ClusterSharedVolume </A> &lt;Cluster Disk Name&gt; –Defrag –Parameters &lt;Defrag parameters&gt; <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90472i485F5AB895A2BE12" /> <BR /> <BR /> You can determine the Cluster Disk Name corresponding to your CSV using the Get-ClusterSharedVolume cmdlet by running the following: <BR /> Get-ClusterSharedVolume | fl * <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90473i10A44439B04BC9D5" /> <BR /> <BR /> Thanks! <BR /> <BR /> Subhasish Bhattacharya <BR /> Senior Program Manager <BR /> Clustering and High Availability <BR /> Microsoft </BODY></HTML> Fri, 15 Mar 2019 21:41:49 GMT https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/how-to-run-chkdsk-and-defrag-on-cluster-shared-volumes-in/ba-p/371905 Rob Hindman 2019-03-15T21:41:49Z Understanding the Repair Active Directory Object Recovery Action https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/understanding-the-repair-active-directory-object-recovery-action/ba-p/371891 <P><STRONG> First published on MSDN on Dec 13, 2013 </STRONG> </P> <P>&nbsp;</P> <P>One of the responsibilities of cluster Network Name resource is to rotate the password of the computer object in Active Directory associated with it.&nbsp; When the Network Name resource&nbsp;is online,&nbsp;it will rotate the&nbsp;password according to domain and local machine policy (which is 30 days by default).</P> <P>&nbsp;</P> <P>If the&nbsp;password is different from what is stored in the cluster database,&nbsp;the cluster service will be unable to logon to the computer object and the Network Name will&nbsp;fail to&nbsp;come online.&nbsp; This may also&nbsp;cause issues such as Kerberos errors, failure to register in a secure DNS zone, and live migration to fail.</P> <P><BR />The Repair Active Directory Object option is a recovery tool to re-synchronize&nbsp;the password for cluster computer objects.&nbsp;&nbsp;It can be found in Failover Cluster Manager (CluAdmin.msc)&nbsp;by right-clicking on the Network Name, selecting More Actions…, and then clicking Repair Active Directory Object.</P> <P>&nbsp;</P> <UL> <UL> <LI> <DIV><STRONG>Cluster Name Object (CNO) </STRONG> - The CNO is the computer object associated with the Cluster Name resource.&nbsp; When using Repair on the Cluster Name, it will use the credentials of the currently logged on user and reset the computer objects password.&nbsp; To run Repair, you must have the "Reset Password" permissions to the CNO computer object.</DIV> </LI> </UL> </UL> <UL> <UL> <LI> <DIV><STRONG>Virtual Computer Object (VCO) </STRONG> - The CNO is responsible for managing the passwords on all other computer objects (VCO's) for other cluster network names in the cluster.&nbsp; If the password for a VCO falls out of sync, the CNO will reset the password and self-heal automatically.&nbsp; Therefore it is not needed to run Repair to reset the password for a VCO.&nbsp; In Windows Server 2012 a Repair action was added for all other cluster Network Names, and is a little bit different.&nbsp; Repair&nbsp;will check to see if the associated computer object exists in Active Directory.&nbsp; If the VCO had been accidentally deleted, then using Repair will re-create the computer object if it is missing.&nbsp; The recommended process to recover deleted computer objects is&nbsp;with the AD Recycle Bin feature, using Repair&nbsp;to re-create computer objects when they have been deleted should be a last resort recovery action.&nbsp; This is because some applications store attributes in the computer object (namely MSMQ), and recreating a new&nbsp;computer object&nbsp;will break the application.&nbsp; Repair is a safe action to perform&nbsp;on any SQL Server, or File Server deployment.&nbsp; The CNO must have "Create Computer Objects" permissions on the OU in which it resides to recreate the VCO's.</DIV> </LI> </UL> </UL> <P>&nbsp;</P> <P><span class="lia-inline-image-display-wrapper lia-image-align-inline" style="width: 551px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90460iE497D82F69FC213A/image-size/large?v=v2&amp;px=999" role="button" /></span></P> <P>&nbsp;</P> <P>To run Repair, the Network Name resource must be in a "Failed" or "Offline" state.&nbsp;&nbsp; Otherwise the option will be grayed out.</P> <P><BR />Repair is only available through the Failover Cluster Manager snap-in, there is no Powershell cmdlet available to script the action.</P> <P>&nbsp;</P> <P>If you are running Windows Server 2012 and find that you are having to repeatedly run Repair every ~30 days, ensure you have hotfix KB2838043 installed.</P> <P>&nbsp;</P> <P>Matt Kurjanowicz <BR />Senior Software Development Engineer <BR />Clustering &amp; High-Availability <BR />Microsoft</P> <P>&nbsp;</P> <P>&nbsp;</P> Wed, 07 Aug 2019 18:52:18 GMT https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/understanding-the-repair-active-directory-object-recovery-action/ba-p/371891 John Marlin 2019-08-07T18:52:18Z Understanding the state of your Cluster Shared Volumes https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/understanding-the-state-of-your-cluster-shared-volumes/ba-p/371889 <HTML> <HEAD></HEAD><BODY> <STRONG> First published on MSDN on Dec 05, 2013 </STRONG> <BR /> Cluster Shared Volumes (CSV) is the clustered file system for the Microsoft Private cloud, first introduced in Windows Server 2008 R2. In Windows Server 2012, we radically improved the CSV architecture. We presented a deep dive of these architecture improvements at <A href="#" target="_blank"> TechEd 2012 </A> . Building on this new and improved architecture, in Windows Server 2012 R2, we have introduced several <A href="#" target="_blank"> new CSV features </A> . In this blog, I am going to discuss one of these new features – the new Get-ClusterSharedVolumeState Windows Server Failover Clustering PowerShell® cmdlet. This cmdlet enables you to view the state of your CSV. Understanding the state of your CSV is useful in troubleshooting failures as well as optimizing the performance of your CSV. In the remainder of this blog, I will explain how to use this cmdlet as well as how to interpret the information provided by the cmdlet. <BR /> <H2> Get-ClusterSharedVolumeState Windows PowerShell® cmdlet </H2> <BR /> The Get-ClusterSharedVolumeState cmdlet allows you to view the state of your CSV on a node in the cluster. Note that the state of your CSV can vary between the nodes of a cluster. Therefore, it might be useful to determine the state of your CSV on multiple or all nodes of your cluster. <BR /> <BR /> To use the <A href="#" target="_blank"> Get-ClusterSharedVolumeState </A> cmdlet open a new Windows PowerShell console and run the following: <BR /> <UL> <BR /> <LI> To view the state of all CSVs on all the nodes of your cluster </LI> <BR /> </UL> <BR /> Get-ClusterSharedVolumeState <BR /> <UL> <BR /> <LI> To view the state of all CSVs on a subset of the nodes in your cluster </LI> <BR /> </UL> <BR /> Get-ClusterSharedVolumeState –Node clusternode1,clusternode2 <BR /> <UL> <BR /> <LI> To view the state of a subset of CSVs on all the nodes of your cluster </LI> <BR /> </UL> <BR /> Get-ClusterSharedVolumeState –Name "Cluster Disk 2","Cluster Disk 3" <BR /> <P> OR </P> <BR /> <BR /> Get-ClusterSharedVolume "Cluster Disk 2" | Get-ClusterSharedVolumeState <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90445i5F0FFBA71525FC80" /> <BR /> <H2> Understanding the state of your CSV </H2> <BR /> The Get-ClusterSharedVolumeState cmdlet output provides two important pieces of information for a particular CSV – the state of the CSV and the reason why the CSV is in that particular state. There are three states of a CSV – Direct, File System Redirected and Block Redirected. I will now examine the output of this cmdlet for each of these states. <BR /> <H2> Direct Mode </H2> <BR /> In Direct Mode, I/O operations from the application on the cluster node can be sent directly to the storage. It therefore, bypasses the NTFS or ReFS volume stack. <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90446iB7BBB8457FCB7BE1" /> <BR /> <H2> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90447i3C5A4FAD9497E49A" /> </H2> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90448i0039DFB6DCDD279B" /> <BR /> <H2> File System Redirected Mode </H2> <BR /> In File System Redirected mode, I/O on a cluster node is redirected at the top of the CSV pseudo-file system stack over SMB to the disk. This traffic is written to the disk via the NTFS or ReFS file system stack on the coordinator node. <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90449i23148B8ECA5968D9" /> <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90450iFEA565AA047A576E" /> <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90451i102AED84E0D81CC0" /> <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90452i1F662538170564A6" /> <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90453iDB5348023EAA2146" /> <BR /> <BR /> <STRONG> Note: </STRONG> <BR /> <UL> <BR /> <LI> When a CSV is in File System Redirected Mode, I/O for the volume will not be cached in the <A href="#" target="_blank"> CSV Block Cache </A> . </LI> <BR /> <LI> Data deduplication occurs on a per file basis. Therefore, when a file on a CSV volume is deduped, all I/O for that file will occur in File System Redirected mode. I/O for the file will not be cached in the CSV Block Cache – it will instead be cached in the Deduplication Cache. For the remaining non-deduped files, CSV will be in direct mode. The state of the CSV will be reflected as being in <STRONG> Direct </STRONG> mode. </LI> <BR /> <LI> The Failover Cluster Manager will show a volume as in <STRONG> Redirected Access </STRONG> only when it is in File System Redirected Mode and the <STRONG> FileSystemRedirectedIOReason </STRONG> is <STRONG> UserRequest </STRONG> . </LI> <BR /> </UL> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90454i3787AF70569147B4" /> <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90455i84A5EEE69B377EBE" /> <BR /> <H2> Block Redirected Mode </H2> <BR /> In Block level redirected mode, I/O passes through the local CSVFS proxy file system stack and is written directly to Disk.sys on the coordinator node. As a result it avoids traversing the NTFS/ReFS file system stack twice. <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90456iC49EDD1AE977E635" /> <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90457i3FA5A74E49468ECD" /> <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90458i37318E0CBE86900C" /> <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90459iACEC922FE952F2F2" /> <BR /> <BR /> In conclusion, the Get-ClusterSharedVolumeState cmdlet is a powerful tool that enables you to understand the state of your Cluster Shared Volume and thus troubleshoot failures and optimize the performance of your private cloud storage infrastructure. <BR /> <BR /> Thanks! <BR /> Subhasish Bhattacharya <BR /> Senior Program Manager <BR /> Clustering and High Availability <BR /> Microsoft </BODY></HTML> Fri, 15 Mar 2019 21:39:31 GMT https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/understanding-the-state-of-your-cluster-shared-volumes/ba-p/371889 Rob Hindman 2019-03-15T21:39:31Z Cluster Shared Volume (CSV) Inside Out https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/cluster-shared-volume-csv-inside-out/ba-p/371872 <P><STRONG> First published on MSDN on Dec 02, 2013 </STRONG></P> <P><BR />In this blog we will take a look under the hood of the cluster file system in Windows Server 2012 R2 called Cluster Shared Volumes (CSV). This blog post&nbsp;is targeted at developers and ISV’s who are looking to integrate their storage solutions with CSV. <BR /><BR />Note: Throughout this blog, I will refer to C:\ClusterStorage assuming that the Windows is installed on the C:\ drive. Windows can be installed on any available drive and the CSV namespace will be built on the system drive, but instead of using %SystemDrive%\ClusterStorage\ I’ve used C:\ClusteredStorage for better readability since C:\ is used as the system drive most of the time.</P> <P>&nbsp;</P> <P><STRONG><FONT size="4" color="#000000">Components</FONT></STRONG></P> <P><BR />Cluster Shared Volume in Windows Server 2012 is a completely re-architected solution from Cluster Shared Volumes you knew in Windows Server 2008 R2. Although it may look similar in the user experience – just a bunch of volumes mapped under the C:\ClusterStorage\ and you are using regular windows file system interface to work with the files on these volumes, under the hood, these are two completely different architectures. One of the main goals is that in Windows Server 2012, CSV has been expanded beyond the Hyper-V workload, for example Scale-out File Server and in Windows Server 2012 R2 CSV is also supported with SQL Server 2014. <BR /><BR />First, let us look under the hood of CsvFs at the components that constitute the solution.</P> <P style="text-align: center;"><BR /><span class="lia-inline-image-display-wrapper lia-image-align-inline" style="width: 999px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90437i40C1F928B1249AC9/image-size/large?v=v2&amp;px=999" role="button" /></span> <BR /><BR /><STRONG> Figure </STRONG> <STRONG> 1: </STRONG></P> <P style="text-align: center;"><STRONG>CSV Components and Data Flow Diagram </STRONG></P> <P><BR /><BR />The diagram above shows a 3 node cluster. There is one shared disk that is visible to Node 1 and Node 2. Node 3 in this diagram has no direct connectivity to the storage. The disk was first clustered and then added to the Cluster Shared Volume. From the user’s perspective, everything will look the same as in the Windows 2008 R2. On every cluster node you will find a mount point to the volume: C:\ClusterStorage\Volume1. The “VolumeX” naming can be changed, just use Windows Explorer and rename like you would any other directory.&nbsp; CSV will then take care of synchronizing the updated name around the cluster to ensure all nodes are consistent.&nbsp; Now let’s look at the components that are backing up these mount points. <BR /><BR /><EM> Terminology </EM></P> <P><EM> The node where NTFS for the clustered CSV disk is mounted is called the Coordinator Node. In this context, any other node that does not have clustered disk mounted is called Data Servers (DS). Note that coordinator node is always a data server node at the same time. In other words, coordinator is a special data server node when NTFS is mounted. </EM></P> <P>&nbsp;</P> <P><EM> If you have multiple disks in CSV, you can place them on different cluster nodes. The node that hosts a disk will be a Coordinator Node only for the volumes that are located on that disk. Since each node might be hosting a disk, each of them might be a Coordinator Node, but for different disks. So technically, to avoid ambiguity, we should always qualify “Coordinator Node” with the volume name. For instance we should say: “Node 2 is a Coordinator Node for the Volume1”. Most of the examples we will go through in this blog post for simplicity will have only one CSV disk in the cluster so we will drop the qualification part and will just say Coordinator Node to refer to the node that has this disk online. </EM></P> <P>&nbsp;</P> <P><EM> Sometimes we will use terms “disk” and “volume” interchangeably because in the samples we will be going through one disk will have only one NTFS volume, which is the most common deployment configuration. In practice, you can create multiple volumes on a disk and CSV fully supports that as well. When you move a disk ownership from one cluster node to another, all the volumes will travel along with the disk and any given node will be the coordinator for all volumes on a given disk. Storage Spaces would be one exception from that model, but we will ignore that possibility for now. </EM></P> <P><BR />This diagram is complicated so let’s try to break it up to the pieces, and discuss each peace separately, and then hopefully the whole picture together will make more sense. <BR /><BR />On the Node 2, you can see following stack that represents mounted NTFS. Cluster guarantees that only one node has NTFS in the state where it can write to the disk, this is important because NTFS is not a clustered file system.&nbsp; CSV provides a layer of orchestration that enables NTFS or ReFS (with Windows Server 2012 R2) to be accessed concurrently by multiple servers. Following blog post explains how cluster leverages SCSI-3 Persistent Reservation commands with disks to implement that guarantee <A href="https://gorovian.000webhostapp.com/?exam=t5/Failover-Clustering/Cluster-Shared-Volumes-CSV-Disk-Ownership/ba-p/371352" target="_blank" rel="noopener">https://gorovian.000webhostapp.com/?exam=t5/Failover-Clustering/Cluster-Shared-Volumes-CSV-Disk-Ownership/ba-p/371352</A>.</P> <P style="text-align: center;"><BR /><BR /><span class="lia-inline-image-display-wrapper lia-image-align-inline" style="width: 158px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90438i9ECC8FDF01DC3DBD/image-size/large?v=v2&amp;px=999" role="button" /></span> <BR /><BR /><STRONG> Figure 2:</STRONG></P> <P style="text-align: center;"><STRONG>CSV NTFS stack </STRONG></P> <P><BR /><BR />Cluster makes this volume hidden so that Volume Manager (Volume in the diagram above) does not assign a volume GUID to this volume and there will be no drive letter assigned. You also would not see this volume using mountvol.exe or using <A href="#" target="_blank" rel="noopener"> FindFirstVolume() </A> and <A href="#" target="_blank" rel="noopener"> FindNextVolume() </A> WIN32 APIs. <BR /><BR />On the NTFS stack the cluster will attach an instance of a file system mini-filter driver called CsvFlt.sys at the altitude 404800. You can see that filter attached to the NTFS volume used by CSV if you run following command:</P> <DIV><BR /> <P><FONT color="#000080">&gt;fltmc.exe instances</FONT></P> <P><BR /><FONT color="#000080">Filter&nbsp; &nbsp; &nbsp; &nbsp;Volume Name&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Altitude&nbsp;&nbsp;&nbsp;Instance Name</FONT></P> <P><FONT color="#000080">---------&nbsp; ---------------------------&nbsp; &nbsp; &nbsp; ---------&nbsp; ----------------------</FONT></P> <P><FONT color="#999999"><EM>&lt;skip&gt;</EM></FONT><BR /><FONT color="#000080">CsvFlt&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;\Device\HarddiskVolume7&nbsp; &nbsp;404800&nbsp;&nbsp;&nbsp;&nbsp; CsvFlt Instance</FONT></P> <P><FONT color="#999999"><EM>&lt;skip&gt;</EM></FONT></P> </DIV> <P><BR />Applications are not expected to access the NTFS stack and we even go an extra mile to block access to this volume from the user mode applications. CsvFlt will check all create requests coming from the user mode against the security descriptor that is kept in the cluster public property SharedVolumeSecurityDescriptor. You can use power shell cmdlet “Get-Cluster | fl SharedVolumeSecurityDescriptor” to get to that property. The output of this PowerShell cmdlet shows value of the security descriptor in self-relative binary format ( <A href="#" target="_blank" rel="noopener"> http://msdn.microsoft.com/en-us/library/windows/desktop/aa374807(v=vs.85).aspx </A> :(</img></P> <DIV><BR /><FONT color="#000080">PS &gt; Get-Cluster | fl SharedVolumeSecurityDescriptor </FONT></DIV> <DIV><FONT color="#000080">SharedVolumeSecurityDescriptor : {1, 0, 4, 128...}</FONT></DIV> <P><BR />CsvFlt plays several roles: <BR /><BR /></P> <UL> <UL> <LI>Provides an extra level of protection for the hidden NTFS volume used for CSV</LI> </UL> </UL> <UL> <UL> <LI>Helps provide a local volume experience (after all CsvFs does look like a local volume). For instance you cannot open volume over SMB or read USN journal. To enable these kinds of scenarios CsvFs often times marshals the operation that need to be performed to the CsvFlt disguising it behind a tunneling file system control. CsvFlt is responsible for converting the tunneled information back to the original request before forwarding it down-the stack to NTFS.</LI> </UL> </UL> <UL> <UL> <LI>It implements several mechanisms to help coordinate certain states across multiple nodes. We will touch on them in the future posts. File Revision Number is one of them for example.</LI> </UL> </UL> <P>The next stack we will look at is the system volume stack. On the diagram above you see this stack only on the coordinator node which has NTFS mounted. In practice exactly the same stack exists on all nodes.</P> <P style="text-align: center;"><BR /><BR /><span class="lia-inline-image-display-wrapper lia-image-align-inline" style="width: 158px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90439i31E8E7E5D3F17E72/image-size/large?v=v2&amp;px=999" role="button" /></span> <BR /><BR /><STRONG> Figure 3:</STRONG></P> <P style="text-align: center;"><STRONG>System Volume Stack </STRONG></P> <P><BR /><BR />The CSV Namespace Filter (CsvNsFlt.sys) is a file system mini-filter driver at an altitude of 404900:</P> <DIV><BR /><FONT color="#000080">&gt;fltmc instances </FONT></DIV> <DIV><BR /><FONT color="#000080">Filter&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Volume Name&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Altitude&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Instance Name </FONT><BR /><FONT color="#000080">------------&nbsp; ---------------------&nbsp; ------------&nbsp; ---------------------- </FONT><BR /><EM><FONT color="#000080"><FONT color="#999999">&lt;skip&gt;</FONT> </FONT></EM><BR /><FONT color="#000080">CsvNSFlt&nbsp; &nbsp; &nbsp; C:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;404900&nbsp; &nbsp; &nbsp; &nbsp; CsvNSFlt Instance </FONT><BR /><EM><FONT color="#999999">&lt;skip&gt;</FONT></EM></DIV> <P><BR />CsvNsFlt plays the following roles: <BR /><BR /></P> <UL> <UL> <LI>It protects C:\ClusterStorage by blocking unauthorized attempts that are not coming from the cluster service to delete or create any files or subfolders in this folder or change any attributes on the files. Other than opening these folders about the only other operation that is not blocked is renaming the folders. You can use command prompt or explorer to rename C:\ClusterStorage\Volume1 to something like C:\ClusterStorage\Accounting. &nbsp;The directory name will be synchronized and updated on all nodes in the cluster.</LI> </UL> </UL> <UL> <UL> <LI>It helps us to dispatch the block level redirected IO. We will cover this in more details when we talk about the block level redirected IO later on in this post.</LI> </UL> </UL> <P>The last stack we will look at is the stack of the CSV file system. Here you will see two modules CSV Volume Manager (csvvbus.sys), and CSV File System (CsvFs.sys). CsvFs is a file system driver, and mounts exclusively to the volumes surfaced up by CsvVbus.</P> <P style="text-align: center;"><BR /><BR /><span class="lia-inline-image-display-wrapper lia-image-align-inline" style="width: 184px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90440iB2F77B1E45FFEC2D/image-size/large?v=v2&amp;px=999" role="button" /></span> <BR /><BR /><STRONG> Figure 5:</STRONG></P> <P style="text-align: center;"><STRONG>CsvFs stack</STRONG></P> <P>&nbsp;</P> <H2>Data Flow</H2> <P><BR />Now that we are familiar with the components and how they are related to each other, let’s look at the data flow. <BR /><BR />First let’s look at how <STRONG> Metadata </STRONG> flows. Below you can see the same diagram as on the Figure 1. I’ve just kept only the arrows and blocks that is relevant to the metadata flow and removed the rest from the diagram.</P> <P style="text-align: center;"><BR /><BR /><span class="lia-inline-image-display-wrapper lia-image-align-inline" style="width: 998px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90441iA8B628385AD9ABA5/image-size/large?v=v2&amp;px=999" role="button" /></span> <BR /><BR /><STRONG> Figure </STRONG> <STRONG> 6:</STRONG></P> <P style="text-align: center;"><STRONG>Metadata Flow </STRONG></P> <P><BR /><BR />Our definition of metadata operation is everything except read and write. Examples of metadata operation would be create file, close file, rename file, change file attributes, delete file, change file size, any file system control, etc. Some writes may also, as a side effect cause a metadata change. For instance, an extending write will cause CsvFs to extend all or some of the following: file allocation size, file size and valid data length. A read might cause CsvFs to query some information from NTFS. <BR /><BR />On the diagram above you can see that metadata from any node goes to the NTFS stack on Node 2. Data server nodes (Node 1 and Node 3) are using Server Message Block (SMB) as a protocol to forward metadata over. <BR /><BR />Metadata are always forwarded to NTFS. On the coordinator node CsvFs will forward metadata IO directly to the NTFS volume while other nodes will use SMB to forward the metadata over the network.</P> <P>Next, let’s look at the data flow for the <STRONG> Direct IO </STRONG> . The following diagram is produced from the diagram on the Figure 1 by removing any blocks and lines that are not relevant to the Direct IO. By definition Direct IO are the reads and writes that never go over the network, but go from CsvFs through CsvVbus straight to the disk stack. To make sure there is no ambiguity I’ll repeat it again: - Direct IO bypasses volume stack and goes directly to the disk.</P> <P>&nbsp;</P> <P>&nbsp;</P> <P style="text-align: center;"><span class="lia-inline-image-display-wrapper lia-image-align-inline" style="width: 747px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90442i0D1524436945D2E9/image-size/large?v=v2&amp;px=999" role="button" /></span></P> <P style="text-align: center;"><BR /><STRONG> Figure </STRONG> <STRONG> 7:</STRONG></P> <P style="text-align: center;"><STRONG>Direct IO Flow </STRONG></P> <P><BR /><BR />Both Node 1 and Node 2 can see the shared disk - they can send reads and writes directly to the disk completely avoiding sending data over the network. The Node 3 is not in the diagram on the Figure 7 Direct IO Flow since it cannot perform Direct IO, but it is still part of the cluster and it will use block level redirected IO for reads and writes. <BR /><BR />The next diagram shows a <STRONG> File System </STRONG> <STRONG> Redirected IO </STRONG> request flows. The diagram and data flow for the redirected IO is very similar to the one for the metadata from the Figure 6 Metadata Flow:</P> <P style="text-align: center;"><BR /><BR /><span class="lia-inline-image-display-wrapper lia-image-align-inline" style="width: 999px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90443i00E8EDAC2C8C938C/image-size/large?v=v2&amp;px=999" role="button" /></span> <BR /><BR /><STRONG> Figure 8<BR />File System Redirected IO Flow </STRONG></P> <P><BR /><BR />Later we will discuss when CsvFs uses the file system redirected IO to handle reads and writes and how it compares to what we see on the next diagram – <STRONG> Block Level Redirected IO </STRONG> :</P> <P style="text-align: center;"><BR /><BR /><span class="lia-inline-image-display-wrapper lia-image-align-inline" style="width: 999px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90444i603CA859EFB94CEF/image-size/large?v=v2&amp;px=999" role="button" /></span> <BR /><BR /><STRONG> Figure 9:</STRONG></P> <P style="text-align: center;"><STRONG>Block Level Redirected IO Flow </STRONG></P> <P><BR /><BR />Note that on this diagram I have completely removed CsvFs stack and CSV NTFS stack from the Coordinator Node leaving only the system volume NTFS stack. The CSV NTFS stack is removed because Block Level Redirected IO completely bypasses it and goes to the disk (yes, like Direct IO it bypasses the volume stack and goes straight to the disk) below the NTFS stack. The CsvFs stack is removed because on the coordinating node CsvFs would never use Block Level Redirected IO, and would always talk to the disk. The reason why Node 3 would use Redirected IO, is because Node 3 does not have physical connectivity to the disk. A curious reader might wonder why Node 1 that can see the disk would ever use Block Level Redirected IO. There are at least two cases when this might be happening. Although the disk might be visible on the node it is possible that IO requests will fail because the adapter or storage network switch is misbehaving. In this case, CsvVbus will first attempt to send IO to the disk and on failure will forward the IO to the Coordinator Node using the Block Level Redirected IO. The other example is Storage Spaces - if the disk is a Mirrored Storage Space, then CsvFs will never use Direct IO on a data server node, but instead it will send the block level IO to the Coordinating Node using Block Level Redirected IO.&nbsp; In Windows Server 2012 R2 you can use the <A href="#" target="_self">Get-ClusterSharedVolumeState</A> cmdlet to query the CSV state (direct / file level redirected / block level redirected) and if redirected it will state why. <BR /><BR />Note that CsvFs sends the Block Level Redirected IO to the CsvNsFlt filter attached to the system volume stack on the Coordinating Node. This filter dispatches this IO directly to the disk bypassing NTFS and volume stack so no other filters below the CsvNsFlt on the system volume will see that IO. Since CsvNsFlt sits at a very high altitude, in practice no one besides this filter will see these IO requests. This IO is also completely invisible to the CSV NTFS stack. You can think about Block Level Redirected IO as a Direct IO that CsvVbus is shipping to the Coordinating Node and then with the help of the CsvNsFlt it is dispatched directly to the disk as a Direct IO is dispatched directly to the disk by CsvVbus.</P> <P>&nbsp;</P> <H2>What are these SMB shares?</H2> <P><BR />CSV uses the Server Message Block (SMB) protocol to communicate with the Coordinator Node. As you know, SMB3 requires certain configuration to work. For instance it requires file shares. Let’s take a look at how cluster configures SMB to enable CSV. <BR /><BR />If you dump list of SMB file shares on a cluster node with CSV volumes you will see following:</P> <DIV><BR /><FONT color="#000080">&gt; Get-SmbShare </FONT></DIV> <DIV><BR /><FONT color="#000080">Name&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; ScopeName&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Path&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Description </FONT><BR /><FONT color="#000080">--------&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; -------------&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;----&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; ----------- </FONT><BR /><FONT color="#000080">ADMIN$&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;*&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;C:\Windows&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Remote Admin </FONT><BR /><FONT color="#000080">C$&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;*&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;C:\&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Default share </FONT><BR /><FONT color="#000080">ClusterStorage$&nbsp; &nbsp;CLUS030512&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;C:\ClusterStorage&nbsp; &nbsp;Cluster Shared Volumes Def... </FONT><BR /><FONT color="#000080">IPC$&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; *&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Remote IPC</FONT></DIV> <P><BR />There is a hidden admin share that is created for CSV, shared as ClusterStorage$. This share is created by the cluster to facilitate remote administration. You should use it in the scenarios where you would normally use an admin share on any other volume (such as D$). This share is scoped to the Cluster Name. Cluster Name is a special kind of Network Name that is designed to be used to manage a cluster. You can learn more about Network Name in the following <A href="https://gorovian.000webhostapp.com/?exam=t5/Failover-Clustering/DNS-Registration-with-the-Network-Name-Resource/ba-p/371482" target="_self">blog</A> post.&nbsp; You can access this share using the Cluster Name, i.e. <EM>\\&lt;cluster name&gt;\ClusterStorage$</EM> <BR /><BR />Since this is an admin share, it is ACL’d so only members of the Administrators group have full access to this share. In the output the access control list is defined using Security Descriptor Definition Language (SDDL). You can learn more about SDDL here <A href="#" target="_blank" rel="noopener"> http://msdn.microsoft.com/en-us/library/windows/desktop/aa379567(v=vs.85).aspx </A></P> <DIV><BR /><FONT color="#000080">ShareState&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; : Online </FONT><BR /><FONT color="#000080">ClusterType&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; : ScaleOut </FONT><BR /><FONT color="#000080">ShareType&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; : FileSystemDirectory </FONT><BR /><FONT color="#000080">FolderEnumerationMode : Unrestricted </FONT><BR /><FONT color="#000080">CachingMode&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;: Manual </FONT><BR /><FONT color="#000080">CATimeout&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; : 0 </FONT><BR /><FONT color="#000080">ConcurrentUserLimit&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;: 0 </FONT><BR /><FONT color="#000080">ContinuouslyAvailable&nbsp; &nbsp; &nbsp; : False </FONT><BR /><FONT color="#000080">CurrentUsers&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;: 0 </FONT><BR /><FONT color="#000080">Description&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; : Cluster Shared Volumes Default Share </FONT><BR /><FONT color="#000080">EncryptData&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;: False </FONT><BR /><FONT color="#000080">Name&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;: ClusterStorage$ </FONT><BR /><FONT color="#000080">Path&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;: C:\ClusterStorage </FONT><BR /><FONT color="#000080">Scoped&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; : True </FONT><BR /><FONT color="#000080">ScopeName&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;: CLUS030512 </FONT><BR /><FONT color="#000080">SecurityDescriptor&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;: D:(A;;FA;;;BA)</FONT></DIV> <P><BR />There are also couple hidden shares that are used by the CSV. You can see them if you add the <STRONG> IncludeHidden </STRONG> parameter to the get-SmbShare cmdlet. These shares are used only on the Coordinator Node. Other nodes either do not have these shares or these shares are not used:</P> <DIV><BR /><FONT color="#000080">&gt; Get-SmbShare -IncludeHidden </FONT></DIV> <DIV><BR /><FONT color="#000080">Name&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;ScopeName&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Path&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Description </FONT><BR /><FONT color="#000080">----&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;---------&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;----&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; ----------- </FONT><BR /><FONT color="#000080">17f81c5c-b533-43f0-a024-dc...&nbsp; &nbsp; *&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;&nbsp;<A href="https://gorovian.000webhostapp.com/?exam=\\?\GLOBALROOT\Device\Hard" target="_blank" rel="noopener">\\?\GLOBALROOT\Device\Hard </A> ... </FONT><BR /><FONT color="#000080">ADMIN$&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;*&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; C:\Windows&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Remote Admin </FONT><BR /><FONT color="#000080">C$&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;*&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; C:\&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Default share </FONT><BR /><FONT color="#000080">ClusterStorage$&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;VPCLUS030512&nbsp; &nbsp; C:\ClusterStorage&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Cluster Shared Volumes Def... </FONT><BR /><FONT color="#000080">CSV$&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; *&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;C:\ClusterStorage </FONT><BR /><FONT color="#000080">IPC$&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;*&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Remote IPC</FONT></DIV> <P><BR />Each Cluster Shared Volume hosted on a coordinating node cluster creates a share with a name that looks like a GUID. This is used by CsvFs to communicate to the hidden CSV NTFS stack on the coordinating node. This share points to the hidden NTFS volume used by CSV. Metadata and the File System Redirected IO are flowing to the Coordinating Node using this share.</P> <DIV><BR /><FONT color="#000080">ShareState&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;: Online </FONT><BR /><FONT color="#000080">ClusterType&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; : CSV </FONT><BR /><FONT color="#000080">ShareType&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; : FileSystemDirectory </FONT><BR /><FONT color="#000080">FolderEnumerationMode&nbsp; &nbsp; : Unrestricted </FONT><BR /><FONT color="#000080">CachingMode&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; : Manual </FONT><BR /><FONT color="#000080">CATimeout&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;: 0 </FONT><BR /><FONT color="#000080">ConcurrentUserLimit&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; : 0 </FONT><BR /><FONT color="#000080">ContinuouslyAvailable&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; : False </FONT><BR /><FONT color="#000080">CurrentUsers&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; : 0 </FONT><BR /><FONT color="#000080">Description&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;: </FONT><BR /><FONT color="#000080">EncryptData&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; : False </FONT><BR /><FONT color="#000080">Name&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; : 17f81c5c-b533-43f0-a024-dc431b8a7ee9-1048576$ </FONT><BR /><FONT color="#000080">Path&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; : <A href="https://gorovian.000webhostapp.com/?exam=\\?\GLOBALROOT\Device\Harddisk2\ClusterPartition1\&quot; data-mce-href=" target="_blank" rel="noopener"> \\?\GLOBALROOT\Device\Harddisk2\ClusterPartition1\ </A> </FONT><BR /><FONT color="#000080">Scoped&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; : False </FONT><BR /><FONT color="#000080">ScopeName&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; : * </FONT><BR /><FONT color="#000080">SecurityDescriptor&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; : O:SYG:SYD:(A;;FA;;;SY)(A;;FA;;;S-1-5-21-2310202761-1163001117-2437225037-1002) </FONT><BR /><FONT color="#000080">ShadowCopy&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;: False </FONT><BR /><FONT color="#000080">Special&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;: True </FONT><BR /><FONT color="#000080">Temporary&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;: True</FONT></DIV> <P><BR />On the Coordinating Node you also will see a share with the name CSV$. This share is used to forward Block Level Redirected IO to the Coordinating Node. There is only one CSV$ share on every Coordinating Node:</P> <DIV><BR /><FONT color="#000080">ShareState&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; : Online </FONT><BR /><FONT color="#000080">ClusterType&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;: CSV </FONT><BR /><FONT color="#000080">ShareType&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;: FileSystemDirectory </FONT><BR /><FONT color="#000080">FolderEnumerationMode&nbsp; &nbsp;: Unrestricted </FONT><BR /><FONT color="#000080">CachingMode&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;: Manual </FONT><BR /><FONT color="#000080">CATimeout&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; : 0 </FONT><BR /><FONT color="#000080">ConcurrentUserLimit&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;: 0 </FONT><BR /><FONT color="#000080">ContinuouslyAvailable&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;: False </FONT><BR /><FONT color="#000080">CurrentUsers&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;: 0 </FONT><BR /><FONT color="#000080">Description&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; : </FONT><BR /><FONT color="#000080">EncryptData&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;: False </FONT><BR /><FONT color="#000080">Name&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; : CSV$ </FONT><BR /><FONT color="#000080">Path&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;: C:\ClusterStorage </FONT><BR /><FONT color="#000080">Scoped&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; : False </FONT><BR /><FONT color="#000080">ScopeName&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;: * </FONT><BR /><FONT color="#000080">SecurityDescriptor&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;: O:SYG:SYD:(A;;FA;;;SY)(A;;FA;;;S-1-5-21-2310202761-1163001117-2437225037-1002) </FONT><BR /><FONT color="#000080">ShadowCopy&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;: False </FONT><BR /><FONT color="#000080">Special&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;: True </FONT><BR /><FONT color="#000080">Temporary&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;: True</FONT></DIV> <P><BR />Users are not expected to use these shares - they are ACL’d so only Local System and Failover Cluster Identity user (CLIUSR) have access to the share. <BR /><BR />All of these shares are temporary - information about these shares is not in any persistent storage, and when node reboots they will be removed from the Server Service. Cluster takes care of creating the shares every time during CSV start up.</P> <P>&nbsp;</P> <H2>Conclusion</H2> <P><BR />You can see that that&nbsp;Cluster Shared Volumes&nbsp;in Windows Server 2012 R2 is built on a solid foundation of Windows storage stack, CSVv1, and SMB3. <BR /><BR />Thanks! <BR />Vladimir Petter <BR />Principal Software Development Engineer <BR />Clustering &amp; High-Availability <BR />Microsoft <BR /><BR /><BR /></P> <H2>Additional Resources:</H2> <P><BR />To learn more, here are others in the Cluster Shared Volume (CSV) blog series: <BR /><BR /><A href="https://gorovian.000webhostapp.com/?exam=t5/Failover-Clustering/Cluster-Shared-Volume-CSV-Inside-Out/ba-p/371872" target="_self">Cluster Shared Volume (CSV) Inside Out</A> <BR /><BR /><A href="https://gorovian.000webhostapp.com/?exam=t5/Failover-Clustering/Cluster-Shared-Volume-Diagnostics/ba-p/371908" target="_self">Cluster Shared Volume Diagnostics</A> <BR /><BR /><A href="https://gorovian.000webhostapp.com/?exam=t5/Failover-Clustering/Cluster-Shared-Volume-Performance-Counters/ba-p/371980" target="_self">Cluster Shared Volume Performance Counters</A> <BR /><BR /><A href="https://gorovian.000webhostapp.com/?exam=t5/Failover-Clustering/Cluster-Shared-Volume-Failure-Handling/ba-p/371989" target="_self">Cluster Shared Volume Failure Handling</A> <BR /><BR /><A href="https://gorovian.000webhostapp.com/?exam=t5/Failover-Clustering/Troubleshooting-Cluster-Shared-Volume-Auto-Pauses-8211-Event/ba-p/371994" target="_self">Troubleshooting Cluster Shared Volume Auto-Pauses – Event 5120</A> <BR /><BR /><A href="https://gorovian.000webhostapp.com/?exam=t5/Failover-Clustering/Troubleshooting-Cluster-Shared-Volume-Recovery-Failure-8211/ba-p/371997" target="_self">Troubleshooting Cluster Shared Volume Recovery Failure – System Event 5142</A> <BR /><BR /><A href="https://gorovian.000webhostapp.com/?exam=t5/Failover-Clustering/Cluster-Shared-Volume-A-Systematic-Approach-to-Finding/ba-p/372049" target="_self">Cluster Shared Volume – A Systematic Approach to Finding Bottlenecks</A> <BR /><BR /></P> Tue, 09 Jul 2019 22:22:52 GMT https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/cluster-shared-volume-csv-inside-out/ba-p/371872 Elden Christensen 2019-07-09T22:22:52Z Decoding Bugcheck 0x0000009E https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/decoding-bugcheck-0x0000009e/ba-p/371863 <P><STRONG> First published on MSDN on Nov 13, 2013 </STRONG></P> <P><BR />In the System event log you may find an event similar to the following:</P> <P><BR /><FONT color="#3366FF">Event ID: 1001 </FONT><BR /><FONT color="#3366FF">Source:&nbsp; Microsoft-Windows-WER-SystemErrorReporting </FONT><BR /><FONT color="#3366FF">Description:&nbsp; The computer has rebooted from a bugcheck.&nbsp; The bugcheck was: 0x0000009e (0x0000000000000000, 0x0000000000000000, 0x0000000000000000, 0x0000000000000000). A dump was saved in: C:\Windows\MEMORY.DMP.</FONT></P> <P><BR />Let's start&nbsp;out discussing what a STOP 0x9e is.&nbsp; Failover Clustering actively conducts health monitoring of many components and at different&nbsp;layers of a server, one of the attributes of a highly available system is to have the health detection mechanisms in place to detect when something goes wrong and to react.&nbsp; Under some conditions when an extreme failure occurs, the cluster service may intentionally bugcheck the server in an attempt to recover.&nbsp; The bugcheck will be a USER_MODE_HEALTH_MONITOR (9e) and invoked by the Failover Cluster kernel mode driver NetFT.sys. <BR /><BR />The first and most important thing to understand is that this is normal cluster health detection and recovery, it is intended recovery behavior.&nbsp; It is not a “bug” in clustering, nor is it a bug in NetFT.sys... it is a feature, not a flaw.&nbsp; I say this, because the most common first troubleshooting step I see is that customers apply the latest hotfix for NetFT.sys… and that won’t help. <BR /><BR />By far the most common reason for a 0x9e is that Failover Clustering is conducting health monitoring between the NetFT kernel mode driver to the user mode service.&nbsp; If NetFT stop receiving heartbeats, then user mode is considered to be non-responsive and clustering will bugcheck the box in an effort to force a recovery. <BR /><BR />So the next question is what caused user mode to become unresponsive?&nbsp; In general, you can troubleshoot this like any other user mode hang…&nbsp; you can setup perfmon and look for memory leaks, etc…&nbsp;&nbsp; the most valuable diagnostic tool will be that when clustering bugchecks the box, you can capture a dump and analyze it to reach root cause.&nbsp; This will involve a call to Microsoft support to help debug the dump.</P> <P>&nbsp;</P> <P>One of the questions we receive is, what type of memory dump should we be set for (small, kernel, active, or complete)?&nbsp; This is a good question and not everyone takes this into consideration.&nbsp; As the blog has discussed, we are blue screening due to user mode processes.&nbsp; User mode processes are not contained within a small or kernel memory dump, but are contained within an active and complete dump.&nbsp; There are very few, but still a few, that can be diagnosed from a kernel dump.&nbsp; In order to properly follow the path to the user mode hang causing this blue screen, starting in user mode Resource Host System (RHS) is needed, necessitating an active or complete dump to be taken.&nbsp;&nbsp;<BR /><BR />There are a couple different conditions which can invoke a bugcheck 0x9e.&nbsp; In this blog I will discuss the different parameters logged in the Event ID 1001 and what they mean.</P> <P>&nbsp;</P> <H2>Decoding STOP 0x0000009E</H2> <P><BR />The bugcheck code will have the following format with the following parameters. <BR />Stop 0x0000009E ( Parameter1 , Parameter2 , Parameter3 , Parameter4 )</P> <P>&nbsp;</P> <H2>Parameter1 value meaning:</H2> <P><BR />Process that failed to satisfy a health check within the configured timeout</P> <P>&nbsp;</P> <H2>Parameter2 value meaning:</H2> <P><BR />Hex value which defines the time in seconds for the timeout which was hit.&nbsp; This will detail how long it took for the bugcheck to be invoked.</P> <P>&nbsp;</P> <H2>Parameter3 value meaning:</H2> <P>&nbsp;</P> <TABLE> <TBODY> <TR> <TD><STRONG> Value </STRONG></TD> <TD><STRONG> Description </STRONG></TD> </TR> <TR> <TD>0x0000000000000000</TD> <TD>The source of the reason for the bugcheck was not specified.&nbsp; In OS versions prior to Win2012 R2 this will always be the value.</TD> </TR> <TR> <TD>0x0000000000000001</TD> <TD>The node has been bugchecked because the RHS process was attempting to gracefully close and did not complete successfully.</TD> </TR> <TR> <TD>0x0000000000000002</TD> <TD>The node has been bugchecked because a resource did not respond to a resource entry point call within the configured 'DeadlockTimeout' timeout.&nbsp; The node was configured to bugcheck by the 'DebugBreakOnDeadlock' registry key being set to a value of 3.</TD> </TR> <TR> <TD>0x0000000000000003</TD> <TD>The node has been bugchecked because of an unhandled exception with one of the cluster resources and when attempting to recover the RHS process did not terminate successfully within 20 minutes.</TD> </TR> <TR> <TD>0x0000000000000004</TD> <TD>The node has been bugchecked because of an unhandled exception with the Resource Hosting Subsystem (RHS) and when attempting to recover the RHS process did not terminate successfully within 20 minutes.</TD> </TR> <TR> <TD>0x0000000000000005</TD> <TD>The node has been bugchecked because a resource did not respond to a resource entry point call within the 'DeadlockTimeout' timeout (5 minutes by default)&nbsp;and an attempt was made to terminate the RHS process to recover.&nbsp; However, the RHS process did not terminate successfully within the timeout, which is&nbsp;four times the&nbsp;'DeadlockTimeout' timeout (20 minutes by default).</TD> </TR> <TR> <TD>0x0000000000000006</TD> <TD>The node has been bugchecked because a resource type did not respond to a resource entry point call within the 'DeadlockTimeout' timeout and an attempt was made to terminate the RHS process to recover.&nbsp; However, the RHS process did not terminate successfully.</TD> </TR> <TR> <TD>0x0000000000000007</TD> <TD>The node has been bugchecked because of an unhandled exception with the Cluster Service (ClusSvc) and when attempting to recover the ClusSvc process did not terminate successfully within 20 minutes.</TD> </TR> <TR> <TD>0x0000000000000008</TD> <TD>The node has been bugchecked by the request of another node in the Failover Cluster</TD> </TR> <TR> <TD>0x0000000000000009</TD> <TD>The node has been bugchecked because the cluster service detected an internal&nbsp;subcomponent of the cluster service was being unresponsive. The system was configured to bugcheck by the 'HangRecoveryAction' setting being set to a value of 4</TD> </TR> <TR> <TD>0x000000000000000A</TD> <TD>The node has been bugchecked because the kernel mode NetFT driver did not receive a heartbeat from the user mode Cluster Service within the configured 'ClusSvcHangTimeout' timeout.&nbsp; The recovery action was configured to bugcheck by having the 'HangRecoveryAction' cluster common property being set to a value of 3 (default)&nbsp;or 4</TD> </TR> </TBODY> </TABLE> <P><BR />Note:&nbsp; Parameter3 is a new value introduced in Windows Server 2012 R2 and will always be 0x0000000000000000 in previous releases.</P> <P>&nbsp;</P> <H2>Parameter4 value meaning:</H2> <P><BR />Currently unused / reserved for future use, and will always be 0x0000000000000000 <BR /><BR />Thanks! <BR />Elden Christensen <BR />Principal PM Manager <BR />Clustering &amp; High-Availability <BR />Microsoft</P> Fri, 19 Jul 2019 21:16:38 GMT https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/decoding-bugcheck-0x0000009e/ba-p/371863 Elden Christensen 2019-07-19T21:16:38Z Server Virtualization w/ Windows Server Hyper-V & System Center Jump Start https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/server-virtualization-w-windows-server-hyper-v-system-center/ba-p/371862 <HTML> <HEAD></HEAD><BODY> <STRONG> First published on MSDN on Nov 08, 2013 </STRONG> <BR /> Add Hyper-V to your server virtualization skillset and improve your career options: Register for this free course, led by Microsoft experts <A href="#" target="_blank"> Symon Perriman </A> and Corey Hynes: <A href="#" target="_blank"> Server Virtualization w/ Windows Server Hyper-V &amp; System Center Jump Start </A> .&nbsp; Windows Server 2012 R2 Failover Clustering will be covered in depth, including cluster validation, configuration, management, and best practices for virtualization, from both a Hyper-V and System Center 2012 R2 perspective.This course helps you prepare for the new Microsoft virtualization certification: Microsoft Specialist Exam <A href="#" target="_blank"> 74-409 </A> -Server Virtualization with Windows Server Hyper-V and System Center. Event attendees will get a free voucher for the exam*—normally $150. <BR /> <BR /> Already familiar with other virtualization platforms such as VMware or Citrix? Upgrading virtualization platforms?&nbsp; New to virtualization?&nbsp; If any of these are true, then this course is intended for you. Get expert instruction on Microsoft Server Virtualization with Windows Server 2012 R2 Hyper-V and System Center 2012 R2 Virtual Machine Manager in this two-day Jump Start. You will learn how to configure, manage, and maintain Windows Server 2012 R2 Hyper-V and System Center 2012 R2 Virtual Machine Manager including networking and storage services. You will also learn how to configure key Microsoft server virtualization features such as Generation 2 Virtual Machines, Replication Extension, Online Export, Cross-Version Live Migration, Online VHDX Resizing, Live Migration Performance tuning as well as Dynamic Virtual Switch Load Balancing and virtual Receive Side Scaling (vRSS). <P> </P> <A href="#" target="_blank"> Server Virtualization w/ Windows Server Hyper-V &amp; System Center Jump Start </A> <UL> <LI> Date:&nbsp; November 19 &amp; 20, 2013 </LI> <LI> Time: 9:00am – 4:30pm </LI> <LI> Where: Live, online virtual classroom </LI> <LI> Cost: Free! </LI> </UL> Put your career in hyperdrive and <A href="#" target="_blank"> Register now </A> . <P> Additional Resources: </P> <UL> <LI> Free expert-led <A href="#" target="_blank"> MVA courses on Virtualization </A> </LI> <LI> Download <A href="#" target="_blank"> Windows Server 2012 R2 </A> </LI> <LI> Download <A href="#" target="_blank"> Hyper-V Server 2012 R2 </A> </LI> <LI> Download <A href="#" target="_blank"> System Center 2012 R2 </A> </LI> </UL> <P> <BR /> <BR /> </P> <P> *The number of free exams is limited, so be sure to schedule your appointment to lock in your free exam. Vouchers expire and all exams must be taken by June 30, 2014. </P> </BODY></HTML> Fri, 15 Mar 2019 21:35:55 GMT https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/server-virtualization-w-windows-server-hyper-v-system-center/ba-p/371862 Elden Christensen 2019-03-15T21:35:55Z Windows Server 2012 R2 Virtual Machine Recovery from Network Disconnects https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/windows-server-2012-r2-virtual-machine-recovery-from-network/ba-p/371861 <HTML> <HEAD></HEAD><BODY> <STRONG> First published on MSDN on Sep 04, 2013 </STRONG> <BR /> <H3> Overview </H3> <BR /> <P> Windows Server Failover Clustering has always monitored the running state of virtual machines and the health state of clustered network and clustered storage.&nbsp; We are furthering the failure detection to include monitoring of the virtual machine network and virtual switch connectivity. </P> <BR /> <P> Windows Server 2012 R2 introduces a new functionality that allows a virtual machine (VM) to be moved to another node in a failover cluster, using live migration if a network that it’s using becomes disconnected.&nbsp; This improves the availability in cases where a network connection issue may cause clients using the services running inside the VM to be cut off by moving the VM to a node that can provide the networking access to the VM.&nbsp; By default, the Protected Network setting is enabled for all virtual adapters with the assumption that most networks that a VM uses will be important enough to want to relocate the VM if it becomes disconnected. </P> <BR /> <P> The live migration of the VM to another node of the cluster will not occur if the destination node doesn’t have the network available that is disconnected on the current cluster node.&nbsp; This avoids moving a virtual machine to a node that doesn’t have the resources that triggered the move in the first place.&nbsp; Another node of the cluster will be selected to move the VM to, unless there are no nodes of the cluster available that have the required network and system resources. </P> <BR /> <P> VM live migrations are queued if there are more VMs that are affected by a network issue on a host than can be concurrently live migrated.&nbsp; If the disconnected network becomes available again and there are VMs in the queue to be live migrated, the VMs pending will have the live migrations canceled. </P> <BR /> <P> The VMs network adapter settings have a new property in the advanced configuration section that allows you to select whether the network that the adapter is connected to is important enough to the availability of the VM to have it moved if it fails.&nbsp; For instance, if you have an external network where clients connect to the application running inside of the VM, and another network that is used for backups, you can disable the property for the network adapter used for backups but leave it enabled for the external network.&nbsp; If the backup network becomes disconnected the VM will not be moved.&nbsp; If the client access network is disconnected, the VM will be live migrated to a node that has the network enabled. </P> <BR /> <P> It is important to note that we do recommend using network teaming for any critical networks for redundancy and seamless handling of many network failures. </P> <BR /> <H3> Walkthrough </H3> <BR /> <P> Let’s take walk through some of the concepts to illustrate how this functionality works and ways to configure it. </P> <BR /> <P> The diagram below (Diagram 1) shows a simple 2 node cluster with a VM running on it. </P> <BR /> <P> <EM> (Note: the network configuration depicted in this document is used as an example </EM> <EM> ; the network configuration for your systems may vary depending on the number of adapter, speed of the adapters, and other network considerations) </EM> </P> <BR /> <P> The parent partition, sometimes referred to as the management partition, on each node has a dedicated network adapter on each node. There is a second adapter on each node that is configured with a Hyper-V virtual switch.&nbsp; The virtual machine has a synthetic network adapter that is configured to connect to the virtual switch. </P> <BR /> <P> If the physical network adapter that the virtual switch is using becomes disconnected, then the virtual machine will be live migrated to node B, since node B still has a connection to the network that the virtual machine uses.&nbsp; The virtual machine can be live migrated from node A to B because the private network between those servers is still functioning. </P> <BR /> <P> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90434i81FD5335BE4E509E" /> </P> <BR /> <P> Diagram 1 </P> <BR /> <H3> Configuring a VMs virtual network adapter to not cause the VM to be moved if it is disconnected </H3> <BR /> <P> Let’s take the same configuration and add another network adapter to each of the nodes and connect it to another virtual switch on each node (see diagram 2 below).&nbsp; We then configure the VM for a second virtual adapter and connect it to the new virtual switch. &nbsp;For this scenario, the network may be used for backups, or for communications between VMs for which a short outage doesn’t affect the clients that use the VM.&nbsp; Let’s call this new network “Backup”. </P> <BR /> <P> Because this new network can tolerate short outages, we want to configure the V’s virtual adapter to not be considered a critical network.&nbsp; That will allow the Backup network to become disconnected without causing the VM to be moved to another node of the cluster. </P> <BR /> <P> To do this, open the VM’s settings, go to the virtual adapter for the Backup network, and then expand it so you see the “Advanced Features” item.&nbsp; The option to clear the “Protected Network” check box will be shown (see Screen Shot 1 below). </P> <BR /> <P> By default, the Protected Network setting is enabled for all virtual adapters with the assumption that most networks that a VM uses will be important enough to want to relocate the VM if it becomes disconnected. </P> <BR /> <P> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90435i75846C7216F37DCE" /> </P> <BR /> <P> Diagram 2 </P> <BR /> <P> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90436iD950D9C82FD9C54E" /> </P> <BR /> <P> Screen Shot 1 </P> <BR /> <H3> Configuring a VMs network adapter to not react to a network disconnect using Windows PowerShell </H3> <BR /> <P> Here is the Windows PowerShell command and output that will show the virtual network adapters for a VM named “VM1”. This command will work from any node of the cluster, even if the VM is not being hosted on the node that you initiate the command from.&nbsp; If you want to run the command from a node that is not part of the cluster, you can add the Get-Cluster cmdlet at the start of the command line and specify the cluster name. </P> <BR /> <P> <STRONG> PS C:\Windows\system32&gt; Get-ClusterGroup VM1 |Get-VM | Get-VMNetworkAdapter | FL VMName,SwitchName,MacAddress,ClusterMonitored </STRONG> </P> <BR /> <P> <STRONG> </STRONG> </P> <BR /> <P> <STRONG> VMName&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; : VM1 </STRONG> </P> <BR /> <P> <STRONG> SwitchName&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; : Corp </STRONG> </P> <BR /> <P> <STRONG> MacAddress&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; : 00155D867239 </STRONG> </P> <BR /> <P> <STRONG> ClusterMonitored : True </STRONG> </P> <BR /> <P> <STRONG> </STRONG> </P> <BR /> <P> <STRONG> VMName&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; : VM1 </STRONG> </P> <BR /> <P> <STRONG> SwitchName&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; : Storage </STRONG> </P> <BR /> <P> <STRONG> MacAddress&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; : 00155D86723A </STRONG> </P> <BR /> <P> <STRONG> ClusterMonitored : True </STRONG> </P> <BR /> <P> <STRONG> </STRONG> </P> <BR /> <P> <STRONG> VMName&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; : VM1 </STRONG> </P> <BR /> <P> <STRONG> SwitchName&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; : Private </STRONG> </P> <BR /> <P> <STRONG> MacAddress&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; : 00155D86723B </STRONG> </P> <BR /> <P> <STRONG> ClusterMonitored : True </STRONG> </P> <BR /> <P> </P> <BR /> <P> Here is the Windows PowerShell command that will disable the ClusterMonitored property for network adapter that is configured to use the virtual switch named “Private”. </P> <BR /> <P> <STRONG> <EM> (Note that the Property is “ClusterMonitored” but the parameter to change it is “NotMonitoredInCluster.&nbsp; Therefore, specifying -NotMonitoredInCluster with True actually changes the ClusterMonitored property to false, and vice-versa.) </EM> </STRONG> : </P> <BR /> <P> <STRONG> PS C:\Windows\system32&gt; Get-ClusterGroup VM1 |Get-VM | Get-VMNetworkAdapter | Where-Object {$_.SwitchName -eq "Private"} | Set-VmNetworkAdapter -NotMonitoredInCluster $True </STRONG> </P> <BR /> <P> <STRONG> PS C:\Windows\system32&gt; Get-ClusterGroup VM1 |Get-VM | Get-VMNetworkAdapter | FL VMName,SwitchName,MacAddress,ClusterMonitored </STRONG> </P> <BR /> <P> <STRONG> </STRONG> </P> <BR /> <P> <STRONG> </STRONG> </P> <BR /> <P> <STRONG> VMName&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; : VM1 </STRONG> </P> <BR /> <P> <STRONG> SwitchName&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; : Corp </STRONG> </P> <BR /> <P> <STRONG> MacAddress&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; : 00155D867239 </STRONG> </P> <BR /> <P> <STRONG> ClusterMonitored : True </STRONG> </P> <BR /> <P> <STRONG> </STRONG> </P> <BR /> <P> <STRONG> VMName&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; : VM1 </STRONG> </P> <BR /> <P> <STRONG> SwitchName&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; : Storage </STRONG> </P> <BR /> <P> <STRONG> MacAddress&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; : 00155D86723A </STRONG> </P> <BR /> <P> <STRONG> ClusterMonitored : True </STRONG> </P> <BR /> <P> <STRONG> </STRONG> </P> <BR /> <P> <STRONG> VMName&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; : VM1 </STRONG> </P> <BR /> <P> <STRONG> SwitchName&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; : Private </STRONG> </P> <BR /> <P> <STRONG> MacAddress&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; : 00155D86723B </STRONG> </P> <BR /> <P> <STRONG> ClusterMonitored : False </STRONG> </P> <BR /> <P> </P> <BR /> <H3> Testing </H3> <BR /> <P> You can test this behavior by disconnecting the network cable for the physical adapter of a server where a VM is running. </P> <BR /> <P> It may take up to 1 minute for the cluster to detect that a virtual machine is affected by a network disconnect.&nbsp; Each virtual machine on a cluster has a cluster resource that monitors the virtual machine for failures.&nbsp; By default the cluster resource will check the state of each virtual switch that a VM is using every 60 seconds. </P> <BR /> <P> This means that the time a specific VM takes to identify that a virtual switch is connected to a disconnected physical NIC can be very short or up to 60 seconds, depending on the timing of when the disconnect happened and when the next check for the VM will occur. </P> <BR /> <P> This means that if you have more than one VM using a switch that becomes disconnected, not all the VMs will go into the state that will cause them to be live migrated at the same time. </P> <BR /> <P> As noted previously, if the network becomes connected again, if there are any VMs that are queued to be moved, they will be removed from the queue and remain on the same server.&nbsp; Any live migrations in progress will finish. </P> <BR /> <P> <EM> Steven Ekren <BR /> Senior Program Manager <BR /> Windows Server Failover Clustering and High Availability </EM> </P> </BODY></HTML> Fri, 15 Mar 2019 21:35:50 GMT https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/windows-server-2012-r2-virtual-machine-recovery-from-network/ba-p/371861 Rob Hindman 2019-03-15T21:35:50Z How to Properly Shutdown a Failover Cluster or a Node https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/how-to-properly-shutdown-a-failover-cluster-or-a-node/ba-p/371857 <P>This blog will discuss the proper process for shutting down an individual node in a Windows Server 2012 R2 Failover Cluster or the entire cluster with all the nodes.&nbsp; Note:&nbsp; While the steps outlined will be specific to Windows Server 2012 R2, the process applies to a cluster of any OS version.</P> <H2>Shutting Down a Node</H2> <P>When shutting down or rebooting a node in a Failover Cluster, you first want to drain (move off) any roles running on that server (such as a virtual machine).&nbsp; This ensures that the shutting down of a node is graceful to any applications running on that node.&nbsp;</P> <OL> <OL> <LI>Open Failover Cluster Manager (CluAdmin.msc)</LI> <LI>Click on “ <STRONG> Nodes </STRONG> ”</LI> <LI>Right-click on the node name and under ‘ <STRONG> Pause </STRONG> ’ click on ‘ <STRONG> Drain Roles </STRONG> ’</LI> <LI>Under Status the node will appear as ‘Paused’.&nbsp; At the bottom of the center pane click on the ‘Roles’ tab.&nbsp; Once all roles have moved off this node, it is safe to shut down or reboot the node.</LI> </OL> </OL> <P>&nbsp;</P> <P><span class="lia-inline-image-display-wrapper lia-image-align-inline" style="width: 484px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90432i694E4B03EAF18660/image-size/large?v=v2&amp;px=999" role="button" /></span></P> <P>&nbsp;</P> <H2>To resume the node after it has been restarted…</H2> <P>When the node is once again powered on and ready to be put back into service, use the Resume action to re-enable the node to host roles.&nbsp;</P> <OL> <OL> <LI>Open Failover Cluster Manager (CluAdmin.msc)</LI> <LI>Click on “ <STRONG> Nodes </STRONG> ”</LI> <LI>Right-click on the node name and select ‘ <STRONG> Resume </STRONG> ’, then select either:</LI> <LI>‘ <STRONG> Fail Roles Back </STRONG> ’ – This will resume the node and move any roles which were running on the node prior to the node back.&nbsp; Caution:&nbsp; This could incur downtime based on the role</LI> <LI>‘ <STRONG> Do Not Fail Roles Back </STRONG> ’ – This will resume the node and not move any roles back.</LI> </OL> </OL> <P>&nbsp;</P> <P><span class="lia-inline-image-display-wrapper lia-image-align-inline" style="width: 488px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90433i49B16F5D405672D3/image-size/large?v=v2&amp;px=999" role="button" /></span></P> <P>&nbsp;</P> <H2>Shutting Down a Node with Windows PowerShell®</H2> <OL> <OL> <LI>Open a PowerShell window as Administrator and type: <BR /><A href="#" target="_blank" rel="noopener"> Suspend-ClusterNode -Drain </A></LI> <LI>Verify that there are no roles listed under “OwnerNode” for that node by running: <BR /><A href="#" target="_blank" rel="noopener">Get-ClusterGroup </A></LI> <LI>This could be scripted with the following syntax: <BR />PS C:\&gt; (Get-ClusterGroup).OwnerNode –eq "NodeBeingDrained"</LI> </OL> </OL> <P>&nbsp;</P> <OL> <OL> <LI>Shutdown or restart the computer by typing either: <BR /><A href="#" target="_blank" rel="noopener"> Stop-Computer </A> <BR /><A href="#" target="_blank" rel="noopener"> Restart-Computer </A></LI> </OL> </OL> <H2>To resume the node after it has been restarted…</H2> <OL> <OL> <LI>Open a PowerShell window as Administrator and type: <BR /><A href="#" target="_blank" rel="noopener"> Resume-ClusterNode </A></LI> </OL> </OL> <P>Or if you wish to fail back the roles which were previously running on this node type: <BR />Resume-ClusterNode –Failback Immediate</P> <P>&nbsp;</P> <H1>Shutting Down a Cluster</H1> <P>Shutting down the entire cluster involves stopping all roles and then stopping the Cluster Service on all nodes.&nbsp; While you can shut down each node in the cluster individually, using the cluster UI will ensure the shutdown is done gracefully.&nbsp;</P> <OL> <OL> <LI>Open Failover Cluster Manager (CluAdmin.msc)</LI> <LI>Right-click on the cluster name, select ‘ <STRONG> More Actions </STRONG> ’, then “ <STRONG> Shut Down Cluster… </STRONG> ”</LI> <LI>When prompted if you are sure you want to shut down the cluster, click “ <STRONG> Yes </STRONG> ”</LI> </OL> </OL> <H2>Shutting Down a Cluster with PowerShell</H2> <OL> <OL> <LI>Open a PowerShell window as Administrator and type: <BR /><A href="#" target="_blank" rel="noopener"> Stop-Cluster </A></LI> </OL> </OL> <H3>Controlling VM Behavior on Shutdown</H3> <P>When the cluster is shut down, the VMs will be placed in a Saved state.&nbsp; This can be controlled using the OfflineAction property. <BR /><BR />To configure the shut down action for an individual VM (where "Virtual Machine" is the name of the VM): </P> <PRE>Get-ClusterResource "Virtual Machine" | Set-ClusterParameter OfflineAction 1</PRE> <TABLE> <TBODY> <TR> <TD><STRONG>Value </STRONG></TD> <TD><STRONG> Effect </STRONG></TD> </TR> <TR> <TD><STRONG> 0 </STRONG></TD> <TD>The VM is turned off</TD> </TR> <TR> <TD><STRONG> 1 (default) </STRONG></TD> <TD>The VM is saved</TD> </TR> <TR> <TD><STRONG> 2 </STRONG></TD> <TD>The guest OS is shut down</TD> </TR> <TR> <TD><STRONG> 3 </STRONG></TD> <TD>The guest OS is shut down forcibly</TD> </TR> </TBODY> </TABLE> <P>&nbsp;</P> <H2>To start the cluster after it has been shut down</H2> <OL> <OL> <LI>To start the cluster type: <BR /><A href="#" target="_blank" rel="noopener"> Start-Cluster </A></LI> </OL> </OL> <P><BR />Thanks! <BR />Elden Christensen <BR />Principal&nbsp;PM Manager <BR />Clustering &amp; High-Availability <BR />Microsoft</P> Thu, 08 Aug 2019 16:08:16 GMT https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/how-to-properly-shutdown-a-failover-cluster-or-a-node/ba-p/371857 Elden Christensen 2019-08-08T16:08:16Z How to Enable CSV Cache https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/how-to-enable-csv-cache/ba-p/371854 <P>Cluster Shared Volumes (CSV) Cache is a feature which allows you to allocate system memory (RAM) as a write-through cache.&nbsp; The CSV Cache&nbsp;provides caching of&nbsp;read-only&nbsp;unbuffered I/O.&nbsp; This can improve performance for applications such as Hyper-V, which conducts unbuffered I/O when accessing a VHD or VHDX file.&nbsp; Unbuffered I/O’s are operations which are not cached by the Windows Cache Manager.&nbsp; What CSV Block Cache delivers is caching which can boost the performance of read requests, with&nbsp;write-through for no caching of write requests. <BR /><BR />CSV Cache delivers caching at the block level, which enables it to perform caching of pieces of data being accessed within the VHD file.&nbsp; The primary difference from caching solutions in the form of a PCI card with flash which you add to the server, is that CSV Block Cache reserves its cache from system memory.&nbsp; The CSV Cache also tracks VM mobility and invalidates the cache when it moves from host to host, this removes the need to replicate and keep the cache coherent on all nodes in the cluster.&nbsp; This improves efficiency by not having to cache all VMs on all nodes, as well as reduces the performance overhead of pushing the data between nodes. <BR /><BR />CSV Cache is completely integrated into the Failover Clustering feature and handles orchestration across the sets of nodes in the cluster.&nbsp;</P> <H2>Deployment Planning</H2> <P>CSV Cache will deliver the most value in scenarios where VMs are&nbsp;heavy read requests, and are less write intensive.&nbsp; Scenarios such as Pooled VDI VMs or also for reducing VM boot storms.&nbsp; Because the applicability of CSV Cache depends on the workload and your specific deployment considerations, it is disabled by default.&nbsp; The customer feedback on CSV Cache has been overwhelmingly positive and we generally recommend turning it on for all scenarios, including both Hyper-V Clusters using CSV and Scale-out File Servers using CSV. <BR /><BR />You can allocate up to&nbsp;80%&nbsp;of the total physical&nbsp;RAM for CSV write-through cache, which will be consumed from non-paged pool memory.&nbsp;</P> <UL> <UL> <LI><STRONG> Hyper-V </STRONG> – Our preliminary testing has found 512 MB to deliver excellent gain at minimal cost, and is the recommend starting point / minimal value if enabled.&nbsp; Then based on your specific deployment and the I/O characteristics of the workloads in the VMs you may wish to increase the amount of memory allocated.&nbsp; Since system memory is a contended resource on a Hyper-V cluster, it is recommended to keep the CSV Cache size moderate.&nbsp; Such as&nbsp;512 MB, 1 GB, or 2 GB</LI> </UL> </UL> <UL> <UL> <LI><STRONG> Scale-out File Server </STRONG> – It is recommended to allocate a significantly larger CSV Cache on a SoFS as physical memory is typically not a contended resource,&nbsp;you may want to allocate&nbsp;4 GB,&nbsp;6 GB, or even more...</LI> </UL> </UL> <P>There are two configuration settings that allow you to control CSV Cache.&nbsp;</P> <UL> <UL> <LI><STRONG> BlockCacheSize </STRONG> – This is a cluster common property that allows you to define how much memory (in megabytes) you wish to reserve for the CSV Cache on each node in the cluster.&nbsp; If a value of 512 is defined, then 512 MB of system memory will be reserved on each node in the Failover Cluster.&nbsp; Configuring a value of 0 disables CSV Block Cache.</LI> </UL> </UL> <UL> <UL> <LI><STRONG> EnableBlockCache </STRONG> – This is a private property of the cluster Physical Disk resource.&nbsp; It allows you to enable/disable&nbsp;caching on an individual disk.&nbsp; This gives the flexibility to configure cache for read intensive VMs running on some disks, while allowing to disable and prevent random I/O on other disks from purging the cache.&nbsp; For example parent VHD’s with high reads you would enable caching on Disk1, and high writes for differencing disks the CSV cache could be disabled on Disk2.&nbsp; The default setting is&nbsp;1 for enabled.</LI> </UL> </UL> <H2>Configuring CSV Cache</H2> <P>The CSV Cache is disabled by default, you only need to define how much memory you want to allocate to&nbsp;enable it with the following process:&nbsp;</P> <OL> <OL> <LI>Open an elevated Windows PowerShell prompt</LI> <LI>Define the size of the size of the cache to be reserved (example of setting to 1 GB)</LI> </OL> </OL> <PRE>(Get-Cluster). BlockCacheSize = 1024 </PRE> <H3>Disabling on a per disk basis</H3> <P>Once the CSV Cache is enabled, all disks on all nodes will be cached by default.&nbsp; You have the flexibility to disable the&nbsp;CSV Cache on an individual disk using the following process: </P> <PRE>Get-ClusterSharedVolume “Cluster Disk 1” | Set-ClusterParameter&nbsp; EnableBlockCache 0&nbsp;</PRE> <H2>Optimizing CSV Cache</H2> <P>The CSV Cache also provides a set of counters you can use to monitor the performance of the cache.&nbsp; You can leverage the Performance Monitor tool (PerfMon.msc) to add the following counts to monitor different aspects of the CSV Cache. <BR /><BR />Open Performance Monitor, and under Add Counters you will find “Cluster CSV Volume Cache” with the following counters. </P> <P><span class="lia-inline-image-display-wrapper lia-image-align-inline" style="width: 365px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90431iC116565808F91F79/image-size/large?v=v2&amp;px=999" role="button" /></span></P> <P><BR />I/O satisfied from cache:&nbsp;</P> <UL> <UL> <LI>Cache IO Read-Bytes</LI> </UL> </UL> <UL> <UL> <LI>Cache IO Read-Bytes/Sec</LI> </UL> </UL> <UL> <UL> <LI>Cache Read</LI> </UL> </UL> <UL> <UL> <LI>Cache Read/Sec</LI> </UL> </UL> <P>I/O satisfied from disk:&nbsp;</P> <UL> <UL> <LI>Disk IO Read-Bytes</LI> </UL> </UL> <UL> <UL> <LI>Disk IO Read-Bytes/Sec</LI> </UL> </UL> <UL> <UL> <LI>Disk Read</LI> </UL> </UL> <UL> <UL> <LI>Disk Read/Sec</LI> </UL> </UL> <P>Total I/O:&nbsp;</P> <UL> <UL> <LI>IO Read-Bytes</LI> </UL> </UL> <UL> <UL> <LI>IO Read-Bytes/Sec</LI> </UL> </UL> <UL> <UL> <LI>IO Read</LI> </UL> </UL> <UL> <UL> <LI>IO Read/Sec</LI> </UL> </UL> <H2>Considerations:</H2> <P>The CSV Cache was introduced in Windows Server 2012 and the above applies to all releases following.&nbsp; The CSV Cache has evolved over the releases, and <STRONG> below are a list of considerations with previous releases </STRONG> . </P> <H3>Windows Server 2012 R2:</H3> <UL> <UL> <LI>Enabling CSV Cache on an individual disk requires that the Physical Disk resource be recycled (taken Offline / Online) for it to take effect.&nbsp; This can be done with no downtime by simply moving ownership of the Physical Disk resource from one node to another.</LI> </UL> </UL> <UL> <UL> <LI>Recommended not&nbsp;to exceed allocating 64 GB on Windows Server 2012 R2</LI> <LI>CSV Cache&nbsp;will be&nbsp;disabled on:&nbsp; <UL> <LI>ReFS volume when&nbsp;integrity streams is enabled</LI> <LI>Tiered Storage Space with&nbsp;heat map tracking&nbsp;enabled</LI> <LI style="box-sizing: border-box; font-family: &amp;quot;">Deduplicated&nbsp;files using in-box Windows Server Data Deduplication feature&nbsp;(Note:&nbsp; Data will instead be cached by the dedup&nbsp;cache)</LI> </UL> </LI> </UL> </UL> <H3>Windows Server 2012</H3> <UL> <UL> <LI>Maximum&nbsp;of 20% of the total physical&nbsp;RAM can be allocated for the&nbsp;CSV write-through cache with Windows Server 2012</LI> </UL> </UL> <UL> <UL> <LI>The cache size can be modified with no downtime, however for the Hyper-V root memory reserve in the parent partition to be modified to accommodate the memory allocated to the&nbsp;CSV cache it does require a&nbsp;server reboot with Windows Server 2012.&nbsp; To ensure resource contention is avoided, it is recommended to reboot each node in the cluster after modifying the memory allocated to the CSV cache.</LI> </UL> </UL> <UL> <UL> <LI>Enabling CSV Cache on an individual disk requires that the Physical Disk resource be recycled (taken Offline / Online) for it to take effect.&nbsp; This can be done with no downtime by simply moving ownership of the Physical Disk resource from one node to another.</LI> </UL> </UL> <UL> <UL> <LI>The EnableBlockCache private property is named CsvEnableBlockCache in Windows Server 2012</LI> </UL> </UL> <UL> <UL> <LI>The BlockCacheSize common property is named SharedVolumeBlockCacheSizeInMB in Windows Server 2012</LI> </UL> </UL> <UL> <UL> <LI>The way it is enabled is also slightly different, here is the process:</LI> </UL> </UL> <OL> <OL> <LI>Open an elevated Windows PowerShell prompt</LI> <LI>Define the size of the size of the cache to be reserved (example of setting to 1 GB) </LI> </OL> </OL> <PRE>(Get-Cluster). SharedVolumeBlockCacheSizeInMB = 1024 </PRE> <P>&nbsp;</P> <OL> <OL> <LI>Enable CSV Cache on an individual disk (must be&nbsp;executed for every disk you wish to enable caching)</LI> </OL> </OL> <PRE>Get-ClusterSharedVolume “Cluster Disk 1” | Set-ClusterParameter&nbsp; CsvEnableBlockCache 1 </PRE> <P><BR />Thanks! <BR />Elden Christensen <BR />Principal PM Manager <BR />Clustering &amp; High-Availability <BR />Microsoft</P> Thu, 08 Aug 2019 15:56:36 GMT https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/how-to-enable-csv-cache/ba-p/371854 Elden Christensen 2019-08-08T15:56:36Z Failover Clustering Sessions @ TechEd 2013 https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/failover-clustering-sessions-teched-2013/ba-p/371852 <HTML> <HEAD></HEAD><BODY> <STRONG> First published on MSDN on Jun 11, 2013 </STRONG> <BR /> If you were not able to make it to TechEd 2013 this year, you can still watch the sessions and learn about the new enhancements coming.&nbsp; Here’s links to the recorded&nbsp;cluster sessions at TechEd 2013: <BR /> <BR /> Continuous Availability: Deploying and Managing Clusters Using Windows Server 2012 R2 <BR /> <A href="#" target="_blank"> https://channel9.msdn.com/Events/TechEd/NorthAmerica/2013/MDC-B305#fbid=WOoBzkT2vlt </A> <BR /> <BR /> Failover Cluster Networking Essentials <BR /> <A href="#" target="_blank"> https://channel9.msdn.com/Events/TechEd/NorthAmerica/2013/MDC-B337#fbid=WOoBzkT2vlt </A> <BR /> <BR /> Upgrading Your Private Cloud with Windows Server 2012 R2 <BR /> <A href="#" target="_blank"> https://channel9.msdn.com/Events/TechEd/NorthAmerica/2013/MDC-B331#fbid=WOoBzkT2vlt </A> <BR /> <BR /> Application Availability Strategies for the Private Cloud <BR /> <A href="#" target="_blank"> https://channel9.msdn.com/Events/TechEd/NorthAmerica/2013/MDC-B311#fbid=WOoBzkT2vlt </A> <BR /> <BR /> Storage and Availability Improvements in Windows Server 2012 R2 <BR /> <A href="#" target="_blank"> https://channel9.msdn.com/Events/TechEd/NorthAmerica/2013/MDC-B333#fbid=WOoBzkT2vlt </A> <BR /> <BR /> Understanding the Hyper-V over SMB Scenario, Configurations, and End-to-End&nbsp;Performance <BR /> <A href="#" target="_blank"> https://channel9.msdn.com/Events/TechEd/NorthAmerica/2013/MDC-B335#fbid=WOoBzkT2vlt </A> <BR /> <BR /> <P> Thanks! <BR /> Elden Christensen <BR /> Principal Program Manager Lead <BR /> Clustering &amp; High-Availability <BR /> Microsoft </P> </BODY></HTML> Fri, 15 Mar 2019 21:34:37 GMT https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/failover-clustering-sessions-teched-2013/ba-p/371852 Elden Christensen 2019-03-15T21:34:37Z Validate Storage Spaces Persistent Reservation Test Results with Warning https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/validate-storage-spaces-persistent-reservation-test-results-with/ba-p/371851 <HTML> <HEAD></HEAD><BODY> <STRONG> First published on MSDN on May 24, 2013 </STRONG> <BR /> I have seen questions from customers who get a warning in the results of their failover cluster validation that indicates the storage doesn’t support persistent reservations for Storage Spaces. They want to know why they got the warning, what it means, and what should they do about it. First, here is the text you will see in the report from the failover cluster validation.&nbsp; It will be highlighted in Yellow and the test may have a Yellow triangle icon next to it: <BR /> <P> Validate Storage Spaces Persistent Reservation </P> <BR /> <P> Validate that storage supports the SCSI-3 Persistent Reservation commands needed by Storage Spaces to support clustering. </P> <BR /> <P> Test Disk &lt;number X&gt; does not support SCSI-3 Persistent Reservations commands needed by clustered storage pools that use the Storage Spaces subsystem. Some storage devices require specific firmware versions or settings to function properly with failover clusters. Contact your storage administrator or storage vendor for help with configuring the storage to function properly with failover clusters that use Storage Spaces. </P> <BR /> <BR /> Question: <EM> Why did I get this warning? </EM> <BR /> Failover Clustering requires a specific set of SCSI-3 persistent reservation commands to be implemented by the storage so that Storage Spaces can be properly managed as clustered disks.&nbsp; The commands that are specifically needed for Storage Spaces are tested, and if they are not implemented in the way that the cluster requires, this warning will be given. <BR /> <STRONG> Question: <EM> What does this mean and why is it a warning and not a failure? </EM> </STRONG> <BR /> Failover cluster has multiple tests that check how the storage implements SCSI-3 persistent reservations.&nbsp; This particular test for Storage Spaces is a warning instead of a failure because clustered disks that aren’t going to use Storage Spaces will work correctly if the other SCSI-3 persistent reservation tests pass. <BR /> <STRONG> Question: <EM> What should I do when I get this warning? </EM> </STRONG> <BR /> To be compatible with Storage Spaces it requires support for the additional PERSISTENT RESERVE OUT Register (00h) persistent reservation command, <STRONG> if you are not using Storage Spaces with this cluster (such as on a SAN) then this test is not applicable and&nbsp;you can ignore any results of the "Validate Storage Spaces Persistent Reservation" test including&nbsp;this warning </STRONG> .&nbsp; To avoid future warnings, you can exclude this test. <BR /> <BR /> If you intend&nbsp;to use the disks with Storage Spaces on the cluster they not compatible and you should check your storage configuration and documentation to see if there are settings or firmware/driver versions required to support clustered storage spaces. <BR /> <BR /> The following note is in the KB article that states the support policy for Windows Server 2012 failover clusters.&nbsp; The yellow yield sign mentioned is referring to a warning in the validation test results. <A href="#" target="_blank"> http://support.microsoft.com/kb/2775067 </A> <BR /> <P> <STRONG> Note </STRONG> The yellow yield sign indicates that the aspect of the proposed failover cluster that is being tested is not in alignment with Microsoft best practices. Investigate this aspect to make sure that the configuration of the cluster is acceptable for the environment of the cluster, for the requirements of the cluster, and for the roles that the cluster hosts. </P> <BR /> Here are some links to more information regarding clustered storage spaces, cluster validation, and the support policies regarding the validation tests: <BR /> <UL> <BR /> <LI> Blog: “How to Configure a Clustered Storage Space in Windows Server 2012” <A href="#" target="_blank"> http://blogs.msdn.com/b/clustering/archive/2012/06/02/10314262.aspx </A> </LI> <BR /> <LI> TechNet: Deploy Clustered Storage Spaces <A href="#" target="_blank"> http://technet.microsoft.com/en-us/library/jj822937.aspx </A> </LI> <BR /> <LI> TechNet: Validate Hardware for a Windows Server 2012 Failover Cluster <A href="#" target="_blank"> http://technet.microsoft.com/en-us/library/jj134244.aspx </A> </LI> <BR /> <LI> Microsoft Knowledge Base Article: The Microsoft support policy for Windows Server 2012 failover clusters <A href="#" target="_blank"> http://support.microsoft.com/kb/2775067 </A> </LI> <BR /> </UL> <BR /> Steven Ekren <BR /> Senior Program Manager <BR /> Windows Server Failover Clustering </BODY></HTML> Fri, 15 Mar 2019 21:34:31 GMT https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/validate-storage-spaces-persistent-reservation-test-results-with/ba-p/371851 Rob Hindman 2019-03-15T21:34:31Z Optimizing CSV Backup Performance https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/optimizing-csv-backup-performance/ba-p/371850 <HTML> <HEAD></HEAD><BODY> <STRONG> First published on MSDN on May 06, 2013 </STRONG> <BR /> Cluster Shared Volumes (CSV) is a clustered file system available in Windows Server 2012 where all nodes in a Windows Server Failover Cluster can simultaneously access a common shared NTFS volume.&nbsp; CSV has a distributed backup infrastructure which enables backups to be taken from any node in the cluster.&nbsp; In this blog I will discuss some considerations with how backups work with CSV which can help optimize the performance of backups. <BR /> <BR /> When a volume level backup is taken, the cluster service returns all the VMs hosted on the volume(s) to the requester (backup application), including VMs running on non-requester nodes. The requester can choose to pick only the VMs that are running on the node where the backup was initiated (this becomes a local node VM backup), or it can choose to include VMs that are running across different nodes (this becomes a distributed VM backup). &nbsp;The snapshot creation has some differences based on the type of snapshot configured: <BR /> <UL> <BR /> <LI> <STRONG> Hardware snapshots </STRONG> – The snapshot will be created and surfaced on the node where the backup was invoked by the requestor, which need not be the case as the coordinator node.&nbsp; The backup will then be taken from the local snapshot. </LI> <BR /> <LI> <STRONG> Software snapshots </STRONG> – The underlying snapshot device will be created via volsnap.sys on the coordinator node, and a CSV snapshot volume will be surfaced on every node that points to this volsnap device. On non-coordinator nodes, the CSV snapshot device will access the volsnap snapshot over SMB. &nbsp;It is transparent to the requestor as the CSV snapshot volume appears like a local device, all access to the snapshot will be happening over the network unless the requester happens to be running on the coordinator node. </LI> <BR /> </UL> <BR /> <H3> Considerations: </H3> <BR /> When taking a backup of a CSV volume, it can be done from any node.&nbsp; However, when using software snapshots the snapshot device will be created on the coordinator node and if the backup was initiated on a non-coordinator node the backup data will be accessed remotely.&nbsp; This means that the data for the backup will be streamed over the network from the coordinator node to the node where the backup was initiated.&nbsp; If you have maintenance window requirements that require shortening the overall backup time you may wish to optimize the performance of backups when using software snapshots in one of the following ways: <BR /> <UL> <BR /> <LI> <STRONG> Initiate Backups on the Coordinator Node </STRONG> – When using software snapshots the snapshot device will always be created on the node which currently owns the cluster Physical Disk resource associated with the CSV volume.&nbsp; If the backup is conducted locally on the coordinator node, then the data access will be local and backup performance may be improved.&nbsp; This can be achieved by either initiating the backup application on the coordinator node or by moving the Physical Disk resource locally to the node before initiating the backup.&nbsp; CSV ownership can be moved seamlessly with no downtime. </LI> <BR /> <LI> <STRONG> Scale Intra-node Communication </STRONG> – If you wish to have the flexibility of invoking backups with software snapshots from any node, to achieve optimized performance of backups scale up the performance of intra-node communication.&nbsp; It is recommended to use a minimum of 10 GB Ethernet or InfiniBand.&nbsp; You may also wish to use aggregate network bandwidth with NIC Teaming or SMB Multi-channel to increase network performance between the nodes in the Failover Cluster. </LI> <BR /> </UL> <BR /> <H3> Recommendations: </H3> <BR /> <OL> <BR /> <LI> To achieve the highest levels of performance of backups on a Cluster Shared Volume, it is recommended to use Hardware snapshots over Software snapshots. </LI> <BR /> <LI> To achieve the highest levels of performance with Software snapshots on a Cluster Shared Volume, it is recommend either to initiate the backup locally on the CSV coordinator node or to scale up the bandwidth of intra-node communication. </LI> <BR /> </OL> <BR /> Thanks! <BR /> Elden Christensen <BR /> Principal PM Manager <BR /> Clustering &amp; High-Availability <BR /> Microsoft </BODY></HTML> Fri, 15 Mar 2019 21:34:23 GMT https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/optimizing-csv-backup-performance/ba-p/371850 Elden Christensen 2019-03-15T21:34:23Z MSMQ Errors in the Cluster.log https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/msmq-errors-in-the-cluster-log/ba-p/371849 <P><STRONG> First published on MSDN on Apr 05, 2013 </STRONG> <BR />After using the <A href="#" target="_blank" rel="noopener"> Get-ClusterLog </A> cmdlet to generate the Cluster.log, you may notice the following errors and warnings in the cluster&nbsp;log:</P> <PRE>ERR&nbsp;&nbsp; [RHS] s_RhsRpcCreateResType: ERROR_NOT_READY(21)' because of 'Startup routine for ResType MSMQ returned 21.' <BR />WARN&nbsp; [RCM] Failed to load restype 'MSMQ': error 21. <BR /><BR />ERR&nbsp;&nbsp; [RHS] s_RhsRpcCreateResType: ERROR_NOT_READY(21)' because of 'Startup routine for ResType MSMQTriggers returned 21.' <BR />WARN&nbsp; [RCM] Failed to load restype 'MSMQTriggers': error 21. <BR /><BR />WARN&nbsp; [RCM] ResourceTypeChaseTheOwnerLoop::DoCall: ResType MSMQ's DLL is not present on this node.&nbsp; Attempting to find a good node... <BR />WARN&nbsp; [RCM] ResourceTypeChaseTheOwnerLoop::DoCall: ResType MSMQTriggers's DLL is not present on this node.&nbsp; Attempting to find a good node... </PRE> <H2>Root Cause:</H2> <P>These events are logged because the MSMQ and MSMQ Triggers resource types are registered with the cluster service, but the MSMQ resource DLL cannot be loaded because the MSMQ feature is not installed.&nbsp; The MSMQ and MSMQ Triggers resource types are registered by default&nbsp;when the Failover Clustering feature is installed. </P> <H2>Possible Solutions:</H2> <OL> <OL> <LI><STRONG> Ignore </STRONG> - These are benign events to a debug log and can be safely ignored.&nbsp; They have no impact on the functionality of the cluster, nor do they indicate a failure.</LI> <LI><STRONG>Install MSMQ </STRONG> - If you plan to make MSMQ highly available on this cluster, open Server Manager and install the “Message Queuing” feature on all nodes in the cluster.&nbsp; The above errors will no longer be logged.</LI> <LI><STRONG>Unregister MSMQ Resources </STRONG> - If&nbsp;this is a non-MSMQ cluster, you can unregister the MSMQ and MSMQ resource type with the Cluster Service and&nbsp;the above errors will no longer be logged.&nbsp; This can be accomplished with the <A href="#" target="_blank" rel="noopener"> Remove-ClusterResourceType </A> cmdlet.&nbsp; Open a PowerShell window and type the following:</LI> </OL> </OL> <PRE>Remove-ClusterResourceType MSMQ <BR />Remove-ClusterResourceType MSMQTriggers </PRE> <P>In summary... just ignore them, they are just noise.&nbsp; If they annoy you and you don't plan to use MSMQ, then unregister the MSMQ resource types. <BR /><BR />Thanks! <BR />Elden Christensen <BR />Principal PM&nbsp;Manager <BR />Clustering &amp; High-Availability <BR />Microsoft</P> Thu, 08 Aug 2019 15:27:16 GMT https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/msmq-errors-in-the-cluster-log/ba-p/371849 Elden Christensen 2019-08-08T15:27:16Z Configuring How VMs Are Moved when a Node is Drained https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/configuring-how-vms-are-moved-when-a-node-is-drained/ba-p/371848 <HTML> <HEAD></HEAD><BODY> <STRONG> First published on MSDN on Mar 21, 2013 </STRONG> <BR /> <H2> Introducing some concepts: </H2> <BR /> Windows Server 2012 introduced the ability to drain a node, which is sometimes also referred to as Node Maintenance Mode.&nbsp; When you drain a node, the Windows Server Failover Cluster will Pause the node to prevent any new VMs from moving to that node, then it will automatically move virtual machines (VMs) and other workloads off of the node to other nodes in the cluster.&nbsp; I like to call the moves that aren’t initiated by a user or an external management tool (like System Center Virtual Machine Manager), “cluster initiated moves.” <BR /> <BR /> Windows Server 2012 also introduced the concept of priorities for cluster roles.&nbsp; You can set the priority to High (3000), Medium (2000) or Low (1000).&nbsp; The failover cluster uses this priority setting for a number of things, such as the order in which VMs are started.&nbsp; It is also used to define how VMs should be moved when a node is drained. <BR /> <BR /> Live migration moves VMs to other nodes without clients losing connection. However, it can use significant network bandwidth. <BR /> <BR /> Quick migration involves putting a VM into saved state, moving it to another node, and then resuming from saved state.&nbsp; Quick migrations are usually faster than live migrations and use less network bandwidth.&nbsp; However, quick migration does cause clients to be disconnected during the move. <BR /> <H2> Virtual machines and cluster initiated moves </H2> <BR /> The default behavior for VMs is to live migrate the high and medium priority VMs, and quick migrate the low priority VMs.&nbsp; The logic is that additional time and resources will be spent to move important VMs with no downtime and that it is ok for non-critical VMs to have downtime during the move. <BR /> <BR /> However this behavior is fully configurable, if you wish for non-critical VMs to also be live migrated you can change the default behavior so that low priority VMs are also live migrated during cluster initiated moves. Or, you can change it so that medium, or both medium and high priority VMs use quick migration. <BR /> <BR /> The “Virtual Machine” resource type has a parameter (sometimes called a private property) that is named MoveTypeThreshold.&nbsp; Resource type parameters are settings that affect all cluster resources of that type.&nbsp; Changing this parameter value changes how all VMs on the cluster are moved during automatic moves. <BR /> <BR /> The default value for the MoveTypeThreshold parameter is 2000, this means that medium priority and any higher priorities than medium will use live migration for cluster initiated moves (like node drain).&nbsp; If you set the value to 1000, this means that low priority and any priorities higher than low will use live migration.&nbsp; If you set it to 3000, only high priority VMs will be live migrated, and the medium and low priority VMs will be quick migrated.&nbsp; If you want to only use quick migration, set it to 3001 or higher. <BR /> <BR /> Here is an example of using Windows PowerShell to get the setting and to set it: <BR /> PS C:\Windows\system32&gt; Get-ClusterResourceType "Virtual Machine" | Get-ClusterParameter MoveTypeThreshold | fl * <BR /> <BR /> ClusterObject : Virtual Machine <BR /> Name&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; : MoveTypeThreshold <BR /> IsReadOnly&nbsp;&nbsp;&nbsp; : False <BR /> ParameterType : UInt32 <BR /> Value&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; : 2000 <BR /> <BR /> PS C:\Windows\system32&gt; Get-ClusterResourceType "Virtual Machine" | Set-ClusterParameter MoveTypeThreshold 1000 <BR /> <BR /> PS C:\Windows\system32&gt; Get-ClusterResourceType "Virtual Machine" | Get-ClusterParameter MoveTypeThreshold | fl * <BR /> <BR /> ClusterObject : Virtual Machine <BR /> Name&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; : MoveTypeThreshold <BR /> IsReadOnly&nbsp;&nbsp;&nbsp; : False <BR /> ParameterType : UInt32 <BR /> Value&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; : 1000 <BR /> <BR /> <H3> MoveTypeThreshold values and move types </H3> <BR /> <TABLE> <TBODY><TR> <TD> <BR /> <P> <STRONG> MoveTypeThreshold </STRONG> </P> <BR /> <P> <STRONG> Value </STRONG> </P> <BR /> </TD> <TD> <BR /> <P> <STRONG> VM Priority </STRONG> </P> <BR /> </TD> </TR> <TR> <TD> <BR /> <P> <STRONG> Low </STRONG> </P> <BR /> </TD> <TD> <BR /> <P> <STRONG> Medium </STRONG> </P> <BR /> </TD> <TD> <BR /> <P> <STRONG> High </STRONG> </P> <BR /> </TD> </TR> <TR> <TD> 1000 </TD> <TD> Live </TD> <TD> Live </TD> <TD> Live </TD> </TR> <TR> <TD> 2000 </TD> <TD> Quick </TD> <TD> Live </TD> <TD> Live </TD> </TR> <TR> <TD> 3000 </TD> <TD> Quick </TD> <TD> Quick </TD> <TD> Live </TD> </TR> <TR> <TD> 3001 </TD> <TD> Quick </TD> <TD> Quick </TD> <TD> Quick </TD> </TR> </TBODY></TABLE> <BR /> <BR /> <H2> Changing the behavior for specific virtual machines </H2> <BR /> The MoveTypeThreshold parameter of the Virtual Machine resource type affects the behavior for all of the VMs in the failover cluster.&nbsp; If you wish to have different behavior for different VMs that can also be accomplished, for example you want all low priority VMs to be quick migrated but there is one VM that you want to be live migrated.&nbsp; Each Virtual Machine resource has a DefaultMoveType private property which by default it is set to a value of -1 (This shows as 4294967295 if you look at the value of the parameter in Windows PowerShell).&nbsp; The value of -1 tells the individual Virtual Machine resource that it has no unique setting and that it should inherit its settings from the Virtual Machine resource type. <BR /> <BR /> Note: DefaultMoveType is a parameter of each virtual machine’s Virtual Machine resource.&nbsp; Each VM will have its own Virtual Machine resource. <BR /> <H3> DefaultMoveType parameter values and their behavior: </H3> <BR /> <TABLE> <TBODY><TR> <TD> <STRONG> Value </STRONG> </TD> <TD> <STRONG> Behavior </STRONG> </TD> </TR> <TR> <TD> -1 (4294967295) </TD> <TD> Use global setting (MoveTypeThreshold) </TD> </TR> <TR> <TD> 0 </TD> <TD> Turn off VM </TD> </TR> <TR> <TD> 1 </TD> <TD> Save VM (quick migration) </TD> </TR> <TR> <TD> 2 </TD> <TD> Shut down VM </TD> </TR> <TR> <TD> 3 </TD> <TD> Shut down VM (forced) </TD> </TR> <TR> <TD> 4 </TD> <TD> Live migrate VM </TD> </TR> </TBODY></TABLE> <BR /> <BR /> <BR /> *Note: Value 2 and 3 have the same behavior. <BR /> <BR /> Here is an example of using Windows PowerShell to get the setting and to set it: <BR /> <BR /> Find the resource name: <BR /> PS C:\Windows\system32&gt; Get-ClusterResource | ft Name,ResourceType <BR /> <BR /> Name&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ResourceType <BR /> ----&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;------------ <BR /> Cluster Disk 1&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Physical Disk <BR /> Cluster IP Address&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; IP Address <BR /> Cluster IP Address &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;IPv6 Address <BR /> Cluster Name&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Network Name <BR /> Virtual Machine Configuration test1&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Virtual Machine Configuration <BR /> Virtual Machine test1&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Virtual Machine <BR /> <BR /> <BR /> Get the resources’ private properties using the Get-ClusterParameter cmdlet: <BR /> PS C:\Windows\system32&gt; Get-ClusterResource "Virtual Machine Test1" | Get-ClusterParameter | ft Name,Value <BR /> <BR /> Name&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Value <BR /> ----&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;----- <BR /> VmID&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 76138d6e-ff1d-45da-bce3-d6ddc46a9bae <BR /> OfflineAction&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 1 <BR /> ShutdownAction&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;0 <BR /> DefaultMoveType&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 4294967295 <BR /> CheckHeartbeat&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 1 <BR /> MigrationState&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0 <BR /> MigrationProgress&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;0 <BR /> VmState&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 3 <BR /> MigrationFailureReason&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0 <BR /> StartMemory&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 512 <BR /> VirtualNumaCount&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;1 <BR /> <BR /> <BR /> Set the private property using the Set-ClusterParameter cmdlet: <BR /> PS C:\Windows\system32&gt; Get-ClusterResource "Virtual Machine Test1" | Set-ClusterParameter DefaultMoveType 1 <BR /> <BR /> <BR /> Check that the private property was changed: <BR /> PS C:\Windows\system32&gt; Get-ClusterResource "Virtual Machine Test1" | Get-ClusterParameter | ft Name,Value <BR /> <BR /> Name&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Value <BR /> ----&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ----- <BR /> VmID&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;76138d6e-ff1d-45da-bce3-d6ddc46a9bae <BR /> OfflineAction&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 1 <BR /> ShutdownAction&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0 <BR /> DefaultMoveType&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 1 <BR /> CheckHeartbeat&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;1 <BR /> MigrationState&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0 <BR /> MigrationProgress&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0 <BR /> VmState&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 3 <BR /> MigrationFailureReason&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;0 <BR /> StartMemory&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 512 <BR /> VirtualNumaCount&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 1 </BODY></HTML> Fri, 15 Mar 2019 21:34:09 GMT https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/configuring-how-vms-are-moved-when-a-node-is-drained/ba-p/371848 Elden Christensen 2019-03-15T21:34:09Z Understanding how Failover Clustering Recovers from Unresponsive Resources https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/understanding-how-failover-clustering-recovers-from-unresponsive/ba-p/371847 <P><STRONG> First published on MSDN on Jan 24, 2013 </STRONG> <BR />In this blog I will discuss how Failover Clustering communicates with cluster resources, along with how clustering detects and recovers when something goes wrong.&nbsp; For the sake of simplicity I will use a Virtual Machine as an example throughout this blog, but the logic is generic and applies to all workloads. <BR /><BR />When a Virtual Machine is clustered, there is a cluster “Virtual Machine” resource created which controls that VM.&nbsp; The “Virtual Machine” resource and its associated resource DLL communicates with the VMMS service and tells the VM when to start, when to stop, and it also does health checks to ensure the VM is ok. <BR /><BR />Resources all run in a component of the Failover Clustering feature called the Resource Hosting Subsystem (RHS).&nbsp; These VM actions from the user map to entry point calls that RHS makes to resources, such as Online, Offline, IsAlive, and LooksAlive.&nbsp; You can find the full list of resource DLL entry-point functions <A href="#" target="_blank" rel="noopener"> here </A> . <BR /><BR />The most interesting in most cases where resources go unresponsive and you see clustering need to recover is with the LooksAlive and IsAlive which is a health check to the resource. <BR /><BR /></P> <UL> <UL> <LI>LooksAlive is a quick light lightweight check that happens every 5 seconds by default</LI> </UL> </UL> <P>&nbsp;</P> <UL> <UL> <LI>IsAlive is a more verbose check that happens every 60 seconds by default.</LI> </UL> </UL> <P><BR />Health check calls to the resource continue constantly while resources are online.&nbsp; If a resource returns a failure for the lightweight LooksAlive health check, RHS will then immediately do a more comprehensive health check and call IsAlive to see if the resource is really healthy.&nbsp; A resource is considered failed as the result of an IsAlive failure. <BR /><BR />Think of it like this… Every 60 seconds RHS calls IsAlive and basically is asking the resource “Are you ok?”.&nbsp; And the resource then responds to RHS “Yes, I am doing fine.”&nbsp; This periodic health check goes on and on…&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Until, there can be a case where something happens to the resource and it doesn’t respond.&nbsp; Think of it like a dropped call on your cell phone, how long are you willing to sit there going “Hello?&nbsp; Hello?&nbsp; Hello?”… before you give up and call the person back?&nbsp; Basically resetting the connection… <BR /><BR />Failover Clustering has this same concept.&nbsp; RHS will sit there waiting for the resource to respond to an IsAlive call, and eventually it will give up and need to take recovery action.&nbsp; By default RHS will wait for 5 minutes for the resource to respond to an entry point call to it.&nbsp; This is configurable with the resource <A href="#" target="_blank" rel="noopener"> DeadlockTimeout </A> common property. <BR /><BR />To modify the DeadlockTimeout property of an individual resource, you can use the following PowerShell cmdlet command:</P> <P><BR />(Get-ClusterResource “Resource Name”).DeadlockTimeout = 300000</P> <P><BR />Or if you want to modify the DeadlockTimeout for all resources of that type you can modify it at the resource type level with the following syntax (this example will be for all virtual machine resources):</P> <P><BR />(Get-ClusterResourceType “Virtual Machine”).DeadlockTimeout = 300000</P> <P><BR />Resources are expected to respond to an IsAlive or LooksAlive within a few hundred milliseconds, so waiting 5 minutes for a resource to respond is a really long time.&nbsp; Something pretty bad happened if a resource which normally responds in milliseconds, suddenly takes longer than 5 minutes.&nbsp; So it is generally recommended to stay with the default values. <BR /><BR />If the resource doesn’t respond in 5 minutes, RHS decides that there must be something wrong with the resource and that it should take recovery action to get it back up and running.&nbsp; Remember that the resource has gone silent; RHS has no idea what is wrong with it.&nbsp; The only way to recover and get the resource back up and running is that the RHS process is terminated, then RHS restarts, which will then restart the resource, and everything is back up and running.&nbsp; You may also see the associated entries in the System event log: </P> <P>&nbsp;</P> <P><STRONG> Event ID 1230 </STRONG> <BR />Cluster resource ‘ <EM> Resource Name </EM> ’ (resource type ‘ <EM> Resource Type Name </EM> ’, DLL ‘ <EM> DLL Name </EM> ’) did not respond to a request in a timely fashion. Cluster health detection will attempt to automatically recover by terminating the Resource Hosting Subsystem (RHS) process running this resource.</P> <P>&nbsp;</P> <P><STRONG> Event ID 1146 </STRONG> <BR />The cluster Resource Hosting Subsystem (RHS) stopped unexpectedly. An attempt will be made to restart it. This is usually associated with recovery of a crashed or deadlocked resource.</P> <P><BR />The next layer of protection is that when clustering issues a request to terminate the RHS process, it will wait four times the DeadlockTimeout value (which equates to 20 minutes by default)&nbsp;for the RHS process to terminate.&nbsp; If RHS does not terminate in 20 minutes, clustering will deem that the server has some serious health issues and will bugcheck&nbsp;the server to force failover and recovery.&nbsp; The bugcheck code will be Stop 0x0000009E ( Parameter1 , Parameter2 , 0x0000000000000005 , Parameter4 ).&nbsp; Note: that Parameter3 will always be a value of 0x5 if it is the result of an RHS process failing to terminate. <BR /><BR />This is the way clustering is designed to work… it is monitoring the health of the system, it detects something is wrong, and recovers.&nbsp; This is a good thing! </P> <H2>Summary of Recovery Behavior:</H2> <OL> <OL> <LI>RHS calls an entry point&nbsp;to resource</LI> <LI>RHS waits DeadlockTimeout (5 minutes) for resource to respond</LI> <LI>If resource does not respond, Cluster Service terminates RHS process to recover from unresponsive resource</LI> <LI>Cluster Service waits DeadlockTimeout x 4 (20 minutes) for the RHS process to terminate</LI> <LI>If RHS process does not terminate, Cluster Service calls NetFT to bugcheck the&nbsp;node to recover from RHS termination failure</LI> <LI>NetFT bugchecks the node with a STOP&nbsp;0x9e</LI> </OL> </OL> <P>&nbsp;</P> <P><FONT size="5"><STRONG>Impact of RHS Recovery</STRONG></FONT></P> <P><BR />The Resource Hosting Subsystem (RHS) is the process which hosts resources, and for any given node if there are multiple resources currently online and being hosted by a node they may share a common RHS process.&nbsp; For example, if you had 5 clustered VMs running on the same node, all the resources associated with those VMs would all be running in the same RHS process. <BR /><BR />There are some side effects from terminating the RHS process when a resource goes unresponsive.&nbsp; If there are multiple resources hosted on that node, they may be hosted in the same RHS process.&nbsp; That means when RHS terminates and restarts to recover an individual resource, all resources being hosted in that specific RHS process are also restarted.&nbsp; With Windows Server 2008 R2 if you have 5 VMs running on a node, all 5 VMs are going to get restarted. <BR /><BR />If a resource becomes unresponsive and causes an RHS crash, the cluster service will deem that specific resource to be suspect and that it needs be isolated.&nbsp; Think of it as, one strike and you are out!&nbsp; The cluster service will automatically set the resource common property <A href="#" target="_blank" rel="noopener"> SeparateMonitor </A> to mark that resource to run in its own dedicated RHS process, so that in the event that the resource becomes unresponsive again; it will not affect others.&nbsp; This setting is also configurable, you can either manually enable a resource to run in its own RHS process or you can disable a resource from running in its own RHS process as the result of having had an issue in the past which is now addressed. <BR /><BR />To modify the SeparateMonitor property of an individual resource, you can use the following PowerShell cmdlet command:</P> <P><BR />(Get-ClusterResource “Resource Name”).SeparateMonitor = 0</P> <P><BR />The impact of running resources in their own dedicated RHS process is that each RHS process consumes a little more system resources.&nbsp; If you open Task Manager you will see a series of “Failover Cluster Resource Host Subsystem” processes running, each of which consuming a few MB of RAM. <BR /><BR />In general clustering will self-manage misbehaving resources.&nbsp; Resources will be given a chance to play nicely with everyone else, and if they don’t they will be automatically isolated to minimize impact.&nbsp; So it is generally recommended to stay with the default values. </P> <H2>Improvements in Windows Server 2012</H2> <P><BR />There are some feature enhancements in Windows Server 2012 to mitigate the impact of non-responsive resource recovery. <BR /><BR /></P> <P><STRONG> Resource Re-attach </STRONG> :&nbsp; When a resource goes unresponsive the RHS process will recycle just as before, but any healthy resources in a running state will have their resources re-attach to the new RHS process without having to be restarted.&nbsp; This means that impact from recovery is reduced, just 1 VM gets restarted and the other 4 are not impacted. <BR /><BR /></P> <UL> <UL> <LI>The resource DLL must support resource re-attach to be compatible with this new feature.&nbsp; In-box resource types such as Virtual Machine and Physical Disk have been enhanced in Windows Server 2012 to take advantage of this new feature.</LI> </UL> <LI><STRONG>Isolation of Core resources </STRONG> :&nbsp; Resources are now segmented by default into multiple RHS processes to keep application resources deadlocks from impacting core cluster functionality <BR /><BR /></LI> <UL> <LI>All in-box resources (in ClusRes.dll) run in a dedicated Core RHS process</LI> <LI>All “Physical Disk” resources run in a dedicated Storage RHS process</LI> <LI>3rd party resources run in a dedicated RHS process</LI> </UL> </UL> <P>Additionally resources can also be marked with the SeparateMonitor property to run in their own dedicated RHS process in Windows Server 2012, as they could in previous releases. </P> <H2>How to Troubleshoot RHS Recovery</H2> <P><BR />Everything we have discussed in this blog to this point has describing the expected behavior of how Failover Clustering recovers when something goes wrong with a resource and it becomes unresponsive.&nbsp; Now the most important question… <STRONG> <EM> What do you do about it? </EM> </STRONG> <BR /><BR /><STRONG> Troubleshooting Steps: </STRONG> <BR /><BR /></P> <OL> <OL> <LI>Open Event Viewer and look for an Event ID 1230</LI> </OL> </OL> <P>&nbsp;</P> <OL> <OL> <LI>Identify the date / time as well as the resource name and resource type</LI> </OL> </OL> <P>&nbsp;</P> <OL> <OL> <LI>Generate the cluster.log with the <A href="#" target="_blank" rel="noopener"> Get-ClusterLog </A> cmdlet</LI> </OL> </OL> <P>&nbsp;</P> <OL> <OL> <LI>Go to the C:\Windows\Cluster\Reports folder and open the Cluster.log file</LI> </OL> </OL> <P>&nbsp;</P> <OL> <OL> <LI>Using the time stamp from the Event ID 1230 find the point of the failure <BR /><BR /> <UL> <UL> <LI>Reminder:&nbsp; The event log is in local time and the cluster.log is in GMT.&nbsp; With Windows Server 2012 you can use the Get-ClusterLog –UseLocalTime to generate the Cluster.log in local time.&nbsp; This will make correlating with the event log easier.</LI> </UL> </UL> <BR /><BR /></LI> </OL> </OL> <P>&nbsp;</P> <OL> <OL> <LI>Identify which entry point was being called to the resource.&nbsp; There will be a log entry something similar to: <BR />ERR&nbsp;&nbsp; [RHS] RhsCall::DeadlockMonitor: Call ISALIVE timed out for resource 'ResourceName'. <BR />INFO&nbsp; [RHS] Enabling RHS termination watchdog with timeout 1200000 and recovery action 3. <BR />ERR&nbsp;&nbsp; [RHS] Resource ResourceName handling deadlock. Cleaning current operation and terminating RHS process.</LI> </OL> </OL> <P>&nbsp;</P> <OL> <OL> <LI>Look up what that entry point for that resource type is attempting to do.&nbsp; For in-box resources you will find them documented here <BR /><A href="#" target="_blank" rel="noopener"> KB914458 </A> – Behavior of the LooksAlive and IsAlive functions for the resources that are included in the Windows server Clustering component of Windows Server 2003.</LI> </OL> </OL> <P>&nbsp;</P> <OL> <OL> <LI>Now that you understand what entry point was being called to which resource and when, you need to investigate the underlying component.&nbsp; For example: <BR /><BR /> <OL> <OL> <LI>Physical Disk resource IsAlive will effectively attempt to enumerate the file system, so you should troubleshoot your storage subsystem in why I/O’s are not completing.</LI> </OL> </OL> <BR /> <OL> <OL> <LI>File Server LooksAlive will attempt to retrieve the properties of the SMB shares, so you should troubleshoot the Server service.</LI> </OL> </OL> <BR /><BR /></LI> </OL> </OL> <P><BR /><BR /><STRONG> Advanced Troubleshooting: </STRONG> <BR /><BR /></P> <OL> <OL> <LI>When RHS recovers from a deadlock it will generate a Windows Error Reporting report and a user-mode dump of the RHS process.&nbsp; With the user-mode dump you can determine what the resource DLL was attempting to do when it failed to respond.&nbsp; See this blog for details in how to debug the user-mode dump to troubleshoot the resource <A href="#" target="_blank" rel="noopener"> http://blogs.msdn.com/b/ntdebugging/archive/2011/05/30/what-is-in-a-rhs-dump-file-created-by-windows-error-reporting.aspx </A> <BR /><BR /> <UL> <UL> <LI>Note:&nbsp; The KB article 914458 will generally provide sufficient information on what the resource DLL was attempting to do and this should not normally be necessary.</LI> </UL> </UL> <BR /><BR /></LI> </OL> </OL> <P>&nbsp;</P> <OL> <OL> <LI>To help pinpoint root cause just having a user-mode dump may not be enough, you can also configure RHS to bugcheck the box and generate a full memory dump.&nbsp; This can be enabled by setting the following registry DWORD to a value of 3 <BR />HKEY_LOCAL_MACHINE\Software\Microsoft\Windows NT\CurrentVersion\Failover Clusters\DebugBreakOnDeadlock <BR />Starting with Windows Server 2012 R2 set the following&nbsp;registry DWORD to a value of 3: <BR />HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\ClusSvc\Parameters\DebugBreakOnDeadlock</LI> </OL> </OL> <P><STRONG>NOTE:</STRONG>&nbsp;<EM>DebugBreakOnDeadlock will only create a dump if the RHS process itself deadlocks, not a resource.&nbsp; When a resource deadlocks, RHS will attempt to terminate it.&nbsp; As part of the termination, it should create a WER Report with a small heap and process dump.&nbsp; Once those are completed, RHS will terminate the resource.&nbsp; If it is successful in terminating, then RHS itself will not deadlock.&nbsp; Since RHS itself does not deadlock, no dump is created.&nbsp; So this may be something not needed.</EM><BR /><BR />The key take-away is that RHS recovery is expected behavior for a resource that has become unresponsive.&nbsp; To address the root cause issue you need to dig in to which resource is failing and then by understanding what it was attempting to do, you can identify why it didn’t respond. <BR /><BR />For additional information on troubleshooting resources that result in RHS recovery, see the blogs below.&nbsp; Microsoft support is also available to assist in advanced debugging to help you identify root cause. </P> <H2>Additional Resources</H2> <P><BR />Resource Hosting Subsystem (RHS) In Windows Server 2008 Failover Clusters <BR /><A href="#" target="_blank" rel="noopener"> http://blogs.technet.com/b/askcore/archive/2009/11/23/resource-hosting-subsystem-rhs-in-windows-server-2008-failover-clusters.aspx </A> <BR /><BR />Thanks! <BR />Elden Christensen <BR />Principal&nbsp;PM Manager <BR />Clustering &amp; High-Availability <BR />Microsoft</P> Wed, 08 May 2019 16:37:38 GMT https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/understanding-how-failover-clustering-recovers-from-unresponsive/ba-p/371847 Elden Christensen 2019-05-08T16:37:38Z How to Setup a Failover Cluster in a RODC Environment https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/how-to-setup-a-failover-cluster-in-a-rodc-environment/ba-p/371846 <HTML> <HEAD></HEAD><BODY> <STRONG> First published on MSDN on Dec 13, 2012 </STRONG> <BR /> In Windows Server 2012, a Failover Cluster can be created in an environment that has access only to a Read Only Domain Controller (RODC) but not a Read Write Domain Controller (RWDC). This deployment model can be useful in a branch office with unreliable network connectivity or in a perimeter network (DMZ) where the cluster resides outside a firewall. <BR /> <BR /> In a previous blog, we discussed <A href="#" target="_blank"> how a cluster can be created in a restrictive active directory environment </A> . In the blog, we explained the role of a Cluster Name Object (CNO) and Virtual Computer Object (VCO) in a Failover Cluster. With a Read Only Domain Controller, the Cluster Service is unable to create a CNO or VCO. Therefore, these computer objects will need to be pre-created on a RWDC and then replicated to the cluster RODC, before the cluster creation process is commenced. This blog provides the steps on how this can be done: 1) Using the graphical interface 2) Using Windows PowerShell© These steps should be followed to first pre-create a CNO (computer object that has the same name as your cluster) and a VCO for each clustered service or application (computer object has the same name as the clustered service or application and is created in the same container as the CNO). <BR /> <H2> Steps to configure the CNO and VCO for a RODC Environment: </H2> <BR /> 1.&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;On an RWDC&nbsp;launch the Active Directory Users and Computers snap-in (type dsa.msc) or to configure using a script open a Windows PowerShell© prompt in Administrator mode. <BR /> <BR /> 2.&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Right-click Computers or the organizational unit (OU) container in which computer accounts are created in your domain and create a new Computer object for your cluster CNO (Cluster Name) or VCO(Clustered Application or Service Name): <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90420iB3ABF74DD2F064F2" /> <BR /> Using PowerShell: <BR /> To create the Computer object in the default Computers container: <BR /> new-adComputer -name “myclusterCNO” -dnshostname “testcluster.com” <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90421i53B5272568C7C1B8" /> <BR /> <BR /> To Create the Computer object in an alternate OU: <BR /> new-adComputer -name “myclusterCNO” -dnshostname “testcluster.com” -Path $OUDistinguishName -Enabled $true <BR /> 3.&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; For a CNO, give the user account that will be used to create the cluster, full control of the computer object created. For VCOs, ensure that you give the Cluster account (CNO) full permission to access the object. For instance for a cluster myclusterCNO in domain testcluster, the account testcluster\myclusterCNO should have permission to the VCO. <BR /> <UL> <BR /> <LI> On the View menu ensure that Advanced Features is selected. </LI> <BR /> <LI> Right-click on the computer object created in step 2 and select Properties: </LI> <BR /> </UL> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90422i156CBBCB75287B71" /> <BR /> <BR /> <BR /> <UL> <BR /> <LI> Select the Security tab and add the user account used for cluster creation. </LI> <BR /> <LI> Select the newly created user account and give it Full Control for the computer object: </LI> <BR /> </UL> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90423i595C922EAA152384" /> <BR /> Using PowerShell: <BR /> $objUser = New-Object System.Security.Principal.NTAccount(“domain\user”) <BR /> $objADAR = New-Object System.DirectoryServices.ActiveDirectoryAccessRule($objUser, “GenericAll”,"Allow") <BR /> $adName = get-AdComputer -Identity “myclusterCNO” <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90424iA3D197C6E298392E" /> <BR /> <BR /> <BR /> $targetObj = get-adobject -Identity $adName.DistinguishedName -properties * <BR /> $ntSecurityObj = $targetObj.nTSecurityDescriptor <BR /> $ntSecurityObj.AddAccessRule($objADAR) <BR /> Set-ADObject $adName –Replace @{ntSecurityDescriptor=$ntSecurityObj} <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90425i3F51118B83B8A66A" /> <BR /> <BR /> You can verify through the graphical interface that the permissions have now propagated for the user account: <BR /> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90426iEE0BBEF64E206A6F" /> <BR /> <BR /> 4.&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Next modify the following attributes for the computer object: <BR /> <TABLE> <TBODY><TR> <TD> <BR /> <P> Attribute Name </P> <BR /> </TD> <TD> <BR /> <P> Value </P> <BR /> </TD> </TR> <TR> <TD> <BR /> <P> Dnshostname </P> <BR /> </TD> <TD> <BR /> <P> &lt;FQDN&gt; </P> <BR /> </TD> </TR> <TR> <TD> <BR /> <P> sAMAccountName </P> <BR /> <P> Must be less than 15 characters </P> <BR /> </TD> <TD> <BR /> <P> &lt;Cluster name&gt;$ </P> <BR /> <P> </P> <BR /> </TD> </TR> <TR> <TD> <BR /> <P> msds-supportedencryptiontypes </P> <BR /> <P> </P> <BR /> </TD> <TD> <BR /> <P> 28 </P> <BR /> </TD> </TR> <TR> <TD> <BR /> <P> Service Principle Name </P> <BR /> </TD> <TD> <BR /> <P> List which includes the following entries: </P> <BR /> <P> Host/&lt;computer object name&gt; </P> <BR /> <P> Host/&lt;FQDN&gt; </P> <BR /> <P> MSClusterVirtualServer/&lt;computer object Name&gt; </P> <BR /> <P> MSClusterVirtualServer/&lt;FQDN&gt; </P> <BR /> <P> MSServerClusterMgmtAPI/&lt;Computer Object Name&gt; </P> <BR /> <P> MSServerClusterMgmtAPI/&lt;FQDN&gt;, </P> <BR /> <P> For CNO also add: </P> <BR /> <P> MSServerCluster/&lt;computer object Name&gt; </P> <BR /> <P> MSServerCluster/&lt;FQDN&gt; </P> <BR /> <P> </P> <BR /> </TD> </TR> </TBODY></TABLE> <BR /> <BR /> <UL> <BR /> <LI> You can modify the attributes by selecting the Attribute Editor tab on the computer object properties page: </LI> <BR /> </UL> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90427iD3DB43C3A73D8291" /> <BR /> Using PowerShell: <BR /> $adName = get-AdComputer -Identity “myclusterCNO” <BR /> $dn = $adName.DistinguishedName <BR /> set-adcomputer -Identity $dn -add @{'msds-supportedencryptiontypes'= 28} <BR /> set-adComputer -Identity $dn&nbsp; -ServicePrincipalName @{Add="Host/myclusterCNO", "Host/testcluster.com", "MSClusterVirtualServer/myclusterCNO", "MSClusterVirtualServer/testcluster.com", "MSServerClusterMgmtAPI/myclusterCNO", "MSServerClusterMgmtAPI/testcluster.com"} <BR /> For CNO also add: <BR /> set-adComputer -Identity $dn&nbsp; -ServicePrincipalNames @{Add=" MSServerCluster/myclusterCNO", "MSServerCluster/testcluster.com"} <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90428i3D06E8C758D93E0A" /> <BR /> <BR /> <BR /> <BR /> 5.&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Add the CNO or the VCO SAM account name to the Allow RODC Password Replication Group <BR /> <UL> <BR /> <LI> Select the Domain Controller container from dsa.msc </LI> <BR /> <LI> Right-click on the Computer Object corresponding to the RODC </LI> <BR /> </UL> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90429i8BCFAAC27AFB6737" /> <BR /> <UL> <BR /> <LI> Select the Password Replication Policy tab in the property pane for the RODC Computer Object. </LI> <BR /> <LI> Add the CNO and VCO SAM account names(with $ at the end)&gt; to the Allow RODC Password Replication Group: </LI> <BR /> </UL> <BR /> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90430i47E396BA279A1547" /> <BR /> Using PowerShell: <BR /> Add-ADDomainControllerPasswordReplicationPolicy -Identity “RODC” -AllowedList "testCluster$","vcoName$” <BR /> <UL> <BR /> <LI> Supply the CNO and VCO SAM account name(with $ at the end)&nbsp; as arguments to the &nbsp;AllowedList parameter </LI> <BR /> </UL> <BR /> 6.&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Finally, replicate the CNO or VCO computer object created on the RWDC to the RODC: <BR /> repadmin /rodcpwdrepl&nbsp; &lt;RODC server name&gt; &lt;RWDC server name&gt; &lt;distinguished name of the CNO or VCO without quotes e.g.: CN=myClusterCNO,CN=Computers,DC=testcluster,DC=com &gt; <BR /> Now that you have the computer objects pre-staged and replicated to your RODC, you are ready to create a cluster in a RODC environment. In a previous blog we provided the <A href="#" target="_blank"> steps to create a Failover Cluster </A> . <BR /> <BR /> <BR /> <BR /> Thanks! <BR /> Subhasish Bhattacharya <BR /> Senior Program Manager <BR /> Clustering and High Availability <BR /> Microsoft Corporation </BODY></HTML> Fri, 15 Mar 2019 21:33:56 GMT https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/how-to-setup-a-failover-cluster-in-a-rodc-environment/ba-p/371846 Rob Hindman 2019-03-15T21:33:56Z Tuning Failover Cluster Network Thresholds https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/tuning-failover-cluster-network-thresholds/ba-p/371834 <P>Windows Server Failover Clustering is a high availability platform that is constantly monitoring the network connections and health of the nodes in a cluster.&nbsp; If a node is not reachable over the network, then recovery action is taken to recover and bring applications and services online on another node in the cluster. <BR /><BR />Failover Clustering by default is configured to deliver the highest levels of availability, with the smallest amount of downtime.&nbsp; The default settings out of the box are optimized for failures where there is a complete loss of a server, what we will refer to in this blog as a ‘hard’ failure.&nbsp; These would be unrecoverable failure scenarios such as the failure of non-redundant hardware or power.&nbsp; In these situations the server is lost and the goal is for Failover Clustering to very quickly detect the loss of the server and rapidly recover on another server in the cluster.&nbsp; To accomplish this fast recovery from hard failures, the default settings for cluster health monitoring are fairly aggressive.&nbsp; However, they are fully configurable to allow flexibility for a variety of scenarios. <BR /><BR />These default settings deliver the best behavior for most customers, however as clusters are stretched from being inches to possibly miles apart the cluster may become exposed to additional and potentially unreliable networking components between the nodes.&nbsp; Another factor is that the quality of commodity servers is constantly increasing, coupled with augmented resiliency through redundant components (such as dual power supplies, NIC teaming, and multi-path I/O), the number of non-redundant hardware failures may potentially be fairly rare.&nbsp; Because hard failures may be less frequent some customers may wish to tune the cluster for transient failures, where the cluster is more resilient to brief network failures between the nodes.&nbsp; By increasing the default failure thresholds you can decrease the sensitivity to brief network issues that last a short period of time.</P> <H2>Trade-offs</H2> <P>It is important to understand that there is no right answer here, and the optimized setting may vary by your specific business requirements and service level agreements.&nbsp;</P> <UL> <UL> <LI><STRONG> Aggressive Monitoring </STRONG> – Provides the fastest failure detection and recovery of hard failures, which delivers the highest levels of availability.&nbsp; Clustering is less forgiving of transient failures, and may in some situations prematurely failover resources when there are transient network outages.</LI> </UL> </UL> <UL> <UL> <LI><STRONG> Relaxed Monitoring </STRONG> – Provides more forgiving failure detection which provides greater tolerance of brief transient network issues.&nbsp; These longer time-outs will&nbsp;result in cluster recovery from hard failures taking more time and increasing downtime.</LI> </UL> </UL> <P><BR />Think of it like your cell phone, when the other end goes silent how long are you willing to sit there going <EM> “Hello?... Hello?... Hello?” </EM> before you hang-up the phone and call the person back.&nbsp; When the other end goes silent, you don’t know when or even if they will come back. <BR /><BR />The key question you need to ask yourself is:&nbsp; What is more important to you?&nbsp; To quickly recover when you pull out the power cord or to be tolerant to a network hiccup?</P> <H2>Settings</H2> <P>There are four primary settings that affect cluster heartbeating and health detection between nodes.&nbsp;</P> <UL> <UL> <LI><STRONG> Delay </STRONG> – This defines the frequency at which cluster heartbeats are sent between nodes.&nbsp; The delay is the number of seconds before the next heartbeat is sent.&nbsp; Within the same cluster there can be different delays between nodes on the same subnet, between nodes which are on different subnets, and in Windows Server 2016 between nodes in different fault domain sites.</LI> </UL> </UL> <UL> <UL> <LI><STRONG> Threshold </STRONG> – This defines the number of heartbeats which are missed before the cluster takes recovery action.&nbsp; The threshold is a number of heartbeats.&nbsp; Within the same cluster there can be different thresholds between nodes on the same subnet, between nodes which are on different subnets,&nbsp;and in Windows Server 2016 between nodes in different fault domain sites.</LI> </UL> </UL> <P>It is important to understand that both the delay and threshold have a cumulative effect on the total health detection.&nbsp; For example setting CrossSubnetDelay to send a heartbeat every 2 seconds and setting the CrossSubnetThreshold to 10 heartbeats missed before taking recovery, means that the cluster can have a total network tolerance of 20 seconds before recovery action is taken.&nbsp; In general, continuing to send frequent heartbeats but having greater thresholds is the preferred method.&nbsp; The primary scenario for increasing the Delay, is if there are ingress / egress charges for data sent&nbsp;between nodes.&nbsp; The table below lists properties to tune cluster heartbeats along with default and maximum values.</P> <TABLE> <TBODY> <TR> <TD><STRONG> Parameter </STRONG></TD> <TD><STRONG> Win2012 R2 </STRONG></TD> <TD><STRONG> Win2016 </STRONG></TD> <TD><STRONG> Maximum </STRONG></TD> </TR> <TR> <TD><STRONG> SameSubnetDelay </STRONG></TD> <TD>1 second</TD> <TD>1 second</TD> <TD>2 seconds</TD> </TR> <TR> <TD><STRONG> SameSubnetThreshold </STRONG></TD> <TD>5 heartbeats</TD> <TD>10 heartbeats</TD> <TD>120 heartbeats</TD> </TR> <TR> <TD><STRONG> CrossSubnetDelay </STRONG></TD> <TD>1 second</TD> <TD>1 seconds</TD> <TD>4 seconds</TD> </TR> <TR> <TD><STRONG> CrossSubnetThreshold </STRONG></TD> <TD>5 heartbeats</TD> <TD>20 heartbeats</TD> <TD>120 heartbeats</TD> </TR> <TR> <TD><STRONG> CrossSiteDelay </STRONG></TD> <TD>NA</TD> <TD>1 second</TD> <TD>4 seconds</TD> </TR> <TR> <TD><STRONG> CrossSiteThreshold </STRONG></TD> <TD>NA</TD> <TD>20 heartbeats</TD> <TD>120 heartbeats</TD> </TR> </TBODY> </TABLE> <P><BR />To be more tolerant of transient failures it is recommended on Win2008&nbsp;/ Win2008 R2 / Win2012 /&nbsp;Win2012 R2 to increase the SameSubnetThreshold and CrossSubnetThreshold values to the higher Win2016 values.&nbsp; Note:&nbsp; If the Hyper-V role is installed on a Windows Server 2012 R2 Failover Cluster, the SameSubnetThreshold default will automatically be increased to 10 and the CrossSubnetThreshold default will automatically be increased to 20.&nbsp; After installing the following hotfix the default heartbeat values will be increased on Windows Server 2012 R2 to the Windows Server 2016 values. <BR /><BR /><A href="#" target="_blank" rel="noopener"> https://support.microsoft.com/en-us/kb/3153887 </A> <BR /><BR />Disclaimer:&nbsp; When increasing the cluster thresholds, it is recommended to increase in moderation.&nbsp; It is important to understand that increasing resiliency to network hiccups comes at the cost of increased downtime when a hard failure occurs.&nbsp; In most customers’ minds, the definition of a server being down on the network is when it is no longer accessible to clients.&nbsp; Traditionally for TCP based applications this means the resiliency of the TCP reconnect window.&nbsp; While the cluster thresholds can be configured for durations of minutes, to achieve reasonable recovery times for clients it is generally not recommended to exceed the TCP reconnect timeouts.&nbsp; Generally, this means not going above ~20 seconds. <BR /><BR />It critical to recognize that cranking up the thresholds to high values does not fix nor resolve the transient network issue, it simply masks the problem by making health monitoring less sensitive.&nbsp; The #1 mistake made broadly by customers is the perception of not triggering cluster health detection means the issue is resolved (which is not true!).&nbsp; I like to think of it, that just because you choose not to go to the doctor it does not mean you are healthy.&nbsp; In other words, the lack of someone telling you that you have a problem does not mean the problem went away.</P> <H2>Configuration:</H2> <P>Cluster heartbeat configuration settings are considered advanced settings which are only exposed via PowerShell.&nbsp; These setting can be set while the cluster is up and running with no downtime and will take effect immediately with no need to reboot or restart the cluster. <BR /><BR />To view the current heartbeat configuration values:</P> <PRE>PS C:\&gt; get-cluster | fl *subnet* </PRE> <P><span class="lia-inline-image-display-wrapper lia-image-align-inline" style="width: 717px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90418i95848AF55D8FD1C2/image-size/large?v=v2&amp;px=999" role="button" /></span> <BR /><BR />The setting can be modified with the following syntax:</P> <PRE>PS C:\&gt; (get-cluster).SameSubnetThreshold = 20 </PRE> <P><span class="lia-inline-image-display-wrapper lia-image-align-inline" style="width: 562px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90419i21AEE2943DD4FE4A/image-size/large?v=v2&amp;px=999" role="button" /></span> <BR /><BR /><BR /></P> <H3>Additional Considerations for Logging:</H3> <P>In Windows Server 2012 there is additional logging to the Cluster.log for heartbeat traffic when heartbeats are dropped.&nbsp; By default the RouteHistoryLength setting is set 10, which is two times the number of default thresholds.&nbsp; If you increase the SameSubnetThreshold or CrossSubnetThrehold values, it is recommended to increase the RouteHistoryLength value to be twice the value to ensure that if the time arises that you need to troubleshoot heartbeat packets being dropped that there is sufficient logging.&nbsp; This can be done with the following syntax:</P> <PRE>PS C:\&gt; (get-cluster).RouteHistoryLength = 20 </PRE> <P>For more information on troubleshooting issues with nodes being removed from cluster membership due to network communication issues, please see the following blog: <BR /><A href="#" target="_blank" rel="noopener"> http://blogs.technet.com/b/askcore/archive/2012/02/08/having-a-problem-with-nodes-being-removed-from-active-failover-cluster-membership.aspx </A></P> <H2><BR />Additional Resources:</H2> <P>To learn more&nbsp;see this Failover Cluster Networking Essentials session I did at TechEd: <BR /><IFRAME src="https://channel9.msdn.com/Events/TechEd/NorthAmerica/2013/MDC-B337/player" width="960" height="540" frameborder="0"> </IFRAME> <BR />Thanks! <BR />Elden Christensen <BR />Principal PM Manager <BR />Clustering &amp; High-Availability <BR />Microsoft</P> Thu, 08 Aug 2019 15:36:48 GMT https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/tuning-failover-cluster-network-thresholds/ba-p/371834 Elden Christensen 2019-08-08T15:36:48Z BUILD 2012: Designing applications for highly-availability with Windows Server 2012 Failover Clustering https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/build-2012-designing-applications-for-highly-availability-with/ba-p/371831 <HTML> <HEAD></HEAD><BODY> <STRONG> First published on MSDN on Nov 05, 2012 </STRONG> <BR /> BUILD 2012 wrapped up last week.&nbsp; In case you were not able to make it, the sessions are now posted that you can watch.&nbsp; Here is a link to the cluster session. <BR /> <BR /> <UL> <LI> <STRONG> Session: </STRONG> 3-051 </LI> <LI> <STRONG> Title: </STRONG> Designing applications for highly-availability with Windows Server 2012 Failover Clustering </LI> <LI> <STRONG> Abstract: </STRONG> Learn how to leverage the power of the Failover Clustering features in Windows Server 2012 to deliver your application the highest levels of availability.&nbsp; This session will cover the options available to integrate your application to achieve failover, considerations when building on top of Microsoft’s clustered file system (CSVFS), and how to deliver increased availability when running your application on a private cloud by leveraging Guest Clustering and the new VM Monitoring feature. </LI> <LI> <STRONG> Link: </STRONG> <A href="#" target="_blank"> https://channel9.msdn.com/Events/Build/2012/3-051 </A> </LI> </UL> <BR /> <BR /> Remember that BUILD is a developer conference, so this session is focused at ISV's, IHV's, OEM's, or any developer working on integrating with or leveraging Failover Clustering to achieve high availability with your applications and services. <BR /> <BR /> <P> Thanks! <BR /> Elden Christensen <BR /> Principal Program Manager Lead <BR /> Clustering &amp; High-Availability <BR /> Microsoft </P> </BODY></HTML> Fri, 15 Mar 2019 21:31:23 GMT https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/build-2012-designing-applications-for-highly-availability-with/ba-p/371831 Elden Christensen 2019-03-15T21:31:23Z VM Monitoring in Windows Server 2012 – Frequently Asked Questions https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/vm-monitoring-in-windows-server-2012-8211-frequently-asked/ba-p/371830 <HTML> <HEAD></HEAD><BODY> <STRONG> First published on MSDN on Oct 30, 2012 </STRONG> <BR /> <P> In a previous <A href="#" target="_blank"> blog post </A> I explained how VM Monitoring can be configured in Window Server 2012. In this blog I will answer the three most frequently asked questions related to the VM Monitoring feature. </P> <BR /> <P> <B> 1) </B> <B> Why can I no longer make my Print Server role Highly Available in Windows Server 2012? I was able to do this in Windows Server 2008 R2. </B> </P> <BR /> <P> In contrast to previous versions of Windows Server, Windows Server 2012 defines a highly available print server as a Hyper-V virtual machine(VM) running on a node in a cluster. A single virtual machine with the Print Server role installed can then be migrated from one node in the Hyper-V cluster to the other using either manual or automatic methods. </P> <BR /> <P> In Windows Server 2012, the print spooler service is no longer a clustered resource and instead the entire virtual machine is migrated from one Hyper-V node to the other. This new model provides the same seamless user experience as previous versions of Windows but with the following added benefits: </P> <BR /> <UL> <BR /> <LI> Using Windows Server 2012 as the Hyper-V and failover clustering host allows access to the <B> VM Monitoring </B> feature. This allows greater flexibility and control over recovery actions. <B> </B> </LI> <BR /> <LI> Windows Server 2012 Print Servers can utilize the Live Migration and Quick Migration features of Hyper-V. </LI> <BR /> <LI> Windows Server 2012 Highly Available Print Servers are easier to deploy and have reduced complexity. </LI> <BR /> <LI> Print devices and drivers are deployed the same as on a physical machine which provides consistency for management.&nbsp; When deployed in a virtual machine, availability can be enhanced using the VM Monitoring feature. </LI> <BR /> <LI> Printer manufacturers will have a single driver so they can focus on higher quality and reduce the cost for creating drivers. </LI> <BR /> </UL> <BR /> <P> In a nutshell, using the VM Monitoring feature, the new print spooler HA model is able to streamline the deployment and management while providing higher availability for your users. </P> <BR /> <P> For additional information please refer to: </P> <BR /> <P> <A href="#" target="_blank"> High Availability Printing Overview </A> </P> <BR /> <P> <A href="#" target="_blank"> Install and Configure High Availability Printing </A> </P> <BR /> <P> <B> Note: </B> </P> <BR /> <UL> <BR /> <LI> The default VM Monitoring configuration will reboot or failover the virtual machine for every third Print Service failure in a 15 minute window. The duration of this virtual machine reboot is typically in the order of seconds. During this interval the print service will not be available. Later in this blog I will explain how this default recovery action can be customized. </LI> <BR /> <LI> During patching, the Print Service on the hosted virtual machine can be temporarily unavailable, if a reboot is required. The impact of this planned downtime can be mitigated by having an additional Print Server hosted on a Hyper-V node as a backup. </LI> <BR /> </UL> <BR /> <P> <B> 2) </B> <B> I want to configure VM Monitoring for a mission critical Virtual Machine. I do not want to take automated recovery actions such as rebooting my VM. I want to be notified when my VM encounters a critical condition so that my administrator can investigate the failure. How do I do this? </B> </P> <BR /> <P> <B> I have System Center Operations Manager deployed on my host (cluster node hosting the VM). How do I configure Operations Manager to work with VM Monitoring? </B> </P> <BR /> <P> <B> I want to customize the recovery action taken by the VM Monitoring. I don’t want to restart the VM or failover the VM on a failure. How do I do this? </B> </P> <BR /> <P> If the <B> “Enable automatic recovery for application health monitoring” </B> option is deselected, the cluster service does not take any automatic recovery actions when a VM critical condition occurs. It does however log event ID 1250, to indicate that a critical condition occurred in your VM. To deselect this setting: </P> <BR /> <P> <B> Using Failover Cluster Manager: </B> </P> <BR /> <P> A)&nbsp;&nbsp;&nbsp;&nbsp; Select the VM that you want to configure this setting for </P> <BR /> <P> B)&nbsp;&nbsp;&nbsp;&nbsp; Click on the <B> Resources </B> tab </P> <BR /> <P> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90414iACB4FFE9E64DA630" /> </P> <BR /> <P> C)&nbsp;&nbsp;&nbsp;&nbsp; Right click on the Virtual Machine resource and select the <B> Properties </B> option. </P> <BR /> <P> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90415i0AFD9A660BB2931E" /> </P> <BR /> <P> </P> <BR /> <P> D)&nbsp;&nbsp;&nbsp;&nbsp; Select the <B> Settings </B> tab and uncheck <B> Enable automatic recovery for application health monitoring </B> </P> <BR /> <P> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90416i485138F70F227C42" /> </P> <BR /> <P> </P> <BR /> <P> <B> Using Windows PowerShell <SUP> © </SUP> </B> </P> <BR /> <P> a)&nbsp;&nbsp;&nbsp;&nbsp; Open a Windows PowerShell shell as an Administrator </P> <BR /> <P> b)&nbsp;&nbsp;&nbsp;&nbsp; Set the EmbeddedFailureAction property for the VM resource: </P> <BR /> <P> (Get-ClusterResource "*e test-VM").EmbeddedFailureAction = 1 </P> <BR /> <P> <B> Note: </B> To re-enable automatic recovery actions on VM Critical failures this property should be set to 2 (default). <B> </B> </P> <BR /> <P> You can monitor <B> Event 1250 </B> to customize recovery action on VM Critical failures. Some options include: </P> <BR /> <P> A) <A href="#" target="_blank"> Setting up a Cluster Scheduled task </A> to carry out a desired sequence of actions on the occurrence of the VM event or service failure being monitored e.g.: Initiate a live migration on a VM network failure or send an email to a cluster administrator indicating the failure condition. </P> <BR /> <P> B)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Configure System Center Operations Manager to take recovery actions when the event is triggered on the host. </P> <BR /> <P> C)&nbsp;&nbsp;&nbsp;&nbsp; Use a 3 <SUP> rd </SUP> party solution such as Symantec ApplicationHA <B> <SUP> © </SUP> </B> for Hyper-V which provides advanced customization of recovery actions. </P> <BR /> <P> </P> <BR /> <P> The administrator can investigate the VM in critical state as follows: </P> <BR /> <P> a)&nbsp;&nbsp;&nbsp;&nbsp; Log onto the VM </P> <BR /> <P> b)&nbsp;&nbsp;&nbsp;&nbsp; Launch Task Scheduler </P> <BR /> <P> c)&nbsp;&nbsp;&nbsp;&nbsp; Navigate to the <B> Microsoft/FailoverClustering/VM Monitoring </B> node </P> <BR /> <P> d)&nbsp;&nbsp;&nbsp;&nbsp; Examine when the last event or service failure occurred. </P> <BR /> <P> e)&nbsp;&nbsp;&nbsp;&nbsp; Once the failure has been examined and appropriate recovery actions taken, the VM can be removed from Critical State by running the following Windows PowerShell cmdlet as an Administrator on the guest: </P> <BR /> <P> Reset-ClusterVMMonitoredState </P> <BR /> <P> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90417iD5F32F10084DD2F9" /> </P> <BR /> <P> </P> <BR /> <P> <B> 3) </B> <B> The Virtual Machine I want to monitor is not in the same domain as the cluster node it is hosted on. Can I configure VM Monitoring? </B> </P> <BR /> <P> In this configuration VM Monitoring needs to be configured using Windows PowerShell by logging into the guest (virtual machine). </P> <BR /> <P> <B> Pre-requisites: </B> </P> <BR /> <UL> <BR /> <LI> <A href="#" target="_blank"> Failover Clustering Admin Tools installed </A> in the guest. This installs the FailoverClusters PowerShell module </LI> <BR /> </UL> <BR /> <P> </P> <BR /> <P> <B> Steps to configure in the guest using PowerShell: </B> </P> <BR /> <P> a)&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;Open a Windows PowerShell shell as an Administrator </P> <BR /> <P> b)&nbsp;&nbsp;&nbsp;&nbsp; Run the Add-ClusterVMMoniteredItem cmdlet inside the guest to configure monitoring </P> <BR /> <P> </P> <BR /> <P> <B> Example: </B> Add-ClusterVMMonitoredItem –service spooler </P> <BR /> <P> </P> <BR /> <P> Thanks! </P> <BR /> <P> Subhasish Bhattacharya </P> <BR /> <P> Program Manager </P> <BR /> <P> Clustering and High Availability </P> <BR /> <P> Microsoft </P> <BR /> <P> </P> </BODY></HTML> Fri, 15 Mar 2019 21:31:18 GMT https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/vm-monitoring-in-windows-server-2012-8211-frequently-asked/ba-p/371830 Rob Hindman 2019-03-15T21:31:18Z How to Configure BitLocker Encrypted Clustered Disks in Windows Server 2012 https://gorovian.000webhostapp.com/?exam=t5/failover-clustering/how-to-configure-bitlocker-encrypted-clustered-disks-in-windows/ba-p/371825 <HTML> <HEAD></HEAD><BODY> <STRONG> First published on MSDN on Jul 20, 2012 </STRONG> <BR /> Windows Server 2012 introduces the ability to encrypt Cluster Shared Volumes (CSV) using BitLocker®. &nbsp;You can learn more about BitLocker <A href="#" target="_blank"> here </A> . <BR /> <BR /> Data on a lost or stolen storage is vulnerable to unauthorized access, either by running a software-attack tool against it or by transferring the storage to a different server. BitLocker helps mitigate unauthorized data access by enhancing file and system protections. BitLocker also helps render data inaccessible when BitLocker-protected storage is decommissioned or recycled. BitLocker on a Clustered Disk, either a traditional Physical Disk Resource (PDR) or Cluster Shared Volume &nbsp;therefore allows for an additional layer of protection for administrators wishing to protect sensitive, highly available data. By adding additional protectors to the clustered volume, administrators can also add an additional barrier of security to resources within an organization by allowing only certain user accounts access to unlock the BitLocker volume. <BR /> <BR /> This blog outlines the sequence of steps to configure BitLocker on a Clustered disk using Windows PowerShell® <BR /> <H2> Prerequisites: </H2> <BR /> <UL> <BR /> <LI> A Windows Server 2012 Domain Controller (DC) is reachable from all nodes in the cluster. </LI> <BR /> <LI> Enable the policy - <STRONG> Choose how bitlocker-protected fixed drivers can be recovered. </STRONG> </LI> <BR /> <LI> The BitLocker Drive Encryption feature is installed on all nodes in the cluster. To install, open a Windows PowerShell console and run: </LI> <BR /> </UL> <BR /> <A href="#" target="_blank"> Add-WindowsFeature </A> BitLocker <BR /> <P> Note: The cluster node will need to be restarted after installing the BitLocker Drive Encryption feature. </P> <BR /> <BR /> <UL> <BR /> <LI> Ensure that the disk to be encrypted is formatted with NTFS. For a traditional PDR you need to assign a drive letter to the disk. For CSV you can use the mount point for the volume. Partition, initialize and format the disk if required. Open a Windows PowerShell console and run: </LI> <BR /> </UL> <BR /> <A href="#" target="_blank"> Initialize-Disk </A> -Number &lt;num&gt; -PartitionStyle &lt;style&gt; <BR /> <BR /> $partition = <A href="#" target="_blank"> New-Partition </A> -DiskNumber &lt;num&gt; -DriveLetter &lt;letter&gt; <BR /> <BR /> <A href="#" target="_blank"> Format-Volume </A> -Partition $partition -FileSystem NTFS <BR /> <H2> Steps to configure using Windows PowerShell </H2> <BR /> BitLocker Drive Encryption can be turned on for both traditional Failover cluster disks as well as Cluster Shared Volumes (CSV). BitLocker encrypts at the volume level, so if a clustered disk consists of more than one volume and you may want to protect the entire disk, by turning BitLocker protection on each volume of the disk. <BR /> <BR /> Volumes can be encrypted before adding them to a cluster. Additionally, data volumes already in use by clustered workloads can be encrypted. <BR /> <BR /> To configure, open a Windows PowerShell console and run the following steps: <BR /> <BR /> 1.&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; If the Clustered disk is currently added to a cluster and Online put it into maintenance mode. <BR /> <P> <EM> Traditional PDR: </EM> </P> <BR /> <BR /> <A href="#" target="_blank"> Get-ClusterResource </A> “Cluster Disk 1” | <A href="#" target="_blank"> Suspend-ClusterResource </A> <BR /> <P> <EM> Cluster Shared Volumes: </EM> </P> <BR /> <BR /> <A href="#" target="_blank"> Get-ClusterSharedVolume </A> “Cluster Disk 1” | <A href="#" target="_blank"> Suspend-ClusterResource </A> <BR /> <P> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90409i2A632220AC741AF2" /> </P> <BR /> 2.&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Configure BitLocker® on the volume using your choice of protector. <BR /> <BR /> <EM> To enable using a password protector: </EM> <BR /> $SecureString = ConvertTo-SecureString &lt;password&gt; -AsPlainText -Force <BR /> <BR /> Enable-BitLocker &lt;drive letter or CSV mount point&gt; -PasswordProtector –Password $SecureString <BR /> <EM> Recovery Password Protector </EM> <BR /> <BR /> Creating a recovery password and backing up the password in Active Directory (AD) provides a mechanism to restore access to a BitLocker protected drive in the event that the drive cannot be unlocked normally. A domain administrator can obtain the recovery password from AD and use it to unlock and access the drive. Some of the reasons a BitLocker recovery might be necessary include: <BR /> <UL> <BR /> <LI> The CNO used to establish a SID protector in step 4 has been accidently deleted from AD. </LI> <BR /> <LI> An attacker has modified your server. This is applicable for a computer with a Trusted Platform Module (TPM) because the TPM checks the integrity of boot components during startup </LI> <BR /> </UL> <BR /> <EM> To enable using a recovery password protector and backup the protector to Active Directory: </EM> <BR /> Enable-BitLocker &lt;drive letter or CSV mount point&gt; -RecoveryPasswordProtector <BR /> <P> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90410i84667C2EF2581EF9" /> </P> <BR /> <BR /> $protectorId = (Get-BitLockerVolume &lt;drive or CSV mount point&gt;).Keyprotector | Where-Object {$_.KeyProtectorType -eq "RecoveryPassword”} <BR /> Backup-BitLockerKeyProtector &lt;drive or CSV mount point&gt; -KeyProtectorId $protectorId.KeyProtectorId <BR /> <P> <IMG src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/90411i05670371E53BAB0E" /> </P> <BR /> <EM> To disable: </EM> <BR /> Disable-BitLocker &lt;drive letter or CSV mount point&gt; <BR /> <STRONG> Warning: It is important for you to capture and secure the password protector for future use. </STRONG> <BR /> <BR /> Note: During encryption, a CSV Volume will be in Redirected mode until BitLocker builds its metadata and watermark on all data present on the encrypted volume. The duration of redirected mode will be proportional to the size of the volume, the real data size and BitLocker encryption mode picked (DataOnly or Full). The BitLocker encryption rate is typically in the order of minutes per Giga Byte. The cluster service will switch to Direct I/O mode within 3 minutes after encryption has completed. <BR /> <BR /> 3.&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Determine the Cluster Name Object for your cluster: <BR /> $cno = (Get-Cluster).name + “$” <BR /> 4.&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Add an Active Directory Security Identifier (SID) to the CSV disk using the Cluster Name Object (CNO) <BR /> <BR /> The Active Directory protector is a domain security identifier (SID) based protector for protecting clustered volumes held within the Active Directory infrastructure. It can be bound to a user account, machine account or group.&nbsp; When an unlock request is made for a protected volume, t