locked
OCS - Automatic site failover RRS feed

  • Question

  • sip domain: companyname.org
    Exchange domain: companyname.org
    AD name: companyname.local
    External Access required: Yes
    Users: 5000

    Hope someone can help me out this, read through all the various blogs/whitepapers etc but still unclear.
    We have two sites (siteA & siteB) and looking to deploy OCS with all users connecting to the pool at siteA. However, should we completely lose siteA, then managment have said that OCS services should failover to siteB and all users regardless of their phyiscal location can use OCS services as before.

    What is the best way to acheive this ? - Is it possible to create one pool that covers both locations ?.
    I should mention that we have ISA 2006 here as well so any solution needs to include this.
    My gut feeling  is that I cant do automatic failover due to DNS changes etc that would need to happen on the external dns zone to redirect connections over to siteB but happy to be told I'm wrong!.

    The freebie OCS planing tool shows me 2x Edge and 2xFE at each location with the necessary load balancer(s) but no mention of how site failover would (if at all) work.

    TIA

    Andy
    Tuesday, August 12, 2008 1:34 PM

All replies

  • In OCS the pool (or standard server) owns the user account, and there's no supported way to do automatic failover.  The only supported way that a pool can span physical locations is if those locations are on the same LAN (typically a fiber run between sites).  However, you can be prepared for failover by running periodic backups.  Take a look at the OCS Backup and Restore Guide for more details and let us know if you have further questions after you've reviewed that.

     

    http://www.microsoft.com/downloads/details.aspx?familyid=5c6e6ac7-079a-4326-b517-3c117fadb44e&displaylang=en&tm

    Tuesday, August 12, 2008 1:59 PM
    Moderator
  • My understanding is that spanning a single pool across physical sites is unsupported by Microsoft even though this would be the most elegant way in my opinion to do both High Availability and Disaster Recovery.  Why? Because you achieve both results with the least amount of servers and do it without having to run manual restore procedures that are likely susceptible to failure if done incorrectly.  Not to mention the downtime involved in spinning up the secondary environment when disaster strikes.  Providing you can span a subnet across physical datacenters (something which most large organizations have been doing for some time now) and replicate the SQL database between those datacenters (again something that many organizations do today) you ought to be able to provide for example, 15,000 OCS users both HA and DR using a Consolidated Enterprise Topology using four front-end servers spanned across two physical sites in a single pool.  This conservative estimate would allow for a full site failure and still facilitate for a full 15,000 users to access IM, Presence, Conferencing and even Voice providing that the appropriate load balancing is in place and the necessary supporting OCS servers are available in both datacenters (i.e. Mediation, Reverse Proxy, Media Gateway).  There are supported SQL Server replication methods that can be utilized to achieve database replication across sites.  Same for load balancer that have survivability across sites.  So why are we burdened with having to deploy twice the server count, redundant standby pools, backup and restore scripts that all amount to potentially hours of downtime in the event of a unplanned outage at the primary datacenter?   Exchange Server and SQL server can handle being spanned across multiple datacenters so why can’t OCS which is now in its 3 major revision?  I’ve had more than one major customer ask the same question and I don’t really have a good answer for them other than “it’s not supported…”

     

    I’m hoping to create some intelligent discussion around this topic and welcome everyone including MS product group folks to chime in to discuss the reasons why we can’t offer this solution to customers today and what is being done to address this situation in future.  I truly think OCS 2007 is an incredible product but leaves much to be desired when it comes to high availability across multiple sites.

     

    Regards,
    Dino

     

    Wednesday, August 13, 2008 3:43 AM
  • Dino,

    Agree totally with your post above. I work mainly as an Exchange consultant and the current project here is to deliver a shiny new EX2K7 Org using CCR and SCR to provide resilience plus site failover. OCS was chucked at me last minute, so was kinda surprised that the more I read on the topic, the less certain I was that I could go for what seemed the simplest option - "
    Consolidated Enterprise Topology using four front-end servers spanned across two physical sites in a single pool. ". Deploying twice the server count doesn't add up for me either. Like you, would love to hear from MS themselves on this, as spinning up a standby servers looks clunky at best when compared to features in EX2K7.

    thanks for your post

    Andy
    Wednesday, August 13, 2008 8:19 AM
  • As I understand it the main issue is testing resources in the product group - they have a ton of scenarios to test with very limited resources, so they test the scenarios that are more likely to apply to the largest segment of customers.  If it's not tested then it's not supported.  In my opinion there are 3 key scenarios that they should test for supportability - geographically dispersed pool servers, virtualization of non-voice roles (especially the ancillary roles such as CWA, device update, QMS, etc.), and warm standby backup scenarios (I'll spare you the details on my design options for this item).

     

    In any case, the support stance for an Enterprise pool is that the pool servers and SQL must be on the same LAN.  If you have a LAN that spans geographic locations, then you are good to go.

     

    Wednesday, August 13, 2008 1:37 PM
    Moderator
  • Thanks for your reply and let’s hope that your 3 scenarios are at top of the development team's list for supportability!  With these scenario's I also think a good white paper detailing the minimum requirements for things like latency should be outlined (much like was done when MS started to support geo-clustering of Exchange servers across sites)

     

    I hoping someone from MS chimes in about the supportability of a single pool across a spanned subnet as I've been told by MS folks in the past on more than a single occasion that this was explicitly not supported since the pool members were not physically in the same datacenter.  I personally think this is short sited given that from a network perspective whether they are located beside each other or 40KM away from each other is irrelevant providing that latency isn't an issue.  I guess it would be nice to know officially know what the minimum latency should be to support this configuration.  I personally have spoken with three major financial institutions that either delaying or seriously rethinking deploying OCS since there isn't a fully supported geo-dispersed HA/DR enterprise deployment model. 

     

    It sounds like you are confident that it is supported providing the subnet is spanned and you meet all the other requirements.  Could you share more info on whether anyone is actually doing this and what the kind of setup they used as well as latency between the datacenters?  In my opinion, if the customer datacenters met the certification specifications for Exchange geo-clustering or CCR then spanning front-end servers and a SQL back-end cluster for OCS should also work.

    Dino

    Wednesday, August 13, 2008 4:41 PM
  • The deployments I've done with this topology always had latency under 25ms.  The topology is always the same - SQL cluster with nodes in each site and one or more front ends in each site.
    Thursday, August 14, 2008 2:36 PM
    Moderator