locked
TLS handshake fails, can't validate any OCS Front-End or Web Conference services. RRS feed

  • Question

  • I've searched high and I've searched low. I think I've read every single thread on the Internet with this error. I have a case open with PSS (Office Comm Server team) & sub-case opened with the PKI Team. So far nobody can figure this out in about 12-15 hours of troubleshooting, plus hours upon hours of my own time. It all seems to point to a certificate error, but everyone who's looked at the certificate can't find a problem yet.

     

    For any MSFT employees reading, SRX080717601250 and SRX080717601250-1.

     

    Here is a run down on the situation....

    • Consolidated deployment of OCS 2007 Enterprise on a single server.
    • Backend is a SQL 2005 Cluster
    • No load balancer.
    • Autoconfig SRV records are all configured in DNS.
    • Per PSS's suggestion, we added a second IP to the NIC. DNS records point to IP1 for ocs pool name and IP2 for the servername itself.
    • NSLOOKUPs work fine. Using netmon to watch a client do it automatically shows all successes as well.
    • Communicator clients attempting to log in with TLS (manual or auto) fail.
    • Communicator clients attemtping to log in with TCP work perfectly.
    • Web Components Validation completes successfully.
    • A/V Conferencing Validation completes successfully.
    • Web Conferencing Validation fails.
      • TLS handshake failed: IPADDRESS:8057 Error Code: 0x80090308 outgoing TLS negotiation failed; HRESULT=-2146893048
    • Front-End Validation fails.
      • TLS handshake failed: IPADDRESS:5061 Error Code: 0x80090308 outgoing TLS negotiation failed; HRESULT=-2146893048
      • User 1 Kerberos: Failed to send SIP request: outgoing TLS negotiation failed; HRESULT=-2146893048
      • User 1 NTLM: Failed to send SIP request: outgoing TLS negotiation failed; HRESULT=-2146893048
      • User 2 Kerberos: Failed to send SIP request: Remote disconnected while incoming tls negotation was in progress
      • User 2 NTLM: Failed to send SIP request: outgoing TLS negotiation failed; HRESULT=-2146893048
    • Going to any https:// addresss on the pool works fine and there are no errors in the certificate chain shown. I assume CRL looking is working fine from this. I even imported our CRL manually to be safe.
    • Issued cert (From our internal 2003 Enterprise CA) has pool has
      • Subject = ocs pool fqdn
      • SAN = ocs pool fqdn, ocs server fqdn, sipdomain1 fqdn, sipdomain2 fqdn
      • Server & Client Authentication (Have tried server auth only, that doesn't work either.)
    • All services start fine.

    The only error during startup is this...

     

    Event Type: Error
    Event Source: OCS MCU Infrastructure
    Event Category: (1022)
    Event ID: 61013
    Date:  7/22/2008
    Time:  7:17:43 PM
    User:  N/A
    Description:
    The process DataMCUSvc(2772) failed to send health notifications to the MCU factory at https://ocspool.f.q.d.n:444/LiveServer/MCUFactory/.
    Failure occurrences: 5, since 7/22/2008 7:16:43 PM.

     

    I've not found anything useful on the 'net for dealing with that. One blog suggestion I checked out and most usual (and only given) cause was not existent on my server. If I go to the URL manually from the OCS server itself I get the window;

     

    Choose a digital certificate

     

    Identification: The website you want to view requests identification. Please choose a certificate.

     

    Ok thats fine, but the window below is empty and there is no cert to select.

     

     

    Umm... help? Smile Thank you.

     

     

    Tuesday, July 22, 2008 11:57 PM

All replies

  • I forgot the client info, sorry about that. This is the trace from a client trying TLS.

     

     

    07/22/2008|20:02:59.606 1068:BD4 TRACE :: Async work item posted for TLS negotiation: this 031AAEA0
    07/22/2008|20:02:59.608 1068:BD4 TRACE :: ASYNC_SOCKET:: SendOrQueueIfSendIsBlocking sending sendBuffer 00B6BFB8, this 031AAEA0
    07/22/2008|20:02:59.608 1068:BD4 TRACE :: ASYNC_SOCKET:: SendHelperFn sendBuffer 00B6BFB8 sent, this 031AAEA0
    07/22/2008|20:02:59.686 1068:BD4 ERROR :: recv failed 0x2746
    07/22/2008|20:02:59.686 1068:BD4 ERROR :: OnRecvComplete Error: 0x80072746 BytesRcvd: 0
    07/22/2008|20:02:59.686 1068:BD4 ERROR :: ASYNC_SOCKET:: OnError (0x80072746) - enter
    07/22/2008|20:02:59.686 1068:BD4 ERROR :: ASYNC_SOCKET:: OnConnectError (0x80072746) - enter
    07/22/2008|20:02:59.690 1068:BD4 TRACE :: SIP_MSG_PROCESSOR:: OnRequestSocketConnectComplete - Enter this: 00AE2F60, callid=(null), ErrorCode: 0x80072746
    07/22/2008|20:02:59.690 1068:BD4 ERROR :: Releasing socket and notifying transactions
    07/22/2008|20:02:59.690 1068:BD4 ERROR :: SIP_MSG_PROCESSOR::N otifyRequestSocketConnectComplete - Error: 80072746
    07/22/2008|20:02:59.690 1068:BD4 ERROR :: OUTGOING_TRANSACTION:: OnRequestSocketConnectComplete - connection failed error 80072746
    07/22/2008|20:02:59.690 1068:BD4 TRACE :: CUccServerEndpoint:: UpdateEndpointState - Update state from 1 to 0. Status 80072746. Status text (null).
    07/22/2008|20:02:59.690 1068:BD4 INFO  :: Function: CUccServiceOperationManager:: DisableServManager
    07/22/2008|20:02:59.690 1068:BD4 ERROR :: Condition failed with 80ee0061: 'm_fServMgrEnabled'
    07/22/2008|20:02:59.690 1068:BD4 INFO  :: Function: CUccServerEndpoint:: UpdateEndpointState
    07/22/2008|20:02:59.690 1068:BD4 ERROR :: HRESULT API failed: 80ee0061 = hr. DisableServManager
    07/22/2008|20:02:59.690 1068:BD4 INFO  :: Function: CUccOperationProgressEvent::get_StatusText
    07/22/2008|20:02:59.690 1068:BD4 ERROR :: Condition failed with 00000001: 'm_swszText != 0'
    07/22/2008|20:02:59.731 1068:BD4 TRACE :: SIP_STACK:: DeleteProviderProfile freed profile at index 0

     

    From lcserror.exe: 0x80072746 = (System) An existing connection was forcibly closed by the remote host.

    Wednesday, July 23, 2008 12:07 AM
  •  

    Oh god.... after all of this I think I just fixed it.

     

    We have an Offline Root Server 2003 Enterprise Standalone CA and then an Online Server 2003 Enterprise Intermediate CA that issues all of our internal certs. I tried for the *** of it issuing a cert directly from the offline Root. Once I did this, bang... TLS worked, all validations passed, and Communicator logged on with TLS without an issue.

     

    What in the world is different with how the Intermediate issued the other certs? I used the same OCS certificate wizard except for this one I generated a request file and used the /certsrv website to reqest the cert.

     

    Ok... I'm going to relay this to PSS to see if they can find root cause.

    Wednesday, July 23, 2008 3:24 AM
  •  

     

    I am having a similar problem.  Fresh install of OCS Standard was working fine for 2 days.  Now after the latest rounds of MS patches, I am receiving a flood of MCU errors.  All services start EXCEPT THE FRONT END (RTCSRV)

     

    I also have the "Choose a digital certificate" when navigating to one of the https MCU hyperlinks.  I am not able to select a certificate?!?!?! (Again, this WAS working fine 2 days ago).

     

    Below is another post I created:

    http://forums.microsoft.com/unifiedcommunications/ShowPost.aspx?PostID=1967598&SiteID=57

     

    Our internal 2 tier PKI (offline standalone root + enterprise sub) have the all issuance policy set.  So, there shouldn't be conflicting policies.

     

    HELP!

     

     

    Friday, August 1, 2008 4:15 PM
  •  

    Update:

     

    I do not know the true cause to this problem or the true solution to this problem.  However, I fixed our issue.

     

    First, I deactivated the OCS SE services (in particular order below)

    A/V

    Web Conf

    Web Components

    Front End (make sure your users are not enable or associated to the OCS SE pool in AD.  If so, the wizard will fail or you will be end forcing the wizard to complete)

     

    Second, uninstall OCS SE services (add/remove programs)

     

    A/V

    Web Conf

    Standard Edition

     

    Third, wait for AD to replicate the changes (typically 15-60mins).

     

    Fourth, reinstall OCS SE.

     

    Seems to work now...

    Friday, August 1, 2008 7:18 PM