Answered by:
Public IM Issue

Question
-
Our federation with Public IMs has been set up by the lcs/ocs provisioning team, but I cannot get internal and public IM clients to talk to each other and get failures in verification logs on Edge server. I am troubleshooting now, but any assistance would be appreciated.
Description:
Neither Communicator client or the AOL/MSN/Yahoo can get prescence or send IMs to each other. Test users here are enabled for Public IM connectivity and we try to send IM to email addresses and screen names tied to the public IM accounts, user@live.com, user@yahoo.com, user@aol.com, etc., that match the screen name of the public accounts, but they are never received.
-
Yahoo and AOL users configure contacts with the "LCS" option with the address being the SIP address which is the same as our email.
-
Our Public DNS records A and SRV records are in place we believe all certs are correct.
-
The external cert contains the SIP domain prefixed with the SIP domains we are using <sip.domain.agency.gov> in the SAN field.
-
Monitoring the edge server shows traffic on tcp port 5061 coming in from Public IM IPs and going out with test attempts.
-
Edge server seems to have no name resolution or connectivity problems, it can establish TCP connections with other machines on the Internet and can be reached on TCP 5061. It can do NSLOOKUPs for everyone.
-
We think we have Public IM enabled everywhere we should, users in AD, Edge Server, and pool.
-
The internal and external name of the Edge server is the same, internally it resolves to an internal IP address. Therefore internal and external certs have same hostname, just in case that could be an issue. We are using the same Verisign cert for both interfaces.
-
We don't have any other external federations set up yet, just the Public IMs at this point.
-
We aren't using a director, no remote access. Also no other services yet, just basic IM.
Edge Server event log shows this entry:
Type: Date: Time: Source: Category: Event Id:
2/22/2008 11:08:28 AM OCS Protocol Stack 1001 14380
Some requests were rejected as they exhausted the Max-Forwards limit.
In the past 0 minutes, the protocol stack rejected 1 requests that were looping and exhausted the Max-Forwards limit. The last such request had the From uri (sip:ashley.smoot@itrd.senate.gov) and the To uri (sip:ashleyitrd@aol.com).
Cause: This usually indicates an incorrect server configuration or a bad routing rule.
Resolution:
None needed unless the number of reported errors is large (> 100). Check whether all server routing rules are properly configured.
On the pool server, running validation tools gets this result for public IM…
****************************************************************************
Attempting to establish SIP dialog from first.last@domain.agency.gov to sip:tasmoot@yahoo.com using pool.agency.us
Failure
[0xC3FC200D] One or more errors were detected
Attempting to establish SIP dialog from first.last@domain.agency.gov to sip:tasmoot@yahoo.com using pool.agency.us Maximum hops: 2
Check two-party IM: Discovered a new SIP server in the path.
Maximum hops: 3
Received a failure SIP response: User sip:tasmoot@yahoo.com @ Server
pool.agency.usReceived a failure SIP response: [
SIP/2.0 408 Request Timeout
FROM: "Smoot, Ashley (ITRD)"<sip:first.last@domain.agency.gov>;tag=6726dd9837f519e6de3;epid=epid01
TO: <sip:tasmoot@yahoo.com>;tag=09BDBCAF7D9B43F8E6712409276BF810
CSEQ: 16 INVITE
CALL-ID: 1d520ac16095477fbd4b38e5814683b9
VIA: SIP/2.0/TLS 172.26.5.87:1170;branch=z9hG4bKfd42b98f;ms-received-port=1170;ms-received-cid=2700
CONTENT-LENGTH: 0
AUTHENTICATION-INFO: NTLM rspauth="01000000000000002EA517D95582176A", srand="E47CFE83", snum="17", opaque="151CA8E2", qop="auth", targetname="ocs1.itrd.ussenate.us", realm="SIP Communications Service"
ms-edge-proxy-message-trust: ms-source-type=EdgeProxyGenerated;ms-ep-fqdn=edge-im.agency.gov;ms-source-verified-user=verified;ms-source-network=federation
ms-diagnostics: 5001;reason="Request Timed-out";source="itrd-im.senate.gov";AppUri="http://www.microsoft.com/LCS/ApiModule"
]
Suggested Resolution: Use the maximum hop count to determine the server that generated this error. For example, if the maximum hop value is 2, then it is likely that this error was generated by a server that is 1 (immediate target) or 2 hops away. Check whether the target user is a valid user and that the target user domain is trusted by the source user's pool. Check the connectivity between the source and target pools.
****************************************************************************
Notes about the validation results,
· The Edge Server and the Pool can route to each other and don't seem to have a name resolution problem all machines can ping by pool name and machine name etc.
· The edge server is listed in all the requisite places in the OCS console to put an edge server. When I launch the validation tool on the Pool front end servers, I see traffic with the Public IMs on the DMZ interface of the Edge server including the certificate handshakes with AOL, but the tests still fail.
· I don't know for sure about the valid target user part, but I have test accounts in the 3 public IMs that are logged in and can see other people. I configure the contacts as userID@yahoo.com, userID@aol.com, and userID@live.com. My test users on Live and Yahoo can exchange messages using the same IDs although the behavior is flaky, numbers of messages disappear or are seriously delayed.
Thanks for any assistance.
Saturday, February 23, 2008 5:08 AM -
Answers
-
After getting Microsoft support involved, it turned out to be MTLS and our certificate's subject name. I was using a DNS alias/CNAME in the subject name of the cert because this was a lab server and I was waiting to use the "real" hostname in the Subject Name of a different certificate until production rollout. Using DNS tricks with CNAMES and A records works fine for the internal pool and all other TLS/MTLS communications on the inside of your network, but not on the outside for the Public Internet Connectivity. The logging tool in the resource kit is very useful in pointing this out, I stongly recommend installing that on all OCS servers.
The external PIC configuration requires that the Subject Name of the certificate exactly match the A record in DNS that is registered with Microsoft as your IM Proxy --which is your OCS Edge server.
Wednesday, March 12, 2008 2:45 PM
All replies
-
Hi,
I haven't played with Public IM yet, but I do see a possible issue with your config:
"The internal and external name of the Edge server is the same, internally it resolves to an internal IP address. Therefore internal and external certs have same hostname, just in case that could be an issue. We are using the same Verisign cert for both interfaces."This will cause issues with the front end server because if it is attempting to resolve the internal hostname, and that hostname is associated with two IP addresses the front end server could be attempting to contact the edge server on the external interface instead of the internal interface.You can have one certificate for all of your internal and external interfaces, but each interface must have a unique FQDN for everything to work properly.Cheers,JoeMonday, February 25, 2008 1:48 PM -
Thanks Joe, that is my suspicion.
Just to try and eliminate this as a possible cause, I have already requested a new cert and am changing internal dns so it uses a different name. I will post results later today when I get it in place.
My first thought was that the FE would never get confused because it knows nothing about the external IP address which is only published on the Internet. NSLOOKUPs, pings, and telnet to 5061 from the FE to Edge only return the correct IP whether when using the Edge dns name, so I thought this configuration would work. Perhaps there is something in the MTLS chain that doesn't like this configuration. I couldn't find anything in the MSFT whitepapers to rule it out, so I tried it. I will post back when I try something different.
Regards,
Ash
Tuesday, February 26, 2008 5:23 PM -
Well.... I got a new cert for the Edge server's internal interface with different FQDN for the inside and changed the internal DNS records. Deactivated the Edge server role and reinstalled using the new internal name. Unfortunately, no change in the results. Logs are still the same as above.
Also, one note, the FE cannot resolve or route to the external FQDN of the Edge, so I don't think we had a name resolution or routing issue, all the packets seem to be going to the right places.
Still trying...
Wednesday, February 27, 2008 4:02 AM -
I have exactly the same issue, so would be interested if your find a fix. I also have a test OCS server home and federation is working fine. I've been trying to find someone else on a known working OCS server to federate with
But all Public IM gives the same error :
ms-diagnostics: 5001;reason="Request Timed-out";source="itrd-im.senate.gov";AppUri="http://www.microsoft.com/LCS/ApiModule"Thursday, February 28, 2008 9:20 PM -
After getting Microsoft support involved, it turned out to be MTLS and our certificate's subject name. I was using a DNS alias/CNAME in the subject name of the cert because this was a lab server and I was waiting to use the "real" hostname in the Subject Name of a different certificate until production rollout. Using DNS tricks with CNAMES and A records works fine for the internal pool and all other TLS/MTLS communications on the inside of your network, but not on the outside for the Public Internet Connectivity. The logging tool in the resource kit is very useful in pointing this out, I stongly recommend installing that on all OCS servers.
The external PIC configuration requires that the Subject Name of the certificate exactly match the A record in DNS that is registered with Microsoft as your IM Proxy --which is your OCS Edge server.
Wednesday, March 12, 2008 2:45 PM