none
Server's System.Net.Websockets.AspNetWebSocket.SendAsync sometimes hangs forever when client sends Ping RRS feed

  • Question

  • BACKGROUND:
    We have a client-server architecture with communication over websockets. Server is written in C# and uses .NET Framework 4.5.2. The production server is hosted as a Microsoft Azure web app.

    We need the full duplex capabilities of websockets. Both the server and the client can at any time decide to send a message to the other party.

    We also use websocket ping/pong to make sure the connection is alive. The client sends a ping every 5 seconds, and the server replies with a pong. The server does not send pings. Note that websocket ping/pong is not the same thing as ICMP ping/pong. See the websocket RFC here: https://tools.ietf.org/html/rfc6455#page-37

    Also note that the client-initiated pings never reach our own C# server code - some earlier/lower layer in the network stack takes care of this and replies with a pong. I'm not sure of the name of that earlier layer - maybe IIS, or the .NET runtime. See illustration 1: (which I can't inline because of the forum software's anti-spam measures)

    http://user.dreamler.com/Content/images/wsbug-illustration1.png

    The relevant part of server code that sends messages looks like this:

    var sendBuffer = new ArraySegment<byte>(Encoding.UTF8.GetBytes(sendString));
    var sendTask = Socket.SendAsync(sendBuffer, WebSocketMessageType.Text, true, _sendToken);
    try
    {
        _sendTokenSource.CancelAfter(10000);
        await sendTask.ConfigureAwait(false);
        _sendTokenSource.CancelAfter(-1);
    }
    catch (OperationCanceledException e)
    {
        ...
    }


    PROBLEM:
    Rarely (a few times per hour) the server-side websocket enters a weird half-broken state, where it can respond to pings, and receive messages, but not send messages. See illustration 2.

    http://user.dreamler.com/Content/images/wsbug-illustration2.png

    SYMPTOMS OF PROBLEM - THE BROKEN HALF:
    The first symptom is that the 'await sendTask' line never finishes, and the client never gets the message.

    During normal operation the 'CancelAfter' works fine, but in this half-broken state, the 'CancelAfter' has no effect - we don't get the 'OperationCanceledException'. If we call '_sendTokenSource.Cancel()' from another thread, it also has no effect. In code:

    try
    {
        _sendTokenSource.CancelAfter(10000);
        // when the problem occurs, we reach this point...
        await sendTask.ConfigureAwait(false);
        // ... but then we never reach this point
        _sendTokenSource.CancelAfter(-1);
    }
    catch (OperationCanceledException e)
    {
        // ... nor this point
    }

    If we then try to 'SendAsync' on the same socket from another thread, it fails with an exception. This is not surprising, because "Exactly one send and one receive is supported on each WebSocket object in parallel." according to the documentation here: https://msdn.microsoft.com/en-us/library/system.net.websockets.websocket.sendasync(v=vs.110).aspx
    and here: https://docs.microsoft.com/en-us/dotnet/api/system.net.websockets.websocket.sendasync?view=netframework-4.5.2#System_Net_WebSockets_WebSocket_SendAsync_System_ArraySegment_System_Byte__System_Net_WebSockets_WebSocketMessageType_System_Boolean_System_Threading_CancellationToken_


    SYMPTOMS OF PROBLEM - THE NON-BROKEN HALF:
    Meanwhile, another thread is happily continuing to receive data from the client using 'ReceiveAsync' as normal. And the client continues to send pings and get back pongs.


    CURE:
    So far, the only cure we have found to get out of this half-broken state is to reconnect from scratch.


    FURTHER CLUES:

    • We can reproduce the problem with a minimal client which only pings, and a minimal server which only calls 'SendAsync' in a loop at regular intervals.
    • If we increase the rate / decrease the interval of client-initiated pings, the problem occurs more often. In the extreme case, if we ping every 0.02 seconds, the problem happens within a few seconds after connecting. There's a clear correlation - the more we ping, the more the problem occurs - but it's still quite random.
    • If we increase the rate of 'SendAsync' calls, the problem happens more often.
    • If we don't ping at all, the problem doesn't seem to happen at all - we've tested it for several hours.
    • If we don't call 'SendAsync' at all, the problem doesn't seem to happen at all.
    • If we don't call 'ReceiveAsync' at all, it makes no difference to the problem.
    • It makes no difference whether the connection is SSL encrypted or not.
    • It makes no difference how many clients are connected. This can happen with only 1 client connected.
    • The problem doesn't just happen on the production server on Azure. I can also reproduce when developing on my local machine, debugging from Visual Studio and IIS Express.
    • No exceptions are thrown, and WebSocket.State stays Open.



    OUR HYPOTHESIS:
    We guess the problem happens if the lower level tries to handle a 'SendAsync' and it also receives a ping at almost the same time. See illustration 3. We tried to investigate this hypothesis further, by trying to reproduce the problem with a single well-timed ping, but such precise timing is hard, so we gave up on that.

    http://user.dreamler.com/Content/images/wsbug-illustration3.png

    Obviously this is just a guess - we have no insight into the inner workings of .NET. But there could be some kind of collision or race condition that causes the thread that called 'SendAsync' to hang forever. On the other hand, I find it hard to believe that such a bug could exist in a framework used by millions...


    WHAT WE WOULD LIKE NOW:

    • Contact with someone (ideally an engineer at Microsoft) who does have deep knowledge of .NET websockets and pingpong.
    • Help figuring out if this is actually a bug in .NET or if we are doing something wrong.
    • Someone trying to reproduce the issue.
    • Relevant quotes from the websocket RFC or the MSDN documentation which we might have missed.
    • Links to an existing bug report that describes the same issue.
    • Links to changelog / release notes of some version where the issue was fixed.


    Oh, and please don't suggest that we should work around the issue by not using websocket pingpong and implementing our own application-level pingpong instead. We have already thought of that. If the bug is in .NET it should be reported, and if the bug is on our end, we would like to find it.

    Friday, November 17, 2017 5:29 PM

All replies

  • Hi Emil Hall, www.dreamler.com,

    Thank you for posting here.

    According to your question is more related to ASP.NET, you could post a new thread in ASP.NET forum for suitable support.

    The CLR Forum discuss and ask questions about .NET Framework Base Classes (BCL) such as Collections, I/O, Regigistry, Globalization, Reflection. Also discuss all the other Microsoft libraries that are built on or extend the .NET Framework, including Managed Extensibility Framework (MEF), Charting Controls, CardSpace, Windows Identity Foundation (WIF), Point of Sale (POS), Transactions. 

    Best Regards,

    Wendy


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    Monday, November 20, 2017 8:18 AM
  • Hi, did you figure something out? I have the same problem in AspNetCore 3.1. SendAsync hangs forever, do not respect cancellation token. WebSocket connection is open forever, even if client died long ago.

    Client logic seems to work thou, it will detect that the "line" is down and crash (probably due to missing ping from server). The server should have responded the same way, a missing pong should have made it crash too.

    Sunday, September 20, 2020 12:29 PM