none
500 errors back (again) RRS feed

Answers

  • We pulled this server out of rotation for now while debugging other issues on the server farm.
    STO Forums Test Lead
    Friday, July 10, 2009 10:54 AM
  • Posting back on this.

    We've identified an issue last week:

    1) 1 - 2 servers are having issues, hence we removed them from the server cluster
    2) We had a memory leak issue and fixed the problem on 7/10 (Friday Afternoon - PST).

    If you're still seeing, intermittent issues, feel free to post back.

    Thanks.
    STO Forums Test Lead
    Monday, July 13, 2009 5:31 PM

All replies

  • We've seen a number of 500 errors today during the release. This was resolved by  cache refreshes and server rotations

    This week, folks have been reporting sporadic 500 errors as you've mentioned and we found 2 to 3 front end servers having issues. We took those servers out and haven't seen reports from other folks since then.

    Our support team is monitoring this and have been working close with our operations team to ensure up time reliability.

    If you're still having issues (remainder of the week), post back and we'll look into the servers.

    I've been connecting to Forums from home (today and yesterday) and haven't encountered 500 errors (so far). I'm also connecting straight to my ISP provider and not using any RAS to other services.

    Thanks
    STO Forums Test Lead
    Friday, July 10, 2009 3:55 AM
  • This is another V2 problem that's making a come-back.  There's been an increasing number of server failures as of late.  Smells like the servers are starting to age.  There are roughly 48 servers that can generate an HTTP response for an individual page request, you'd typically get 6 or so to complete a page.  Which 6 servers out of those 48 supplies the responses is random.  Problems start when one of those servers starts dying.

    This kind of setup is common in web farms.  They have a sub-system in place that detects a server misbehaving and automatically takes it out of rotation.  Pretty important since the hardware setup reduces the MTBF by the number of servers.

    Microsoft doesn't have any, they have no way to discover a failing server.  We have to tell them.  That requires a tool that records the HTTP response header.  I use FireBug, Microsoft likes Fiddler as it works with IE.  When you see a 500 response, look at the response header.  The server name is listed in the Server header.  For example:

      Server: Microsoft-IIS/7.0, CO1VB46

    CO1VB46 is what they need to know.  Send it to them through the fissues email alias.  Alicia usually sends it on to Ops.  Do make sure that you saw at least several 500 responses from that server, the error has many other possible causes.  Also beware that by the time you get fed-up enough to do this, there is usually more than one failing server.

    Don: one action item.  I'm seeing IIS6 servers that don't give their name.

     
    Hans Passant.
    Friday, July 10, 2009 10:17 AM
  • I'm seeing this problem too, lots of 500s.  They all come from CO1VB47.  I've sent the alert.

    Hans Passant.
    Friday, July 10, 2009 10:43 AM
  • Got your alert Hans. I've reported this problem. We're actually debugging an issue regarding servers pegging memory space (leak). Not only on the Forums application but other apps as well (Search, Expression Galleries, etc...)

    Operations is aware of the issue and working towards a solution.

    STO Forums Test Lead
    Friday, July 10, 2009 10:48 AM
  • We pulled this server out of rotation for now while debugging other issues on the server farm.
    STO Forums Test Lead
    Friday, July 10, 2009 10:54 AM
  • That worked, 500s have stopped.  Pulling an all-nighter?

    Hans Passant.
    Friday, July 10, 2009 11:35 AM
  • Yup. I'm up again.
    STO Forums Test Lead
    Friday, July 10, 2009 4:11 PM
  • Posting back on this.

    We've identified an issue last week:

    1) 1 - 2 servers are having issues, hence we removed them from the server cluster
    2) We had a memory leak issue and fixed the problem on 7/10 (Friday Afternoon - PST).

    If you're still seeing, intermittent issues, feel free to post back.

    Thanks.
    STO Forums Test Lead
    Monday, July 13, 2009 5:31 PM