Author Topic: Backup sip server killing primary sip service on initialization.  (Read 3381 times)

SoObviouslyNoob

  • Guest
Backup sip server killing primary sip service on initialization.
« on: September 17, 2009, 10:35:45 PM »
Wondering if anyone has experienced a similar problem.

The HA ability of the primary and back-up servers works great in a controlled environment. ie; if I stop and start the primary or backup servers through SCI the servers will swap back and forth with no issue.

The problem seems to happen only when an unexpected outage occurs to ONE of the backup or primary servers. ie; if a network connection is lost to the server or if the Server shutdowns unnexpectedly.

The important part to note is the Primary server will continue to operate fine until the other server returns to service. When the failed server reconnects and the apps restart the Primary sip service will appear to continue running but calls will begin to fail. Fast busy tones are received on outbound calls, sometimes the SIP phones will turn offline, calling the route points will give dead air or fast busy tones.

There are two instances where this happens, one the primary server is lost. The back-up server than kicks in and assumes the primary role. Calls are still routing at this point. When the original primary server (the one that was lost) starts up it initializes as the back-up server. This effectively kills the sip service currently running on the back-up.

The other scenario is more straight forward. The back-up server drops. The primary continues to handle calls. When the back-up re-initializes as the back-up, the calls begin to fail.

Any ideas of what could be causing this?

I've done lots of t/s on this so fire away with the questions please!


Offline René

  • Administrator
  • Hero Member
  • *****
  • Posts: 1832
  • Karma: 62
Re: Backup sip server killing primary sip service on initialization.
« Reply #1 on: September 20, 2009, 08:42:31 PM »
  • Best Answer
  • Hi,

    You haven't written a lot about your deployment (SIP Server version? Operating System? etc.) so I'll be just guessing... It seems to me the issue is somehow related to configuration of virtual network interface. Have you checked network interface on host where primary SIP Server runs remains active after backup host is rebooted or network is re-connected?

    R.


    soobviouslynoob

    • Guest
    Re: Backup sip server killing primary sip service on initialization.
    « Reply #2 on: September 24, 2009, 02:03:00 PM »
  • Best Answer
  • I've narrowed the issue down. The problem is actually with our NLB management. When the server shutdowns unexpectedly it restarts and the NLB is issued a start command which sends the load to the back-up server. I'll have to go through and see if I can find a way to change the start-up process for the NLB manager.

    Offline victor

    • Administrator
    • Hero Member
    • *****
    • Posts: 1416
    • Karma: 18
    Re: Backup sip server killing primary sip service on initialization.
    « Reply #3 on: September 28, 2009, 06:24:44 PM »
  • Best Answer
  • We are using SIP Server with HA, and so far did not have a similar problem. For NLB, we are using Windows NLB with custom script to switch between machines.

    As you have pointed out, the switching is a bit tricky, because there are quite a lot of different scenarios you have to consider. At the end, we decided to have two different systems: hardware failures - NLB decides how to behave; SIP Server failure - SCS mandated switch-over.

    My biggest beef is the switch-over times and lack of ability to fully control the existing calls once it is switched. If you are in a multi-site environment, it becomes a real problem, especially when switchover occurs during ringing... We were told that it was fixed in 8.1 (!!!!) version...go figure.

    SoOBviouslyNoob

    • Guest
    Re: Backup sip server killing primary sip service on initialization.
    « Reply #4 on: September 29, 2009, 02:51:27 PM »
  • Best Answer
  • Hi Victor,

    Funny I was just reading an old post of yours from February regarding some of the struggles you were having with SIP servers/NLB/HA.

    You mentioned that you haven't experienced the same problem as myself.

    Our scripts are configured as directed from the Genesys white paper. They are pretty simple and are set-up to enable cluster 2 and disable cluster 1 if the Primary SIP server fails. The reverse is true if the Back-up Sip server is running as Primary at the time of failure.

    The one configuration I noticed that may be off is that withing the NLB manager the "initial host state" is set-up as "Started". From the nlb documentation that I have, it says initiating the start command enables all ports that previously would have been disabled. Is your initial host state different than mine? Are your scripts (assuming you have 4 scripts) more robust than just simple enable and disable commands?

    Thanks!

    SoObviouslyNoob

    • Guest
    bump for response from victor.