• Welcome, Guest. Please login.
 
Dec 05, 2019, 12:30:57 pm

News:

March 31, 2019 - Forum patched to version 2.1 RC2 | RSS2.0 Feed

Pelicar Game Website


Network Issues...

Started by Head, Jul 12, 2008, 04:26:55 pm

Previous topic - Next topic

Head

Webhost is having network issues.  Something also caused server to be rebooted at 10:15am or so.  Things are still a bit wacky.  Tracerts to the server are all over the place so there may be some Level 3 backbone or DNS resolving issues.

Time to watch  :car12_animated:
"Drawing on my fine command of the English language, I said nothing." - Robert Benchley
Twitter: @mrheadrick

Head

Here's what happened on Saturday:

QuoteGreetings to all,

I apologize for the delay in getting this up, per the power issues we ran into May 23 2008 where we mentioned that things were being done to be pro-active on ensuring no such outages occurred again, we had scheduled a planned UPS / Battery maintenance to replace / add some things to our systems. This was scheduled to be done Saturday July 12 2008 between 6AM EST and 10AM EST. There was no public announcement made about this because this maintenance was not to effect any clients or internal systems. However just in case, we planned it on a weekend during early morning hours in case something did go wrong we would be on standby and being the weekend the least possible impact it would have on clients.

Shortly before 10AM est when the maintenance was about to be completed there was what we thought at the time a DNS network wide issue which we thought effected some servers in the data center. We noticed the call volume increase and immediately began to investigate the issue. We advised staff taking calls to let everyone know it was an intermittent connection issue as we didn't know what the real cause was just yet. Shortly after that we noticed that the EATON UPS maintenance staff had completed the maintenance and attempted to move our entire DC Floor load back to the UPS. The whole data center runs off 3 phases, when the Eaton UPS staff switched the load to the UPS one of the 3 phases failed to fully switch and caused over 33% of the servers in the data center power cycle. It was so sudden and so fast we initially thought it was a DNS issue and thus why it was said initially as a possible network issue when in reality it was one of the 3 phases failing to fully switch over to the UPS. During this time there was about 2-3 intermittent loss of power to the 3rd phase of the load thus why some of you saw your servers go on and off more then once. We immediately went to the Eaton Power UPS staff and advised what happened as this maintenance was not to suppose cause any service interruption to any clients. One of the tasks of the maintenance was to replace the capacitors in the system. The UPS system has over 50 of these capacitors which provide power via 3 electrical phases to the entire NOC, he immediately admitted that one of those capacitors was not wired correctly from the factory thus why 33% of the DC load power cycled on and off during attempts to move the load to the UPS as it shorted it out the phase.

They immediately corrected it and shortly after that the entire NOC power load was fully running on the UPS. A very very mis fortunate situation and we have scheduled a call with EATON today to get an explanation from them of how something this important was not wired correctly from factory. Eaton is a public company http://www.eaton.com/ and we are shocked how this happened and are determined to get to the bottom of it. We had an entire team of 25 staff on standby in case anything went wrong and immediately got 98% of systems online within 1-2 hours.

The situation is still unacceptable under no circumstances but out of our control. Our culture is built upon integrity and everything stated is 100% true. The only good news that can come out of this is that all the needed maintenance/changes/upgrades to our power systems is completed 100% and we are looking forward to the next 5 years ++ of uptime
We take downtime seriously and have acted pro-actively to ensure you get 100% uptime each and every single month. Services have remained 100% since then and are expected to remain 100% from here on out.

If there is anything we can do for you please just reach out to us and we will make it possible. We are determined to make your business relationship with us something you can count on with peace of mind. We put our entire soul and mind into this company with our entire 100+ staff team with a desire to give you the best level of service and support. We are at your disposal at anytime.
__________________
Emmanuel :: Surpass Hosting Network Admin
http://www.SurpassHosting.com
"Drawing on my fine command of the English language, I said nothing." - Robert Benchley
Twitter: @mrheadrick

Lew

 :eh:

I'm sure they would understand a deep discussion of the the physics of Pelicar, just as well as I understood this.  Summery of the letter:  It was fucked up, we fixed it.  If it happens again, we'll fix it again.  In the meantime get over it, shit happens.
So many subplots

Cope

There sure is a lot of acronyms in that mess...

Maybe that's what we need in Pelicar....acronyms.
We cannot banish dangers, but we can banish fears.  We must not demean life by standing in awe of death.

Lew

PS, MP, TTU's, Pink Eye's Clandosomethingerother.  We're getting there.
So many subplots

Dj

ClandOoZ...damn, it's not easy squeezing a eight-syllable title into a simple slimy-sounding abbreviation.
Thank you Mario! But our Princess is in another castle!