Author Topic: Netscaler hard drive error, booting problem, please help.. Thanks.. : )  (Read 2576 times)

Offline ohmhyde

  • VIP Member
  • ***
  • Posts: 28
  • Karma: 4
Hi  everyone. this is my first post here.
I have serious problem about booting the box could anyone please advise me what exactly is happening?

During office hour with heavily production traffic, the system just seems to be disappear, like there were no box out there.(the box is at co-location IDC). can't connect to any IPs at the box.
So I went to check infront of the box, the power is on but the LCD display is off ( dark screen),
normally it should display status right?

So I try rebooting the box,
during rebooting..
It shows this error messages..
///////////////////////////////////////////////////////  error message from cli ///////////////////////////////////
d2s1e: hard error reading fsbn 34997504 of 34997472-34997599 (ad2s1 bn 34997504; cn 2178 tn 125 sn 59) status=51 error=40

** /dev/ad2s1e
** Last Mounted on /var
** Phase 1 - Check Blocks and Sizes
ad2s1e: hard error reading fsbn 34997504 of 34997472-34997599 (ad2s1 bn 34997504; cn 2178 tn 125 sn 59) status=51 error=40


CONTINUE? yes

ad2s1e: hard error reading fsbn 34997516 (ad2s1 bn 34997516; cn 2178 tn 126 sn 8) status=51 error=40
ad2s1e: hard error reading fsbn 34997517 (ad2s1 bn 34997517; cn 2178 tn 126 sn 9) status=51 error=40
ad2s1e: hard error reading fsbn 34997518 (ad2s1 bn 34997518; cn 2178 tn 126 sn 10) status=51 error=40
ad2s1e: hard error reading fsbn 34997520 (ad2s1 bn 34997520; cn 2178 tn 126 sn 12) status=51 error=40
THE FOLLOWING DISK SECTORS COULD NOT BE READ: 34997516, 34997517, 34997518, 34997520,

///////////////////////////////////////////////////////  error message from cli ///////////////////////////////////

which, if I wait after this, sometimes it can finished booting, sometime can't .
And if it can boot, it will take very long time.

And one thing is after finished booting, the box show that there were no license at all, every features are all disable.
( the cnas_ns.lic file is still in /nsconfig/license)
this customer just bought Netscaler for about 4 month with NS9010c Platinum Edition.
So it shouldn't expired right?   

this problem has happend before and then we do RMA and returned the box, but when it came back, I am not sure that Citrix replace the box with 100% new one or just repaired the box.
It work again for only 2-3 month and  crash again, same problem.
 ???

The box is NS9010c Platinum Edition  8.1  build 63.7
We alreay sent back to Citrix again, still waiting.
So just want to know if anyone has met this problem before? the cause or how to avoid this problem?

this customer is a stock trading website, so it's very serious if it goes down again.
Then I'll get phone call at 2AM again !! Arrrgh.. 
Thanks very much to everyone.   :)









Offline Marco Schirrmeister

  • Hero Member
  • *****
  • Posts: 101
  • Karma: 14
If you check the serial number after you got a new box. I think you can be sure if they replaced the whole box.

But the other thing is. If that is a really so important site, then you should install an HA pair. If one crashes, the other one will take over immeditatelly.
The failover is so fast, you don't notice it.


Marco

Offline ZManGT

  • VIP Member
  • ***
  • Posts: 94
  • Karma: 12
I would call Citrix and explain the situation. I ran into the same problem with 4 units in a row. They finally got me a good unit but they need to add an option for RAID hard drives or 2 hard drives. Something to add some redundancy.

Offline Marco Schirrmeister

  • Hero Member
  • *****
  • Posts: 101
  • Karma: 14
Wow, 4 units is really a lot.

I'm lucky that I had never a problem like this until now with all of my old 9800 devices. (From the good old NetScaler Inc days)

The only problem was after shipping two device from the US to Germany that one device was not starting up. Just heard a endless beeping.
But Citrix was really quick on that. 2 days later I had a new device shipped from the US to me. And that's only because they don't have this old legacy models in a europe storage.

I'm excited how it will go with hardware issues with the newer models in the near future.

Offline evildani

  • Administrator
  • Hero Member
  • *****
  • Posts: 389
  • Karma: 22
Funny....

I have a client which I personally replaced two NS 9010c in two months... so expect the following:
If the drive fails during operation the NS will continue to operate normally, you will get a syslog and nslog alert, thats it.
If the drive fails on a reboot, the license process will not kick in, and you will see all licenses files as not installed, the NS will not be operational and the FAILOVER will occur.

The process you have to follow is boot in single user mode; when you start the NS, it asks for a few seconds if you want to do something, like press Ctr-X, then type "boot -s"
Once in the CLI go to shell and do a fdisk /dev/....
That will attempt to fix the drive...

If it happens again ask for a RMA...

Daniel

Offline ohmhyde

  • VIP Member
  • ***
  • Posts: 28
  • Karma: 4
Thanks very much....
mschirrmeister     ZManGT   evildani  and later post.
this board helps me alots....   : )

Umm yes, for Stock web site HA is the best,but this customer's budget can buy only 2 box,
1 at each site.So  I should do GSLB instead right??


evildani
Umm, Yes it should continue working right? because everything is in the /flash.
this has happend once before, same problem,  Just boom!!  the box is gone..
No blinging LCD, all IPs at the box just disappear, all gone.
When I received the box back from RMA that time, I tested rebooting many times,
But I still saw some strange message during boot, but it can finished booting and worked ok that time.
(my bad too)
compare to normal box , the booting process is much smoother..

Another box that came together works fine since installed..

OK, next time I will double check when booting more carefully.
and check the S/N and system id too.

Thanks everyone... ;D













Offline Marco Schirrmeister

  • Hero Member
  • *****
  • Posts: 101
  • Karma: 14
Your customer should try to explain to his finance guy how much money he will loose if his stock site is down for some ours or more then a day.
After that the will maybe open the jewel case. ;-)

I would definitely recommend GSLB if you all your backend infrastructure is capable to do this. I mean, if you can make all the backend stuff, Storage, Databases, whatever you have active/active on both sites then you should think about this.

With active/active I mean no big delay in replicating all the data to the second location and make them available for use after a failover. I think it also depends on how long your failover window can be.

You should also keep in mind that the users maybe have to close their browsers once a failover has occurred. This damn browser have their own caches for resolved IP addresses. A Browser caches it longer even if you set the TTL in DNS to 5 or 10 seconds.

GSLB is really nice and we use it a lot. But you should think about it before put an application to production. It's also different from case to case or application to application.
Maybe you want to read the GSLB Page of Shame about some comments to GSLB: ;-) http://www.tenereillo.com/GSLBPageOfShame.htm

Marco

Offline ohmhyde

  • VIP Member
  • ***
  • Posts: 28
  • Karma: 4
Thanks Marco.
Yeah that day cuatomer lost 30-40% margin from Stock trading  .. Arrgh!!

Umm GSLB has to effect lots of their productions.
Now the two sites have same infra and application but different domains.
like      http://trade1.abc.com    http://trade2.abc.com    ...  so on
They have to make these 2 site the same right?
So GSLB can give the IPs to the VIP at the site which is alive and has best performance.
Ok I'll go read the link you sent me.
Maybe not now for GSLB  ha ha..    : )

Offline Paul B

  • Hero Member
  • *****
  • Posts: 193
  • Karma: 20
Thanks Marco.
Yeah that day cuatomer lost 30-40% margin from Stock trading  .. Arrgh!!

It amazes me how stupid some customers can be. They loose loads of $$, you give them a solution that will guarantee it won't happen again, at a price LESS than the loss, and they still won't do it.

It's kinda like the person who buys an expensive car, with third-party-only insurance.

They must just have too much money, and like wasting it!