Web Hosting Talk







View Full Version : Auto Reboot device for freebsd


ClusterMania
04-05-2002, 12:12 AM
Hi, I need a auto reboot device for freebsd machines. I have looked at watchdog cards but non found for freebsd. I need something that's automatic since it's not possible to babysit every server 24 hours a day.

Starhost
04-05-2002, 08:43 AM
How do you mean autoreboot? My server gets rebooted just when it loses power or whatever. I will alwayss reboot automaticly unless the hardware is corrupt or the OS is ****ed up.

kunal
04-05-2002, 02:38 PM
how about writing a lil cron script??

Starhost
04-05-2002, 02:52 PM
Why should you let your server reboot automaticly on certain times? That isn't usefull or am I missing something?

ClusterMania
04-05-2002, 09:32 PM
I want my server to auto reboot or auto power cycle if my server crashes and stops responding.

Blazing
04-05-2002, 10:24 PM
You want remote reboot from telenet/ssh through UPS?

If so, I think you need a APC...not sure on this one

ClusterMania
04-05-2002, 10:52 PM
I know about the Masterswitch Plus. I just need something that's automated. I want it to powercycle my server if it stops responding automaticly.

taz0
04-05-2002, 11:08 PM
Originally posted by ClusterMania
I know about the Masterswitch Plus. I just need something that's automated. I want it to powercycle my server if it stops responding automaticly.

Does this happen often?
Check this: http://www.etinc.com/watchdog.htm

ClusterMania
04-05-2002, 11:11 PM
http://datahawk.verifast.net/mrtg/66.28.252.1_27.html

I have other servers and this is the only one that keeps crashing allot. I need to fix this.


I have looked at watchdog cards myself but need one with

1) PCI
2) Fits In a 1U
3) Supports Freebsd

I have found cards that support 1) 2) but now I am thinking about formatting my harddrive and installing linux since there are hardly any cards for freebsd.

taz0
04-05-2002, 11:13 PM
Originally posted by ClusterMania
http://datahawk.verifast.net/mrtg/66.28.252.1_27.html

I have other servers and this is the only one that keeps crashing allot. I need to fix this.

It's surely a hardware problem. Bad RAM, CPU or motherbord.
I've had FreeBSD servers running for 365+ days.

ClusterMania
04-05-2002, 11:55 PM
I got a pretty powerful machine. I am sure it can handle a high load but don't know what's wrong. Dual P3T with 2 Gigs of ram. It should do allot more than this without crashing =/


last pid: 1050; load averages: 0.05, 0.14, 0.19 up 0+00:39:31 12:40:20
328 processes: 1 running, 327 sleeping
CPU states: 0.4% user, 0.0% nice, 0.6% system, 0.2% interrupt, 98.8% idle
Mem: 155M Active, 60M Inact, 106M Wired, 15M Cache, 141M Buf, 1676M Free
Swap: 2048M Total, 2048M Free

PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU CPU COMMAND
323 root 2 0 8900K 6684K select 0 0:01 0.00% 0.00% httpd
448 apache 2 0 9468K 6848K sbwait 0 0:01 0.00% 0.00% httpd
608 apache 2 0 9448K 6844K sbwait 0 0:01 0.00% 0.00% httpd
570 apache 2 0 9088K 6888K sbwait 0 0:01 0.00% 0.00% httpd
511 apache 2 0 8972K 6772K sbwait 1 0:01 0.00% 0.00% httpd
621 apache 2 0 8964K 6764K sbwait 1 0:01 0.00% 0.00% httpd
459 apache 2 0 8972K 6772K sbwait 1 0:01 0.00% 0.00% httpd
499 apache 2 0 8972K 6780K sbwait 0 0:01 0.00% 0.00% httpd
550 apache 2 0 9484K 6908K sbwait 0 0:01 0.00% 0.00% httpd
514 apache 2 0 8964K 6764K sbwait 1 0:01 0.00% 0.00% httpd
651 apache 2 0 8972K 6772K sbwait 0 0:01 0.00% 0.00% httpd
643 apache 2 0 8972K 6772K sbwait 1 0:01 0.00% 0.00% httpd
463 apache 2 0 9136K 6936K sbwait 0 0:01 0.00% 0.00% httpd
654 apache 2 0 8964K 6764K sbwait 1 0:01 0.00% 0.00% httpd
460 apache 18 0 8964K 6764K lockf 1 0:01 0.00% 0.00% httpd
453 apache 2 0 8964K 6768K sbwait 1 0:01 0.00% 0.00% httpd
537 apache 18 0 8964K 6764K lockf 1 0:01 0.00% 0.00% httpd
645 apache 18 0 8972K 6772K lockf 1 0:01 0.00% 0.00% httpd
509 apache 2 0 10040K 6820K sbwait 1 0:01 0.00% 0.00% httpd
471 apache 2 0 9152K 6952K sbwait 0 0:01 0.00% 0.00% httpd
455 apache 2 0 8964K 6768K sbwait 0 0:01 0.00% 0.00% httpd
631 apache 2 0 8972K 6780K sbwait 0 0:01 0.00% 0.00% httpd
594 apache 18 0 8972K 6772K lockf 1 0:01 0.00% 0.00% httpd
481 apache 18 0 8972K 6768K lockf 1 0:01 0.00% 0.00% httpd
668 apache 2 0 10024K 6836K sbwait 1 0:01 0.00% 0.00% httpd
560 apache 2 0 9072K 6872K sbwait 1 0:01 0.00% 0.00% httpd
632 apache 18 0 8972K 6772K lockf 1 0:01 0.00% 0.00% httpd
461 apache 2 0 8964K 6768K sbwait 0 0:01 0.00% 0.00% httpd
653 apache 2 0 8964K 6764K sbwait 1 0:01 0.00% 0.00% httpd
536 apache 2 0 8972K 6772K sbwait 1 0:01 0.00% 0.00% httpd
564 apache 18 0 8972K 6768K lockf 1 0:01 0.00% 0.00% httpd
467 apache 2 0 8964K 6764K sbwait 0 0:01 0.00% 0.00% httpd
644 apache 2 0 9124K 6924K sbwait 0 0:01 0.00% 0.00% httpd
486 apache 2 0 8964K 6764K sbwait 0 0:01 0.00% 0.00% httpd
658 apache 2 0 8964K 6764K sbwait 0 0:01 0.00% 0.00% httpd
589 apache 18 0 8964K 6772K lockf 1 0:01 0.00% 0.00% httpd
451 apache 18 0 8964K 6764K lockf 1 0:00 0.00% 0.00% httpd
597 apache 2 0 8972K 6772K sbwait 1 0:00 0.00% 0.00% httpd
646 apache 2 0 9072K 6880K sbwait 0 0:00 0.00% 0.00% httpd
660 apache 2 0 9488K 7028K sbwait 0 0:00 0.00% 0.00% httpd
475 apache 2 0 8964K 6764K sbwait 1 0:00 0.00% 0.00% httpd
554 apache 2 0 9064K 6864K sbwait 1 0:00 0.00% 0.00% httpd
490 apache 18 0 8972K 6772K lockf 1 0:00 0.00% 0.00% httpd
457 apache 18 0 8964K 6764K lockf 1 0:00 0.00% 0.00% httpd
527 apache 2 0 9168K 6976K sbwait 0 0:00 0.00% 0.00% httpd
581 apache 18 0 8972K 6776K lockf 1 0:00 0.00% 0.00% httpd
663 apache 2 0 8992K 6792K sbwait 0 0:00 0.00% 0.00% httpd
567 apache 2 0 8972K 6772K sbwait 1 0:00 0.00% 0.00% httpd
562 apache 2 0 9020K 6812K sbwait 0 0:00 0.00% 0.00% httpd
535 apache 2 0 8972K 6772K sbwait 1 0:00 0.00% 0.00% httpd

NyteOwl
04-06-2002, 03:54 AM
There is a PCI Watchdog card at this site:

http://www.berkprod.com/pci_pc_watchdog.htm

There is a driver for the ISA version of this card for BSD at this site:

http://www.cs.ndsu.nodak.edu/~tinguely/


Might be worth dropping the author of the driver a note about possible compatibility with the PCI card.

I've used B&B Electronic's Watchdog cards in the past but they don't seem to have a driver (or driver data to write your own) for *nix.

Good Luck!

Starhost
04-06-2002, 04:51 AM
I also think it is bad RAM. My freeBSD server with bad ram also started to crash without a known reason. Afther running memtest (whih is in the port collection of freeBSD) I knew it was BAD ram.

I replaced the ram and the server is running great since .........

ClusterMania
04-06-2002, 05:32 AM
Originally posted by NyteOwl
There is a PCI Watchdog card at this site:

http://www.berkprod.com/pci_pc_watchdog.htm

There is a driver for the ISA version of this card for BSD at this site:

http://www.cs.ndsu.nodak.edu/~tinguely/


Might be worth dropping the author of the driver a note about possible compatibility with the PCI card.

I've used B&B Electronic's Watchdog cards in the past but they don't seem to have a driver (or driver data to write your own) for *nix.

Good Luck!

I tryed contacting the guy long ago but he only has it working for isa which only the really old machines have. No luck finding one for pci. I guess I should have the memory tested.

Starhost
04-06-2002, 10:06 AM
Could you please post the output of your metest when you run it.

Don't forget to shut it down, because memtest will keep running!!

Mike the newbie
04-06-2002, 10:12 AM
Originally posted by ClusterMania
I want my server to auto reboot or auto power cycle if my server crashes and stops responding.

It may be a better usage of your energies to track down and fix the cause of your server's not responding, rather than to accomodate it.

taz0
04-06-2002, 03:04 PM
You may find this articlle very interesting:
http://www.onlamp.com/pub/a/bsd/2002/03/21/Big_Scary_Daemons.html

ClusterMania
04-07-2002, 12:26 AM
I changed my httpd.conf settings to and my processes doubled. I think my load is fine but my server still keeps crashing.

Timeout 8
KeepAliveTimeout 15
MinSpareServers 200
MaxSpareServers 300
StartServers 200
MaxClients 2048



last pid: 1084; load averages: 1.63, 2.61, 1.47 up 0+00:31:27 14:10:46
447 processes: 2 running, 445 sleeping
CPU states: 1.0% user, 0.0% nice, 3.7% system, 0.2% interrupt, 95.1% idle
Mem: 157M Active, 60M Inact, 101M Wired, 20M Cache, 114M Buf, 1673M Free
Swap: 2048M Total, 2048M Free

PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU CPU COMMAND
678 apache 18 0 8964K 6764K lockf 1 0:00 0.05% 0.05% httpd
987 apache 2 0 8964K 6760K sbwait 0 0:00 0.05% 0.05% httpd
1069 root 28 0 2428K 1616K CPU0 0 0:01 0.00% 0.00% top
342 root 2 0 8900K 6684K select 0 0:01 0.00% 0.00% httpd
480 apache 2 0 10012K 7600K sbwait 0 0:00 0.00% 0.00% httpd
517 apache 18 0 8956K 6760K lockf 1 0:00 0.00% 0.00% httpd
643 apache 18 0 8956K 6756K lockf 1 0:00 0.00% 0.00% httpd
653 apache 18 0 8956K 6756K lockf 1 0:00 0.00% 0.00% httpd
740 apache 2 0 9092K 6892K sbwait 0 0:00 0.00% 0.00% httpd
471 apache 18 0 8964K 6764K lockf 1 0:00 0.00% 0.00% httpd
608 apache 2 0 8956K 6756K sbwait 0 0:00 0.00% 0.00% httpd
672 apache 2 0 9116K 6916K sbwait 1 0:00 0.00% 0.00% httpd
500 apache 2 0 8956K 6760K sbwait 0 0:00 0.00% 0.00% httpd
556 apache 2 0 9084K 6884K sbwait 0 0:00 0.00% 0.00% httpd
561 apache 2 0 10012K 6812K sbwait 0 0:00 0.00% 0.00% httpd
518 apache 18 0 8956K 6756K lockf 1 0:00 0.00% 0.00% httpd
664 apache 18 0 8956K 6756K lockf 1 0:00 0.00% 0.00% httpd
711 apache 18 0 8964K 6772K lockf 1 0:00 0.00% 0.00% httpd
523 apache 18 0 8956K 6756K lockf 1 0:00 0.00% 0.00% httpd
473 apache 2 0 9024K 6816K select 1 0:00 0.00% 0.00% httpd
551 apache 18 0 8956K 6756K lockf 1 0:00 0.00% 0.00% httpd
677 apache 2 0 9476K 6924K sbwait 1 0:00 0.00% 0.00% httpd
590 apache 18 0 8956K 6756K lockf 1 0:00 0.00% 0.00% httpd
467 apache 2 0 8956K 6756K sbwait 1 0:00 0.00% 0.00% httpd
706 apache 2 0 9092K 6888K sbwait 0 0:00 0.00% 0.00% httpd
739 apache 18 0 8964K 6764K lockf 1 0:00 0.00% 0.00% httpd
760 apache 18 0 8964K 6764K lockf 1 0:00 0.00% 0.00% httpd
534 apache 2 0 9132K 6936K RUN 1 0:00 0.00% 0.00% httpd
624 apache 18 0 8956K 6756K lockf 1 0:00 0.00% 0.00% httpd
543 apache 2 0 9084K 6884K select 0 0:00 0.00% 0.00% httpd
559 apache 2 0 8956K 6756K sbwait 1 0:00 0.00% 0.00% httpd
535 apache 18 0 8964K 6764K lockf 1 0:00 0.00% 0.00% httpd
646 apache 18 0 8956K 6756K lockf 1 0:00 0.00% 0.00% httpd
491 apache 2 0 9096K 6896K sbwait 0 0:00 0.00% 0.00% httpd
558 apache 2 0 8956K 6756K sbwait 0 0:00 0.00% 0.00% httpd
589 apache 18 0 8956K 6756K lockf 1 0:00 0.00% 0.00% httpd
699 apache 2 0 8964K 6764K sbwait 0 0:00 0.00% 0.00% httpd
571 apache 2 0 9480K 7228K sbwait 1 0:00 0.00% 0.00% httpd
510 apache 18 0 8956K 6756K lockf 1 0:00 0.00% 0.00% httpd
693 apache 18 0 8964K 6764K lockf 1 0:00 0.00% 0.00% httpd
591 apache 2 0 9476K 7008K sbwait 0 0:00 0.00% 0.00% httpd
726 apache 2 0 8972K 6772K sbwait 0 0:00 0.00% 0.00% httpd
753 apache 2 0 9492K 7260K sbwait 1 0:00 0.00% 0.00% httpd
573 apache 2 0 8956K 6756K sbwait 0 0:00 0.00% 0.00% httpd
636 apache 18 0 8956K 6760K lockf 1 0:00 0.00% 0.00% httpd
578 apache 2 0 9108K 6912K sbwait 0 0:00 0.00% 0.00% httpd
485 apache 18 0 8964K 6760K lockf 1 0:00 0.00% 0.00% httpd
470 apache 18 0 8956K 6760K lockf 1 0:00 0.00% 0.00% httpd
640 apache 18 0 8964K 6764K lockf 1 0:00 0.00% 0.00% httpd
770 apache 2 0 8964K 6764K sbwait 1 0:00 0.00% 0.00% httpd

Starhost
04-07-2002, 06:01 AM
Originally posted by ClusterMania
I changed my httpd.conf settings to and my processes doubled. I think my load is fine but my server still keeps crashing.


As I and others told you it is almost certainly a hardware problem. So run memtest to see if it isn't the ram. Because you didn't do it yet. here the instructions on how to run it:


-login as root.

cd /usr/ports/sysutils/memtest/
make install clean
memtest 100M

(100M means, that memtest is trying to test blocks of 100MB at the time. )


Now just wait and see if everything is going well. Afther a while press ^C so that the programm stops.

I'm almost certain that you will get errors. If so replace the RAM.
suc6.

ClusterMania
04-07-2002, 08:28 AM
Thanks for instructions, it installed ok I think but when I run memtest 100 I get command not found

Starhost
04-07-2002, 08:36 AM
No problem :)


If their are any other problems, please post them. And if it was bad RAM. Also post :)

ClusterMania
04-07-2002, 08:48 AM
=/ wish everything was gui. I am never good at Command line stuff

===> Extracting for memtest-2.93.1
>> Checksum OK for memtester-2.93.1.tar.bz2.
===> memtest-2.93.1 depends on executable: gmake - found
===> Patching for memtest-2.93.1
===> Applying FreeBSD patches for memtest-2.93.1
===> Configuring for memtest-2.93.1
===> Building for memtest-2.93.1
gcc -O -pipe -Wall -g -c -o memtest.o memtest.c
gcc -O -pipe -Wall -g -c -o memtest-tests.o memtest-tests.c
gcc -g -o memtest memtest.o memtest-tests.o
bsd2# make install
===> Installing for memtest-2.93.1
===> memtest-2.93.1 is already installed - perhaps an older version?
If so, you may wish to ``make deinstall'' and install
this port again by ``make reinstall'' to upgrade it properly.
If you really wish to overwrite the old port of memtest-2.93.1
without deleting it first, set the variable "FORCE_PKG_REGISTER"
in your environment or the "make install" command line.
*** Error code 1

Stop in /usr/ports/sysutils/memtest.
*** Error code 1

Stop in /usr/ports/sysutils/memtest.
bsd2# memtest
memtest: Command not found.
bsd2# memtest 100
memtest: Command not found.

ClusterMania
04-07-2002, 09:21 AM
BYTEmark* Native Mode Benchmark ver. 2 (10/95)
Index-split by Andrew D. Balsa (11/97)
Linux/Unix* port by Uwe F. Mayer (12/96,11/97)

TEST : Iterations/sec. : Old Index : New Index
: : Pentium 90* : AMD K6/233*
--------------------:------------------:-------------:------------
top
NUMERIC SORT : 573.85 : 14.72 : 4.83
STRING SORT : 42.422 : 18.96 : 2.93
BITFIELD : 1.7243e+08 : 29.58 : 6.18
FP EMULATION : 37.124 : 17.81 : 4.11
FOURIER : 9404.1 : 10.70 : 6.01
Segmentation fault (core dumped)


I actually got something to work

Starhost
04-07-2002, 09:27 AM
Oke it is already installed. then do this:

cd /usr/ports/sysutils/memtest/
make deinstall


And then the things I posted before.

ClusterMania
04-07-2002, 09:57 AM
Originally posted by Starhost
Oke it is already installed. then do this:

cd /usr/ports/sysutils/memtest/
make deinstall


And then the things I posted before.

bsd2# cd /usr/ports/sysutils/memtest/
bsd2# make deinstall
===> Deinstalling for memtest-2.93.1
bsd2# mtest 100m
multicast membership test program; enter ? for list of commands

?
j g.g.g.g i.i.i.i - join IP multicast group
l g.g.g.g i.i.i.i - leave IP multicast group
a ifname e.e.e.e.e.e - add ether multicast address
d ifname e.e.e.e.e.e - del ether multicast address
m ifname 1/0 - set/clear ether allmulti flag
p ifname 1/0 - set/clear ether promisc flag
q - quit

Starhost
04-07-2002, 10:10 AM
Mmm


What happens when you just type:

memtest 100M


??

Does it start running?

ClusterMania
04-07-2002, 10:15 AM
Ok, I got it running but only on my second server since my first one keeps crashing every 10 minutes. What other ports are good to run and test my system. I am thinking about swapping harddrives with the two servers if this one is fine and have someone trouble shoot the other server.

bsd2# make install
===> Extracting for crashme-2.4
>> Checksum mismatch for crashme.tgz.
Make sure the Makefile and distinfo file (/usr/ports/sysutils/crashme/distinfo)
are up to date. If you are absolutely sure you want to override this
check, type "make NO_CHECKSUM=yes [other args]".
*** Error code 1

How do I update it? Is there a command line that make it auto download the updates? Hmm, thanks to you I am learning allot today.

Oh, yeah it keeps running. Is it suppose to do 100 test?

Run 3:
Test 1: Stuck Address: Testing...Passed.
Test 2: Random value: Setting...Testing...Passed.
Test 3: XOR comparison: Setting...Testing...Passed.
Test 4: SUB comparison: Setting...Testing...Passed.
Test 5: MUL comparison: Setting...Testing...Passed.
Test 6: DIV comparison: Setting...Testing...Passed.
Test 7: OR comparison: Setting...Testing...Passed.
Test 8: AND comparison: Setting...Testing...Passed.
Test 9: Sequential Increment: Setting...Testing...Passed.
Test 10: Solid Bits: Testing...Passed.
Test 11: Block Sequential: Testing...Passed.
Test 12: Checkerboard: Testing...Passed.
Test 13: Bit Spread: Testing...Passed.
Test 14: Bit Flip: Testing...Passed.
Test 15: Walking Ones: Testing...Passed.
Test 16: Walking Zeroes: Testing...Passed.
Run 3 completed in 386 seconds (0 tests showed errors).
Run 4:
Test 1: Stuck Address: Testing...Passed.
Test 2: Random value: Setting...Testing...Passed.
Test 3: XOR comparison: Setting...Testing...Passed.
Test 4: SUB comparison: Setting...Testing...Passed.
Test 5: MUL comparison: Setting...Testing...Passed.
Test 6: DIV comparison: Setting...Testing...Passed.
Test 7: OR comparison: Setting...Testing...Passed.
Test 8: AND comparison: Setting...Testing...Passed.
Test 9: Sequential Increment: Setting...Testing...Passed.
Test 10: Solid Bits: Testing...Passed.
Test 11: Block Sequential: Testing...Passed.
Test 12: Checkerboard: Testing...Passed.
Test 13: Bit Spread: Testing...Passed.
Test 14: Bit Flip: Testing...Passed.
Test 15: Walking Ones: Testing...Passed.
Test 16: Walking Zeroes: Testing...Passed.
Run 4 completed in 387 seconds (0 tests showed errors).

allera
04-07-2002, 10:59 AM
On the machine that keeps crashing, do this:

cd /usr/ports/sysutils/memtest
make deinstall
make install clean

when it's done, log out and log back in, then run

memtest 100M

and it should kick in. Hopefully it won't crash while the test is running. Also, have you _tried_ replacing the RAM? You might want to do that instead of sitting there waiting for all 2GB to test. If it stays up with the new RAM (128MB? whatever), then run the memtest on one GB stick and then on another to see which is the faulty one.

Hope that helps at least a little. :)

ClusterMania
04-07-2002, 11:15 AM
Each Run is for 1 meg of ram or 100 megs? I can let it run while I sleep. I also want to stress test my system but it won't let me get crashme.

ClusterMania
04-07-2002, 11:36 AM
I got my server powercycled and it crashed in 5 minutes. Heres everything it got up to before it crashed


bash-2.05# memtest 100M
memtest v. 2.93.1
(C) 2000 Charles Cazabon <memtest@discworld.dyndns.org>
Original v.1 (C) 1999 Simon Kirby <sim@stormix.com> <sim@neato.org>

Current limits:
RLIMIT_RSS 0xffffffff
RLIMIT_VMEM 0xffffffff
Raising limits...
Allocated 104857600 bytes...trying mlock...success. Starting tests...

Testing 104857600 bytes at 0x0804e000 (0 bytes lost to page alignment).

Run 1:
Test 1: Stuck Address: Testing...Passed.
Test 2: Random value: Setting...Testing...Passed.
Test 3: XOR comparison: Setting...Testing...Passed.
Test 4: SUB comparison: Setting...Testing...Passed.
Test 5: MUL comparison: Setting...Testing...Passed.
Test 6: DIV comparison: Setting...Testing...Passed.
Test 7: OR comparison: Setting...Testing...Passed.
Test 8: AND comparison: Setting...Testing...Passed.
Test 9: Sequential Increment: Setting...Testing...Passed.
Test 10: Solid Bits: Testing...Passed.
Test 11: Block Sequential: Setting... 35Apr 8 01:16:06 bsd1 /kernel: pmap_collect: collecting pv entries -- suggest increasing PMAP_SHPGPERPROC
Apr 8 01:16:06 bsd1 /kernel: pmap_collect: collecting pv entries -- suggest increasing PMAP_SHPGPERPROC
Testing...Passed.
Test 12: Checkerboard: Testing...Passed.
Test 13: Bit Spread: Testing...Passed.
Test 14: Bit Flip: Testing...Passed.
Test 15: Walking Ones: Testing... 3

taz0
04-07-2002, 11:49 AM
Do you read your log files??
This may be the cause of your crashes:
Apr 8 01:16:06 bsd1 /kernel: pmap_collect: collecting pv entries -- suggest increasing PMAP_SHPGPERPROC

allera
04-07-2002, 11:51 AM
Try replacing the ram to see if that's even the problem. However, your log shows an entry that I don't like. :) Search google and see what you come up with. Might be something to do with your NIC. What else is hiding in your logs?

Starhost
04-07-2002, 12:50 PM
Run another memtest,

but first shutdown all programms you don't need. Such as httpd, mysql etc.

Then run memtest again and see what it is. ALso check your /var/log/messages to see if their are things that go wrong just before the shutdown.

ClusterMania
04-07-2002, 12:55 PM
How do I shut them all down?

Starhost
04-07-2002, 01:08 PM
as root type:

ps aux


Then start killing the processes.
Or jsut go to:

/usr/local/etc/rc.d/

And run every script.
With a stop entry

so for example

proftpd.sh stop

ho247
04-07-2002, 01:16 PM
Originally posted by ClusterMania
[B]I changed my httpd.conf settings to and my processes doubled. I think my load is fine but my server still keeps crashing.

Timeout 8
KeepAliveTimeout 15
MinSpareServers 200
MaxSpareServers 300
StartServers 200
MaxClients 2048


ClusterMania, have you tried scaling down the number of HTTPD processes? Try changing your httpd.conf file with the following settings:

Timeout 150
KeepAliveTimeout 15
MinSpareServers 8
MaxSpareServers 18
StartServers 8

Try that and see if it still crashes. Keeping your MaxClients at 2048 should be okay with those sever specs.

Alan

ClusterMania
04-07-2002, 01:26 PM
I did su -l to try to log into root and go into /usr/local/etc/rc.d/ but I don't have access.


Do I just do ps then kill #####

How do I shut down apache?

taz0
04-07-2002, 01:28 PM
Originally posted by ClusterMania
I did su -l to try to log into root and go into /usr/local/etc/rc.d/ but I don't have access.


Do I just do ps then kill #####

How do I shut down apache?

/usr/local/etc/rc.d/apache.sh stop
or killall -9 httpd

ClusterMania
04-07-2002, 01:29 PM
Originally posted by ho247


ClusterMania, have you tried scaling down the number of HTTPD processes? Try changing your httpd.conf file with the following settings:

Timeout 150
KeepAliveTimeout 15
MinSpareServers 8
MaxSpareServers 18
StartServers 8

Try that and see if it still crashes. Keeping your MaxClients at 2048 should be okay with those sever specs.

Alan

Hmm, I think dilhole has the same settings as me and he's doing fine. I thought my server should be able to handle it with such hardware.

ho247
04-07-2002, 01:34 PM
Your server will be able to handle the amount of traffic, but looking at the amount of processes that have started, it would be a way to see if it actually helps, to make the server stop crashing, since that was a problem that I had previously with a server similar to yours (not as much RAM though).

Alan

ClusterMania
04-07-2002, 01:45 PM
Oh man, I had the people power cycle my server and it crashed in 15 seconds. Not even enough time to do anything =/ sigh....

Ok I found a copy of crashme on another site
http://wuarchive.wustl.edu/systems/unix/NetBSD/packages/1.5.2/vax/sysutils/

Is there command line to download it and install it? I want to test my second server to make sure everything is ok. I want to swap harddrives and get the buggy server tested.

ClusterMania
04-07-2002, 06:25 PM
This is how far it got before my buggy server crashed again. Am I suppose to let it run until it stops by itself?

memtest v. 2.93.1
(C) 2000 Charles Cazabon <memtest@discworld.dyndns.org>
Original v.1 (C) 1999 Simon Kirby <sim@stormix.com> <sim@neato.org>

Current limits:
RLIMIT_RSS 0xffffffff
RLIMIT_VMEM 0xffffffff
Raising limits...
Allocated 104857600 bytes...trying mlock...success. Starting tests...

Testing 104857600 bytes at 0x0804e000 (0 bytes lost to page alignment).

Run 1:
Test 1: Stuck Address: Testing...Passed.
Test 2: Random value: Setting...Testing...Passed.
Test 3: XOR comparison: Setting...Testing...Passed.
Test 4: SUB comparison: Setting...Testing...Passed.
Test 5: MUL comparison: Setting...Testing...Passed.
Test 6: DIV comparison: Setting...Testing...Passed.
Test 7: OR comparison: Setting...Testing...Passed.
Test 8: AND comparison: Setting...Testing...Passed.
Test 9: Sequential Increment: Setting...Testing...Passed.
Test 10: Solid Bits: Testing...Passed.
Test 11: Block Sequential: Testing...Passed.
Test 12: Checkerboard: Testing...Passed.
Test 13: Bit Spread: Testing...Passed.
Test 14: Bit Flip: Testing...Passed.
Test 15: Walking Ones: Testing...Passed.
Test 16: Walking Zeroes: Testing...Passed.
Run 1 completed in 406 seconds (0 tests showed errors).
Run 2:
Test 1: Stuck Address: Testing...Passed.
Test 2: Random value: Setting...Testing...Passed.
Test 3: XOR comparison: Setting...Testing...Passed.
Test 4: SUB comparison: Setting...Testing...Passed.
Test 5: MUL comparison: Setting...Testing...Passed.
Test 6: DIV comparison: Setting...Testing...Passed.
Test 7: OR comparison: Setting...Testing...Passed.
Test 8: AND comparison: Setting...Testing...Passed.
Test 9: Sequential Increment: Setting...Testing...Passed.
Test 10: Solid Bits: Testing...Passed.
Test 11: Block Sequential: Testing...Passed.
Test 12: Checkerboard: Testing...Passed.
Test 13: Bit Spread: Testing...Passed.
Test 14: Bit Flip: Testing...Passed.
Test 15: Walking Ones: Testing...Passed.
Test 16: Walking Zeroes: Testing...Passed.
Run 2 completed in 407 seconds (0 tests showed errors).
Run 3:
Test 1: Stuck Address: Testing...Passed.
Test 2: Random value: Setting...Testing...Passed.
Test 3: XOR comparison: Setting...Testing...Passed.
Test 4: SUB comparison: Setting...Testing...Passed.
Test 5: MUL comparison: Setting...Testing...Passed.
Test 6: DIV comparison: Setting...Testing...Passed.
Test 7: OR comparison: Setting...Testing...Passed.
Test 8: AND comparison: Setting...Testing...Passed.
Test 9: Sequential Increment: Setting...Testing...Passed.
Test 10: Solid Bits: Testing...Passed.
Test 11: Block Sequential: Testing...Passed.
Test 12: Checkerboard: Testing...Passed.
Test 13: Bit Spread: Testing...Passed.
Test 14: Bit Flip: Testing...Passed.
Test 15: Walking Ones: Testing...Passed.
Test 16: Walking Zeroes: Testing...Passed.
Run 3 completed in 408 seconds (0 tests showed errors).
Run 4:
Test 1: Stuck Address: Testing...Passed.
Test 2: Random value: Setting...Testing...Passed.
Test 3: XOR comparison: Setting...Testing...Passed.
Test 4: SUB comparison: Setting...Testing...Passed.
Test 5: MUL comparison: Setting...Testing...Passed.
Test 6: DIV comparison: Setting...Testing...Passed.
Test 7: OR comparison: Setting...Testing...Passed.
Test 8: AND comparison: Setting...Testing...Passed.
Test 9: Sequential Increment: Setting...Testing...Passed.
Test 10: Solid Bits: Testing...Passed.
Test 11: Block Sequential: Testing...Passed.
Test 12: Checkerboard: Testing...Passed.
Test 13: Bit Spread: Testing...Passed.
Test 14: Bit Flip: Testing...Passed.
Test 15: Walking Ones: Testing...Passed.
Test 16: Walking Zeroes: Testing...Passed.
Run 4 completed in 408 seconds (0 tests showed errors).
Run 5:
Test 1: Stuck Address: Testing...Passed.
Test 2: Random value: Setting...Testing...Passed.
Test 3: XOR comparison: Setting...Testing...Passed.
Test 4: SUB comparison: Setting...Testing...Passed.
Test 5: MUL comparison: Setting...Testing...Passed.
Test 6: DIV comparison: Setting...Testing...Passed.
Test 7: OR comparison: Setting...Testing...Passed.
Test 8: AND comparison: Setting...Testing...Passed.
Test 9: Sequential Increment: Setting...Testing...Passed.
Test 10: Solid Bits: Testing...Passed.
Test 11: Block Sequential: Testing...Passed.
Test 12: Checkerboard: Testing...Passed.
Test 13: Bit Spread: Testing...Passed.
Test 14: Bit Flip: Testing...Passed.
Test 15: Walking Ones: Testing...Passed.
Test 16: Walking Zeroes: Testing...Passed.
Run 5 completed in 408 seconds (0 tests showed errors).
Run 6:
Test 1: Stuck Address: Testing...Passed.
Test 2: Random value: Setting...Testing...Passed.
Test 3: XOR comparison: Setting...Testing...Passed.
Test 4: SUB comparison: Setting...Testing...Passed.
Test 5: MUL comparison: Setting...Testing...Passed.
Test 6: DIV comparison: Setting...Testing...Passed.
Test 7: OR comparison: Setting...Testing...Passed.
Test 8: AND comparison: Setting...Testing...Passed.
Test 9: Sequential Increment: Setting...Testing...Passed.
Test 10: Solid Bits: Testing...Passed.
Test 11: Block Sequential: Testing... 54

Starhost
04-07-2002, 06:30 PM
And in the /var/log/messages

Any result of kernel panics or whatever?

ClusterMania
04-08-2002, 03:29 PM
They finally did the harddrive swap. I tryed doing memtest 2G but it won't let me =) anyways. It auto decreased itself to Allocated 535822336 bytes...trying mlock...success. Starting tests...

I hope I am putting stress on the server cause I want to to see if it will crash.

bacid
04-08-2002, 05:40 PM
have you tried checking your BIOS settings?

i had a similar problem a few months back and it turned out my motherboards BIOS was set to "optimal" settings by default.. i changed this to "stable" which meant that it wasnt overclocking certain things and used very conservative settings.. the problem was resolved..

ClusterMania
04-09-2002, 01:07 AM
How do I check and edit my bio through ssh?

last pid: 2795; load averages: 7.74, 6.54, 3.80 up 0+09:54:25 14:48:51
443 processes: 2 running, 441 sleeping
CPU states: 0.4% user, 0.0% nice, 5.3% system, 0.0% interrupt, 94.4% idle
Mem: 260M Active, 241M Inact, 159M Wired, 28M Cache, 214M Buf, 1324M Free
Swap: 2048M Total, 2048M Free

http://datahawk.verifast.net/mrtg/66.28.252.1_26.html

My graph looks funky but my system says it's been up for almost 10 hours