ClusterMania
04-05-2002, 12:12 AM
Hi, I need a auto reboot device for freebsd machines. I have looked at watchdog cards but non found for freebsd. I need something that's automatic since it's not possible to babysit every server 24 hours a day.
![]() | View Full Version : Auto Reboot device for freebsd ClusterMania 04-05-2002, 12:12 AM Hi, I need a auto reboot device for freebsd machines. I have looked at watchdog cards but non found for freebsd. I need something that's automatic since it's not possible to babysit every server 24 hours a day. Starhost 04-05-2002, 08:43 AM How do you mean autoreboot? My server gets rebooted just when it loses power or whatever. I will alwayss reboot automaticly unless the hardware is corrupt or the OS is ****ed up. kunal 04-05-2002, 02:38 PM how about writing a lil cron script?? Starhost 04-05-2002, 02:52 PM Why should you let your server reboot automaticly on certain times? That isn't usefull or am I missing something? ClusterMania 04-05-2002, 09:32 PM I want my server to auto reboot or auto power cycle if my server crashes and stops responding. Blazing 04-05-2002, 10:24 PM You want remote reboot from telenet/ssh through UPS? If so, I think you need a APC...not sure on this one ClusterMania 04-05-2002, 10:52 PM I know about the Masterswitch Plus. I just need something that's automated. I want it to powercycle my server if it stops responding automaticly. taz0 04-05-2002, 11:08 PM Originally posted by ClusterMania I know about the Masterswitch Plus. I just need something that's automated. I want it to powercycle my server if it stops responding automaticly. Does this happen often? Check this: http://www.etinc.com/watchdog.htm ClusterMania 04-05-2002, 11:11 PM http://datahawk.verifast.net/mrtg/66.28.252.1_27.html I have other servers and this is the only one that keeps crashing allot. I need to fix this. I have looked at watchdog cards myself but need one with 1) PCI 2) Fits In a 1U 3) Supports Freebsd I have found cards that support 1) 2) but now I am thinking about formatting my harddrive and installing linux since there are hardly any cards for freebsd. taz0 04-05-2002, 11:13 PM Originally posted by ClusterMania http://datahawk.verifast.net/mrtg/66.28.252.1_27.html I have other servers and this is the only one that keeps crashing allot. I need to fix this. It's surely a hardware problem. Bad RAM, CPU or motherbord. I've had FreeBSD servers running for 365+ days. ClusterMania 04-05-2002, 11:55 PM I got a pretty powerful machine. I am sure it can handle a high load but don't know what's wrong. Dual P3T with 2 Gigs of ram. It should do allot more than this without crashing =/ last pid: 1050; load averages: 0.05, 0.14, 0.19 up 0+00:39:31 12:40:20 328 processes: 1 running, 327 sleeping CPU states: 0.4% user, 0.0% nice, 0.6% system, 0.2% interrupt, 98.8% idle Mem: 155M Active, 60M Inact, 106M Wired, 15M Cache, 141M Buf, 1676M Free Swap: 2048M Total, 2048M Free PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU CPU COMMAND 323 root 2 0 8900K 6684K select 0 0:01 0.00% 0.00% httpd 448 apache 2 0 9468K 6848K sbwait 0 0:01 0.00% 0.00% httpd 608 apache 2 0 9448K 6844K sbwait 0 0:01 0.00% 0.00% httpd 570 apache 2 0 9088K 6888K sbwait 0 0:01 0.00% 0.00% httpd 511 apache 2 0 8972K 6772K sbwait 1 0:01 0.00% 0.00% httpd 621 apache 2 0 8964K 6764K sbwait 1 0:01 0.00% 0.00% httpd 459 apache 2 0 8972K 6772K sbwait 1 0:01 0.00% 0.00% httpd 499 apache 2 0 8972K 6780K sbwait 0 0:01 0.00% 0.00% httpd 550 apache 2 0 9484K 6908K sbwait 0 0:01 0.00% 0.00% httpd 514 apache 2 0 8964K 6764K sbwait 1 0:01 0.00% 0.00% httpd 651 apache 2 0 8972K 6772K sbwait 0 0:01 0.00% 0.00% httpd 643 apache 2 0 8972K 6772K sbwait 1 0:01 0.00% 0.00% httpd 463 apache 2 0 9136K 6936K sbwait 0 0:01 0.00% 0.00% httpd 654 apache 2 0 8964K 6764K sbwait 1 0:01 0.00% 0.00% httpd 460 apache 18 0 8964K 6764K lockf 1 0:01 0.00% 0.00% httpd 453 apache 2 0 8964K 6768K sbwait 1 0:01 0.00% 0.00% httpd 537 apache 18 0 8964K 6764K lockf 1 0:01 0.00% 0.00% httpd 645 apache 18 0 8972K 6772K lockf 1 0:01 0.00% 0.00% httpd 509 apache 2 0 10040K 6820K sbwait 1 0:01 0.00% 0.00% httpd 471 apache 2 0 9152K 6952K sbwait 0 0:01 0.00% 0.00% httpd 455 apache 2 0 8964K 6768K sbwait 0 0:01 0.00% 0.00% httpd 631 apache 2 0 8972K 6780K sbwait 0 0:01 0.00% 0.00% httpd 594 apache 18 0 8972K 6772K lockf 1 0:01 0.00% 0.00% httpd 481 apache 18 0 8972K 6768K lockf 1 0:01 0.00% 0.00% httpd 668 apache 2 0 10024K 6836K sbwait 1 0:01 0.00% 0.00% httpd 560 apache 2 0 9072K 6872K sbwait 1 0:01 0.00% 0.00% httpd 632 apache 18 0 8972K 6772K lockf 1 0:01 0.00% 0.00% httpd 461 apache 2 0 8964K 6768K sbwait 0 0:01 0.00% 0.00% httpd 653 apache 2 0 8964K 6764K sbwait 1 0:01 0.00% 0.00% httpd 536 apache 2 0 8972K 6772K sbwait 1 0:01 0.00% 0.00% httpd 564 apache 18 0 8972K 6768K lockf 1 0:01 0.00% 0.00% httpd 467 apache 2 0 8964K 6764K sbwait 0 0:01 0.00% 0.00% httpd 644 apache 2 0 9124K 6924K sbwait 0 0:01 0.00% 0.00% httpd 486 apache 2 0 8964K 6764K sbwait 0 0:01 0.00% 0.00% httpd 658 apache 2 0 8964K 6764K sbwait 0 0:01 0.00% 0.00% httpd 589 apache 18 0 8964K 6772K lockf 1 0:01 0.00% 0.00% httpd 451 apache 18 0 8964K 6764K lockf 1 0:00 0.00% 0.00% httpd 597 apache 2 0 8972K 6772K sbwait 1 0:00 0.00% 0.00% httpd 646 apache 2 0 9072K 6880K sbwait 0 0:00 0.00% 0.00% httpd 660 apache 2 0 9488K 7028K sbwait 0 0:00 0.00% 0.00% httpd 475 apache 2 0 8964K 6764K sbwait 1 0:00 0.00% 0.00% httpd 554 apache 2 0 9064K 6864K sbwait 1 0:00 0.00% 0.00% httpd 490 apache 18 0 8972K 6772K lockf 1 0:00 0.00% 0.00% httpd 457 apache 18 0 8964K 6764K lockf 1 0:00 0.00% 0.00% httpd 527 apache 2 0 9168K 6976K sbwait 0 0:00 0.00% 0.00% httpd 581 apache 18 0 8972K 6776K lockf 1 0:00 0.00% 0.00% httpd 663 apache 2 0 8992K 6792K sbwait 0 0:00 0.00% 0.00% httpd 567 apache 2 0 8972K 6772K sbwait 1 0:00 0.00% 0.00% httpd 562 apache 2 0 9020K 6812K sbwait 0 0:00 0.00% 0.00% httpd 535 apache 2 0 8972K 6772K sbwait 1 0:00 0.00% 0.00% httpd NyteOwl 04-06-2002, 03:54 AM There is a PCI Watchdog card at this site: http://www.berkprod.com/pci_pc_watchdog.htm There is a driver for the ISA version of this card for BSD at this site: http://www.cs.ndsu.nodak.edu/~tinguely/ Might be worth dropping the author of the driver a note about possible compatibility with the PCI card. I've used B&B Electronic's Watchdog cards in the past but they don't seem to have a driver (or driver data to write your own) for *nix. Good Luck! Starhost 04-06-2002, 04:51 AM I also think it is bad RAM. My freeBSD server with bad ram also started to crash without a known reason. Afther running memtest (whih is in the port collection of freeBSD) I knew it was BAD ram. I replaced the ram and the server is running great since ......... ClusterMania 04-06-2002, 05:32 AM Originally posted by NyteOwl There is a PCI Watchdog card at this site: http://www.berkprod.com/pci_pc_watchdog.htm There is a driver for the ISA version of this card for BSD at this site: http://www.cs.ndsu.nodak.edu/~tinguely/ Might be worth dropping the author of the driver a note about possible compatibility with the PCI card. I've used B&B Electronic's Watchdog cards in the past but they don't seem to have a driver (or driver data to write your own) for *nix. Good Luck! I tryed contacting the guy long ago but he only has it working for isa which only the really old machines have. No luck finding one for pci. I guess I should have the memory tested. Starhost 04-06-2002, 10:06 AM Could you please post the output of your metest when you run it. Don't forget to shut it down, because memtest will keep running!! Mike the newbie 04-06-2002, 10:12 AM Originally posted by ClusterMania I want my server to auto reboot or auto power cycle if my server crashes and stops responding. It may be a better usage of your energies to track down and fix the cause of your server's not responding, rather than to accomodate it. taz0 04-06-2002, 03:04 PM You may find this articlle very interesting: http://www.onlamp.com/pub/a/bsd/2002/03/21/Big_Scary_Daemons.html ClusterMania 04-07-2002, 12:26 AM I changed my httpd.conf settings to and my processes doubled. I think my load is fine but my server still keeps crashing. Timeout 8 KeepAliveTimeout 15 MinSpareServers 200 MaxSpareServers 300 StartServers 200 MaxClients 2048 last pid: 1084; load averages: 1.63, 2.61, 1.47 up 0+00:31:27 14:10:46 447 processes: 2 running, 445 sleeping CPU states: 1.0% user, 0.0% nice, 3.7% system, 0.2% interrupt, 95.1% idle Mem: 157M Active, 60M Inact, 101M Wired, 20M Cache, 114M Buf, 1673M Free Swap: 2048M Total, 2048M Free PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU CPU COMMAND 678 apache 18 0 8964K 6764K lockf 1 0:00 0.05% 0.05% httpd 987 apache 2 0 8964K 6760K sbwait 0 0:00 0.05% 0.05% httpd 1069 root 28 0 2428K 1616K CPU0 0 0:01 0.00% 0.00% top 342 root 2 0 8900K 6684K select 0 0:01 0.00% 0.00% httpd 480 apache 2 0 10012K 7600K sbwait 0 0:00 0.00% 0.00% httpd 517 apache 18 0 8956K 6760K lockf 1 0:00 0.00% 0.00% httpd 643 apache 18 0 8956K 6756K lockf 1 0:00 0.00% 0.00% httpd 653 apache 18 0 8956K 6756K lockf 1 0:00 0.00% 0.00% httpd 740 apache 2 0 9092K 6892K sbwait 0 0:00 0.00% 0.00% httpd 471 apache 18 0 8964K 6764K lockf 1 0:00 0.00% 0.00% httpd 608 apache 2 0 8956K 6756K sbwait 0 0:00 0.00% 0.00% httpd 672 apache 2 0 9116K 6916K sbwait 1 0:00 0.00% 0.00% httpd 500 apache 2 0 8956K 6760K sbwait 0 0:00 0.00% 0.00% httpd 556 apache 2 0 9084K 6884K sbwait 0 0:00 0.00% 0.00% httpd 561 apache 2 0 10012K 6812K sbwait 0 0:00 0.00% 0.00% httpd 518 apache 18 0 8956K 6756K lockf 1 0:00 0.00% 0.00% httpd 664 apache 18 0 8956K 6756K lockf 1 0:00 0.00% 0.00% httpd 711 apache 18 0 8964K 6772K lockf 1 0:00 0.00% 0.00% httpd 523 apache 18 0 8956K 6756K lockf 1 0:00 0.00% 0.00% httpd 473 apache 2 0 9024K 6816K select 1 0:00 0.00% 0.00% httpd 551 apache 18 0 8956K 6756K lockf 1 0:00 0.00% 0.00% httpd 677 apache 2 0 9476K 6924K sbwait 1 0:00 0.00% 0.00% httpd 590 apache 18 0 8956K 6756K lockf 1 0:00 0.00% 0.00% httpd 467 apache 2 0 8956K 6756K sbwait 1 0:00 0.00% 0.00% httpd 706 apache 2 0 9092K 6888K sbwait 0 0:00 0.00% 0.00% httpd 739 apache 18 0 8964K 6764K lockf 1 0:00 0.00% 0.00% httpd 760 apache 18 0 8964K 6764K lockf 1 0:00 0.00% 0.00% httpd 534 apache 2 0 9132K 6936K RUN 1 0:00 0.00% 0.00% httpd 624 apache 18 0 8956K 6756K lockf 1 0:00 0.00% 0.00% httpd 543 apache 2 0 9084K 6884K select 0 0:00 0.00% 0.00% httpd 559 apache 2 0 8956K 6756K sbwait 1 0:00 0.00% 0.00% httpd 535 apache 18 0 8964K 6764K lockf 1 0:00 0.00% 0.00% httpd 646 apache 18 0 8956K 6756K lockf 1 0:00 0.00% 0.00% httpd 491 apache 2 0 9096K 6896K sbwait 0 0:00 0.00% 0.00% httpd 558 apache 2 0 8956K 6756K sbwait 0 0:00 0.00% 0.00% httpd 589 apache 18 0 8956K 6756K lockf 1 0:00 0.00% 0.00% httpd 699 apache 2 0 8964K 6764K sbwait 0 0:00 0.00% 0.00% httpd 571 apache 2 0 9480K 7228K sbwait 1 0:00 0.00% 0.00% httpd 510 apache 18 0 8956K 6756K lockf 1 0:00 0.00% 0.00% httpd 693 apache 18 0 8964K 6764K lockf 1 0:00 0.00% 0.00% httpd 591 apache 2 0 9476K 7008K sbwait 0 0:00 0.00% 0.00% httpd 726 apache 2 0 8972K 6772K sbwait 0 0:00 0.00% 0.00% httpd 753 apache 2 0 9492K 7260K sbwait 1 0:00 0.00% 0.00% httpd 573 apache 2 0 8956K 6756K sbwait 0 0:00 0.00% 0.00% httpd 636 apache 18 0 8956K 6760K lockf 1 0:00 0.00% 0.00% httpd 578 apache 2 0 9108K 6912K sbwait 0 0:00 0.00% 0.00% httpd 485 apache 18 0 8964K 6760K lockf 1 0:00 0.00% 0.00% httpd 470 apache 18 0 8956K 6760K lockf 1 0:00 0.00% 0.00% httpd 640 apache 18 0 8964K 6764K lockf 1 0:00 0.00% 0.00% httpd 770 apache 2 0 8964K 6764K sbwait 1 0:00 0.00% 0.00% httpd Starhost 04-07-2002, 06:01 AM Originally posted by ClusterMania I changed my httpd.conf settings to and my processes doubled. I think my load is fine but my server still keeps crashing. As I and others told you it is almost certainly a hardware problem. So run memtest to see if it isn't the ram. Because you didn't do it yet. here the instructions on how to run it: -login as root. cd /usr/ports/sysutils/memtest/ make install clean memtest 100M (100M means, that memtest is trying to test blocks of 100MB at the time. ) Now just wait and see if everything is going well. Afther a while press ^C so that the programm stops. I'm almost certain that you will get errors. If so replace the RAM. suc6. ClusterMania 04-07-2002, 08:28 AM Thanks for instructions, it installed ok I think but when I run memtest 100 I get command not found Starhost 04-07-2002, 08:36 AM No problem :) If their are any other problems, please post them. And if it was bad RAM. Also post :) ClusterMania 04-07-2002, 08:48 AM =/ wish everything was gui. I am never good at Command line stuff ===> Extracting for memtest-2.93.1 >> Checksum OK for memtester-2.93.1.tar.bz2. ===> memtest-2.93.1 depends on executable: gmake - found ===> Patching for memtest-2.93.1 ===> Applying FreeBSD patches for memtest-2.93.1 ===> Configuring for memtest-2.93.1 ===> Building for memtest-2.93.1 gcc -O -pipe -Wall -g -c -o memtest.o memtest.c gcc -O -pipe -Wall -g -c -o memtest-tests.o memtest-tests.c gcc -g -o memtest memtest.o memtest-tests.o bsd2# make install ===> Installing for memtest-2.93.1 ===> memtest-2.93.1 is already installed - perhaps an older version? If so, you may wish to ``make deinstall'' and install this port again by ``make reinstall'' to upgrade it properly. If you really wish to overwrite the old port of memtest-2.93.1 without deleting it first, set the variable "FORCE_PKG_REGISTER" in your environment or the "make install" command line. *** Error code 1 Stop in /usr/ports/sysutils/memtest. *** Error code 1 Stop in /usr/ports/sysutils/memtest. bsd2# memtest memtest: Command not found. bsd2# memtest 100 memtest: Command not found. ClusterMania 04-07-2002, 09:21 AM BYTEmark* Native Mode Benchmark ver. 2 (10/95) Index-split by Andrew D. Balsa (11/97) Linux/Unix* port by Uwe F. Mayer (12/96,11/97) TEST : Iterations/sec. : Old Index : New Index : : Pentium 90* : AMD K6/233* --------------------:------------------:-------------:------------ top NUMERIC SORT : 573.85 : 14.72 : 4.83 STRING SORT : 42.422 : 18.96 : 2.93 BITFIELD : 1.7243e+08 : 29.58 : 6.18 FP EMULATION : 37.124 : 17.81 : 4.11 FOURIER : 9404.1 : 10.70 : 6.01 Segmentation fault (core dumped) I actually got something to work Starhost 04-07-2002, 09:27 AM Oke it is already installed. then do this: cd /usr/ports/sysutils/memtest/ make deinstall And then the things I posted before. ClusterMania 04-07-2002, 09:57 AM Originally posted by Starhost Oke it is already installed. then do this: cd /usr/ports/sysutils/memtest/ make deinstall And then the things I posted before. bsd2# cd /usr/ports/sysutils/memtest/ bsd2# make deinstall ===> Deinstalling for memtest-2.93.1 bsd2# mtest 100m multicast membership test program; enter ? for list of commands ? j g.g.g.g i.i.i.i - join IP multicast group l g.g.g.g i.i.i.i - leave IP multicast group a ifname e.e.e.e.e.e - add ether multicast address d ifname e.e.e.e.e.e - del ether multicast address m ifname 1/0 - set/clear ether allmulti flag p ifname 1/0 - set/clear ether promisc flag q - quit Starhost 04-07-2002, 10:10 AM Mmm What happens when you just type: memtest 100M ?? Does it start running? ClusterMania 04-07-2002, 10:15 AM Ok, I got it running but only on my second server since my first one keeps crashing every 10 minutes. What other ports are good to run and test my system. I am thinking about swapping harddrives with the two servers if this one is fine and have someone trouble shoot the other server. bsd2# make install ===> Extracting for crashme-2.4 >> Checksum mismatch for crashme.tgz. Make sure the Makefile and distinfo file (/usr/ports/sysutils/crashme/distinfo) are up to date. If you are absolutely sure you want to override this check, type "make NO_CHECKSUM=yes [other args]". *** Error code 1 How do I update it? Is there a command line that make it auto download the updates? Hmm, thanks to you I am learning allot today. Oh, yeah it keeps running. Is it suppose to do 100 test? Run 3: Test 1: Stuck Address: Testing...Passed. Test 2: Random value: Setting...Testing...Passed. Test 3: XOR comparison: Setting...Testing...Passed. Test 4: SUB comparison: Setting...Testing...Passed. Test 5: MUL comparison: Setting...Testing...Passed. Test 6: DIV comparison: Setting...Testing...Passed. Test 7: OR comparison: Setting...Testing...Passed. Test 8: AND comparison: Setting...Testing...Passed. Test 9: Sequential Increment: Setting...Testing...Passed. Test 10: Solid Bits: Testing...Passed. Test 11: Block Sequential: Testing...Passed. Test 12: Checkerboard: Testing...Passed. Test 13: Bit Spread: Testing...Passed. Test 14: Bit Flip: Testing...Passed. Test 15: Walking Ones: Testing...Passed. Test 16: Walking Zeroes: Testing...Passed. Run 3 completed in 386 seconds (0 tests showed errors). Run 4: Test 1: Stuck Address: Testing...Passed. Test 2: Random value: Setting...Testing...Passed. Test 3: XOR comparison: Setting...Testing...Passed. Test 4: SUB comparison: Setting...Testing...Passed. Test 5: MUL comparison: Setting...Testing...Passed. Test 6: DIV comparison: Setting...Testing...Passed. Test 7: OR comparison: Setting...Testing...Passed. Test 8: AND comparison: Setting...Testing...Passed. Test 9: Sequential Increment: Setting...Testing...Passed. Test 10: Solid Bits: Testing...Passed. Test 11: Block Sequential: Testing...Passed. Test 12: Checkerboard: Testing...Passed. Test 13: Bit Spread: Testing...Passed. Test 14: Bit Flip: Testing...Passed. Test 15: Walking Ones: Testing...Passed. Test 16: Walking Zeroes: Testing...Passed. Run 4 completed in 387 seconds (0 tests showed errors). allera 04-07-2002, 10:59 AM On the machine that keeps crashing, do this: cd /usr/ports/sysutils/memtest make deinstall make install clean when it's done, log out and log back in, then run memtest 100M and it should kick in. Hopefully it won't crash while the test is running. Also, have you _tried_ replacing the RAM? You might want to do that instead of sitting there waiting for all 2GB to test. If it stays up with the new RAM (128MB? whatever), then run the memtest on one GB stick and then on another to see which is the faulty one. Hope that helps at least a little. :) ClusterMania 04-07-2002, 11:15 AM Each Run is for 1 meg of ram or 100 megs? I can let it run while I sleep. I also want to stress test my system but it won't let me get crashme. ClusterMania 04-07-2002, 11:36 AM I got my server powercycled and it crashed in 5 minutes. Heres everything it got up to before it crashed bash-2.05# memtest 100M memtest v. 2.93.1 (C) 2000 Charles Cazabon <memtest@discworld.dyndns.org> Original v.1 (C) 1999 Simon Kirby <sim@stormix.com> <sim@neato.org> Current limits: RLIMIT_RSS 0xffffffff RLIMIT_VMEM 0xffffffff Raising limits... Allocated 104857600 bytes...trying mlock...success. Starting tests... Testing 104857600 bytes at 0x0804e000 (0 bytes lost to page alignment). Run 1: Test 1: Stuck Address: Testing...Passed. Test 2: Random value: Setting...Testing...Passed. Test 3: XOR comparison: Setting...Testing...Passed. Test 4: SUB comparison: Setting...Testing...Passed. Test 5: MUL comparison: Setting...Testing...Passed. Test 6: DIV comparison: Setting...Testing...Passed. Test 7: OR comparison: Setting...Testing...Passed. Test 8: AND comparison: Setting...Testing...Passed. Test 9: Sequential Increment: Setting...Testing...Passed. Test 10: Solid Bits: Testing...Passed. Test 11: Block Sequential: Setting... 35Apr 8 01:16:06 bsd1 /kernel: pmap_collect: collecting pv entries -- suggest increasing PMAP_SHPGPERPROC Apr 8 01:16:06 bsd1 /kernel: pmap_collect: collecting pv entries -- suggest increasing PMAP_SHPGPERPROC Testing...Passed. Test 12: Checkerboard: Testing...Passed. Test 13: Bit Spread: Testing...Passed. Test 14: Bit Flip: Testing...Passed. Test 15: Walking Ones: Testing... 3 taz0 04-07-2002, 11:49 AM Do you read your log files?? This may be the cause of your crashes: Apr 8 01:16:06 bsd1 /kernel: pmap_collect: collecting pv entries -- suggest increasing PMAP_SHPGPERPROC allera 04-07-2002, 11:51 AM Try replacing the ram to see if that's even the problem. However, your log shows an entry that I don't like. :) Search google and see what you come up with. Might be something to do with your NIC. What else is hiding in your logs? Starhost 04-07-2002, 12:50 PM Run another memtest, but first shutdown all programms you don't need. Such as httpd, mysql etc. Then run memtest again and see what it is. ALso check your /var/log/messages to see if their are things that go wrong just before the shutdown. ClusterMania 04-07-2002, 12:55 PM How do I shut them all down? Starhost 04-07-2002, 01:08 PM as root type: ps aux Then start killing the processes. Or jsut go to: /usr/local/etc/rc.d/ And run every script. With a stop entry so for example proftpd.sh stop ho247 04-07-2002, 01:16 PM Originally posted by ClusterMania [B]I changed my httpd.conf settings to and my processes doubled. I think my load is fine but my server still keeps crashing. Timeout 8 KeepAliveTimeout 15 MinSpareServers 200 MaxSpareServers 300 StartServers 200 MaxClients 2048 ClusterMania, have you tried scaling down the number of HTTPD processes? Try changing your httpd.conf file with the following settings: Timeout 150 KeepAliveTimeout 15 MinSpareServers 8 MaxSpareServers 18 StartServers 8 Try that and see if it still crashes. Keeping your MaxClients at 2048 should be okay with those sever specs. Alan ClusterMania 04-07-2002, 01:26 PM I did su -l to try to log into root and go into /usr/local/etc/rc.d/ but I don't have access. Do I just do ps then kill ##### How do I shut down apache? taz0 04-07-2002, 01:28 PM Originally posted by ClusterMania I did su -l to try to log into root and go into /usr/local/etc/rc.d/ but I don't have access. Do I just do ps then kill ##### How do I shut down apache? /usr/local/etc/rc.d/apache.sh stop or killall -9 httpd ClusterMania 04-07-2002, 01:29 PM Originally posted by ho247 ClusterMania, have you tried scaling down the number of HTTPD processes? Try changing your httpd.conf file with the following settings: Timeout 150 KeepAliveTimeout 15 MinSpareServers 8 MaxSpareServers 18 StartServers 8 Try that and see if it still crashes. Keeping your MaxClients at 2048 should be okay with those sever specs. Alan Hmm, I think dilhole has the same settings as me and he's doing fine. I thought my server should be able to handle it with such hardware. ho247 04-07-2002, 01:34 PM Your server will be able to handle the amount of traffic, but looking at the amount of processes that have started, it would be a way to see if it actually helps, to make the server stop crashing, since that was a problem that I had previously with a server similar to yours (not as much RAM though). Alan ClusterMania 04-07-2002, 01:45 PM Oh man, I had the people power cycle my server and it crashed in 15 seconds. Not even enough time to do anything =/ sigh.... Ok I found a copy of crashme on another site http://wuarchive.wustl.edu/systems/unix/NetBSD/packages/1.5.2/vax/sysutils/ Is there command line to download it and install it? I want to test my second server to make sure everything is ok. I want to swap harddrives and get the buggy server tested. ClusterMania 04-07-2002, 06:25 PM This is how far it got before my buggy server crashed again. Am I suppose to let it run until it stops by itself? memtest v. 2.93.1 (C) 2000 Charles Cazabon <memtest@discworld.dyndns.org> Original v.1 (C) 1999 Simon Kirby <sim@stormix.com> <sim@neato.org> Current limits: RLIMIT_RSS 0xffffffff RLIMIT_VMEM 0xffffffff Raising limits... Allocated 104857600 bytes...trying mlock...success. Starting tests... Testing 104857600 bytes at 0x0804e000 (0 bytes lost to page alignment). Run 1: Test 1: Stuck Address: Testing...Passed. Test 2: Random value: Setting...Testing...Passed. Test 3: XOR comparison: Setting...Testing...Passed. Test 4: SUB comparison: Setting...Testing...Passed. Test 5: MUL comparison: Setting...Testing...Passed. Test 6: DIV comparison: Setting...Testing...Passed. Test 7: OR comparison: Setting...Testing...Passed. Test 8: AND comparison: Setting...Testing...Passed. Test 9: Sequential Increment: Setting...Testing...Passed. Test 10: Solid Bits: Testing...Passed. Test 11: Block Sequential: Testing...Passed. Test 12: Checkerboard: Testing...Passed. Test 13: Bit Spread: Testing...Passed. Test 14: Bit Flip: Testing...Passed. Test 15: Walking Ones: Testing...Passed. Test 16: Walking Zeroes: Testing...Passed. Run 1 completed in 406 seconds (0 tests showed errors). Run 2: Test 1: Stuck Address: Testing...Passed. Test 2: Random value: Setting...Testing...Passed. Test 3: XOR comparison: Setting...Testing...Passed. Test 4: SUB comparison: Setting...Testing...Passed. Test 5: MUL comparison: Setting...Testing...Passed. Test 6: DIV comparison: Setting...Testing...Passed. Test 7: OR comparison: Setting...Testing...Passed. Test 8: AND comparison: Setting...Testing...Passed. Test 9: Sequential Increment: Setting...Testing...Passed. Test 10: Solid Bits: Testing...Passed. Test 11: Block Sequential: Testing...Passed. Test 12: Checkerboard: Testing...Passed. Test 13: Bit Spread: Testing...Passed. Test 14: Bit Flip: Testing...Passed. Test 15: Walking Ones: Testing...Passed. Test 16: Walking Zeroes: Testing...Passed. Run 2 completed in 407 seconds (0 tests showed errors). Run 3: Test 1: Stuck Address: Testing...Passed. Test 2: Random value: Setting...Testing...Passed. Test 3: XOR comparison: Setting...Testing...Passed. Test 4: SUB comparison: Setting...Testing...Passed. Test 5: MUL comparison: Setting...Testing...Passed. Test 6: DIV comparison: Setting...Testing...Passed. Test 7: OR comparison: Setting...Testing...Passed. Test 8: AND comparison: Setting...Testing...Passed. Test 9: Sequential Increment: Setting...Testing...Passed. Test 10: Solid Bits: Testing...Passed. Test 11: Block Sequential: Testing...Passed. Test 12: Checkerboard: Testing...Passed. Test 13: Bit Spread: Testing...Passed. Test 14: Bit Flip: Testing...Passed. Test 15: Walking Ones: Testing...Passed. Test 16: Walking Zeroes: Testing...Passed. Run 3 completed in 408 seconds (0 tests showed errors). Run 4: Test 1: Stuck Address: Testing...Passed. Test 2: Random value: Setting...Testing...Passed. Test 3: XOR comparison: Setting...Testing...Passed. Test 4: SUB comparison: Setting...Testing...Passed. Test 5: MUL comparison: Setting...Testing...Passed. Test 6: DIV comparison: Setting...Testing...Passed. Test 7: OR comparison: Setting...Testing...Passed. Test 8: AND comparison: Setting...Testing...Passed. Test 9: Sequential Increment: Setting...Testing...Passed. Test 10: Solid Bits: Testing...Passed. Test 11: Block Sequential: Testing...Passed. Test 12: Checkerboard: Testing...Passed. Test 13: Bit Spread: Testing...Passed. Test 14: Bit Flip: Testing...Passed. Test 15: Walking Ones: Testing...Passed. Test 16: Walking Zeroes: Testing...Passed. Run 4 completed in 408 seconds (0 tests showed errors). Run 5: Test 1: Stuck Address: Testing...Passed. Test 2: Random value: Setting...Testing...Passed. Test 3: XOR comparison: Setting...Testing...Passed. Test 4: SUB comparison: Setting...Testing...Passed. Test 5: MUL comparison: Setting...Testing...Passed. Test 6: DIV comparison: Setting...Testing...Passed. Test 7: OR comparison: Setting...Testing...Passed. Test 8: AND comparison: Setting...Testing...Passed. Test 9: Sequential Increment: Setting...Testing...Passed. Test 10: Solid Bits: Testing...Passed. Test 11: Block Sequential: Testing...Passed. Test 12: Checkerboard: Testing...Passed. Test 13: Bit Spread: Testing...Passed. Test 14: Bit Flip: Testing...Passed. Test 15: Walking Ones: Testing...Passed. Test 16: Walking Zeroes: Testing...Passed. Run 5 completed in 408 seconds (0 tests showed errors). Run 6: Test 1: Stuck Address: Testing...Passed. Test 2: Random value: Setting...Testing...Passed. Test 3: XOR comparison: Setting...Testing...Passed. Test 4: SUB comparison: Setting...Testing...Passed. Test 5: MUL comparison: Setting...Testing...Passed. Test 6: DIV comparison: Setting...Testing...Passed. Test 7: OR comparison: Setting...Testing...Passed. Test 8: AND comparison: Setting...Testing...Passed. Test 9: Sequential Increment: Setting...Testing...Passed. Test 10: Solid Bits: Testing...Passed. Test 11: Block Sequential: Testing... 54 Starhost 04-07-2002, 06:30 PM And in the /var/log/messages Any result of kernel panics or whatever? ClusterMania 04-08-2002, 03:29 PM They finally did the harddrive swap. I tryed doing memtest 2G but it won't let me =) anyways. It auto decreased itself to Allocated 535822336 bytes...trying mlock...success. Starting tests... I hope I am putting stress on the server cause I want to to see if it will crash. bacid 04-08-2002, 05:40 PM have you tried checking your BIOS settings? i had a similar problem a few months back and it turned out my motherboards BIOS was set to "optimal" settings by default.. i changed this to "stable" which meant that it wasnt overclocking certain things and used very conservative settings.. the problem was resolved.. ClusterMania 04-09-2002, 01:07 AM How do I check and edit my bio through ssh? last pid: 2795; load averages: 7.74, 6.54, 3.80 up 0+09:54:25 14:48:51 443 processes: 2 running, 441 sleeping CPU states: 0.4% user, 0.0% nice, 5.3% system, 0.0% interrupt, 94.4% idle Mem: 260M Active, 241M Inact, 159M Wired, 28M Cache, 214M Buf, 1324M Free Swap: 2048M Total, 2048M Free http://datahawk.verifast.net/mrtg/66.28.252.1_26.html My graph looks funky but my system says it's been up for almost 10 hours |