
|
View Full Version : TP RAID 5 (Megaraid) slow as hell ;(
freeflight2 05-25-2004, 12:21 PM oh boy... I thought it would be a good idea to speed up our sites by switching from ordinary SCSI to TP's RAID 5 4x73 GB SCSI boxes - these servers are slow as hell!
- a simple tar cf usr.tgz /usr takes almost 9 minutes (compared to 2min 55 secs on a ordinary SCSI box with an even bigger /usr, measured after a fresh reboot with no files cached)
- what's worse: machine LOCKS, ls -al /file takes sometimes 20-30 seconds during a tar or rsync... mysqld is becoming unresponsive.
- a tar cf makes the machine 0% idle and 97% iowait (compared to 30% iowait on a SCSI box)
- playing back a mysql-bin-log was slower than on a p4 SCSI box with a heavy load of 20
- it 'feels' worse than a celeron 1GHZ with 7200RPM IDE
can anybody give me some quick advice what to do? (firmware update?) I need to get these servers fast before June1... upgrade to a 2.6.6 kernel didn't help. TP's support is even worse than SM's - I used to be a huge fan of this company but the performance of their raid5 boxes are not acceptable and will force me to move if they don;t get their act together.
Note: TP's ordinary SCSI boxes are great, no complaints... great bang for the $ - It's the raid5 servers I am talking about.
RossH 05-25-2004, 12:24 PM I think they are using really cheap hardware in their higher up servers.
freeflight2 05-25-2004, 12:27 PM what makes you think that? their 'photo' on the order page shows dell boxes.
It also says "Powered by Dell": http://theplanet.com/control/pro/p2800sr5x_details.html
How can I check if it's a dell?
freeflight2 05-25-2004, 02:15 PM more tests: maximum transfer on a 100mbit is 5.6MBytes/sec with this raid... did I spend $500/mo to be transfered back to the 80ies? every celeron packs this (idle) network with 11.1Mbytes/sec
let's see what bonnie++ will tell me in an hour or so...
System 05-25-2004, 04:58 PM My guess is for some reason the array is rebuilding, this would cause it to be slow. Anyone who knows RAID knows that RAID5 is not known, or used for speed. It is used so if a drive fails, you don't lose all your data.
RAID5 is best for a backup server, that doesn't really need to be fast, it just needs to be able to hold alot of data, and be reliable.
freeflight2 05-25-2004, 05:21 PM yeah was guessing that too - megamgr shows 4 green drives though...
Read/seek performance is actually pretty good, same as a single drive, seek even a little bit better - but write IO is factor 3-4 slower and I really don't think that's normal. Also machine LOCKS and runs with 97% iowait on operations such as tar, rsync etc. compared to 25-35% on machines with single SCSI drives.
Sounds like the level of raid your using might not be the best option (Maybe 0?)
DEFINITION Disk striping with parity
COMMENTS Parity data is distributed across with parity all drives in the volume. Normal data and parity data are written to drives in the stripe set in a round-robin algorithm, similar to RAID 4.
RAID 5 is multithreaded for both reads and writes because both normal data and parity data are distributed round-robin. This is one reason why RAID 5 offers better overall performance in server applications than either RAID 3 or 4. Random I/O benefits more from RAID 5 than does sequential I/O, and writes take a performance hit because of the parity calculations. RAID 5 is ideal for database applications.
porcupine 05-25-2004, 05:54 PM Might want to look for raid 0+1
Hardware locking is a problem I've run across, when building a 4x73.4gb scsi u160 10k rpm server, using an adaptec card (adaptec 2100S card). Never could really fix it. Adaptec offered no support (OEM, didn't need the box, only needed a *functional card*)
Move over to a LSI card, and everything went fine, still not blazing fast, but RAID5 rarely is. If you want more speed, try 0+1 is all I can say, RAID 5 with 3-4 drives will rarely outrun a single SCSI server (unless you have an extremely expensive card, eg. a 4 channel one).
KarlZimmer 05-25-2004, 05:54 PM Sounds like they may be using a cheap RAID card. Reads and seeks should be faster in RAID 5 than on a single drive while writes will be slightly slower, but nowhere near 3-4 times slower. RAID 5 is normally pretty good for web servers because of the improved read performance as well as the data safety.
bqinternet 05-25-2004, 07:47 PM For high performance RAID, we've found that using software RAID 1 mirroring (vinum on FreeBSD) with 2 SCSI drives on seperate LSI SCSI cards provides a good deal of performance and redundancy.
Steven 05-25-2004, 09:54 PM we are having the same problem, we have a server and the raid is really slow
freeflight2 05-25-2004, 10:11 PM linuxguy: I can feel your pain... I'll call them tomorrow and will ask them to give me my prorated fee back - server is not useable in this form... it'd make more sense to get a celeron with 2x200GB software raid.... might be even faster.
Also did you recognize that the controller only has 32MB?
mainarea 05-25-2004, 10:15 PM Have you contacted them about it? It may be a simple configuration problem. What does your "top" reading look like?
- Matt
Steven 05-25-2004, 10:21 PM Originally posted by freeflight2
linuxguy: I can feel your pain... I'll call them tomorrow and will ask them to give me my prorated fee back - server is not useable in this form... it'd make more sense to get a celeron with 2x200GB software raid.... might be even faster.
Also did you recognize that the controller only has 32MB?
what controller card do you have?
porcupine 05-25-2004, 10:28 PM Originally posted by freeflight2
linuxguy: I can feel your pain... I'll call them tomorrow and will ask them to give me my prorated fee back - server is not useable in this form... it'd make more sense to get a celeron with 2x200GB software raid.... might be even faster.
Also did you recognize that the controller only has 32MB?
The popular Mylex/LSI Raid controller we last used only has 16mb on it, whats wrong with that? Not every controller is going to come overly decked out with RAM, its a controller card, not a motherboard :).
freeflight2 05-25-2004, 10:32 PM root# cat /proc/megaraid/hba0/config
v2.00.3 (Release Date: Wed Feb 19 08:51:30 EST 2003)
PERC 3/SC
Controller Type: 438/466/467/471/493/518/520/531/532
Controller Supports 40 Logical Drives
Controller capable of 64-bit memory addressing
Controller is not using 64-bit memory addressing
Base = f8828000, Irq = 17, Logical Drives = 1, Channels = 1
Version =196T:3.33, DRAM = 32Mb
Controller Queue Depth = 254, Driver Queue Depth = 126
support_ext_cdb = 1
support_random_del = 1
boot_ldrv_enabled = 1
boot_ldrv = 0
boot_pdrv_enabled = 0
boot_pdrv_ch = 0
boot_pdrv_tgt = 0
quiescent = 0
has_cluster = 0
Module Parameters:
max_cmd_per_lun = 63
max_sectors_per_io = 128
("Controller is not using 64-bit memory addressing" => could that be the issue?)
"top" says 97% iowait and 0% idle while doing e.g. a "tar cf usr.tgz /usr" - which takes about 9 minutes (compared to 2.5 minutes on a TP/TC box with SCSI without raid and only 25-30% iowait, after a fresh reboot, both are new boxes from TP/TC)... pretty shocking...
linux 2.6 seams to be 8 to 10% faster than 2.4 with that setup, 4GB RAM box.
I send them a support ticket this morning, 12h ago and nobody has even touched it.. I thought the support would be better for the "Professional Series", guess not.
Also, for some reason they are mixing different HD brands:
root# cat /proc/megaraid/hba0/diskdrives-ch0
Channel: 0 Id: 0 State: Online.
Vendor: FUJITSU Model: MAP3735NC Rev: 5608
Type: Direct-Access ANSI SCSI revision: 03
Channel: 0 Id: 1 State: Online.
Vendor: FUJITSU Model: MAP3735NC Rev: 5608
Type: Direct-Access ANSI SCSI revision: 03
Channel: 0 Id: 2 State: Online.
Vendor: SEAGATE Model: ST373307LC Rev: DS09
Type: Direct-Access ANSI SCSI revision: 03
Channel: 0 Id: 3 State: Online.
Vendor: SEAGATE Model: ST373307LC Rev: DS09
Type: Direct-Access ANSI SCSI revision: 03
(btw. the last 'cat /proc' took about 10 seconds to complete while a mysqlcheck --repair was running)
Steven 05-25-2004, 11:58 PM downgraded to 2.4.26 and it sped up alot
freeflight2 05-26-2004, 12:30 AM linuxguy:interesting...so u say 2.4.26 is faster than their stock 2.4.21 and 2.6.6? how much did it speed up? any hard stats?
how does your system compare right now to a dual xeon box with 10kSCSI?
Steven 05-26-2004, 12:45 AM well its not the greatest speed up but still:
before:
hdparm -Tt /dev/sda
/dev/sda:
Timing buffer-cache reads: 1752 MB in 2.00 seconds = 876.00 MB/sec
Timing buffered disk reads: 56 MB in 3.05 seconds = 18.36 MB/sec
after:
hdparm -Tt /dev/sda
/dev/sda:
Timing buffer-cache reads: 1672 MB in 2.00 seconds = 836.00 MB/sec
Timing buffered disk reads: 120 MB in 3.00 seconds = 40.00 MB/sec
the before is with the stock rhe kernel, the after is with 2.4.26
Steven 05-26-2004, 12:47 AM and to also note the iowait is way down. and loads are 1/2 what they were with stock rhe kernel.
Steven 05-26-2004, 12:49 AM not being able to edit posts is really killing me:
on the server there is a forum:
Currently Active Users: 399 (82 members and 317 guests)
and the loads are staying under 1.30 this is with the forums and various other smaller sites. the hd test was taken under full load.
freeflight2 05-26-2004, 01:24 AM I am getting 74-76MB/sec avg. buffered disk reads with a 2.6.6 kernel and 74-78MB/sec with TP's 2.4.21smp...a machine without any raid but with the same disks reports ~67MB/secs - I then rebooted with mem=128M and the results were similar (around 75MB/sec).
I guess I could live with its' read+seek performance but write performance remains horrible, especially since it locks up the (mysql)-machine for up to 15 to 20secs which will result in: a.) locked reads as well and b.) users refreshing the pages etc...
Now here is also an other strange thing: I tested the box multiple times, 3 times each with hdparam -tT - with the 2.4.21 kernel I got two times 75MB/secs and one time 26MB/sec... I repeated it multiple times and it stayed at 26MB/secs until I rebooted then it was back to 75MB again - although the box was always 100% idle (and with 4GB RAM)... strange...
Just read your update: does your forum appear to be "frozen" sometimes? that's my main concern (right now SM servers are serving all the traffic but I'll switch 70M pageviews/month to this set of TP servers on june 1st and I don't wanna ruin it)
Steven 05-26-2004, 01:35 AM when you test how much traffic do you have?
freeflight2 05-26-2004, 01:41 AM Traffic=0... like the holy virgin.
did you encounter any mysql locks so far (mysqladmin processlist)?
Steven 05-26-2004, 01:42 AM nope. i tested witht he box fully loaded.
KDAWebServices 05-26-2004, 04:40 AM We run LSI MegaRAID cards in most of our virtual hosting boxes, most with RAId1, but our mail and mysql servers run RAID5 and we don't have any significant problems, sure the write speed is poor compared to a single drive, but that's always going to be the case, as it has to calculate the distribution of the data and the parity etc. but not getting any major problems like it seems you're getting.
freeflight2 05-27-2004, 06:09 PM I gave up.... I just ordered a TC-box with 3 single HDs... I will have the site use both as master/slave DBs and compare iowait etc.
I also tried the 2.4.26 kernel but I am pretty sure that 2.6 performs better (megaraid2)... at least with the tests I performed. If you find anything to tune the megaraid let me know and I'll do the same - good luck! ;)
IRCCo Jeff 05-27-2004, 09:35 PM Do an offsite backup and have them switch your RAID to "0." It will offer no redundancy, but should speed things up nicely.
RayWomack 06-04-2004, 09:10 PM Dell has crappy Raid solutions. Part of the probelm is they monkey with the OEM bios, and secondly they disable write back caching in advanced perfomance (Windows, do not know Linux) unless the controller card comes with a backup battery.
How do I know this? I have $5,000 worth of their hardware in house that is taking a one way ticket to Round Rock via "Brown" first thing Monday morning.
I have benched their stuff using IO Meter, and have developed a P4 SATA Raid 5 package on a 64-bit PCI that will do 180 Mbps read and 60 Mbps write.
I can copy to the a 540 meg file (Windows Enterprise file) to the same location in 1:21.
freeflight2 06-04-2004, 10:27 PM random writes (DB) are actually "OK", comparable to using 2 disks and spreading, putting *.MYD on the first and *.MYI on the second... so I ordered 2 more of these boxes - IMHO best ratio in terms of redundancy/RAM/disk space+2TB transfer @ TP - greetings to "Mr. Brown" ;)
RayWomack 06-04-2004, 10:58 PM Yeah, that's too bad,
I'm just finishing most of the benchmarking on SATA machines, and I have gone through the Dell's pretty well. It wasn't very hard to find solutions that were better than their stock 90 mbps read and 8 mbps write.
I am enclosing a live photo concluding nearly 3 weeks of benchmarking different cards/mobo's/etc.
http://psf.biz/images/raid.jpg
Only thing missing is the beer can on top of the chassis!
I would be insterested in having you run an IO Meter on your machine. I would like to see how the SATA Raptors compare.
Steven 06-08-2004, 02:40 AM We resolved this issue by loading up a server with 2.4.26 and disabling HT in the bios.
TLott 06-08-2004, 02:44 AM :)
Went from crippling loads of 15 with a light rsync process to being able to tar my entire home directory with a load of less than 1.
Output of hdparm -Tt /dev/sda1 jumped from 9 MB/s on the second result to 60. :)
Poweredge 1600SC with an LSI Logic RAID card under RHE.
mhalligan 06-10-2004, 11:27 PM Originally posted by freeflight2
what makes you think that? their 'photo' on the order page shows dell boxes.
It also says "Powered by Dell": http://theplanet.com/control/pro/p2800sr5x_details.html
How can I check if it's a dell?
Dells are extremely cheap servers.
Their raid cards are by far the worst I've ever used... The only thing as bad is their broadcom NICs.. horrible.
Raid 5 on 4 drives, expect to get what, a 90% increase in speed, something like that.. Unless they're using the LSI cards. Megaraids are bad.. Check the drivers.. Use at least 1.18j (I think 1.18h is out now). Run megamgr and see what the read and write settings are (cached is better than writethrough for the read-types). Also see how much cache is on it.
Plain and simple, Dell sucks.
porcupine 06-10-2004, 11:43 PM Originally posted by mhalligan
Dells are extremely cheap servers.
Their raid cards are by far the worst I've ever used... The only thing as bad is their broadcom NICs.. horrible.
Raid 5 on 4 drives, expect to get what, a 90% increase in speed, something like that.. Unless they're using the LSI cards. Megaraids are bad.. Check the drivers.. Use at least 1.18j (I think 1.18h is out now). Run megamgr and see what the read and write settings are (cached is better than writethrough for the read-types). Also see how much cache is on it.
Plain and simple, Dell sucks.
Man do you ever need to get your facts straight. You typically will *not* get a 90% "increase in speed" using raid 5, it's completely depending on file sizes, os, caching, etc. RAID 5 is designed (as discussed in this thread) for redundancy, not speed. LSI's are some of the better RAID cards out there, and by no means the cheapest either. The 16MB LSI RAID card (their smallest one I believe) outruns the Adaptec 128MB 2100S to say the least.
SMachiz 06-11-2004, 01:05 AM If the original poster is still having this issue, have them span the raid across multiple channels. Looking at your specifics, all 4 drives are on channel 0. This is bad because it means all 4 drives are sharing either 160 or 320MBit, depending on what type of raid you have installed. Ask them to move 2 of them to the second channel if your raid card supports it (and it should...)
Dell machines come with CERC (cost effictive riad solutions) or the ???????????????
They Suck, Suck, Suck, Suck.
mhalligan 06-11-2004, 02:57 PM Originally posted by porcupine
Man do you ever need to get your facts straight. You typically will *not* get a 90% "increase in speed" using raid 5, it's completely depending on file sizes, os, caching, etc. RAID 5 is designed (as discussed in this thread) for redundancy, not speed. LSI's are some of the better RAID cards out there, and by no means the cheapest either. The 16MB LSI RAID card (their smallest one I believe) outruns the Adaptec 128MB 2100S to say the least.
Raid5 over 4 drives should get you roughly 90% write increase, when you factor in the metadata overhead. THat's just a normal rule of thumb, and one that's usually applied true when using quality components.
And the ideae of LSI being some of the better raid cards out there is a joke. I've deployed hundreds of them (against my will) and thousands of vortex cards, as well as hundreds of adaptec and mylex cards. LSI is junk. High failure rate, dumbed-down interface, and their one "benefit" is that they're supposedly r eally forgiving when swapping failures.. Supposedly. Runbook procedure at my last full-time gig was to backup daily, and if you lost a raid card, put a new one in, and rebuild, because we were about 5/50 in terms of the lsi actually recovering the volume from disk.
Lsi is typical of hardware to come with a dell. It's a low-end piece of junk touted as an upper-midrange piece of hardware.
freeflight2 06-11-2004, 06:11 PM Raid5 over 4 drives should get you roughly 90% write increase
that's what I expected as well...at least not a decrease and I don't think I am a complete idiot, especially since the concept of raid5 is pretty straightforward.
one of these single drive gives me about 25MB/sec write performance (out of my head... somewhere between 20-30MB/sec) - I got about 6.5-7MB write performance last time I benchmarked the raid5 (copy from cached file in ram to disk)
You don't have to be a genious to see that there is something wrong... what's even worse: taring -c huge files (10GB+) locks the machine pretty badly (mysql slave gets very very slow replicating) - can a competent person from ThePlanet please comment on that?
(orbit support was not helpful)
Steven 06-11-2004, 06:19 PM Originally posted by freeflight2
that's what I expected as well...at least not a decrease and I don't think I am a complete idiot, especially since the concept of raid5 is pretty straightforward.
one of these single drive gives me about 25MB/sec write performance (out of my head... somewhere between 20-30MB/sec) - I got about 6.5-7MB write performance last time I benchmarked the raid5 (copy from cached file in ram to disk)
You don't have to be a genious to see that there is something wrong... what's even worse: taring -c huge files (10GB+) locks the machine pretty badly (mysql slave gets very very slow replicating) - can a competent person from ThePlanet please comment on that?
(orbit support was not helpful)
We are seeing 45+ on a raid5 setup now. We tar'ed an entire /home directory and the load never got past 1.0
freeflight2 06-11-2004, 08:29 PM lg: u mnetioned that you disabled ht in the bios - do you think that might have caused the slow writes?
Steven 06-11-2004, 08:32 PM Originally posted by freeflight2
lg: u mnetioned that you disabled ht in the bios - do you think that might have caused the slow writes?
Well HT creates alot of IO overhead most of the time in my experience. Once we disabled HT the server got alot more stable, and the writes became real nice.
Nossie 06-25-2004, 05:26 AM Hi,
I'm having the same problems with my megaraid card.
I have a LSI MegaRaid 150-6 SATA card with six Maxtor 160GB SATA drives attached in a raid10 config.
I see performance differences between different kernels / drivers.
With kernel 2.6.7-mm1 i get a throughput of ~60 Mb/sec for reads and ~14 Mb/sec for writes. System load (iowaits) is high during writes.
With kernel 2.4.25 and the megaraid2 driver i get a throughput of ~85 Mb/sec for reads and 20 Mb/sec for writes. iowaits are a lot better with this kernel/driver.
With both kernels I experience lockups during heavy file writes (file transfers from a client, or from another ide disk).
I can live with the read/write speeds, but I can't live with the system lockups :(
I hope someone will find a fix for this, because it drives me crazy!
system specs:
cpu : Athlon 2600+ (1900Mhz)
mobo : MSI-delta (nForce2)
ram : 1 Gb
*raid card is in a 32bit pci slot
Nossie 06-25-2004, 10:25 AM I was looking for clues and found the following on the Linux Kernel Mailinglist.
alan pearson writes:
> On 2.6 the iowait jumps to around 70%, while 2.4 on
> both tests it is firmly zero.
The 2.4 kernel lumps iowait into idle, so you
won't see iowait on a 2.4 kernel.
> On disk read, I'm loosing 30 Mb/sec of bandwidth PER
> DISK, compared to 2.4.20.
> I've tried using both the deadline and as ioschedulers
> but no difference.
>
>
> Under real conditions (ie our application running
> which reads from all the disks simultaneously) on
> 2.6.4, the system performance is around 1/3 of 2.4.20)
>
> Summary MB/Sec :
>
> dd if=x dd if=/dev/zero
> 2.4 64 35.6
> 2.6 30.34 35.9
Well, that looks serious, but unfortunately you
can't tell what the iowait was on the 2.4 kernel.
Only the 2.6 kernel provides this information.
So that's why the 2.4.x kernel seems to have a much lower iowait.
freeflight2 06-27-2004, 09:39 PM thanx Nossie, I have 3 of these boxes running as mysql masters/slaves and they actually didn't give me any problems so far since 1 month... 'DB IO' is good/fair under 2.6.6 and 300+ sql requests/second . I am doing DB backups nightly with rsync --bwlimit=8192 which prevents the machine from locking up.
I hope/I am confident the kernelguys will get the 2.6 up to speed.
Steven 06-27-2004, 09:58 PM 2.6 Helps the boxes alot.
Nossie 06-28-2004, 09:56 AM I have done some more testing/tweaking.
If i change the write policy from 'write through' to 'write back' the system is a bit more responsive during writes.
The max. throughput speed also increased from 20 to 27 for writes. Read throughput is the same (as expected).
I tried kernel 2.6.7-mm2 and to my surprise it gave the same results as kernel 2.4.26.
The only problem with kernel 2.6.7-mm2 is that my Ethernet card (intel Gbit, e1000 driver) has a maximum throughput of 40kb/sec. so I can’t really use 2.6.7-mm2 :(
Greetz,
Nossie
lostpacket 06-28-2004, 11:05 AM Run while you can, I just got 3 SATA port Mega Raid cards and they SUCK beyond belief.
Took over 15 hours to format an 80GB SATA drive.
freeflight2 06-28-2004, 01:18 PM Nossie: congrats to the 40kb/sec for the gigabit card ;) how did you switch the write policy from write through to write back?
freeflight2 06-28-2004, 01:23 PM megaraid.h says (in the 2.6 kernels and megaraid2.h in the 2.4s):
#define WRMODE_WRITE_THRU 0
#define WRMODE_WRITE_BACK 1
so write back seems to be the default
freeflight2 08-14-2004, 10:35 PM WRITE_THRU is actually the default... I was ready to cancel all these raid5 servers again but then used dell's MegaMGR to set the adapter to WRITE_BACK and CACHED_IO => everything is fast now, the RAID5 with 4 disks feels like 3 single disks when it comes to heavy mysql inserts+selects, a myisamchk of a 65G file took 1.5h instead of 3h+
cat /proc/megaraid/hba0/raiddrives-0-9 says now:
Span depth: 1, RAID level: 5, Stripe size: 64, Row size: 4
Read Policy: Adaptive, Write Policy: Write back, Cache Policy: Cached IO
(yes, I am aware of the dangers with write back but accept it...)
freeflight2 09-22-2004, 05:18 PM Update: tuning the IO scheduler also helps, In particular setting antic_expire to 0 if your kernel is using AS (2.6 defaults to AS)... playing with the *_expire values also might help.
right now I have cat /sys/block/sda/queue/iosched/*
0
0 % exit probability
12 ms new thinktime
80488780 sectors new seek distance
1000
500
250
5000
Note: this is a pure mysql server - big thanks to TheLinuxGuy for giving me the hint to check io scheduler settings (under linux 2.4 you have to use elvtune under 2.6 you have to mount none /sys -t sysfs). TPS went up almost 50%!
I'll reboot the box tonite with 'elevator=deadline' as kernel parameter to check if deadline is even better for such a RAID5+random read combination as some are saying.
Thx again to steve, he just saved me tons of $$ and big headaches!
Steven 09-22-2004, 05:24 PM I am glad i could help out :)
|