
|
View Full Version : Too many MySQL queries through PHP?
HavokDH 04-22-2006, 01:34 AM Hi there,
I've been having issues with some shared hosts where my webcomic keeps getting bumped off for "excessive use of resources," which only started to happen when I started advertising the comic on other sites. I don't know of the issue is the number of images & page hits, poor coding, or both.
Assuming it's poor coding on my part, I'm trying to pinpoint what exactly I'm doing wrong. One question I have is if having multiple MySQL select queries on a given php page would be the culprit.
On my main page, for example, I have queries to pull latest updates, a banner ad, the last five blog posts, upcoming schedule, and a list of links to other webcomics. The data is pulled through my DAO layer, organized into arrays through my ADO layer, and formatted through my UI layer with templates.
Writing that all out, I'm realizing that's a lot to ask to display the info. Would that be the cause of the resource issue I'm currently running into?
Thanks for any help you can provide. Please be gentle on the noob. : )
Doug Hills
PS: The website is (temporarily) at pnhcomics.dreamhosters.com if you want to see what I'm talking about.
Vdevelopers 04-22-2006, 02:13 AM I would make sure to unset variables right after they're done being used. Or, reuse variables. As well, make sure to close all mysql connections at the end of their use even though PHP is supposed to close them itself (it doesn't sometimes.)
Edit: It sounds like your website is very database driven - try using the LIMIT clause when you can and also do as much data processing on the MySQL server as you can as opposed to in your PHP script.
HavokDH 04-22-2006, 02:30 AM Thanks for the reply.
I did make sure to mysql_close at the end of each page. And I do try to reuse variables as much as possible.
Most of the queries I do make have a limit on them, except (now that I think about it) on the comic pages themselves.
On the Chibi Cheerleaders page (pnhcomics.dreamhosters.com/chibi), for example, I called a query to pull all the pages from the db that correspond to that comic. I then formatted the data into the TOC you see on the right hand side.
It's probably not the most efficient means of doing it, but it was the best I could come up with. I'm game for other suggestions. : )
Azavia 04-22-2006, 11:26 AM All sites come to a point where they're just too active for a shared environment. If you code pretty well and all your queries are about as optimized as they can be, I'd say that's just the fact of a growing web site :)
RH Swaroop 04-22-2006, 11:28 AM I would also recommend caching of some queries which don't change too often. Or something like this. Have updates every 5 minutes. Make a script which queries every 5 minutes and include that on your webpage. Imagine if you get 201 hits every 5 minutes, you will save 200 queries every five minutes.
Again, that's just an example. Do it the way it suits. If you are not sure, feel free to post here in detail, we'll help :)
Azavia 04-22-2006, 11:37 AM I wonder if mysql on that server is set up for caching, as that might help. not that it is in your control but that can often improve how an intensive script runs
HavokDH 04-22-2006, 11:42 AM How would I go about caching a query? I didn't think that could be done. You're right, that would certainly reduce the load.
Edit: Oh, if this helps, I'm trying out both HostGator and DreamHost. HostGator suspended me last night for "excessive resource use", so currently the site is hosted on Dreamhost.
I'm pretty sure I dotted my i's and crossed my t's programming-wise. Do you think I put in too much overhead using a third layer (ADO) and/or formatting?
Azavia 04-22-2006, 11:58 AM How would I go about caching a query? I didn't think that could be done. You're right, that would certainly reduce the load.
Edit: Oh, if this helps, I'm trying out both HostGator and DreamHost. HostGator suspended me last night for "excessive resource use", so currently the site is hosted on Dreamhost.
I'm pretty sure I dotted my i's and crossed my t's programming-wise. Do you think I put in too much overhead using a third layer (ADO) and/or formatting?
Well, it's up to the host, not you. Perhaps go with a host who caters to programmers. With a properly-optimized mysql your script may run a bit better.
Otherwise, perhaps look into a vps or low-end dedicated where you can control everything.
I'm sure there's some overhead, but I'm pretty sure it runs a lot better than if it were made by a novice PHP 'programmer' with no layers whatsoever and no optimization whatsoever. :)
HavokDH 04-22-2006, 12:23 PM I probably will look into a VPS. I was wary because I only started to have this issue when I advertised the comic, and had a few thousand new visitors. I think it's safe to say the numbers will go down once the advertising stops.
But, if I want to have any kind of success with this thing, it'd probbaly be more prudent to switch to a VPS, as opposed to, say, stop advertising. :) I'll do some research on the VPS forum to see which host would be the best fit for me.
Thanks for the help, everyone!
brianoz 04-22-2006, 01:19 PM If your site gets hit a lot you may want to cache your home page and only regenerate it (ie run the mysql queries etc) every 5-15 minutes or so. The code for this is pretty trivial, just use the ob_ functions to rerun the queries and save the buffered output into the cache if the cache is older than 5-15 mins, or otherwise just display the cache. If your homepage gets hit a lot this will save you a lot.
brianoz 04-22-2006, 01:58 PM ------- double post, please ignore.
HavokDH 04-22-2006, 02:27 PM Hm...okay, that's intriguing. Even if I do go to a VPS, that could still be useful.
Any suggestions for tutorials? I've seen the basic definitions for the ob_functions on the PHP website, and a couple of articles from about five years ago, but that's about it.
And how would I go about checking the cache every 5-15 minutes? Would that be considered a cron job (which, admittedly, I've also never used)?
sasha 04-22-2006, 02:54 PM It would help if you could post table structures and some queries so we can get better idea if you are doing something wrong.
Mysql indexes can do miracles for query speed and efficiency. Do you use them?
Someone mentioned using LIMIT, that is good suggestion.
Avoid making query, then processing that query results in php and then making some more queries. Often you can fetch everything you need in single query.
What is your daily BW usage? For an example if it is less then 500MB and your site still has major impact on server performance, chances are you are doing something wrong.
HavokDH 04-22-2006, 09:21 PM I'll post some code examples hopefully a little later tonight. But as far as traffic goes, I was averaging about 2000-4000 unique users a day with about 300,000 hits and about 4-5GB of bandwidth. Prior to the advertising, I was averaging about 400-500 unique visitors with about .15GB of bandwidth without any problems/warnings at all.
Out of curiosity, does anyone have a suggestion for simulating heavy traffic on a local windows machine using apache?
kensplace 04-22-2006, 11:00 PM Easiest way short term is just to cache the pages, and update them (ie delete the cached page) when the page is updated with new content.
Simply show the cached version if it exists, if not show the 'real' page and re-create the cached page at the same time.
When updating the page, delete the cached version.
Meanwhile, study up on optimising your queries, tables and code.
brianoz 04-23-2006, 03:40 AM Any suggestions for tutorials? I've seen the basic definitions for the ob_functions on the PHP website, and a couple of articles from about five years ago, but that's about it.
Check around on sitepoint.com, I know I've seen something. It may have been in Harry Fueck's PHP Anthology from sitepoint.com (books, recommended).
And how would I go about checking the cache every 5-15 minutes? Would that be considered a cron job (which, admittedly, I've also never used)?Probably the simplest way is just to check the age of the cache file via stat - if it exceeds 15 minutes, just regenerate. Another way would be to remove the cache file from cron and regenerate if it doesn't exist, but that's inferior to the age test method. Regenerating if it's aged will require you lock the file before regenerating - don't miss that step, or your cache writes could compete/obliterate each other. Another good solution is to call the homepage via wget from cron with a special "rebuild" option, eg "wget www.domain.com/index.php?rebuild > /dev/null".
I still like the first (stat-based) solution, just don't forget to lock.
HavokDH 04-23-2006, 03:09 PM I wrote down as detailed an example of my coding as possible to get an idea of what I may be doing wrong. Looking at it, I probably made things a LOT more compliated than I needed to, but this was the best I could come up with.
It's fairly long, so I put it in a text file, which can be seen at http://pnhcomics.com/method.txt
I used what would be the most frequently hit page (and possibly the most possible cause of the problem), which would be the comic page itself.
I had a script for the banner ad on the page, but I'll be dropping it in favor of phpAds, once I learn how to use it.
Be gentle. :)
sasha 04-23-2006, 03:38 PM I would try time that query and see how long it takes. Then I would add indexes on tCategories.sCatDir and tImages.bActive and see if that makes any difference in speed.
Another thing I would check out is template parse() function. I saw some parsing functions in the past that would reparse template file over and over for each key, value pair.
If none of that helps, that menu portion of the page, is not something that changes often and it should not be too much of problem to simply use parsed html rather then generating it all the time.
HavokDH 04-23-2006, 04:40 PM Query time from the dreamhost server using phpMyAdmin after indexing tImages.bActive (first 30 records): 0.0051 mean average (12 tries, removing highest and lowest values)
I tried to add an index to sCatDir, and I got this error:
#1170 - BLOB/TEXT column 'sCatDir' used in key specification without a key length
Oh, and if I do make a parsed html TOC, what would be the best method to fetch the image, next, and previous records?
timdorr 04-23-2006, 04:43 PM Learn how to use and read EXPLAIN. It is your #1 tool for MySQL optimizations. Based on your query, I'm guessing your causing a lot of filesorts (which are very very bad). Look into adding appropriate multi-column indexes for your queries, in particular the ORDER BY columns used.
timdorr 04-23-2006, 04:45 PM Query time from the dreamhost server using phpMyAdmin after indexing tImages.bActive: 0.0051 mean average (12 tries, removing highest and lowest values)
They're using query caching, which means the query isn't getting run, the data is just being pulled straight from memory each time. As a result, it's super-fast, no matter what you do. You can add SQL_NO_CACHE to your SELECT statement to ensure it's running fresh each time for testing speeds. Like so:
SELECT SQL_NO_CACHE id, name FROM customer;
HavokDH 04-23-2006, 05:06 PM Adding the SQL_NO_CACHE to the query gave me an average of 0.00654 (same deal...twelve tries, removed highest & lowest value. I now fetched the first 100 records instead of the first 30). What I noticed was the first two queries were slow (0.0107 & 0.0193). The rest were about the same as before (minus one that was a negative value).
I added indices to the ORDER BY fields (iPageNo & iChapNo), and calculated an average of 0.00517.
I'll look into the EXPLAIN function. Thanks for the heads up!
HavokDH 04-23-2006, 05:14 PM running the SQL query with EXPLAIN, I have this data:
id = 1
select_type = SIMPLE
table = tImages
type = ALL
possible_keys = bActive
key = NULL
key_len = NULL ref = NULL
rows = 143
Extra = Using where; Using filesort
id = 1
select_type = SIMPLE
table = tCategories
possible_keys = eq_ref
key = PRIMARY
key_len = PRIMARY
ref = db_site.tImages.iCatID
rows = 1
Extra = Using where
HavokDH 04-23-2006, 06:33 PM Oh, and I found out what I did wrong with the sCatDir field. I changed it to varchar(25) and it indexed no problem. :)
Saeven 04-23-2006, 07:54 PM I would make sure to unset variables right after they're done being used. Or, reuse variables. As well, make sure to close all mysql connections at the end of their use even though PHP is supposed to close them itself (it doesn't sometimes.)Irrelevant really, you would have to generate QUITE a few variables to attain such a level.
I did make sure to mysql_close at the end of each page.This is done automatically for you at the end of a page's execution either way.
I think your culprit would likely be your queries themselves. Please post the output of an EXPLAIN on one of your queries, and the table structure of the tables your queries are using. Also include the amount of rows in the tables being used.
HavokDH 04-23-2006, 08:01 PM Hi Saeven,
The tables in question can be seen here: http://pnhcomics.com/method.txt (though the fields that were originally tinytext have recently been changed to varchars)
The results from the EXPLAIN query in question can be seen a few comments up. :)
Saeven 04-23-2006, 08:12 PM Please show me the result of:
SHOW CREATE TABLE tablename;
And SELECT COUNT(*) FROM table;
for each table.
HavokDH 04-23-2006, 08:23 PM Table tImages
SQL query: SHOW CREATE TABLE tImages;;
Rows: 1
CREATE TABLE `tImages`
(`iID` int(11) NOT NULL auto_increment,
`iCatID` tinyint(4) NOT NULL default '0',
`iChapNo` float NOT NULL default '0',
`iPageNo` tinyint(4) NOT NULL default '0',
`sImgTitle` varchar(100) NOT NULL,
`sImgFile` varchar(100) NOT NULL,
`sImgComments` text NOT NULL,
`bActive` enum('y','n') NOT NULL default 'y',
PRIMARY KEY (`iID`),
KEY `bActive` (`bActive`),
KEY `iChapNo` (`iChapNo`,`iPageNo`))
ENGINE=MyISAM
DEFAULT CHARSET=latin1
COUNT(*) : 190
-------------------------
Table tCategories
SQL query: SHOW CREATE TABLE tCategories;;
Rows: 1
CREATE TABLE `tCategories`
(`iID` tinyint(4) NOT NULL auto_increment,
`sCatName` varchar(100) NOT NULL,
`sCatDir` varchar(25) NOT NULL,
`sCatAvatar` varchar(25) NOT NULL,
`sCatLogo` varchar(25) NOT NULL,
`sCatDesc` text NOT NULL,
`sCatCSS` text NOT NULL,
`bHasChapters` enum('y','n') NOT NULL default 'y',
`bHasImages` enum('y','n') NOT NULL default 'y',
`bActive` enum('y','n') NOT NULL default 'y',
PRIMARY KEY (`iID`),
KEY `sCatDir` (`sCatDir`))
ENGINE=MyISAM
DEFAULT CHARSET=latin1
Count(*): 7
Saeven 04-23-2006, 08:34 PM I don't have any data to run against, but try this query, please show explain:
SELECT * FROM tImages
WHERE tImages.bActive = 'y'
AND iCatID IN (
SELECT iID FROM tCategories WHERE sCatDir = ?
)
ORDER BY iChapNo, iPageNo
You are using mySQL 4.1?
HavokDH 04-23-2006, 08:36 PM Looks like dreamhost is using 5.0.18, but I don't know for certain how long I'll be sticking with them.
Results: (sCatDir = 'chibi')
id = 1
select_type = PRIMARY
table = tImages
type= ALL
possible_keys = bActive
key = NULL
key_len = NULL
ref = NULL
rows = 143
extra = Using where; Using filesort
id = 2
select_type = DEPENDENT SUBQUERY
table = tCategories
type = unique_subquery
possible_keys = PRIMARY,sCatDir
key = PRIMARY
key_len = 1
ref = func
rows = 1
extra = Using index; Using where
Execution time seems to be averaging 0.0050 (rough estimate)
Saeven 04-23-2006, 08:51 PM You need to add an INDEX on tImages.iCatID, then run
ANALYZE TABLE tImages
Please post explain after this is done.
Saeven 04-23-2006, 09:25 PM HavokDH - got that explain result ? Won't be online for much longer..
HavokDH 04-23-2006, 09:32 PM Yep, sorry...just got back from outside.
Here's the results:
id = 1
select_type = PRIMARY
table =tImages
type = index
possible keys = bActive
key = iChapNo
key_len = 5
ref = NULL
rows = 143
extra = Using where
id = 2
select_type = DEPENDENT SUBQUERY
table = tCategories
type = unique_subquery
possible keys = PRIMARY,sCatDir
key = PRIMARY
key_len = 1
ref = func
rows = 1
extra = Using index; Using where
Didn't see any change in execution time. But just to show how out of the loop I am, I didn't know that MySQL finally allowed nested queries. :)
Saeven 04-23-2006, 09:43 PM That's as optimized as it'll get. You wouldn't see the results just now because your dataset is so small, but I posted these messages to perhaps help you onto the right path where optimization is concerned. You should do this with EVERY one of your queries always - I spent about a good 3 weeks of non-stop optimization with Auracle (one of our projects) for example (it is a large scale system, 65+ tables in BCNF).
Look at a comparison of key elements that were scary in your initial EXPLAIN output:
id=1
type = ALL vs. type = index: All means that mySQL is scanning ALL of your table rows to find the answer set. This is ridiculously bad, as your table size increases, your query time will increase linearly!
key = NULL vs. key = iChapNo : Using no key most always results in an ALL scan. Having a key allows SQL to read a key instead of running through all records one by one.
Extra = Using where; Using filesort; vs. extra = Using where: Filesort is probably the ugliest one around. It actually sorts from a flatfile which can really increase your latency as your set increases. Now that it doesn't write to file anymore, it won't ask for file resources from the OS. A+
The subquery makes use of an index instead of just joining two tables and pulling records that are often duplicated.
Never judge a query by execution time. Always look at what mySQL is making the OS do. Always avoid ALL reads, indexless queries, and filesort and temporary in the WHERE clause.
With this, go toward the mySQL optimization chapter (7.2 or 7.3, I forget) and good luck with future queries/projects.
Cheers.
Alex
HavokDH 04-23-2006, 10:19 PM Thanks very much for the help! Do you believe this would have been a cause for the site bringing the shared server to its knees?
Saeven 04-23-2006, 10:20 PM That single query? no. If all of your queries are using filewrites and temp tables and you are getting massive amounts of hits, perhaps.
HavokDH 04-23-2006, 10:28 PM ah, ok. Because I'm still trying to pinpoint what the problem may have been (SQL coding, PHP coding, or both).
I do need to look through the other tables, make sure there are indicies are in place, and remove the left/inner joins in any other queries.
HavokDH 04-23-2006, 11:32 PM Okay...that's wierd.
I re-ran the explain statement, and the first part is filesorting again.
Edit: I noticed why. There was a Limit tacked on the end. When I removed the limit, it went back to filesorting.
|