Web Hosting Talk







View Full Version : help please. super complicated query maxing the server.


mconnors
09-13-2005, 02:47 PM
This is a gallery modification to a phpbb forums and everytime I run it it causes the server to go to 95%. This page is an extended version of the memberlist page and shows how many photos, views, last upload, ect.

the query...

SELECT
phpbb_users.username,
phpbb_users.user_id,
phpbb_users.user_from,
phpbb_users.user_website,
phpbb_users.user_email,
COUNT(archive_imageLib.unique_id) AS photos,
SUM(archive_imageLib.views) AS views,
SUM(archive_imageLib.downloads) AS down, MAX(UNIX_TIMESTAMP(archive_imageLib.date_added)) AS upload
FROM phpbb_users
LEFT JOIN archive_imageLib ON phpbb_users.username = archive_imageLib.author
WHERE phpbb_users.username = archive_imageLib.author
GROUP BY archive_imageLib.author
ORDER BY photos
DESC LIMIT 0, 50


Explain:
1 - SIMPLE - phpbb_users - index - username - PRIMARY - 3 - NULL -17077 - Using temporary; Using filesort

1 - SIMPLE - archive_imageLib - ref - author - author - 25 - DB.phpbb_users.username - 1 - Using where

Anybody have any ideas? Also it seems any query I use uses filesort, this particular one is Using temporary. I've been trying to write a search script to NOT use fulltext because of the mysql cluster and that seems to be a pretty massive query too.




-- --------------------------------------------------------

--
-- Table structure for table `archive_imageLib`
--

CREATE TABLE `archive_imageLib` (
`unique_id` varchar(25) NOT NULL default '',
`parent_id` varchar(25) NOT NULL default '',
`visible` char(1) NOT NULL default '',
`type` varchar(8) NOT NULL default '',
`downloads` int(11) NOT NULL default '0',
`views` int(11) NOT NULL default '0',
`title` varchar(36) NOT NULL default '',
`author` varchar(25) NOT NULL default '',
`model_release` int(1) NOT NULL default '0',
`property_release` int(1) NOT NULL default '0',
`size` int(11) NOT NULL default '0',
`width` varchar(11) NOT NULL default '',
`height` varchar(11) NOT NULL default '',
`date_added` datetime NOT NULL default '0000-00-00 00:00:00',
`exif_ae` varchar(15) NOT NULL default '',
`exif_aperture_stop` varchar(5) NOT NULL default '',
`exif_comment` text NOT NULL,
`exif_date` varchar(20) NOT NULL default '',
`exif_description` text NOT NULL,
`exif_exposure_comp` varchar(20) NOT NULL default '',
`exif_flash` varchar(20) NOT NULL default '',
`exif_focal_length` varchar(5) NOT NULL default '',
`exif_focal_length_equiv` varchar(5) NOT NULL default '',
`exif_light_metering` varchar(10) NOT NULL default '',
`exif_make` varchar(25) NOT NULL default '',
`exif_model` varchar(25) NOT NULL default '',
`exif_resolution` varchar(25) NOT NULL default '',
`exif_shutter_speed` varchar(5) NOT NULL default '',
`exif_zoom` varchar(10) NOT NULL default '',
`comments` text NOT NULL,
`keywords` text NOT NULL,
`lastViewJPG` datetime NOT NULL default '0000-00-00 00:00:00',
`lastViewZIP` datetime NOT NULL default '0000-00-00 00:00:00',
`last_view_ip` varchar(16) NOT NULL default '',
`last_zip_ip` varchar(16) NOT NULL default '',
`commentsCount` int(12) NOT NULL default '0',
`commentsDate` datetime NOT NULL default '0000-00-00 00:00:00',
`rating` int(3) NOT NULL default '0',
`backup` int(2) NOT NULL default '0',
`license` char(3) NOT NULL default '0',
PRIMARY KEY (`unique_id`),
KEY `parent_id` (`parent_id`),
KEY `author` (`author`),
KEY `title` (`title`),
KEY `date_added` (`date_added`),
KEY `size` (`size`),
KEY `views` (`views`),
KEY `downloads` (`downloads`),
KEY `lastViewJPG` (`lastViewJPG`),
KEY `lastViewZIP` (`lastViewZIP`),
KEY `commentsDate` (`commentsDate`),
KEY `commentsCount` (`commentsCount`),
KEY `type` (`type`),
KEY `index` (`unique_id`,`parent_id`,`type`,`downloads`,`views`,`title`,`author`,`lastViewJPG`,`lastViewZIP`,`date_added`)
) ENGINE=ndbcluster DEFAULT CHARSET=latin1;


this table has about 80,000 records and grows a few hundred a day. I would greatly appreciate any help.

innova
09-13-2005, 04:50 PM
LIMIT 0, 50
Is it necessary to show 50 at a time?

LEFT JOIN archive_imageLib ON phpbb_users.username = archive_imageLib.author
WHERE phpbb_users.username = archive_imageLib.author

Also.. do you need to have a LEFT JOIN here? You realize this would show all the image authors whether or not they are phpBB users.. which doesnt make sense at least to me (since presumably they would have to be forum members to post pics?).

Can you use this instead:

INNER JOIN archive_imageLib ON phpbb_users.username = archive_imageLib.author AND phpbb_users.username = archive_imageLib.author

In that last example you are using a mixture of theta and ANSI-style joins.. I would suggest using ANSI style all the way through to keep things cleaner and to not use the WHERE clause like that in this query.

Finally, I also think that this is severe index abuse:

KEY `parent_id` (`parent_id`),
KEY `author` (`author`),
KEY `title` (`title`),
KEY `date_added` (`date_added`),
KEY `size` (`size`),
KEY `views` (`views`),
KEY `downloads` (`downloads`),
KEY `lastViewJPG` (`lastViewJPG`),
KEY `lastViewZIP` (`lastViewZIP`),
KEY `commentsDate` (`commentsDate`),
KEY `commentsCount` (`commentsCount`),
KEY `type` (`type`),
KEY `index` (`unique_id`,`parent_id`,`type`,`downloads`,`views`,`title`,`author`,`lastViewJPG`,`lastViewZIP`,`date_added`)


You have duplicate index fields in the index named 'index', and you might really sit down and think about what the indexes are actually FOR: for searching against. Therefore, I would remove any of these that are unlikely to be used in a direct query as they are probably just wasting space.

For example: Would you ever search against: size, commentsDate, parent_id?

let me know if that helps.

mconnors
09-13-2005, 06:03 PM
Thanks for the reply, I did change the LEFT JOIN to INNER, it still seems to be using temporary file. But before I was watching top and refreshing now it seems to have gone from 90% to 15% what a difference. 50 records isn't required I can lower that to 20, I was doing the join because I was comparing the DB of images to the DB of users. You are correct that only registered users can upload photos, although not all registered users have photos in imageLib. The index index was dumb, I just added that today hoping it would somehow aleviate some problems but I have removed it. I thought I would need all of them because I am using them for sorting (select all with parent_id sort by lastview,title). I'm not worried about space, speed seems to be the real issue.

Thanks very much-

innova
09-13-2005, 06:07 PM
Awesome!

I think that the indexes can probably be further optimized but that would take somewhat more time and experimentation. I didnt really think that having 50 rows returned was the problem, but it was the first thought I had after reading your query.

My guess from my limited knowledge of your exact situation is that having additional join criteria in the WHERE clause in your original query was really slowing things down.

What does EXPLAIN say now? I bet it doesnt say using 'where' :)

I dont remember the specifics at the moment, but if you are using the WHERE clause I believe mysql will have to scan the whole table anyway even though you JOIN'ed it. By moving that into the main JOIN argument what you are really doing is making a much smaller dataset for mysql to parse through.

One more thing:
I'm not worried about space, speed seems to be the real issue.
Remember, they are related more often than not :)

How are the load times now?

gogocode
09-14-2005, 05:51 AM
Limiting the returned rows will make 0 difference, nil, nada. MySQL still has to run the full query, find all the matching rows and sort them to be able to give you a certain number of them in sorted order. So no difference.



LEFT JOIN archive_imageLib ON phpbb_users.username = archive_imageLib.author
WHERE phpbb_users.username = archive_imageLib.author


You realize this would show all the image authors whether or not they are phpBB users.. which doesnt make sense at least to me (since presumably they would have to be forum members to post pics?).


No, this will show all phpbb users whether or not they are image authors. It's a LEFT join and phpbb_user is on the LEFT and thus all records from the LEFT table are returned whether or not there is a matching table in the right hand table. Still, if you don't need to see phpbb users who havn't posted pictures, an inner join here will speed things up nicely.



KEY `index` (`unique_id`,`parent_id`,`type`,`downloads`,`views`,`title`,`author`,`lastViewJPG`,`lastViewZIP`,`date_added`)

You have duplicate index fields in the index named 'index'


Which fields in that are duplicated in that index? I don't see any.

[edit: that said, it's still not likely an overly useful index to have - the design of MySQL indices should not be undertaken unless you have read up on how they work, or you're just wasting CPU time - http://dev.mysql.com/doc/mysql/en/mysql-indexes.html - read and make sure you understand that entire page before continuing ]

mconnors
09-14-2005, 11:25 AM
explain now says...

1 - SIMPLE - phpbb_users - ALL - username - NULL - NULL - NULL - 17077 - Using temporary; Using filesort
1 - SIMPLE - archive_imageLib - ref - author - author - 25 - DB.phpbb_users.username - 1 -

And this is the current structure of imageLib its the same minus the duplicate index...

CREATE TABLE `archive_imageLib` (
`unique_id` varchar(25) NOT NULL default '',
`parent_id` varchar(25) NOT NULL default '',
`visible` char(1) NOT NULL default '',
`type` varchar(8) NOT NULL default '',
`downloads` int(11) NOT NULL default '0',
`views` int(11) NOT NULL default '0',
`title` varchar(36) NOT NULL default '',
`author` varchar(25) NOT NULL default '',
`model_release` int(1) NOT NULL default '0',
`property_release` int(1) NOT NULL default '0',
`size` int(11) NOT NULL default '0',
`width` varchar(11) NOT NULL default '',
`height` varchar(11) NOT NULL default '',
`date_added` datetime NOT NULL default '0000-00-00 00:00:00',
`exif_ae` varchar(15) NOT NULL default '',
`exif_aperture_stop` varchar(5) NOT NULL default '',
`exif_comment` text NOT NULL,
`exif_date` varchar(20) NOT NULL default '',
`exif_description` text NOT NULL,
`exif_exposure_comp` varchar(20) NOT NULL default '',
`exif_flash` varchar(20) NOT NULL default '',
`exif_focal_length` varchar(5) NOT NULL default '',
`exif_focal_length_equiv` varchar(5) NOT NULL default '',
`exif_light_metering` varchar(10) NOT NULL default '',
`exif_make` varchar(25) NOT NULL default '',
`exif_model` varchar(25) NOT NULL default '',
`exif_resolution` varchar(25) NOT NULL default '',
`exif_shutter_speed` varchar(5) NOT NULL default '',
`exif_zoom` varchar(10) NOT NULL default '',
`comments` text NOT NULL,
`keywords` text NOT NULL,
`lastViewJPG` datetime NOT NULL default '0000-00-00 00:00:00',
`lastViewZIP` datetime NOT NULL default '0000-00-00 00:00:00',
`last_view_ip` varchar(16) NOT NULL default '',
`last_zip_ip` varchar(16) NOT NULL default '',
`commentsCount` int(12) NOT NULL default '0',
`commentsDate` datetime NOT NULL default '0000-00-00 00:00:00',
`rating` int(3) NOT NULL default '0',
`backup` int(2) NOT NULL default '0',
`license` char(3) NOT NULL default '0',
PRIMARY KEY (`unique_id`),
KEY `parent_id` (`parent_id`),
KEY `author` (`author`),
KEY `title` (`title`),
KEY `date_added` (`date_added`),
KEY `size` (`size`),
KEY `views` (`views`),
KEY `downloads` (`downloads`),
KEY `lastViewJPG` (`lastViewJPG`),
KEY `lastViewZIP` (`lastViewZIP`),
KEY `commentsDate` (`commentsDate`),
KEY `commentsCount` (`commentsCount`),
KEY `type` (`type`)
) ENGINE=ndbcluster DEFAULT CHARSET=latin1;

Thanks again for your help. I will read up on indexes again.

innova
09-14-2005, 11:27 AM
Limiting the returned rows will make 0 difference, nil, nada. MySQL still has to run the full query, find all the matching rows and sort them to be able to give you a certain number of them in sorted order. So no difference.

Dont be insane.

Agreed, mysql has to do all the work. The difference in perceived loading time however may have more to do with the fact that instead of 20 rows in your php array, which you then have to loop through and print out a table, you instead have 10,000 rows.

It does indeed make a difference > nil, as far as the user experience goes.

No, this will show all phpbb users whether or not they are image authors. It's a LEFT join and phpbb_user is on the LEFT and thus all records from the LEFT table are returned whether or not there is a matching table in the right hand table.

Actually, its not really valid join syntax, so it doesnt matter if its left or inner. He is trying to use ANSI and theta at the same time there. According to the first part, archive_imagelib IS the left table, but then in the WHERE clause the order is reversed. I cant honestly say what mysql would do, but the fact that it was wrong and my suggestion fixes it shows that :)

As for the index...

Look at his original SQL statement. That is index abuse, having multiple indexes on single columns and then having that last index which covers multiple columns already indexed by single indexes. Whew.

mconnors
09-14-2005, 11:28 AM
PS- this is the structure for phpbb users, NOTE: the engine isn't nbcluster, that actually is a mistake and I have to go back and change it. By default its installed as myisam

CREATE TABLE `phpbb_users` (
`user_id` mediumint(8) NOT NULL default '0',
`user_active` tinyint(1) default '1',
`username` varchar(25) NOT NULL default '',
`user_password` tinytext NOT NULL,
`user_session_time` int(11) NOT NULL default '0',
`user_session_page` smallint(5) NOT NULL default '0',
`user_session_topic` int(11) NOT NULL default '0',
`user_lastvisit` int(11) NOT NULL default '0',
`user_regdate` int(11) NOT NULL default '0',
`user_level` tinyint(4) default '0',
`user_posts` mediumint(8) unsigned NOT NULL default '0',
`user_timezone` decimal(5,2) NOT NULL default '0.00',
`user_style` tinyint(4) default NULL,
`user_lang` varchar(255) default NULL,
`user_dateformat` varchar(14) NOT NULL default 'd M Y H:i',
`user_new_privmsg` smallint(5) unsigned NOT NULL default '0',
`user_unread_privmsg` smallint(5) unsigned NOT NULL default '0',
`user_last_privmsg` int(11) NOT NULL default '0',
`user_emailtime` int(11) default NULL,
`user_viewemail` tinyint(1) default NULL,
`user_attachsig` tinyint(1) default NULL,
`user_allowhtml` tinyint(1) default '1',
`user_allowbbcode` tinyint(1) default '1',
`user_allowsmile` tinyint(1) default '1',
`user_allowswearywords` tinyint(1) NOT NULL default '0',
`user_allowavatar` tinyint(1) NOT NULL default '1',
`user_allow_pm` tinyint(1) NOT NULL default '1',
`user_allow_mass_pm` tinyint(1) default '2',
`user_allow_viewonline` tinyint(1) NOT NULL default '1',
`user_notify` tinyint(1) NOT NULL default '1',
`user_notify_pm` tinyint(1) NOT NULL default '0',
`user_notify_donation` tinyint(1) NOT NULL default '0',
`user_popup_pm` tinyint(1) NOT NULL default '0',
`user_sound_pm` tinyint(1) NOT NULL default '1',
`user_rank` int(11) default '0',
`user_avatar` varchar(100) default NULL,
`user_avatar_type` tinyint(4) NOT NULL default '0',
`user_email` varchar(255) default NULL,
`user_icq` varchar(15) default NULL,
`user_website` varchar(100) default NULL,
`user_from` varchar(100) default NULL,
`user_from_flag` varchar(25) default NULL,
`user_sig` text,
`user_sig_bbcode_uid` varchar(10) default NULL,
`user_aim` varchar(255) default NULL,
`user_yim` varchar(255) default NULL,
`user_msnm` varchar(255) default NULL,
`user_occ` varchar(100) default NULL,
`user_interests` varchar(255) default NULL,
`user_actkey` varchar(32) default NULL,
`user_newpasswd` varchar(32) default NULL,
`user_lastlogon` int(11) default '0',
`user_totaltime` int(11) default '0',
`user_totallogon` smallint(5) default '0',
`user_totalpages` int(11) NOT NULL default '0',
`user_birthday` mediumint(6) NOT NULL default '999999',
`user_next_birthday_greeting` int(11) NOT NULL default '0',
`user_gender` tinyint(4) NOT NULL default '0',
`user_photo` varchar(100) default NULL,
`user_photo_type` tinyint(4) NOT NULL default '0',
`user_zipcode` varchar(10) default NULL,
`user_points` int(11) NOT NULL default '0',
`admin_allow_points` tinyint(1) NOT NULL default '1',
`user_items` text,
`user_effects` varchar(255) default NULL,
`user_privs` varchar(255) default NULL,
`user_custitle` text,
`user_specmsg` text,
`user_trade` text,
`user_warnings` smallint(5) default '0',
`rating_status` tinyint(3) unsigned default NULL,
`avatar_sticky` tinyint(4) NOT NULL default '0',
`user_profile_view` smallint(5) NOT NULL default '0',
`user_last_profile_view` int(11) NOT NULL default '0',
`user_profile_view_popup` tinyint(1) default '1',
`user_view_log` tinyint(4) NOT NULL default '0',
`user_votewarnings` smallint(5) default '0',
`user_allow_profile` tinyint(1) NOT NULL default '1',
`user_clockformat` varchar(10) NOT NULL default 'Clock 1',
`user_inactive_emls` tinyint(1) NOT NULL default '0',
`user_inactive_last_eml` int(11) NOT NULL default '0',
`user_popup_notes` tinyint(1) default '0',
`user_showsigs` tinyint(1) default '1',
`irc_commands` varchar(255) NOT NULL default '',
`user_showavatars` tinyint(1) default '1',
`karma_plus` mediumint(9) NOT NULL default '0',
`karma_minus` mediumint(9) NOT NULL default '0',
`karma_time` bigint(20) NOT NULL default '0',
`user_realname` varchar(50) default NULL,
`user_trophies` int(10) NOT NULL default '0',
`ina_cheat_fix` int(100) NOT NULL default '0',
`ina_game_playing` int(10) NOT NULL default '0',
`ina_last_visit_page` varchar(255) NOT NULL default '',
`ina_games_today` int(10) NOT NULL default '0',
`ina_last_playtype` varchar(255) NOT NULL default 'parent',
`ina_games_played` int(10) NOT NULL default '0',
`user_address_1` tinytext NOT NULL,
`user_address_2` tinytext NOT NULL,
`user_country` tinytext NOT NULL,
`user_city` tinytext NOT NULL,
`user_state` varchar(25) NOT NULL default '',
`user_zip` varchar(15) NOT NULL default '',
`user_phone` varchar(20) NOT NULL default '',
`user_cell_phone` varchar(20) NOT NULL default '',
`user_equipment` text NOT NULL,
`user_comment` text NOT NULL,
`user_generic_comment` text NOT NULL,
`user_profile_tone` varchar(5) NOT NULL default '',
`user_birthdate` datetime NOT NULL default '0000-00-00 00:00:00',
`user_weight` varchar(20) NOT NULL default '',
`user_chest` varchar(20) NOT NULL default '',
`user_waist` varchar(20) NOT NULL default '',
`user_height` varchar(20) NOT NULL default '',
`user_hair` varchar(35) NOT NULL default '',
`user_eyes` varchar(35) NOT NULL default '',
`user_dress` varchar(35) NOT NULL default '',
`user_shoe` varchar(35) NOT NULL default '',
`user_agency` text NOT NULL,
`user_nationality` varchar(100) NOT NULL default '',
`user_ethnicity` varchar(100) NOT NULL default '',
`user_agree` varchar(5) NOT NULL default '',
`user_author` int(1) default '0',
`user_model` int(1) NOT NULL default '0',
`user_live_model` int(1) NOT NULL default '0',
PRIMARY KEY (`user_id`),
KEY `user_session_time` (`user_session_time`),
KEY `username` (`username`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

gogocode
09-14-2005, 02:38 PM
Originally posted by innova
Dont be insane.
Agreed, mysql has to do all the work. The difference in perceived loading time however may have more to do with the fact that instead of 20 rows in your php array, which you then have to loop through and print out a table, you instead have 10,000 rows.


Why the hell would you read ALL the rows into an array?! Sure if you want to needlessly read 10,000 rows into an array before you use the data then it's going to be a dog, but in that case, you deserve everything you get and then some!


Actually, its not really valid join syntax, so it doesnt matter if its left or inner. He


Perfectly valid, or it wouldn't work :) ANSI Joins happen first, where operates on the result of the joined tables, no problem there, it's still a left join.

Look at his original SQL statement. That is index abuse, having multiple indexes on single columns and then having that last index which covers multiple columns already indexed by single indexes. Whew.

KEY (a,b,c)
and
KEY (b)

are in no way equivalent or redundant. If you are wanting to utilise an index on b only, to utilise the first index you would need to operate over a minimum of a and b as for an index to be used you must operate over a continuous subset from the start of the index of the fields specified in the index.

Further more, mysql will only use one index per table per query, this is why you need to take care in crafting your indexes so that they may cover the most situations, with the least number of indexes. Having a column appear in multiple indexes is perfectly acceptable if your queries demand it.

gogocode
09-14-2005, 02:42 PM
And also, having many indexes is not this guys problem.

Indexes will only negativly impact performace (aside from if mysql picks an index which isn't as good as one you already have) when inserting, updating or deleting data from the table.

gogocode
09-14-2005, 02:48 PM
The only time a LIMIT is going to improve MySQL (server)'s performance is in the peculiar case where an index is available but without the limit mysql would have decided it is quicker to run without the index, with the limit mysql will always use the index, even if it means the query portion will take longer to run, because the process of limiting the results before making the results cursor available will be quicker with the index than without.

But that's a real edge case.

innova
09-14-2005, 03:06 PM
mysql will only use one index per table per query, this is why you need to take care in crafting your indexes so that they may cover the most situations, with the least number of indexes. Having a column appear in multiple indexes is perfectly acceptable if your queries demand it.

This was actually my point, IF your queries demand it. I just thought it unlikely that a search would be performed against either all of the individual indexes that had been defined (some but not all) OR the last index consisting of multiple columns.

Anyway, thats a rather small point in the scheme of things :)

Why the hell would you read ALL the rows into an array?! Sure if you want to needlessly read 10,000 rows into an array before you use the data then it's going to be a dog, but in that case, you deserve everything you get and then some!

Ok, so 10,000 was a silly example. What about this.. lets say you use smarty, and need to pass the data to the template as an array? Granted again, when I write queries I only ask mysql for the dataset that I will use and nothing more.. I figure I am missing something here....

ANSI Joins happen first, where operates on the result of the joined tables, no problem there, it's still a left join.

Sorry. What I meant by invalid (my bad) is 'not working as expected and/or intended'. It is a left join, its just not what he wanted.

gogocode
09-14-2005, 03:13 PM
Originally posted by gogocode
The only time a LIMIT is going to improve MySQL (server)'s performance is in
...
But that's a real edge case.

Actually, I was too quick off the blocks with this, thinking about it more I'm wrong.

mysql_query() is buffered, that is, once mysql has performed the selection it will copy the results and provide the cursor back to php. copying the results is the key here, because of course the more results it must copy, the longer (and more memory) that will take.

I imagine however you'd have to have a big resultset, and a small limit to really notice it at all.

mysql_unbuffered_query() is also available from memory, but it is limited by the fact that numrows won't work, and you won't be able to make another query until you're done with the first, and it will hold table locks until you're done.