In followup to http://www.webhostingtalk.com/showthread.php?t=483024 I stumbled over another question.
The table contains an integer column which I use for probability calculations (the higher a value the more often it is chosen).
Now this query should actually incorporate these probabilities, however it seems to prefer values from the middle range
SELECT * FROM table WHERE field>=(SELECT RAND()*MAX(field) FROM table) ORDER BY field LIMIT 1
If I split it up it still seems to prefer mid-range values over values closer to the higher end, however it doesnt "completely" ignore the highest value anymore.
random_value = SELECT RAND()*MAX(field) FROM table;
SELECT * FROM table WHERE field>=random_value ORDER BY field LIMIT 1
Does anyone have an explanation for that or a better solution? Thanks
Korvan
02-20-2006, 12:19 PM
isnt it supposed to be (as per the last thread)
SELECT * FROM table WHERE field>=(SELECT RAND()*MAX(field) FROM table) ORDER BY field, RAND() LIMIT 1
Well in order to get to the root cause of this, have mysql run 1,000 queries of
SELECT RAND()*MAX(field) FROM table
Index it into an array then print_r the array then paste it in a reply.
like
$array[$num]++; //where $num is the result.
isnt it supposed to be (as per the last thread)
SELECT * FROM table WHERE field>=(SELECT RAND()*MAX(field) FROM table) ORDER BY field, RAND() LIMIT 1
Correct, I cut down the query to the basic meaning for simplification (the ordering RAND() is only for having randomised results of the entries with the same value).
Well in order to get to the root cause of this, have mysql run 1,000 queries of
SELECT RAND()*MAX(field) FROM table
Index it into an array then print_r the array then paste it in a reply.
like
$array[$num]++; //where $num is the result.
I have to admit, I didnt test it in a longterm run as with 1000 queries, but observed the behaviour in smaller tests constantly.
Here is an example with the subquery version
value, chosen
5, 1
15, 3
40, 8
73, 15
135, 37
231, 37
403, 55
576, 23
826, 6
and here with the query split up
value, chosen
5, 3
15, 1
40, 3
73, 3
135, 8
231, 16
403, 26
576, 18
826, 55
However I will run your suggested version as well. Thanks Korvan
Korvan
02-20-2006, 12:52 PM
I think i know whats going on. look at the gaps between your values. If you notice that the first few have small gaps. So the rand is picking a random number correctly then picking the next largest number. So what happens is that if it picks 300 as its rand number it returns the result of 403. The gaps between 5 15 40 and 73 are MUCH smaller than between 231, 403, 576...
So the rand is picking a random number correctly then picking the next largest number. So what happens is that if it picks 300 as its rand number it returns the result of 403.
Yes, but thats what it should do.
I think i know whats going on. look at the gaps between your values. If you notice that the first few have small gaps.
The gaps between 5 15 40 and 73 are MUCH smaller than between 231, 403, 576...
I am sorry, I cant follow you really. Where do you think is the difference between the two queries?
Korvan
02-20-2006, 02:37 PM
The only difference is that they are random. You might not see a consistant result or results you expect. Its just how it is.
The only thing you can do is check over a massive amount of queries to make sure your random selector query is producting a sorta even result. Normally I would say it isnt anything to worry about.
Sure they are random, but higher values should be chosen more often than lower values. But as you can see in the example, the query with the subquery does not do that.
Burhan
02-21-2006, 03:01 AM
Sure they are random, but higher values should be chosen more often than lower values.
Sorry, then this is not random anymore. What you want are weighted results.
Sorry, then this is not random anymore. What you want are weighted results.
Well, yes, you can call it that way.
Does nobody have an idea why the queries behave differently?
Korvan
02-22-2006, 11:54 AM
nope.
Another option you could do is have one of php's random number generators make the rand. It would require splitting the query though.