Web Hosting Talk







View Full Version : PHP,MySql: How to search in database?


wayiran
08-13-2006, 08:24 PM
How can I make search in database, which for example in one of rows contains:

"Hello dog cat boy cry help"

and the words for search are:

"helping dogs"

Any idea can be helpfull, even for basic searching ("help dog")

Have you ever had similar project?

Thanks

hehachris
08-13-2006, 10:33 PM
how about using LIKE in your query?
SELECT cols FROM table 1 WHERE col1 LIKE "helping dog%"

ochiba
08-14-2006, 09:48 AM
Don't forget to put a percentage mark in front too:
LIKE "%helping dog%"

tiamak
08-14-2006, 10:18 AM
actually this is not so easy as other guys wrote.

first of all you should split sentence into single words array.
then you should filter out some short words like "in at a the" etc ...
then if you have some predictable suffixes/prefixes in your language you can create a function that will go through your words array and cut off that suf/pre-fixes (i/e in helping you have ing suffix and it would be better to cut if off)
to get rid of lower/upper chars lets say create another function and go through your array and lowercase all words
ok so now you have array with clear nice looking words :)

so you need to make a query now like SELECT * FROM table WHERE LOWER(field) LIKE "%word1%" AND LOWER(field) LIKE "%word2%" and so on ...
ofcourse you can use OR instead of AND

if you want to search in few fileds you can use
SELECT * FROM table WHERE CONCAT(LOWER(field), ' ', LOWER(field)) LIKE "%word1%" ... and so on :)

it is up to u how u want to search :)

anyway next step could be to rate results and output it in some relevancy order.
u can use query like
SELECT something, CONCAT(LOWER(field), ' ', LOWER(field)) AS searchtext FROM table WHERE CONCAT(LOWER(field), ' ', LOWER(field)) LIKE "%word1%"

then after you receive results you place it in another array (or better multiarray)
and check each row for certain words.
lets say
you get such array
a[1][id]="1"
a[1][searchtext]="Hello dog cat boy cry help"
a[1][score]="0"

a[2][id]="1"
a[2][searchtext]="Helping dog cat boy cry help"
a[2][score]="0"

you check each searchtext if certain word exists and add a score
lets say if "help" exists u add 1 to score
if "dog" exists you add 1
if "help dog" phrase exists you add 2
if "help dogs" exists you add 2
if "helping dog" exists you add 2
if "helping dogs" exists you add 3
something like that :)
then you sort your multidimensional array by "score"
and output results for your users :D

ah and this is nice solution for relatively small databases :)
with some huge db and with huge amount of results it could kill your server :)

if i made a mistake somewhere then sorry :) thats just an idea (actually with little modifications i use it on my site :) )

zupanm
08-14-2006, 11:01 AM
actually using LIKE is very slow.. you should look into fulltext searches.. you set that column to fulltext and then its something like

select bla from table where match(col1,col2) against('string');

you can even do relevence against it too to pull out the most relevent strings

tiamak
08-14-2006, 11:24 AM
actually using LIKE is very slow..

as i said it is far to be perfect :)


you should look into fulltext searches.. you set that column to fulltext and then its something like

select bla from table where match(col1,col2) against('string');

you can even do relevence against it too to pull out the most relevent strings

very very nice :) never heard about it
anyway im reading about it right now and i see it also has some disadvantages
however boolean fulltext search looks pretty impressive :)
man u destroyed my day plan :buck: now instead of working i will read about this new kewl thing :D

---
aaaa i just tried it :) works a little faster than with LIKE and query is 1000000000 times simplier :D
zupanm you are my god today :D

ochiba
08-14-2006, 11:52 AM
how's the speed for multiple queries for simple searches like the above example "help dog"?

BTW: I read about this a while ago and didn't try it because my db was small and LIKE sufficed. However, I'm rewriting my site and need to optimize the monster it has become, so thanks zupanm for the remind!

tiamak
08-14-2006, 12:17 PM
how's the speed for multiple queries for simple searches like the above example "help dog"?


on my db with about 15 000 rows query with "... match ... against..." took 0,037 sec
and query with "... like ..." 0,092 sec :) so it is a big difference but still unnoticeable

however query with match agains looks really simple and easy to understand
and that old one with lots of LIKEs is huuuuuge and both do almost the same :)

wayiran
08-14-2006, 09:42 PM
Tiamak
Zupanm
Ochiba
Hehachris

Thanks a lot for your usefull information
But my database will have 2 billion rows!!!!!!!! Really!!!!!
What should I do? should I use Oracle instead of mysql, or its of no difference?
If I use "match... against...", It would be enough fast to show the results in less than 2 seconds? or it takes hours to do!?
Whats your suggestion?

wayiran
08-14-2006, 09:49 PM
AS I calculated if it takes 0.03 sec for 15000 then It will take 1.1 hour for 2,000,000,000 If we dont consider the decrease in speed as times goes!!!!

Then How google can search in 0.15 seconds for 21,610,000,000 websites(result of searching "a" in google)? Do you know which database they use?

Czaries
08-15-2006, 05:43 PM
It's not like you can download some Google software and get really fast searches. Both their operating systems and databases are custom built and fine-tuned for super fast searching. You're better off sticking with MySQL and PHP as they are both free and widely distributed.

And I am not sure that time scaling would be accurate either. Fulltext searching works by creating a list of keywords from the columns you pick as fulltext, and then just searching those, so it's MUCH more efficient for very large-scale searches (where as LIKE goes through each row in the table, which turns out to take a bunch more time).

maxymizer
08-16-2006, 01:51 AM
AS I calculated if it takes 0.03 sec for 15000 then It will take 1.1 hour for 2,000,000,000 If we dont consider the decrease in speed as times goes!!!!

Then How google can search in 0.15 seconds for 21,610,000,000 websites(result of searching "a" in google)? Do you know which database they use?

Your maths is way wrong, you cannot calculate it in such manner. Database searches are heavily optimised and if indexed - databases do not search EACH row of the table (that's why indexes exist). You can obtain more info at Wikipedia if you're interrested about the subject.

As Czaries pointed out, Google use their own db and OS and their entire architecture is different to a common computer / web server so you cannot compare them really.

Also, I must say I'm a little bit surprised - a database with 2 billion rows is quite a large database..I don't often see people with limited knowledge on databases to work with such large databases (don't take my critic personally).

Oracle is a bit expensive, and MySQL 5.x is pretty much stable and lightning-fast database. If I had limited resources (money), I'd go with MySQL hands down (or PostgreSQL) but there are other speed-wise tweaks you'll have to take into account (choosing proper hardware as hdd's, cpu, RAM etc) for such a db not to be slow while searching.

wayiran
08-16-2006, 03:13 AM
Your maths is way wrong, you cannot calculate it in such manner.

AS I calculated if it takes 0.03 sec for 15000 then It will take 1.1 hour for 2,000,000,000 If we dont consider the decrease in speed as times goes!!!!

As Czaries pointed out, Google use their own db and OS and their entire architecture is different to a common computer / web server so you cannot compare them really.


thanks

Also, I must say I'm a little bit surprised - a database with 2 billion rows is quite a large database..I don't often see people with limited knowledge on databases to work with such large databases (don't take my critic personally).


its an image search engine, which finds images from websites by robot...
Probably it may find 2 billion images...
So, you think is it ok to use mysql?

maxymizer
08-16-2006, 05:36 AM
AS I calculated if it takes 0.03 sec for 15000 then It will take 1.1 hour for 2,000,000,000 If we dont consider the decrease in speed as times goes!!!!


When it comes to databases and indexed searching, maths used to calculate number of traverses trough its records is not linear but logarithmic as they use binary search algorithm.

I'll use the easiest method possbile and most basic example to illustrate what I mean.

Problem: you have a deck with 32 cards (8 hearts, 8 diamonds, 8 spades and 8 clubs). I pick a card, you have to guess which card I picked. Also, you get to ask questions so you can eliminate cards I don't have.

Using your logic, the maximum number of questions you would have to ask is 31. Well not quite - it's actually 5 questions to ask to obtain the card I hold.

Solution: there are 32 cards, 16 red, 16 black cards. First question: is it black? Depending on answer, you successfully eliminated 16 cards.
If I say "no", you immedeately know I hold a red card. Then you can ask if it's hearts.
Again, depending on my answer you successfully cut number of options to 8 cards and so on until you reach the final card I hold.

More info on binary search and examples can be found at: http://en.wikipedia.org/wiki/Binary_search

That's the illustration of binary search alghorithm that's been put to good use in modern computer systems.
Using your logic - "if it took 0.03sec for 15K rows, it will take 1.1hour for the entire db" - that's plainly wrong as the system doesn't work that way as it won't scan the entire data file of the database, but only the index file which is alot smaller and then the binary search algorithm is used to scan that file.

So to give the final answer - if you have enough storage space, fast and reliable hdds - you're pretty much safe with MySQL.

wayiran
08-16-2006, 09:53 AM
maxymizer (http://www.webhostingtalk.com/member.php?u=112641)

thanks... I must read more about it...

discobean
08-17-2006, 08:06 AM
If you are using Oracle you can't use match() and against() functions, there is no such thing.

I suggest to stick to a database that you understand, because if Oracle is tuned by somebody who does not know Oracle, it will work just as well as if it were MySQL tuned by somebody who dosn't know MySQL. (or most other Databases)

Also, indexing 2 billion images, have you thought about how _much_ time that will take?+