Web Hosting Talk







View Full Version : My SQL search is dumb. Any way to smarten it up?


monkey junkie
06-15-2007, 09:00 AM
Hello,

One of my websites http://www.dole.ie has very dumb SQL.

All the SQL searches are '%keyword%' based.

This means if you search for "care assistants" you get things like "Personal assistant career" etc.

Changing the SQL to ' %keyword% ' isn't really good enough.

Can any of you think of any ways to make "care assitant", "care other words assistant", "assistant other words care" be found in a large bit of text, while at the same time ignoring things like "career assistant"?

Any help greatly appreciated.

Thank you.

mwatkins
06-15-2007, 10:45 AM
SQL isn't itself optimized for full text searching, although some SQL engines do have full text search capabilities - Postgres for one. MySQL has one too, but I am unfamiliar with its capabilities - whether it supports "exact" word matches or not is unknown to me.

If you are searching for the words (as opposed to substring matches) "care" and / or "assistant" then the query needs to understand that you only want *words* not substring matches and that generally implies using a regex pattern match for word_delimeter + word + word_delimeter i.e. " care " or " care," or " assistant."

Your SQL engine may provide that capability (regex searches - Postgresql see the *substring* expression), it may not. Such capabilities are not part of the standard.

Alternatively, depending on the size of the information corpus being queried, you might want to do your string matching elsewhere - either in code, or via a real full text index. Searching for " care " within a text field means that your query is (likely) going to examine every row in the table (unless you have a where clause limiting the scope of the query). If your database is small, who cares. But doing a full table scan of a very large table containing significant text data means a huge impact on performance and load on the machine, particularly if it'll be done frequently. If the latter is true, you are best off integrating a capable full text engine with your application.

MysticServer
06-15-2007, 02:06 PM
assuming mySQL (it is the most popular) if you want to search for occurrences of care and assitant you could do

sName LIKE "% care %" AND sName LIKE "% assistants %" would do exactly what you wanted.

If you need something fancier.. mySQL supports "regex" (regexp command).. but RegEx is a whole 'nother nightmare.

-JasonR

mwatkins
06-15-2007, 02:15 PM
Regex shouldn't be a nightmare but a useful tool.

The example you've given won't match:

"care. "
"care, "
"care! "
"assistant. "
..
and so on. Regex is the SOLUTION not the problem.

Edit:

PostgreSQL Pattern Matching: http://www.postgresql.org/docs/8.1/static/functions-matching.html

There are three separate approaches to pattern matching provided by PostgreSQL: the traditional SQL LIKE operator, the more recent SIMILAR TO operator (added in SQL:1999), and POSIX-style regular expressions. Additionally, a pattern matching function, substring, is available, using either SIMILAR TO-style or POSIX-style regular expressions.

Tip: If you have pattern matching needs that go beyond this, consider writing a user-defined function in Perl or Tcl.

mwatkins
06-15-2007, 02:29 PM
Regex example. Python, but the same patterns are generally supported by most regular expression implementations. \b means word boundary.


--> # Python code:
--> import re
->> string = '''Care assistant. Part time job available for a Care-Assistant
Level II. Must be familiar with all types of assistive care.'''
->> word_pattern = r'\b%s\b'
->> re.findall(word_pattern % 'care', flags=re.IGNORECASE|re.MULTILINE)
['Care', 'Care', 'care']

Note how nearby punctuation and case don't matter... this is what you want to get back from your database. If your database can't do it, get another database.

Xeentech
06-15-2007, 03:09 PM
Heres your basic Fulltext search for a MySQL db

SELECT id, title, name, uid, dt, MATCH (title,body) AGAINST (?) AS score FROM h_pages2 WHERE MATCH (title,body) AGAINST (?)


fulltext search also has a number of different options to make it match in different ways.

Some forums and CMSs take a different approach and make a table/database of words and link the words to posts/pages they were spotted in.