
|
View Full Version : My SQL search is dumb. Any way to smarten it up?
monkey junkie 06-15-2007, 09:00 AM Hello,
One of my websites http://www.dole.ie has very dumb SQL.
All the SQL searches are '%keyword%' based.
This means if you search for "care assistants" you get things like "Personal assistant career" etc.
Changing the SQL to ' %keyword% ' isn't really good enough.
Can any of you think of any ways to make "care assitant", "care other words assistant", "assistant other words care" be found in a large bit of text, while at the same time ignoring things like "career assistant"?
Any help greatly appreciated.
Thank you.
mwatkins 06-15-2007, 10:45 AM SQL isn't itself optimized for full text searching, although some SQL engines do have full text search capabilities - Postgres for one. MySQL has one too, but I am unfamiliar with its capabilities - whether it supports "exact" word matches or not is unknown to me.
If you are searching for the words (as opposed to substring matches) "care" and / or "assistant" then the query needs to understand that you only want *words* not substring matches and that generally implies using a regex pattern match for word_delimeter + word + word_delimeter i.e. " care " or " care," or " assistant."
Your SQL engine may provide that capability (regex searches - Postgresql see the *substring* expression), it may not. Such capabilities are not part of the standard.
Alternatively, depending on the size of the information corpus being queried, you might want to do your string matching elsewhere - either in code, or via a real full text index. Searching for " care " within a text field means that your query is (likely) going to examine every row in the table (unless you have a where clause limiting the scope of the query). If your database is small, who cares. But doing a full table scan of a very large table containing significant text data means a huge impact on performance and load on the machine, particularly if it'll be done frequently. If the latter is true, you are best off integrating a capable full text engine with your application.
MysticServer 06-15-2007, 02:06 PM assuming mySQL (it is the most popular) if you want to search for occurrences of care and assitant you could do
sName LIKE "% care %" AND sName LIKE "% assistants %" would do exactly what you wanted.
If you need something fancier.. mySQL supports "regex" (regexp command).. but RegEx is a whole 'nother nightmare.
-JasonR
mwatkins 06-15-2007, 02:15 PM Regex shouldn't be a nightmare but a useful tool.
The example you've given won't match:
"care. "
"care, "
"care! "
"assistant. "
..
and so on. Regex is the SOLUTION not the problem.
Edit:
PostgreSQL Pattern Matching: http://www.postgresql.org/docs/8.1/static/functions-matching.html
There are three separate approaches to pattern matching provided by PostgreSQL: the traditional SQL LIKE operator, the more recent SIMILAR TO operator (added in SQL:1999), and POSIX-style regular expressions. Additionally, a pattern matching function, substring, is available, using either SIMILAR TO-style or POSIX-style regular expressions.
Tip: If you have pattern matching needs that go beyond this, consider writing a user-defined function in Perl or Tcl.
mwatkins 06-15-2007, 02:29 PM Regex example. Python, but the same patterns are generally supported by most regular expression implementations. \b means word boundary.
--> # Python code:
--> import re
->> string = '''Care assistant. Part time job available for a Care-Assistant
Level II. Must be familiar with all types of assistive care.'''
->> word_pattern = r'\b%s\b'
->> re.findall(word_pattern % 'care', flags=re.IGNORECASE|re.MULTILINE)
['Care', 'Care', 'care']
Note how nearby punctuation and case don't matter... this is what you want to get back from your database. If your database can't do it, get another database.
Xeentech 06-15-2007, 03:09 PM Heres your basic Fulltext search for a MySQL db
SELECT id, title, name, uid, dt, MATCH (title,body) AGAINST (?) AS score FROM h_pages2 WHERE MATCH (title,body) AGAINST (?)
fulltext search also has a number of different options to make it match in different ways.
Some forums and CMSs take a different approach and make a table/database of words and link the words to posts/pages they were spotted in.
|