Web Hosting Talk







View Full Version : Determine the Most Frequently Appeared data in Database


nickia
09-09-2009, 04:58 PM
Is there anyway to determine the most frequently entered data in Mysql?

For example, there is a form that users can submit their favorite hobby. What can I do to automatically fetch out the most frequently entered hobby? I also have to filter out unrelated words such as "is, there, going, to, and etc"

Example:

user_id | Hobby
Uesr 1: Hobby Submition: Going to skydive
User 2: Going to the gym
User 3: Skydiving
User 4: playing basketball
User 5: skydive tomorrow


What can I do to automatically fetch out "skydive" as the most frequently appeared data?

mattle
09-09-2009, 11:10 PM
I can't think of a specific query to accomplish this, but if you break it up into a couple of steps you might be able to get some usable results. I might do something like this:

SELECT hobby FROM table

Try to capture unique words into an array. You can think of some common language rules to help with verb tenses, for example:

if one word matches /(.*)ing/ and another matches /$1?[e]/ then you have the same word. Ie, (skydiv)ing would match (skydiv)e and (eat)ing would match (eat).

Once you've accumulated a good list of keywords, you could then run them against the table in a FULL TEXT search and track the result counts. You might want to play around with different search types to see how you like the search results. An advantage here is that common words searched on their own will return no results, so they are automatically filtered out of the results.

ref: http://dev.mysql.com/doc/refman/5.0/en/fulltext-search.html, http://dev.mysql.com/doc/refman/5.0/en/fulltext-stopwords.html

nickia
09-10-2009, 01:46 AM
thanks Mattle!

eticaret
09-14-2009, 11:58 PM
Well in this case, you will have problem, because it is not good idea to create stats from text data. Misspellings, plurals, -ing etc will harden your job and you will get incorrect results.

I advise you to create select-multiple drop list and/or checkboxes.

FreeKill
09-16-2009, 12:10 AM
I would suggest modifying your form where they submit the data to remove the information you don't want before you put it in the database. If you do that, then it would be pretty easy to write an SQL query to get what you want...

mwatkins
09-16-2009, 12:47 AM
Constraining the data might the preferred option for some application requirements, but perhaps the OP wants to leave the choices up to users for a good reason. After all, is a constrained list likely to include "Competitive Rockclimbing" or "Arial Crocheting" ?

Face it, a lot of "data" is free form text. Don't try to fight that all the time... that's what search is for.

Filtering out "stop words (http://www.ranks.nl/resources/stopwords.html)" is a good start.

DevGuy
09-16-2009, 07:44 AM
Well it is possible, you can use select from users where dateupdated is closest to today, and if you need the actual code I can send it over