Results 1 to 6 of 6
  1. #1

    Database design question: Indexing

    From what I've read, indexing columns helps optimize DB performance by speeding up WHERE clauses. However, I had a question about speeding up a WHERE clause in which a calculation is involved.
    Currently I am using MS SQL 2005, but plan on migrating my app to a MySQL DB.

    The purpose of this query is to find all users with an empire_size between half and double size of the user. However, empire_size is not currently a column in the DB but rather a computation that equals planets * 2.5 + moons.

    Here is a brief summary of my query:

    strSQL = "Select * from table where ". $userSize/2 . " >= Planets*2.5 + Moons AND ". $userSize*2. " <= Planets*2.5 + Moons"

    This query is one of the most executed queries in my application, and there will be anywhere from 50,000 to 200,000 records in the table, so optimizing this query is very important.
    My question is, is the query efficient enough as it is? Or would it be more efficient to add a column called "empire_size" and index it, and have it re-compute every time a record is updated (MS SQL already has computed columns, but I dont think mySQL does)? Or is there a another faster option that I haven't thought of?

    Any help would be appreciated.

  2. #2
    Join Date
    Apr 2007
    Location
    Bath, UK
    Posts
    132
    Well it'd definately be more efficient for the select if you have an empire_size column that is indexed.

    The question is how often will that column get updated? Maybe it would be worth updating empire_size for the whole table periodically, say every 15 minutes to reduce server load?

  3. #3
    Quote Originally Posted by Dolbz View Post
    Well it'd definately be more efficient for the select if you have an empire_size column that is indexed.

    The question is how often will that column get updated? Maybe it would be worth updating empire_size for the whole table periodically, say every 15 minutes to reduce server load?
    Its a column that would get updated a lot. On average (from the beta test) moons/planets gets updated about every 10 minutes per record, but at random intervals / bursts. On a DB of 50,000 rows there will be 300,000 updates to empire_size per hour.

    Keeping empire size 100% up do date will reduce server load in other ways, however, by preventing the User from clicking on an empire's profile that has recently moved out of their range.

    Without 15 minute script, each update statement will look like:
    "Update Planets=$P, Moons=$M, EmpireSize=$P*2.5+$M WHERE UserID=$UserID"

    Would the 15 minute script, which removes the need for the bolded clause in the update statement, help in this scenario?

  4. #4
    Also I don't know if this affects indexing or not, but the Query will only return 10 records maximum, and there is one other field looked at in the query.

    The full query is summarized to be something like this:
    strSQL = "Select TOP 10 * from table where ". $userSize/2 . " >= Planets*2.5 + Moons AND ". $userSize*2. " <= Planets*2.5 + Moons and Alert<=100 order by checksum(new_identity())"

    where checksum(new_identity()) is used to randomize the 10 records retrieved. Will indexing empire_size matter at all if Alert isn't indexed? Will indexing even help at all for very dynamic content?

  5. #5
    Join Date
    Aug 2001
    Location
    Central USA
    Posts
    200
    Just out of curiosity, why are you switching from MS-SQL to MySQL? MS-SQL is a more robust database with many more advanced features that may be very useful in your situation like computed columns, stored procedures, views, etc. Stored procedures and views are available in MySQL 5, but are still a little clumsy to use in practice.

    On a side note, you will have to change this query:
    Code:
    strSQL = "Select TOP 10 * from table where ". $userSize/2 . " >= Planets*2.5 + Moons AND ". $userSize*2. " <= Planets*2.5 + Moons and Alert<=100 order by checksum(new_identity())"
    ... to THIS for MySQL:
    Code:
    strSQL = "Select * from table where ". $userSize/2 . " >= Planets*2.5 + Moons AND ". $userSize*2. " <= Planets*2.5 + Moons and Alert<=100 LIMIT 10"
    Randomization is also a bit clumsy in MySQL currently... You generally have to use two queries to do it right (since ORDER BY RAND() is *very* slow for large datasets).
    InvoiceMore - Online Billing & Invoicing
    phpDataMapper - Object-Oriented PHP5 Data Mapper ORM

  6. #6
    Quote Originally Posted by Czaries View Post
    Just out of curiosity, why are you switching from MS-SQL to MySQL? MS-SQL is a more robust database with many more advanced features that may be very useful in your situation like computed columns, stored procedures, views, etc. Stored procedures and views are available in MySQL 5, but are still a little clumsy to use in practice.
    I'm moving my application onto a LAMP server due to budget constraints. This is a facebook app but I am paying for the server with my own $$$ and decided to go the LAMP route. MS SQL was available when I used a shared host, but a shared host wasn't sufficient for my needs. Stored procedures are important to my application, but I've already learned the stupid mySQL syntax . But I agree - computed columns and SPs made life very easy on MS SQL.

    Quote Originally Posted by Czaries View Post
    On a side note, you will have to change this query:
    Code:
    strSQL = "Select TOP 10 * from table where ". $userSize/2 . " >= Planets*2.5 + Moons AND ". $userSize*2. " <= Planets*2.5 + Moons and Alert<=100 order by checksum(new_identity())"
    ... to THIS for MySQL:
    Code:
    strSQL = "Select * from table where ". $userSize/2 . " >= Planets*2.5 + Moons AND ". $userSize*2. " <= Planets*2.5 + Moons and Alert<=100 LIMIT 10"
    Randomization is also a bit clumsy in MySQL currently... You generally have to use two queries to do it right (since ORDER BY RAND() is *very* slow for large datasets).
    Yea I changed that - and I've done some research, I'll probably do the select a random index and then LIMIT from that. I've also read that LIMIT will perform an indexed search anyway, and since I have another more static column in the WHERE clause that I can index (Lvl), my hope is that it will avoid a full table scan and will only have to do ~50-100 computations (Planets *2.5 + Moons) per query within the LIMIT 10 and Lvl constraints.

    Of course if my assumption is wrong please let me know.
    Last edited by phoqoo; 01-16-2008 at 12:23 AM.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •