I'm running a server that handles logging and reporting. It's pretty processor intensive. I have a list of numbers that i'd like to update the server with, approximately 100,000-1,000,000 numbers that have a URL attached to them in a seperate field. I'm being told that it is taking too long for the database to be query'ed and spit out an answer...what can I implement to make the query go faster?
So far, my thoughts are:
#1) Remove the keyword part of the schematics to a seperate server and run the querys there and report back the answer.
#2) Update and improve the technical aspects of the server (more RAM, processors, etc)
Yep, an index will slow things down a little bit on insert. If the insert is done as a batch type operation then you can disable indexes during insert and re-enable it after.
A million rows, especially given the small amount of data you're looking at (an int and a varchar() of some length) is not alot of data. In my "day job" I deal with 10's of millions of rows with a select time on the order of 20ms for a particular row. This is on a pretty lowly machine - a 450Mhz UltraSparc II. But our data is primarily read only so we've got indexes tuned pretty well.
Is your data heavily updated or is it pretty static (say < 10% changed in an hour)? That should help you figure out the way to improve database performance. If you have no indexes on the table then, for every read, you will average reading half the table to find you data. This is called a full table scan and it is the worst possible thing you can do for performance - especially if the database has to read and ignore 500,000 rows.
Originally posted by dherman76 I'm the strategy guy - and the technology guys dont want to add the keyword parts because they say it will load too slow. How many rows can a normal db hold? Thanks for all your responses.
Not necessary - a particular query, say
"select id, url from my_table where id = 12345" (which is kinda what you said you were doing) can run in a few ms. on a well tuned database or in tens of seconds on a poorly tuned db. In this case it all depends on the indexes. In my query above if there is an index/primary key constraint on the id column then the query should be very fast. If, instead, the table is not created with something like that the dataserver (MySQL, PostgreSQL, etc.) must read each row from the table, comparing the value in the id column until it finds a match. Depending on the distribution of your data you would likely average reading half of the table on a set of queries.
So in this case the query could be exactly the same and it is the table in the database that needs to be tuned.
Throwing more hardware at a bad software design, while popular, is not something I would recommend. If your developers cannot tell you what first normal form is then it may be time to look for new developers. Sorry to be so harsh but the problem you're trying to solve is really not that tough for anybody who has done real database design.
Originally posted by dherman76 So, the inefficiencies come from either:
1) poorly coded query script
2) slow hardware
3) seperate out different functions to different servers
My biggest question would be how is the data indexed? Secondly, what tuning has been done with the queries? If I were to help you here I'd need:
[list=1][*]The DDL (data definition language) for the table - i.e. the create table script, including any indexes that have been added.[*]The queries that are slow. That is, regardless of the programming environment you are using (PHP, Java, etc.), at the database interface layer there are queries running that are independent from the PHP/Java/etc. We would need to see the query or queries that are slow to get a better feel for why they are slow.[*]One row of sample data with the assumption that the rest of the data is conceptually the same.[/list=1]
If this cannot be provided because of IP type issues, I'd encourage you to bring in a consultant under an NDA to help. Based on what you've described it does not sound like alot of work though there certainly may be something I'm missing.
Originally posted by dherman76 Gotcha - so what is the questions should I be asking?