Web Hosting Talk







View Full Version : Advice: new tables or differenciated rows


hq12
02-05-2007, 09:50 AM
Dear Folks at Webhostingtalk:

Currently I am writing a php script that allow users to insert up to 1000 rows in a mysql database. The script operates in two tables. Now I am wondering, should I write the script, such that each user (I expect to have 4500 users) has a pair of tables or I should only used two tables and insert a new column "username" for differenciating the entries by user?

I will appreciate the advice.

Sincerely,

Hq12

PD: How many rows a mysql table can have?

karlkatzke
02-05-2007, 09:55 AM
Put a column called username in.

I have a MySQL table that has close to 10m rows in it, and it's fast as long as the indexing is set up properly. (And the server has enough RAM...)

The best advice: Stay normalized. You do know what normalization is, right? It means that you use joins to bridge places where you can have one-to-many relationships.

Example: I'm working on an app at work right now that keeps track of kids that go to the university I work for that are in a particular program. We need to collect medical and emergency contact info, because it's possible for kids to get hurt. The current version of the application has columns in the table for kid's mom, kid's dad, and kid's doctor. Well... this is America in the 21st century, OK? We have kids with two moms, kids with two dads, kids with a mom, dad, stepdad, and who isn't talking to any of them, they only deal with their legal custodian, grandpa... it's a ONE kid to MANY caretaker relatives relationship. We just label what kind of caretaker the kid has in all of the different records.

karlkatzke
02-05-2007, 09:57 AM
Oh, and how much information you can pack into a table depends greatly on your operating system: http://dev.mysql.com/doc/refman/5.0/en/full-table.html

Saeven
02-05-2007, 01:34 PM
Oh, and how much information you can pack into a table depends greatly on your operating system:Into a table depends more on your filesystem, not your OS. MySQL will automatically segment values that supercede OS capabilities (file descriptors, etc). Where filesystems are nearly infinitely scaled these days, these are less and less of a concern. Drives are cheap, and easily federated.

Put a column called username in. Almost agree here. Textual identifiers are never recommended, and text indices are more expensive than an integer index. In the spirit of normalization as writ, consider a schema where your table contains:

table: user
========
id, int(32), unsigned, auto-increment, PRIMARY_KEY
username, varchar(64), UNIQUE

table: data
=========
id, int(32), unsigned, auto-increment, PRIMARY_KEY
user_id, int(32), unsigned, INDEX, FOREIGN_KEY( user.id )
other field1
other field2


This way, it's easy to retrieve data rows by id:

SELECT COUNT(*) AS total FROM data WHERE user_id=24

Or by username:

SELECT COUNT(*) AS total FROM data WHERE user_id=(
SELECT id FROM user WHERE username='john'
);


Good luck.
Alex

hq12
02-11-2007, 03:03 PM
Thank you very much Saeven and Karlkatzke, I really appreciate your advice.
Have a good day,
Hq12

karlkatzke
02-11-2007, 03:38 PM
I disagree with saeven that you should never use text fields as a primary key. I don't think it matters so much with modern DBMSes unless you have millions of rows. For just a few thousand, it won't be any slower.

Saeven
02-11-2007, 04:35 PM
I disagree with saeven that you should never use text fields as a primary keyIt's a basic concept that you would learn in any introductory university course. Index hashing of text, is more expensive than that of integers, further, text has special considerations that integers don't have (storage collation, charset, etc).

Chances are if you are using a textual index, it's only for human representation's sake. Integers are a better index in 99% of cases, help you build your databases with tighter schema, reduce the element of textual redundancy (what if your index is a foreign key in 50 tables? weak...). I think in the long run, you might tend to agree. Til then, we can agree to disagree!

Regards.
Alex

karlkatzke
02-11-2007, 09:51 PM
It's a basic concept that you would learn in any introductory university course. Index hashing of text, is more expensive than that of integers, further, text has special considerations that integers don't have (storage collation, charset, etc).

Chances are if you are using a textual index, it's only for human representation's sake. Integers are a better index in 99% of cases, help you build your databases with tighter schema, reduce the element of textual redundancy (what if your index is a foreign key in 50 tables? weak...). I think in the long run, you might tend to agree. Til then, we can agree to disagree!

Regards.
Alex

Alex,

I completely agree with you that numbers work better as indexes from a computer science standpoint, but the type of programming I do is almost completely oriented towards information working with humans. Our general policy with the little webapps we write is that the database needs to be easily human readable in the database without writing a join, so we use a lot of 'enum' fields and use text as keys where it makes sense and where information won't be automatically incrementing.

Ex:
Since we use the conversion 'userid -> user fullname' a lot, and we need to be able to tell who a user is both in the program and in the database, it makes more sense for us to have the userid field be a text string. The text string is converted to the full name in the application, which keeps an array of 'user full name's in an array with 'userid' as the key... but as we've got a max of 150 usernames, it's perfectly OK. Also keeps us from having to run a separate query for doing a user name dropdown.

OTOH, I would never use text as a key for a logging table, a table of trouble tickets, a table of comments on blog posts, etc. That's just asking for trouble. But a table of statuses and their definitions? Text key is perfectly fine. We have very fast hardware and our code is pretty darned efficient anyway -- it's cheaper for us to buy another blade for the server than it is for us to spend another two or three minutes writing a join every time we're troubleshooting something and need to know what a status of '26' for user '2354' means...

hq12
02-12-2007, 09:58 AM
Very good arguments guys, I will take them in consideration.

Saeven
02-12-2007, 10:34 PM
Hi Karl,

I can understand this may be the seemingly easier way to do things, but if your schema is organized like this, it might be of interest, perhaps one day, to revise it...especially if scalability is important to you. It again, has never made sense in my years of experience as an SQL consultant (this is my job) to use a textual or enumeration index. If you have usernames, chances are that you have structures associated to these usernames? If the answer is yes, you should be using an id. There's always an inherent reorganization of a parent entity that can be made to optimize around proper index structure. Where databases are concerned, small quickly becomes big!

Our general policy with the little webapps we write is that the database needs to be easily human readable in the database without writing a join, so we use a lot of 'enum' fields and use text as keys where it makes sense and where information won't be automatically incrementing.With a script application though, this is a recipe for disaster. Your heap size increases in linear proportion to your users. One day, you will bust your heap if your users and active data exceed your allocated memory. Also, startup time will increase linearly with users. Your operational contraints, should never be a function of your data - especially if username queries are cheap.

But a table of statuses and their definitions? Text key is perfectly fine.Until you work in a multilingual environment...also, it's better again to repeat an integer index about than potentially misspell your status! This is a huge source of bugs in my experience.. someone writes OFFLINE instead of offline in a table; and an expected lookup fails. Note also, that by repeating a redundant identifier about, you are introducing a de-normalization. This is redundancy (won't make 1NF).

- it's cheaper for us to buy another blade for the server than it is for us to spend another two or three minutes writing a join every time we're troubleshooting something and need to know what a status of '26' for user '2354' means...

There's no need to use JOIN queries, subqueries are ideal for this, and provide a maximal optimization:

SELECT X.*, ( SELECT username FROM users WHERE id=X.user_id ) FROM data_table X

You'll always get 10000x more performance from SQL optimization, than hardware or code patches/upgrades where data applications are concerned. In any case, I hope this is some food for thought - good luck with your development!

Cheers.
Alex

karlkatzke
02-13-2007, 01:30 AM
Alex -- Thanks. You are perfectly right across the board. We don't deal with any of the issues you pointed out -- i.e. the department I work in has had less than 240 users for the past 20 years and we're monolingual, but they're things I would need to take into account working on other projects. The one point that I will give in my favor is that we *never* allow users to put textual input into key fields; it's almost uniformly done with dropdowns, radios, or some other 'protected' form of interface that keeps things to offline insteaed of OffLine.

Oh, and I swear -- Working with MySQL <= 4 rots the brain. No subqueries -- blech. In 3.23 it was faster to run ten simple queries than it was to run one join, and I still deal with a lot of code from that era. I need to start getting into the habit of using them more often.

Any suggestions on where I can go to update my database practices that won't require me to read a 500 page book? ;)

Saeven
02-13-2007, 01:51 AM
Checking your site, you are already on the right track :) Your first step should be to get rid of anything that is sub MySQL 4.1 however.. Other RDBMS have supported subqueries for quite awhile otherwise. Subqueries are definitely the surgical tool of choice for optimizations, especially in cases where the query optimizer does not give you an expected EXPLAIN result for joins or unions.

Get 4.1, get cracking on Chapter 7 at MySQL.com, and just learn through your creations. If you are really interested, check your local university for an introductory course on schema design and normalization.

If you do find the time to read, a good DBA starter is a book by Gehrke called "Database Management Systems" - it is thick, but is good reading :)

Cheers!
Alex

karlkatzke
02-13-2007, 02:08 AM
Alex, don't count my code by my site. It's all stock wordpress on a shared-hosting server. But I'm working on the structure for what'll become a very large application, so you gave me a lot to think about.