Web Hosting Talk







View Full Version : Mysql ENUM and normalization


innova
07-18-2005, 12:09 PM
I am designing a small to moderate size database app (couple thousand records, a handful of users) and I have come across a bit of a problem with my table design and structure.

It seems to me after analyzing my initial structure that I chose the ENUM type for certain data for pure convenience, but it actually appears to break some of the rules of proper DB design. I am torn between keeping it this way for ease versus looking at the advantages of breaking this table into more tables.

Here is an example:


CREATE TABLE reversal (
reversal_id MEDIUMINT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
agent ENUM('Drug1','Drug2','Drug3') NOT NULL,
dose VARCHAR(40) NOT NULL,
freq VARCHAR(40) NOT NULL,
date_given DATE NOT NULL,
patient_id INT UNSIGNED UNIQUE NOT NULL,
FOREIGN KEY (patient_id) REFERENCES patients (patient_id)
) TYPE = INNODB;


This sql creates a table intended to record use of reversal agents for certain drug overdoses. I have changed the names to make the example shorter. My entire rationale for initially doing it this way was to make it easy to select from radio buttons on a web-form as to which agent was used.

Doesnt this break the rules of proper table design? Would it not be better to have two tables, one to track when a reversal agent was given and another to setup the table of agents? On the plus side, this would make it easier to migrate to another database backend. Additionally, it makes the admin interface simpler, as it is easier to add another record for another drug than to do an ALTER TABLE on the ENUM record.

On the other hand, it seems silly to define a whole new lookup table for what will probably only be about 10 items.

While on the ENUM subject, what is a better solution:

ENUM('Yes','No') NOT NULL

OR

TINYINT(1) NOT NULL #where 1=true, 0=false

hiryuu
07-18-2005, 06:28 PM
If you need to track additional information about the agents (brand, product codes, side effects), then you should definitely make a separate table for them. If you're only tracking the name, then it's perfectly valid design. You may still want a table for valid inputs, since I'd imagine you maintain a 'valid values' list at the application level now.

Note that you can still use the agent name in that table either way. Nothing in SQL design dictates that foreign keys must be numbers (agent ID or whatever).

As for Yes/No, an enumeration is easier at the form level, but 1/0 works better at the code level. From a pure design point, Yes/No or True/False is a more correct expression of that data. (Note that MySQL's access tables use Y/N.)

error404
07-18-2005, 06:58 PM
It's not valid dsign to use enums, and MySQL are really morons for including it. What it does is put actual data into the DB schema, forcing you to modify the table structure when your data changes. This is a fundamental thing you should never do. The schema shouldn't contain any data-specific content whatsoever.

MySQL doesn't have a boolean type, and that's about the only place I can think of that I'd consider an ENUM acceptable design. IIRC enums are stored as VARCHAR in MySQL, so you'd do better with the integer representation of a boolean.

Or you could switch to a real database that does things properly ;).

innova
07-18-2005, 08:59 PM
Thanks for the replies. After thinking about it further, I am going to break this thing out into several tables and do this correctly. I dont think I want to be doing ALTER TABLE's to accomadate future additions, and there is a remote possibility that this thing might have to run on MSSQL as a backend.

With that in mind, I think I will use TINYINT(1) for true/false instead. Thank you for the perspectives.

hiryuu
07-19-2005, 04:58 AM
error404: I generally agree with what you said, but ENUM and SET fields are stored as bit values and bit maps. That makes them much smaller to store and index than a VARCHAR, which is the compelling reason MySQL included them. They're terrible from a design standpoint, but sometimes performance is the more pressing issue.

error404
07-19-2005, 05:08 AM
If performance is really that great an issue, it could be decomposed to a relation (and to be properly normalized, it probably should be anyway). Anyway, there's no excuse to using them.

ENUMs but no boolean type just blows my mind...

innova
07-19-2005, 12:48 PM
Seems more than just mysql do not have a 'true' boolean datatype. I don't think its THAT big of a deal.

At the other extreme, you have postgresql which has 'geometric' data types among other things. Mysql is not the best database product in the world, but its free, its fairly featureful (and getting better), ubiquitous, and easy to work with.

error404
07-20-2005, 04:45 AM
It's not a big deal, using a tinyint is just fine. I'm just really puzzled as to why they'd provide a 'small set of valid values' data type without providing a boolean type. For some reason they seem to go together, and from the way MySQL's development has proceeded, I would've thought it was something they would add. Anyway, I suspect that the inclusion of ENUMs goes back to the lack of referential integrity constraints in prior versions, but we're way off the mark now.

The 'proper' way to implement an ENUM is to create another relation and have a foreign key in your table. This is both correct and fast, but it only works if foreign keys are properly supported, otherwise I really can't think of a better solution than ENUMs, which I really don't like the idea of at all.

Just to get things off-topic again, innova do you know if MySQL is planning on making InnoDB the default table type anytime soon?

innova
07-20-2005, 11:05 AM
I am not entirely sure on the direction of mysql with regard to its 'default' table type.

It seems that InnoDB would not be default as mysql has always been regarded for speed, and InnoDB can be significantly slower with some operations.

In a book I have (Mysql Database Design and Optimization, great resource BTW) the authors mention MyISAM tables getting foreign key support sometime in the 5.x series.

Having multiple table types as mysql does has created both opportunities and some confusion. Perhaps in the future they will just borrow the better features from each into MyISAM. Ever since InnoDB was available and usable, I always wondered why or who uses BDB tables anymore, for example.

I agree with you on the Boolean issue. Its the simplest datatype, and does *really* need to be there. It would seem that implementing that would be probably the simplest one to do, and I am left wondering why it hasnt been done already. Tinyint(1) is workable, but you could always insert a 2 instead of a 1, or something else bone-headed that may cause trouble later.

The 'proper' way to implement an ENUM is to create another relation and have a foreign key in your table.

For another table in my project that is exactly what I did, moving those values to another table and linking them in. After much thought about the matter I think ENUM's in general really are not THAT useful due to MySQL's erratic behavior with inserting incorrect values.

For example:

Create an ENUM table ('yes','no') NOT NULL
Insert the value 'Dog'
Watch it get truncated to a NULL value.

Doh!