Okay, so here’s a question I would normally pose to a mailing list, but since my email setups are so jacked up at the moment, there really isn’t a good way I could subscribe and ask one, so I’ll just post this to my blog instead and hope for some input. Not that I don’t have anything against mailing lists, mind you, it’s just that I don’t like setting up an email account and subscribing when I post maybe three times a year, and lurk and read the rest of the time.
Anyway. At work, I was looking at cleaning up one database table of a project I’m working on, and I noticed that we have three ways that we are storing boolean values in the table:
- unsigned tinyint, which presumably would only be set to 0 or 1
- char(1), which also should be set to 0 or 1
- enum(‘y’,n’)
I, personally, always prefer the tinyint route. Not really for a technical reason so much as a historical one … it’s just kind of the first one I picked. What I would really like is if MySQL had a *real* boolean type field similar to postgresql, where the values can be TRUE, FALSE, ‘t’ or ‘f’. It makes things so much easier.
MySQL will accept BOOL as a column type when creating a column, but it’s implementation is a bit jacked in my opinion. It creates an unsigned tinyint column, with null attributes. That just gives a huge range of possible options, and doesn’t really come that close to a binary option set at all.
mysql> create table test (steve bool);
Query OK, 0 rows affected (0.00 sec)mysql> desc test;
+——-+————+——+—–+———+——-+
| Field | Type | Null | Key | Default | Extra |
+——-+————+——+—–+———+——-+
| steve | tinyint(1) | YES | | NULL | |
+——-+————+——+—–+———+——-+
1 row in set (0.00 sec)
I did a bit of research, since enum seems now like the most reasonable option — it limits you to a strict sub set of options. The only question I have is, how well would that index? Would it be faster scanning the table for enums or integers? That’s where I’m not sure. It turns out that an enum that stores up to 255 values will only use 1 byte (assuming I’m intrepreting this MySQL reference book correctly). A tiny integer uses the same size. So it seems like they should both be pretty optimal, but I dunno.
Any thoughts?
I’m not expert on the intricacies of mysql (or any sql, for that matter), but intuition leads me to believe that the tinyint(1) unsigned would be the quickest. No idea if that’s the case or not, but it stands to reason, IMO.
there is a forth way, CHAR(0) DEFAULT NULL, using only one bit instead of one byte.
Generally I do use tinyint 0 == False.
Indexed are used or not based on distribution of it’s values. A primary index which has exactly one value for each row is a very specific one, and will be allways preferred and used.
On the other side, an index on a boolean value may be valid for 50% of rows in the table and will be ignored (full table scan preferred) to avoid a double acces to disk, table and indexes.
As a general rule, if indexes cover more columns, put the columns which are more specific first, provided the distribution of 50% of boolean values they should be last (if in index at all).
Only exception to the last rule is when you need to match only for part of the index i.e. the index is on column (`a`,`b`,`c`) and you often query for `b` and `c`, in this case it always should be composed as (`b`,`c`,`a`).
indexing is a complex matter if I’ve been unclear MySQL has good manuals 😉
I don’t know mysql and every time I read about it, I realise I don’t want to 🙂
y,n,NULL instead of y,n is hardly a huge range of possibilities.
Can’t you define a BOOL NOT NULL column ?
Or a BOOL DEFAULT true ?
About indexing, I guess you should not expect too much from mysql, but bitmap indexes have been implemented for such cases and avoid full table scans when querying on such an index.