Category Archives: Databases

standardizing on booleans in mysql

Okay, so here’s a question I would normally pose to a mailing list, but since my email setups are so jacked up at the moment, there really isn’t a good way I could subscribe and ask one, so I’ll just post this to my blog instead and hope for some input.  Not that I don’t have anything against mailing lists, mind you, it’s just that I don’t like setting up an email account and subscribing when I post maybe three times a year, and lurk and read the rest of the time.

Anyway.  At work, I was looking at cleaning up one database table of a project I’m working on, and I noticed that we have three ways that we are storing boolean values in the table:

  • unsigned tinyint, which presumably would only be set to 0 or 1
  • char(1), which also should be set to 0 or 1
  • enum(‘y’,n’)

I, personally, always prefer the tinyint route.  Not really for a technical reason so much as a historical one … it’s just kind of the first one I picked.   What I would really like is if MySQL had a *real* boolean type field similar to postgresql, where the values can be TRUE, FALSE, ‘t’ or ‘f’.  It makes things so much easier.

MySQL will accept BOOL as a column type when creating a column, but it’s implementation is a bit jacked in my opinion.  It creates an unsigned tinyint column, with null attributes.  That just gives a huge range of possible options, and doesn’t really come that close to a binary option set at all.

mysql> create table test (steve bool);
Query OK, 0 rows affected (0.00 sec)

mysql> desc test;
+——-+————+——+—–+———+——-+
| Field | Type       | Null | Key | Default | Extra |
+——-+————+——+—–+———+——-+
| steve | tinyint(1) | YES  |     | NULL    |       |
+——-+————+——+—–+———+——-+
1 row in set (0.00 sec)

I did a bit of research, since enum seems now like the most reasonable option — it limits you to a strict sub set of options.  The only question I have is, how well would that index?  Would it be faster scanning the table for enums or integers?  That’s where I’m not sure.  It turns out that an enum that stores up to 255 values will only use 1 byte (assuming I’m intrepreting this MySQL reference book correctly).  A tiny integer uses the same size.  So it seems like they should both be pretty optimal, but I dunno.

Any thoughts?

3 Comments

Filed under Databases

mysql ordering by string with possible blank entries

I just found a workaround to something I’ve always preferred to do in SQL. I’m using MySQL 5 at work, and I had a query where I would order the entries by a column that is a varchar. Since there was the possibility for this column to be blank, it would display all rows with a blank entry first, and then alphabetically from there.

So, the order would be something like: ‘ ‘, ‘ ‘, ’1′, ’2′, ’3′, ‘A’, ‘B’, ‘C’.

What I really wanted was to display the blank ones last, since I wasn’t interested in those. I poked through the string functions available to see if I could conjure up a hack, and ASCII works great, as it fetches the ASCII numeral of the first character in the string. And, if the string is empty, it will return a zero. And that’s all I needed, was a binary flag to order by first.

Here’s a sample query then:

SELECT string FROM table ORDER BY ! ASCII(string), string;

And the result would be: ’1′, ’2′, ’3′, ‘A’, ‘B’, ‘C’, ‘ ‘, ‘ ‘.

Perfect. :)

2 Comments

Filed under Databases

postgres and mysql comparison paper

I’ve been job hunting, and while my dream job would be somewhere that uses PostgreSQL, I am having an extremely hard time finding anyone that uses it. So, I think my chances might be better actually getting a company to convert to using it instead. In doing that, I’ve started outlining a draft of a paper that I can present to both lead programmers, database administrators, and management on the pros of using PostgreSQL over MySQL. If anyone has some ideas that I could add in, I would appreciate it.

Here’s the general principles I already plan on covering: foreign key support, data types, transactions, shell interface, ANSI SQL support, table types, general features, history, licensing, abstraction layers (using PHP).

Also, and I don’t mean to sound like I’m spreading FUD, but it occurred to me this morning that I’ve never heard anyone say that MySQL is better than PostgreSQL.

Anyway, ideas welcome. I’ll post my progress as I get the paper put together. This is something I’ve been meaning to do for a long time.

6 Comments

Filed under Databases

prepared statements and stored procedures

I’m still working on cleaning up the import scripts for GPNL, and I’m going to have to start using PHP’s PDO database layer to connect to an SQLite3 database at one point.

I haven’t used it yet, but I had heard it was coming in PHP 5 for a while. Personally, I’ve always used PEAR::DB and was quite happy with that.

I’m still not sold on using the new layer anyway, but I figured I’d do some reading while I am getting ready to use it in this very small instance that I’m implementing.

On the docs page, I found a great summary of why prepared statements and stored procedures are handy and helpful. In short: they save you time for queries you have to repeat a lot, by pre-compiling the preparation that is common to all the queries, so that the database is really only processing the new data, and thus using less resources.

Prepared statements I haven’t played with much before until a few weeks ago, but I’ve slowly started using them in my import scripts. Performance-wise, I’ve only seen about a 15 to 20 percent speed increase. The thing I like the most about them, though, is that I don’t have to escape my strings anymore. That’s a nice little advantage I can live with.

Anyway, php.net’s PDO documentation page has a nice writeup as well, and instead of trying to summarize it myself any more, I’ll just quote it verbatim:

Many of the more mature databases support the concept of prepared statements. What are they? You can think of them as a kind of compiled template for the SQL that you want to run, that can be customized using variable parameters. Prepared statements offer two major benefits:

  • The query only needs to be parsed (or prepared) once, but can be executed multiple times with the same or different parameters. When the query is prepared, the database will analyze, compile and optimize it’s plan for executing the query. For complex queries this process can take up enough time that it will noticeably slow down your application if you need to repeat the same query many times with different parameters. By using a prepared statement you avoid repeating the analyze/compile/optimize cycle. In short, prepared statements use fewer resources and thus run faster.
  • The parameters to prepared statements don’t need to be quoted; the driver handles it for you. If your application exclusively uses prepared statements, you can be sure that no SQL injection will occur. (However, if you’re still building up other parts of the query based on untrusted input, you’re still at risk).

Leave a comment

Filed under Databases

nice mysql vs postgres summary

I was googling for a postgresql image I could use when I found this page, a nice short summary on the differences between MySQL and PostgreSQL with an emphasis on development policy.

I should mention that I’m linking to it because I agree with the author and also because I’m biased towards PostgreSQL. I prefer postgres not because of fanboyism, but because of experience and years of using both databases.

I was actually lucky enough to be trained to use PostgreSQL as the first database I ever used, and everything after that has never been able to duplicate its feature set. Since my first tech job, I’ve worked with Access, MySQL, SQL Server 2000 and SQLite.

Anyway, I love postgres. If you’ve never given it a chance, and you are looking for more advanced features, check it out. It’s all that and a box of girl scout cookies. I tell you what.

Leave a comment

Filed under Databases

potential postgres schema for lds-scriptures 3.0

Well, that was fast. I looked at the schema last night for the MDP Scriptures project, and started cleaning it up, and it went really quickly. I’ve got a postgres dump all ready for review, and this is probably the configuration I’ll use for the next release.

The major change was that I added a new table for the chapters. It seems a little odd having the chapter number in a table all its own, but for a normalized database schema it makes perfect sense. The only thing I don’t like is now you have to INNER JOIN across four tables just to get all the information. Most of the time you won’t need anything but book + chapter + verse, which is only three tables. I did create a sample view called view_verses which pulls them all together so you can easily run a select on some format like ‘Gen 1:1′. The thing I don’t like is that even that view is CPU intensive, so I may have to look at changing some stuff around.

Aside from that basic view, I’ve decided I’m not going to put all my fun ideas for functions and views in the packaged release. Instead, I’ll just have them either as a separate release, or just post them on the website since I’m sure they will evolve.

One really cool thing about postgres that I love is that you can have overloaded functions. I started playing with them a while back on this database, and came up with some cool concepts. One idea I want to implement is being able to run a select statment using a between on two verses. An example query would be: “SELECT * FROM view_verses WHERE verse_id BETWEEN verse(‘Gen.’, 1, 5) AND verse(‘Genesis’, 12); where the verse() function would be overloaded to take between one and three arguments: book, chapter and verse.

It’s pretty cool all the stuff you can do with postgres, and that’s definately where I’ll be focusing my attention in getting the goodies done.

Anyway, if you want to download this test schema, its available here. As always, feedback is welcome.

1 Comment

Filed under Databases, Religion

gentoo packages that need lovin'

I mentioned not too long ago that I was working on getting portage details crammed into postgresql, and here is the end result.

GPNL is meant to be a QA tool for treecleaners to use, making it easier to find packages and ebuilds that … well, need some lovin.

Though it’s primarily intended for quality assurance, I’ve written the frontend to be hopefully pretty generic so anyone can browse the portage tree and just see some interesting statistics all around. There’s still a lot more to be done on the website, but I think it’s to a point right now where it’s at least ready for some public consumption.

One thing I’m excited about is setting up the advanced search page, where you’ll be able to run all kinds of funky queries. I’m going to be adding some more QA checks as well, once I get some time. Getting this much done though was quite a lot of work though, and I’m probably going to take a break and focus more on other things for a while. However, if anyone has some reasonable feature requests, I’m all ears.

Oh, also the source code for the database schema and the import scripts is available online. I’ll setup SVN access and some documentation on the db layout sometime soon, not to mention how to get it working (short howto: emerge php, pkgcore, postgresql and portage-utils).

Also, a huge shout out to marienz and ferringb who put together pkgcore and my little python scripts that made importing the data incredibly simple. Thanks, guys. :)

Leave a comment

Filed under Databases, Gentoo