Archive for the ‘Databases’ Category

mysql ordering by string with possible blank entries

Saturday, November 15th, 2008

I just found a workaround to something I’ve always preferred to do in SQL. I’m using MySQL 5 at work, and I had a query where I would order the entries by a column that is a varchar. Since there was the possibility for this column to be blank, it would display all rows with a blank entry first, and then alphabetically from there.

So, the order would be something like: ‘ ‘, ‘ ‘, ‘1′, ‘2′, ‘3′, ‘A’, ‘B’, ‘C’.

What I really wanted was to display the blank ones last, since I wasn’t interested in those. I poked through the string functions available to see if I could conjure up a hack, and ASCII works great, as it fetches the ASCII numeral of the first character in the string. And, if the string is empty, it will return a zero. And that’s all I needed, was a binary flag to order by first.

Here’s a sample query then:

SELECT string FROM table ORDER BY ! ASCII(string), string;

And the result would be: ‘1′, ‘2′, ‘3′, ‘A’, ‘B’, ‘C’, ‘ ‘, ‘ ‘.

Perfect. :)

postgres and mysql comparison paper

Wednesday, August 6th, 2008

I’ve been job hunting, and while my dream job would be somewhere that uses PostgreSQL, I am having an extremely hard time finding anyone that uses it. So, I think my chances might be better actually getting a company to convert to using it instead. In doing that, I’ve started outlining a draft of a paper that I can present to both lead programmers, database administrators, and management on the pros of using PostgreSQL over MySQL. If anyone has some ideas that I could add in, I would appreciate it.

Here’s the general principles I already plan on covering: foreign key support, data types, transactions, shell interface, ANSI SQL support, table types, general features, history, licensing, abstraction layers (using PHP).

Also, and I don’t mean to sound like I’m spreading FUD, but it occurred to me this morning that I’ve never heard anyone say that MySQL is better than PostgreSQL.

Anyway, ideas welcome. I’ll post my progress as I get the paper put together. This is something I’ve been meaning to do for a long time.

prepared statements and stored procedures

Sunday, March 23rd, 2008

I’m still working on cleaning up the import scripts for GPNL, and I’m going to have to start using PHP’s PDO database layer to connect to an SQLite3 database at one point.

I haven’t used it yet, but I had heard it was coming in PHP 5 for a while. Personally, I’ve always used PEAR::DB and was quite happy with that.

I’m still not sold on using the new layer anyway, but I figured I’d do some reading while I am getting ready to use it in this very small instance that I’m implementing.

On the docs page, I found a great summary of why prepared statements and stored procedures are handy and helpful. In short: they save you time for queries you have to repeat a lot, by pre-compiling the preparation that is common to all the queries, so that the database is really only processing the new data, and thus using less resources.

Prepared statements I haven’t played with much before until a few weeks ago, but I’ve slowly started using them in my import scripts. Performance-wise, I’ve only seen about a 15 to 20 percent speed increase. The thing I like the most about them, though, is that I don’t have to escape my strings anymore. That’s a nice little advantage I can live with.

Anyway, php.net’s PDO documentation page has a nice writeup as well, and instead of trying to summarize it myself any more, I’ll just quote it verbatim:

Many of the more mature databases support the concept of prepared statements. What are they? You can think of them as a kind of compiled template for the SQL that you want to run, that can be customized using variable parameters. Prepared statements offer two major benefits:

  • The query only needs to be parsed (or prepared) once, but can be executed multiple times with the same or different parameters. When the query is prepared, the database will analyze, compile and optimize it’s plan for executing the query. For complex queries this process can take up enough time that it will noticeably slow down your application if you need to repeat the same query many times with different parameters. By using a prepared statement you avoid repeating the analyze/compile/optimize cycle. In short, prepared statements use fewer resources and thus run faster.
  • The parameters to prepared statements don’t need to be quoted; the driver handles it for you. If your application exclusively uses prepared statements, you can be sure that no SQL injection will occur. (However, if you’re still building up other parts of the query based on untrusted input, you’re still at risk).

nice mysql vs postgres summary

Sunday, March 23rd, 2008

I was googling for a postgresql image I could use when I found this page, a nice short summary on the differences between MySQL and PostgreSQL with an emphasis on development policy.

I should mention that I’m linking to it because I agree with the author and also because I’m biased towards PostgreSQL. I prefer postgres not because of fanboyism, but because of experience and years of using both databases.

I was actually lucky enough to be trained to use PostgreSQL as the first database I ever used, and everything after that has never been able to duplicate its feature set. Since my first tech job, I’ve worked with Access, MySQL, SQL Server 2000 and SQLite.

Anyway, I love postgres. If you’ve never given it a chance, and you are looking for more advanced features, check it out. It’s all that and a box of girl scout cookies. I tell you what.

potential postgres schema for lds-scriptures 3.0

Wednesday, April 25th, 2007

Well, that was fast. I looked at the schema last night for the MDP Scriptures project, and started cleaning it up, and it went really quickly. I’ve got a postgres dump all ready for review, and this is probably the configuration I’ll use for the next release.

The major change was that I added a new table for the chapters. It seems a little odd having the chapter number in a table all its own, but for a normalized database schema it makes perfect sense. The only thing I don’t like is now you have to INNER JOIN across four tables just to get all the information. Most of the time you won’t need anything but book + chapter + verse, which is only three tables. I did create a sample view called view_verses which pulls them all together so you can easily run a select on some format like ‘Gen 1:1′. The thing I don’t like is that even that view is CPU intensive, so I may have to look at changing some stuff around.

Aside from that basic view, I’ve decided I’m not going to put all my fun ideas for functions and views in the packaged release. Instead, I’ll just have them either as a separate release, or just post them on the website since I’m sure they will evolve.

One really cool thing about postgres that I love is that you can have overloaded functions. I started playing with them a while back on this database, and came up with some cool concepts. One idea I want to implement is being able to run a select statment using a between on two verses. An example query would be: “SELECT * FROM view_verses WHERE verse_id BETWEEN verse(’Gen.’, 1, 5) AND verse(’Genesis’, 12); where the verse() function would be overloaded to take between one and three arguments: book, chapter and verse.

It’s pretty cool all the stuff you can do with postgres, and that’s definately where I’ll be focusing my attention in getting the goodies done.

Anyway, if you want to download this test schema, its available here. As always, feedback is welcome.

gentoo packages that need lovin’

Monday, November 27th, 2006

I mentioned not too long ago that I was working on getting portage details crammed into postgresql, and here is the end result.

GPNL is meant to be a QA tool for treecleaners to use, making it easier to find packages and ebuilds that … well, need some lovin.

Though it’s primarily intended for quality assurance, I’ve written the frontend to be hopefully pretty generic so anyone can browse the portage tree and just see some interesting statistics all around. There’s still a lot more to be done on the website, but I think it’s to a point right now where it’s at least ready for some public consumption.

One thing I’m excited about is setting up the advanced search page, where you’ll be able to run all kinds of funky queries. I’m going to be adding some more QA checks as well, once I get some time. Getting this much done though was quite a lot of work though, and I’m probably going to take a break and focus more on other things for a while. However, if anyone has some reasonable feature requests, I’m all ears.

Oh, also the source code for the database schema and the import scripts is available online. I’ll setup SVN access and some documentation on the db layout sometime soon, not to mention how to get it working (short howto: emerge php, pkgcore, postgresql and portage-utils).

Also, a huge shout out to marienz and ferringb who put together pkgcore and my little python scripts that made importing the data incredibly simple. Thanks, guys. :)

death to sql server (part 4)

Friday, November 10th, 2006

I’ve written about this before, but to rehash … the date functions inside SQL server suck. What’s really weird is that there’s an undocumented way to retrieve out certain datetime formats, and even that is inconsistent in its numbering scheme.

The way to pull them out is by running “SELECT CONVERT(VARCHAR, GETDATE(), @x);” where @x is a positive integer. If you can find the right integer, you can save time and pull out something directly like ‘11-10-2006′ as your variable.

One of the problems you’ll run into though is that you can’t just do 1 through $integer. Only some of them return something, and the ones that don’t just throw an SQL error, so you get to hunt down which integers return something.

Well, digging for them manually once is something I don’t want to repeat, so I wrote a query statement to pull out some of them. This could be a handy reference inside your database somewhere.

DECLARE @x int;
SET @x = 1;
WHILE @x < 132 BEGIN
IF @x > 0 AND (@x < 14 OR (@x > 20 AND @x <25) OR (@x > 99 AND @x < 115) OR @x IN(126,130,131)) BEGIN
SELECT @x, CONVERT(VARCHAR, GETDATE(), @x);
END
SELECT @x = (@x +1);

END

Here’s another online reference you can use.

postgresql functions

Wednesday, November 8th, 2006

One thing I love about (advanced) databases is that you can write functions. They speed up the query time quite a bit, and you can do fun stuff like IF … ELSE … THEN statements.

In working on getting portage into postgres, part of the problem I’m trying to solve is find out where QA issues are so they can be fixed. Unfortunately, in the early stages of my little script, it always assumes that the everything is working correctly across the board, so I’ll write my queries assuming the foreign keys won’t break. In reality, that doesn’t happen, and I end up killing a transaction with hundreds of thousands of statements because there’s 25 queries that break.

So, I had to write a postgres function to do the checking for me. This is going to be absolutely boring to those db gurus out there, but this is still slightly new to me, so I’m really enjoying it.

DECLARE
use_id integer;
BEGIN
SELECT id FROM use WHERE name = $2 AND
(package IS NULL OR package = $3) LIMIT 1 INTO use_id;
IF use_id IS NOT NULL THEN
INSERT INTO ebuild_use (ebuild, use) VALUES ($1, use_id);
RETURN TRUE;
ELSE
RETURN FALSE;
END IF;
END;

A really simple function, I know. All it does is run a SELECT statement to see if my foreign key is going to break *before* running the INSERT statement. That way, I can continue on my happy way with my transaction, and at the same time, if I want to turn on ‘qa mode’ when inserting the data, I can check for a false return on the recordset, and know which ebuilds need attention.

Pretty cool stuff, I think. Also, for the record, pgaccess is a *great* little GUI tool to quickly and easily edit your functions.

portage in postgresql

Wednesday, November 8th, 2006

Well, I’m bored, so I figured I’d spill the beans on a project I’ve been keeping under wraps for a while.

I’ve been working on getting everything about the portage tree into postgresql so you can run all kinds of queries. What kinds of queries? How many ebuilds use eclass ‘eutils’ and USE flag ‘alsa’ and are in ‘video’ herd and amd64 is masked but x86 isn’t. That kind. Funky ones. :)

I must say, I really love postgresql even though I haven’t been using it regularly for a long time, I’m quickly getting back into it. The simplicity, the standards, the power, the tools … postgres has it all. Ahh, fanboyism.

Anyway, getting the details of the ebuilds was made incredibly easy thanks to marienz and ferringb and their work on pkgcore (and a custom python script). After that, it was just a matter of parsing the information and setting up the schemas. My importer is written in PHP and the class to import / read the data is still in its slightly butt-ugly stage. It can use some cleaning up, for sure. The database layout is going to be where the real optimizations are though. I’m going to work on setting up some good views so it will be easy to query. Right now, here’s the list of tables I have setup: arch, category, ebuild, ebuild_arch, ebuild_eclass, ebuild_homepage, ebuild_license, ebuild_use, eclass, eclass_use, herd, license, package and use. All of them can already be populated by the scripts except for eclass_use and herd. I haven’t setup the dependency ones yet, though that’ll be pretty simple too.

So there’s my big announcement. Woots. I’m working on creating the SQL to import everything right now (which takes a long time), and once that’s done, I’ll throw up a db dump somewhere. There’s still lots to be done, like finishing the import scripts and setting up some webpages to browse the tree, but it shouldn’t be too hard. I’m definately over the worst of it.

sql vs. sql

Tuesday, August 29th, 2006

Once again, UPHPU has had a minor stir about which database is faster / better / stronger, PostgreSQL or MySQL. All fanboyism aside, who really cares?

You want to know the way to *really* speed up your database? Normalization is probably going to be the largest factor. After that, use views, stored procedures, indexes, transactions and well-written queries, and your database is going to fly amazingly fast.

At work we have a large server we call “Zeus” because it is incredibly large. I won’t even go into specs because you wouldn’t believe me even if I told you. When I first started working here, the database running on it was incredibly slow. At first I blamed it all on the database software we are using (you can search my blog if you really wanna know which one it is. Hint: it’s neither of the two mentioned above), but as we cleaned up the databases and tables by removing columns that were complete cruft and then doing everything I mentioned above, this puppy flies. In fact, our “dev” database, which is running on nothing more than an Athlon XP 1800+ runs just as fast as our beast-monster does.

That’s how you get a fast database — doing things the right way. Who would have thought?

I have to apologize for the elitist feel of this post, but my point is this … the only magic bullet in improving performance is going to be quality code and design. Just replacing your database with something else isn’t going to make the speed fairy sprinkle your application with love.