gpnl updates

I haven’t written anything about the Gentoo Packages That Need Lovin’ (GPNL) project in a long time, but I thought I’d bump an update. There’s no major releases on the horizon, but I’ve been working on revamping the entire backend in my spare time.

The thing that spawned the change is that I realized you can have portage store its metadata in sqlite. I took a peek at the database and it’s got a lot of good stuff in there. So much, in fact, that I’m pretty sure I can use this instead and do most of my data collection in a mesh of scripting, SQL and regular expressions, three of my great loves.

More specifically, I’m basically taking an sqlite dump, porting it into my postgres database, comparing the data, deleting the old, adding the new, normalizing everything and then cleaning it up for display. It’s actually a fun, challenging project. With the data that comes with the sqlite dump, I’ll be able to do more stuff as well, like finally display eclasses and proper dependencies (thank goodness). I’ve been wanting that for a long time.

The best part though is that almost all of the heavy lifting now is all through SQL. That makes things incredibly faster for me. Importing the data into the database before would take literally between 45 minutes and 2 hours. Now, though, I’ve gone back and cleaned up all the crap and it takes maybe 2 to 5 minutes to do a completely clean import from arches to ebuilds.

One thing I’m glad for is that this project has taught me to clean up my SQL skills a little bit. The main reason the import process was so slow before was because I would delete all the data from all the tables and then repopulate them. It would take incredibly long because of all the cascading deletes. Now, though, I just do incremental updates and deletes, so only what’s recent gets updated. It’s much faster, since you’re touching just a small percentage of the data.

Another cool thing is I got to upgrade to Postgres 8.2 which has some nicer string functions, like regex_replace. One challenge I ran into was stripping the portage versioning suffix off of an atom, but I eventually figured it out.

Anyway, I’m getting really close to having the import process cleaned up completely, so then I can get back to fixing up the frontend and making it actually worthwhile. Good times.

Leave a Reply