Well, I’m bored, so I figured I’d spill the beans on a project I’ve been keeping under wraps for a while.
I’ve been working on getting everything about the portage tree into postgresql so you can run all kinds of queries. What kinds of queries? How many ebuilds use eclass ‘eutils’ and USE flag ‘alsa’ and are in ‘video’ herd and amd64 is masked but x86 isn’t. That kind. Funky ones.
I must say, I really love postgresql even though I haven’t been using it regularly for a long time, I’m quickly getting back into it. The simplicity, the standards, the power, the tools … postgres has it all. Ahh, fanboyism.
Anyway, getting the details of the ebuilds was made incredibly easy thanks to marienz and ferringb and their work on pkgcore (and a custom python script). After that, it was just a matter of parsing the information and setting up the schemas. My importer is written in PHP and the class to import / read the data is still in its slightly butt-ugly stage. It can use some cleaning up, for sure. The database layout is going to be where the real optimizations are though. I’m going to work on setting up some good views so it will be easy to query. Right now, here’s the list of tables I have setup: arch, category, ebuild, ebuild_arch, ebuild_eclass, ebuild_homepage, ebuild_license, ebuild_use, eclass, eclass_use, herd, license, package and use. All of them can already be populated by the scripts except for eclass_use and herd. I haven’t setup the dependency ones yet, though that’ll be pretty simple too.
So there’s my big announcement. Woots. I’m working on creating the SQL to import everything right now (which takes a long time), and once that’s done, I’ll throw up a db dump somewhere. There’s still lots to be done, like finishing the import scripts and setting up some webpages to browse the tree, but it shouldn’t be too hard. I’m definately over the worst of it.