So, I was working on Znurt this morning (I woke up unusually early, and didn’t wanna go back to sleep). I’m getting close to opening the codebase, but before doing that, there’s some really obvious glaring deficiencies that I want to clean up first.
The big thing I’ve been working on with the packages site now is making it more efficient. The first step in that has been gathering some data on how often certain things are being called to see where optimizations are most needed. So, the other day, I added a counter to the constructor of each class that would just tally each time the class was instantiated, and then I’d dump out the counter at the end of an import run.
One thing that surprised me is how often one particular class was being called — PortageTree. It’s a really simple class, and all it does is set down some really simple variables that aren’t going to change at all once they are declared, such as the location of the portage tree and it’s metadata cache on the filesystem. Pretty much used across the board on a lot of other classes that need to know the filesystem location of files (PortageCategory, PortagePackage, etc.).
Well, being still pretty new and fuzzy to the OOP approach, I thought it made sense to just extend the PortageTree class on PortageCategory and call the parent constructor to get the variables set. That ended up in that class being created a huge magnitude of times, all for the same pretty much unchanging variables.
So, I switched it this morning to use a singleton instance instead, so the class is only being created once and referenced thousands of times each import. Much nicer already.
It’s stuff like that that makes me wish I knew more about OOP. I am studying it on and off, but there’s still some concepts that I just can’t wrap my brain around at times, like exceptions. In my procedurally-attuned programming frame of mind, every time someone explains them to me, I think … “Well, if something *breaks* why don’t you just work with the return codes and work around that?” So, yah. Some stuff is still lost on me. I’m trying to figure it out though. Maybe it’s one of those things that doesn’t make sense so much when you apply it to PHP and it’s general usage of websites. A lot of the stuff I read about, I think how it would make much more sense if it were an actual application running.
On a totally different note, one thing I want to look at getting into the packages website is tracking a changelog of all the package’s keywording history. Right now, the import process is pretty simple — if the content of the ebuild has changed, then the old one is marked for removal and an entirely new ebuild record is created in the database. The reason for that is because that is far easier to do than it would be to examine all the myriad of data that is associated with one ebuild, track the changes, and then flag those. Instead, I just dump the old one and treat the new one as a completely new record.
There’s a tradeoff in the compromise, though, because instead of tracking ebuild modifications, I have to do all this coding to flag packages and ebuilds that things have changed and to treat them as an update instead of a new one. That was tricky to get setup right, and getting that stuff in there in fact was one of the main things that pushed the initial launch back. It was just one of those things that I couldn’t run into the bugs until I started actually doing a sequence of import runs, since they wouldn’t show up until then anyway.
But, I’d like to start at least tracking the ebuild keyword status changes. The reason is because that is really valuable data that can provide an excellent set of reports. For instance, we can see which categories / packages / herds are getting ignored historically as far as stabilization. Plus you can do cool stuff like import results from a statistics tracker as far as what people have installed, and you can start to see where maybe the tree could use a little more love. And, it would help contributors who want to help out, but are overwhelmed by the enormity of bugs and packages and issues that need to be addressed. I could see it being helpful saying, “here’s an area that is suffering from neglect *and* is popular.” That would be cool. And that’s my goal. In fact, that’s *been* my goal for years. I’m just now getting to the point where it’s becoming possible, though.
Fun stuff. I gotta hone my coding skills as I go, though.