So, I was working on Znurt this morning (I woke up unusually early, and didn’t wanna go back to sleep). I’m getting close to opening the codebase, but before doing that, there’s some really obvious glaring deficiencies that I want to clean up first.
The big thing I’ve been working on with the packages site now is making it more efficient. The first step in that has been gathering some data on how often certain things are being called to see where optimizations are most needed. So, the other day, I added a counter to the constructor of each class that would just tally each time the class was instantiated, and then I’d dump out the counter at the end of an import run.
One thing that surprised me is how often one particular class was being called — PortageTree. It’s a really simple class, and all it does is set down some really simple variables that aren’t going to change at all once they are declared, such as the location of the portage tree and it’s metadata cache on the filesystem. Pretty much used across the board on a lot of other classes that need to know the filesystem location of files (PortageCategory, PortagePackage, etc.).
Well, being still pretty new and fuzzy to the OOP approach, I thought it made sense to just extend the PortageTree class on PortageCategory and call the parent constructor to get the variables set. That ended up in that class being created a huge magnitude of times, all for the same pretty much unchanging variables.
So, I switched it this morning to use a singleton instance instead, so the class is only being created once and referenced thousands of times each import. Much nicer already.
It’s stuff like that that makes me wish I knew more about OOP. I am studying it on and off, but there’s still some concepts that I just can’t wrap my brain around at times, like exceptions. In my procedurally-attuned programming frame of mind, every time someone explains them to me, I think … “Well, if something *breaks* why don’t you just work with the return codes and work around that?” So, yah. Some stuff is still lost on me. I’m trying to figure it out though. Maybe it’s one of those things that doesn’t make sense so much when you apply it to PHP and it’s general usage of websites. A lot of the stuff I read about, I think how it would make much more sense if it were an actual application running.
Anyway.
On a totally different note, one thing I want to look at getting into the packages website is tracking a changelog of all the package’s keywording history. Right now, the import process is pretty simple — if the content of the ebuild has changed, then the old one is marked for removal and an entirely new ebuild record is created in the database. The reason for that is because that is far easier to do than it would be to examine all the myriad of data that is associated with one ebuild, track the changes, and then flag those. Instead, I just dump the old one and treat the new one as a completely new record.
There’s a tradeoff in the compromise, though, because instead of tracking ebuild modifications, I have to do all this coding to flag packages and ebuilds that things have changed and to treat them as an update instead of a new one. That was tricky to get setup right, and getting that stuff in there in fact was one of the main things that pushed the initial launch back. It was just one of those things that I couldn’t run into the bugs until I started actually doing a sequence of import runs, since they wouldn’t show up until then anyway.
But, I’d like to start at least tracking the ebuild keyword status changes. The reason is because that is really valuable data that can provide an excellent set of reports. For instance, we can see which categories / packages / herds are getting ignored historically as far as stabilization. Plus you can do cool stuff like import results from a statistics tracker as far as what people have installed, and you can start to see where maybe the tree could use a little more love. And, it would help contributors who want to help out, but are overwhelmed by the enormity of bugs and packages and issues that need to be addressed. I could see it being helpful saying, “here’s an area that is suffering from neglect *and* is popular.” That would be cool. And that’s my goal. In fact, that’s *been* my goal for years. I’m just now getting to the point where it’s becoming possible, though.
Fun stuff. I gotta hone my coding skills as I go, though.
There’s two schools of thought with regard to exceptions. The first is that exceptions should only be for, well, /exceptional/ cases. The second is that exceptions are very handy to use for flow control (wrt jumping back up the stack).
I recall trying to explain this to someone at work a while back. The analogy I used was that a person driving a car into a wall would be a normal error code, but a unicorn driving a car at all would be an exception.
I’ve been on both sides of the fence with this. Lately, especially since having learned more Python, I find that it is really up to the language and culture around it to decide what an exception is used for. For Python, it seems Pythonic to attempt an operation first, whilst catching exceptions, rather than to lay down a number of state checks before attempting an operation, for example.
2¢
For learning good OOP practice, Java is an excellent language even if you don’t plan on using it in practice. “Big Java” by Cay Horstmann is a decent book w/o being too dry, and if you have previous programming experience you will breeze through it.
Bjarne Stroustrup also has a book out similar in scope by using C++. I haven’t purchased it yet, but it probably goes into more depth albeit more dense and difficult due to C++.
You might want to recheck the Object Oriented Design rules. A Category is-not-a Tree, unless you are trying to say that a Category is interchangeable with the root node of the portage tree which I didn’t think was the case. You’re looking for Composition, a Category has-a owning tree, but it is not actually a tree itself. The singleton approach is good enough, though if you want to include overlays (sunrise?) in it then you may want to allow multiple
PortageTree
instances for each overlay.The main OOD/OOP class design rules I remember are:
Classes represent things. A File, a Person, a Button.
A class must do one thing, and only one thing. All functions should be aligned towards one purpose (The rule of cohesion). eg. A Database class should only operate on the contents of a database, it should not include public functions to open or delete files, network connections, calculate taxes, etc.
A class can be a more specific type of something else (Inheritance, the is-a relationship). For example, an IceCreamTruck is-a Truck is-a Vehicle.
A class can be made from “parts”, that is, member variables which reference other objects (Composition, the has-a relationship). Example: A Truck has-a Wheel, but it is not a Wheel itself.
You need to be careful that you don’t try to use inheritance as a shortcut, even if that means rewriting similar code, as you’ll usually just end up making the code more tightly coupled (inter-dependent) and that makes it harder to fix or improve later on.
—
For exceptions, the jury is still out on that. Usual arguments are that exceptions make code visually leaner since you don’t have to use lots of “
if (errCode != 0) return errCode;
” things, however, the “try/catch” structure is often more ugly. Some people claim that it stops the error recovery from obscuring the normal behaviour when reading code but others claim that it makes it too easy to ignore error cases, allowing them to propagate and cause a crash [the same is said of forgotten error code tests though]. It can also enable you to use multiple functions that may fail in a single statement without intermediate variables and error testing and give a minor performance benefit in some cases. Generally, the best you can do is use the convention of the language; if the language doesn’t have much of a convention then just use the same style that everyone else is using on the project; if it’s your own project then you can do it whichever way you want. Most of it is just opinion.Have you tried xdebug? It can be used together with for example kcachegrind to profile your PHP application.
+1 for xdebug.
There’s also gprof2dot as an alternative to kcachegrind that gives you its main functionality (pretty tree graphs of where all your page load time is coming from).
You might want to read this post by Ned Batchelder where he argues that exceptions are better. It assumes the reader already understands the mechanics of exceptions and wants to understand why they are useful.
http://nedbatchelder.com/text/exceptions-vs-status.html