stats and direction

I briefly glanced over Donnie’s post on how to improve stuff in Gentoo, and while I can see how some ideas might stir up a bit of controversy, I kind of can agree and see things from a different angle.

I think that the real issue at hand is that we have a lot of people working on a project (for whatever reason they will), but we have no real idea of what areas are being heavily focused on, or which areas or are of real importance. I see one simple solution as keeping things the way they are (things work relatively fine), but laying onto the existing framework some tools to help us find out specifically what areas are getting neglected, which are getting showered with attention, and which ones could use some tender lovin’.

Sadly, two of my own projects which I’ve neglected due to time constraints and passing interest would help to quantify a lot of that stuff.

The reason that I think statistics will help is because we can see what is popular and what direction users are going, instead of developers assuming they know the trends.

Some things are pretty obvious — people will migrate to the most popular tools, for example, and everyone will rely on the basic system and desktop tools. It’s easy to place importance on those affected packages because, well, everyone uses them. The next question to be raised could be — what’s the second most important or popular area?

The problem is that there is a lot of packages out there, and they don’t carry the same weight. You can make arbitrary goals like “let’s get 90% of all bugs fixed in a timely manner,” but then you are putting every ebuild and package on the same playing field — they are all of equal importance and priority. It just doesn’t work out that way. It’s applying a statistical approach (low percentage of open bugs) to a practical problem (most important fixes first).

I could be really optimistic here, but again I just think we need some kind of data based on what users are installing, using, keywording, masking and ignoring to help shed light on what the second tier of problem areas are. By second tier I mean, “not the big packages.”

As a developer, I know it’s easy for me to shrug off fixing “small” packages because they may seem irrelevant, useless or unpopular, but that’s just looking at things through my point of view and experience. What tends to happen sometimes is I’ll see the application mentioned in other places, blogs, forums, IRC, whatever, and it’s status changes in my mind that perhaps there are people that are really using this thing and have an interest in it’s maintenance. That’s a pretty poor source of input that actually does decide how I determine what I’ll be working on next.

Anyway, I really don’t have any revolutionary ideas. In fact, it’s the same ones I’ve had before, and I just need a kick in the pants to get working on gathering data and compiling reports.

Also, I don’t want to toot my own horn or anything, but I just realized something interesting the other day. When I was first working on GPNL (Gentoo Packages that Need Lovin’), one of the first problem areas I highlighted was finding ebuilds that didn’t have any metadata.xml file attached with them. And there were quite a lot. I remember going through about 400 packages or so and adding them in there. What’s interesting is that the problem is completely non-existent now, and was actually eradicated rather quickly even then.

It’s small anecdotal evidence, I know, but I really do believe that small, easily fixed issues can be quickly snuffed out if exposed, tracked and indexed properly. There’s a lot of low hanging fruit out there to be picked.

Leave a Reply