multimedia archives: lessons learned over twenty years

My media server died this week. It was designed for doing all the heavy lifting, ripping and encoding mostly. There’s a lot of experimental stuff on there as well, such as trying different codecs and projects over the years. It’s loss is tragic. Technically only the power supply died so it is easy / not costly to replace, but I’m looking at the idea of getting newer hardware. I think the server is around ten years old. I remember getting it when eight-core processors were just barely starting to be a thing, however long ago that was.

I’ve had more free time than I thought I would lately, so I’ve been playing around with multimedia a lot. I have learned *a lot* of lessons over the year not just about codecs and media, but also about tackling problems and managing large tasks that I want to do. So I thought I’d do a write up of some of them.

The first lesson: read the documentation

Tonight I got the idea to go back and duplicate the problems I was having about fifteen years ago when I first started ripping my DVD collection and getting them ready for storage and streaming locally on my systems, and see if I could solve those problems now.

I remember clearly trying to encode some CHiPs episodes and I kept having problems with horizontal lines on the video. That, and when playing the video back, it would toggle between two framerates: film and video, 23.97 and 29.97. This was very confusing to me, and I couldn’t figure out what was happening. The story is that the video was telecined (I won’t go into details here, but it’s what happens when you convert video from film to broadcast television settings). So, all you need to do is de-telecine it so it has one constant framerate all the way through.

I wasn’t doing so well at reading the documentation too, much less understanding it (which will lead to my second lesson in a minute here). I was using the MPlayer software which was the most advanced and best media player at the time, which shipped with a program called ‘mencoder’ to reencode video. One thing that made it great is it could read DVDs directly (mencoder dvd://), something ffmpeg couldn’t do (what’s used to encode video almost everywhere now, such as YouTube). A slight tangent is that I’ve been working on getting DVD support into ffmpeg on and off, but haven’t really committed to it.

Googling around, the best result for ‘mplayer telecine’ is their documentation, literally called ‘How to deal with telecine and interlacing within NTSC DVDs.‘ It was exactly what I needed, and it had all the options in there. I started using that on my CHiPs episode, and the video filters worked perfectly. I didn’t know how it worked or why, but it did and if I had read *and understood* the documentation, I would have had far less problems so many years ago. Which leads to the next lesson.

The second lesson: ask questions

I have an anxiety disorder, specifically generalized anxiety and OCD, and one of the side effects of it for me is that I am extremely embarrassed to ask questions if I don’t know as much about the problem as possible. And when I do ask questions, I phrase them in a way that lead to my guesses as possible conclusions. Essentially, I don’t like looking dumb, which I translate as any time when I don’t know something. So I hate coming off as a total beginner — which is exactly what I was. I was learning all this stuff that’s involved in simply building a media collection for my DVD. But I hated the idea of looking stupid so I would lurk and look for what other people do, and I’d do that for years, never getting where I wanted to because I didn’t want to be embarrassed. It is a horrible way to live, never getting answers.

Here’s what I should have said: “I read the documentation. I don’t know why these lines are showing up. Can someone help me?” A final inquiry could also be “Could you elaborate, I don’t understand.” It’s easy to do. You show that you’ve done your due diligence by researching the problem and haven’t come to a conclusion that makes sense for you. If I had only done that years ago, it would have saved a lot of problems. Alas.

I’m better at asking questions now, but I still cringe a lot and get embarrassed when I don’t know the answers and I’m asking for help. Now you may say that being embarrassed isn’t that bad, but it causes such discomfort for me, that I begin to break down and fall to pieces. I need to get over that part, too, which is a topic I bring up once in a while in therapy, so I’m working on that. Onto the next lesson.

That’s a lot of DVDs, and this was about 18 years ago.

The third lesson: research, decision, execution

I don’t know where I came up with this three-staged approach to making a decision and moving forward, but it has helped me a lot over the years. The principle is simple: a decision can be broken down into three stages.

I always say “research, research, research.” Research is free. You have to ask a lot of questions — both from external sources but internal ones as well, such as why do you want to achieve or acquire something. It can be quite soul searching and you may not get the answers you like. Some of the research I’d run into a lot was which audio codecs and which video codecs and which containers I liked the most.

Ultimately, they all serve the same purpose — get from one media source (DVD) to another one (smaller size) both efficiently, as fast as possible, and has the best quality you’d like to have. There are a *lot* of options when it comes to multimedia, which is great. It translated for me into learning lots and lots and lots of stuff. Research for me is really fun, because you can do so much of it and I absolutely love learning new things.

The next stage, after doing all the research you want, to where you feel confident to start moving forward, is decision. It’s very important to see that decision is also free, once you do all your research you can pick the option that you like the most. Or narrow it down to a few, or simply eliminate other options.

Think of it like shopping for something, say a new laptop. There’s a lot of options out there, which is great. If you want to find the best fit for you, research it until you find out as much as you can handle. Specifications, customer reviews, prices, longevity, features, etc. Once you’ve finished doing that, you can pick from the ones you like. Some things that drive the decision could be compatibility with your needs, price, vendor, and availability. One thing I’ve learned is that if you do a lot of research, the decision usually makes itself.

The final stage is to actually execute. It doesn’t apply to just choosing multimedia codecs. It can be done for finding a job, picking a car, and choosing a water bottle. It can help with a lot of stuff. Once you’ve made the decision, though, you’re ready to execute your plan of attack. Make the purchase. Move out east. Get married and have kids. Or whatever. Some decisions obviously carry more weight. One thing I’ve learned though, is you’ll never regret having done a lot of research and taking time to think about something.

The fourth lesson: don’t use a shotgun approach

I’m very much a kinesthetic learner. I learn by doing. In the case of my multimedia library, it means trying out all my different options. There’s a lot of audio codecs and video codecs out there, which ones *work* the best? Let’s time the encodes, let’s do visual comparisons, let’s monitor the filesizes.

Early on, though, I’d spin out of control because I’d try many things at once. I call it the shotgun approach because you’re just firing wildly and will hit a lot of things at once, and never really come to a good conclusion that way. The problem is that I’d try lots of options all at once, and instead of doing enough research, I’d switch between all of of them simply because I didn’t like the end result immediately. Xvid is too large. Fine, try x264. That’s too slow. Fine, let’s do theora. I don’t like how this audio codec is developed, switch to something else. The shotgun approach is when you pivot quickly, and judge things swiftly. It is damaging because the option that may have worked the best went right past you, but you’d never know it because you were trying too many things at once.

The thing that I’ve noticed that works really well for me is to change *one variable* at a time, and then do more research on that. Typically, or in the past, I’d change multiple things at once and see how that works. This container with this audio has audio/video sync issues. Let’s try a different computer, a different codec and a different source. That looks weird, too, move onto the next. And so on and so on. If I would just *stop* and look at all my options and change one thing, then I can much easier track down the problem. So don’t use the shotgun approach.

Final lesson: you’ll feel better after making the best decision

Not only will you never regret doing more research, you probably may never be too upset if you took more time and did things a bit slowly when possible. By stepping through everything at a reasonable pace, you learn a lot more, and you gain experience.

So, with tonight’s multimedia adventure, reproducing the CHiPs original encode issues, I was able to figure out the proper way to do it in about an hour, starting from scratch. I figured out all the problems I had earlier, and learned how to fix it. I don’t use mencoder any more these days, I use software called HandBrake instead, but that took lots of research too.

The great thing is, when you enjoy what you’re looking into, the whole process becomes fun because you’re learning without realizing it. And it really helps to have this long list of lessons learned you can fall back on. It helps you feel good about your decisions.

I wish I had picked up on these ideas fifteen years ago, but that’s okay, I learned a lot on the way anyway.

Some final notes, if you’re curious what encoding settings I finally settled on, here’s what I have and why:

  • Video: x264 codec because it’s fast, does very high quality with easy to visually compare differences
  • Audio: Dolby Digital or DTS so I can pass the audio through to my receiver. Sometimes I use FLAC instead.
  • Container: Matroska because it’s feature set and compatibility
  • Media (ripping) server: Gentoo, because it runs so much faster than anything else out there, and it has a great library of multimedia applications that are easy to install and use
  • Media (playback) server: Plex Media Server because it pulls in so much metadata and does a lot of heavy lifting making it easy to organize and play my content, plus it has my favorite feature, which is resumes playback on everything
  • Encoding: ffmpeg and HandBrake
  • QA: mpv
  • Storage: Instead of worrying about how big encodes are, I spent a few hundred dollars and now have 16 TB of space for everything
  • Software: I use dvd_info and bluray_info, both of which I wrote myself; I also use MakeMKV for the hard / stubborn sources. The backend and frontend are written in PHP since that’s what I know the best, and have been coding in it since about 2003.
  • Database: PostgreSQL because it’s simply the best / most advanced open-source database software out there

That’s it! Now go play with some multimedia and learn something. 🙂

Super details

Video Quality

Me comparing x264 CRF values side by side, with the original source on the left. I ended up using CRF of 12 (right side, first picture) which as you can see in the first picture is much closer to the original. I had to ditch both common convention and lots of opinions online to use such an extreme level of detail, and I ended up doing visual comparisons over all other methods. (This might be a bad example since Warner Bros released the entire series on Blu-ray, but I used this as a general standard)

Here’s how I compared the two:

mpv “${1}” –external-file=”${2}” –lavfi-complex='[vid1] [vid2] hstack [vo]’ –fullscreen –screenshot-format=png –screenshot-template=”%F.mpv-compare-${2}-%ws.%wT” –pause

CRF 12
CRF 22

Archiving

I use a database naming scheme for collection ID, series ID, DVD ID, episode ID, short title. For one of the CHiPs episodes, the name would be 2.076.0751.10641.CHIPS.mkv.

I wrote my own massive tool to archive data, called “dart” (DVD archiving tool). It’s in github if you want to look at the code, only so someone could see it if they want, plus it’s in git so I can track it.

Here’s what ‘dart –info’ runs on the same DVD:

[Access Device]

  • Reading /dev/sr0
    [DVD]
  • Title: CHIPS_S2_D1A
  • dvdread id: 792f3e760f8fe937d1cbe1446287c9fd
    [Database]
  • DVD ID: 751
  • Series: CHiPs
  • Imported: Yes
    [Info]
  • CHiPs: Peaks and Valleys
  • CHiPs: The Volunteers
  • CHiPs: Family Crisis
  • CHiPs: Disaster Squad

See here for web frontend written using CodeIgniter.

Encoding

I use HandBrake to rip all my content. Same episode above, here’s the command:

HandBrakeCLI –title ‘7’ –encoder ‘x264′ –quality ’18’ –encoder-preset ‘medium’ –encoder-tune ‘film’ –audio ‘1’ –aencoder ‘copy’ –subtitle ‘1’ ‘–markers’ ‘–detelecine’ ‘–no-dvdnav’ –input ‘/dev/sr0’ –output ‘2.076.0751.10641.CHIPS.mkv’

Leave a Reply