Saturday, November 29, 2008

Video4vimeo

Everyone uploads videos nowadays. Specialists use vimeo because youtube quality sucks. So the project goal was to create a video file for upload to Vimeo with Gmerlin-transcoder, optimize the whole process and fix all bugs.

The footage

I had to make sure that I own the copyright of the example clip and that nobodies privacy is violated. So I decided to make a short video featuring a toilet toy I bought in Tokyo in 2003. A friend of mine went to the shop a few years later, but it was already sold out.

The equipment

My camera is a simple mini-DV one. It's only SD but since the gmerlin architecture is nicely scalable, the same encoder settings (except the picture size) should apply for HD as well. I connected the camera via firewire and recorded directly to the PC (no tape involved) with Kino.



Capture format

The camera sends DV frames (with encapsulated audio) via firewire to the PC. This format is called raw DV (extension .dv). The Kino user can choose whether to wrap the DV frames into AVI or Quicktime or export them raw. Since the raw DV format is completely self-contained, it was choosen as input format for Gmerlin-transcoder. Wrapping DV into another container makes only sense for toolchains, which cannot handle raw DV.

Quality considerations

My theory is that the crappy quality of many web-video services is partly due to financial considerations of the service providers (crappy files need less space on the server and less bandwidth for transmission), but partly also due to people making mistakes when preparing their videos. Here are some things, which should be kept in mind:

1. You never do the final compression
In forums you often see people asking: How can I convert to flv for upload on youtube? The answer is: Don't do it. Even if you do it, it's unlikely that the server will take your video as it is. Many video services are known to use ffmpeg for importing the uploaded files, which can read much more than just flv. Install ffmpeg to check if it can read your files.

Compression parameters should be optimized for invisible artifacts in the file you upload. That's because in the final compression (out of your control) will add more artifacts. And 2nd generation artifacts look even more ugly, the results can be seen on many places in the web.

2. Minimize additional conversions on the server
If you scale your video to the same size it will have on the server, chances are good that the server won't rescale it. The advantage is that scaling will happen for the raw material, resulting in minimal quality loss. Scaled video looks ugly if the original has compression artifacts, which would be the case if you let the server scale.

3. Don't forget to deinterlace
Interlaced video compressed in progressive mode looks extraordinarily ugly. Even more disappointing is that many people apparently forget to deinterlace. Even the crappiest deinterlacer is better than nothing.

4. Minimize artifacts by prefiltering
If, for whatever reason, artifacts are unavoidable you can minimize them by doing a slight blurring of the source material. Usually this shouldn't be necessary.

Format conversion

All video format conversions can be done in a single pass by the Crop & Scale filter. This gives maximum speed, smallest rounding errors and smallest blurring.

Deinterlacing

Sophisticated deinterlacing algorithms are only meaningful if the vertical resolution should be preserved. In our case, where the image is scaled down anyway, it's better to let the scaler deinterlace. Doing scaling and deinterlacing in one step also decreases the overall blurring of the image.


Scaling
Image size for Vimeo in SD seems to be 504x380. It's the size of their flash widget and also the size of the .flv video. Square pixels are assumed.



Cropping
The aspect ratio of PAL DV is a bit larger than 4:3. Also 504x380 with square pixels is not exactly 4:3. Experiments have shown, cropping by 10 pixels each on the left and right borders removed black border at the top and bottom. If your source material has a different size, these values will be different as well.


Chroma placement
Chroma placement for PAL DV is different from H.264 (which has the same chroma placement as MPEG-2). Depending on the gavl quality settings, this fact is either ignored or a another video scaler is used for shifting the chroma locations later on. I thought that could be done smarter.

Since the gavl video scaler can do many things at the same time (it already does deinterlacing, cropping and scaling) it can also do chroma placement correction. For this, I made the chroma output format of the Crop & scale filter configurable. If you set this to the format of the final output, subsequent scaling operations are avoided.


Since ffmpeg doesn't care about chroma placement it's probably unnecessary that we do. On the other hand, our method has zero overhead and does practically no harm.

Audio
Vimeo wants audio to be sampled at 44,1 kHz, most cameras record in 48 kHz. The following settings take care for that:


Encoding

The codecs are H.264 for video and AAC for audio. Not only because they are recommended by vimeo, they give indeed the best results for a given bitrate.

For some reason, vimeo doesn't accept the AAC streams in Quicktime files created by libquicktime. Apple Quicktime, mplayer and ffmpeg accept them and I found lots of forum posts describing exactly the same problem. So I believe that this is a vimeo problem.

The solution I found is simple: Use mp4 instead of mov. People think mp4 and mov are indentical, but that's not true. At least in this case it makes a difference. The compressed streams are, however, the same for both formats.

Format

The make streamable option is probably unnecessary, but I allow people to download the original .mp4 file and maybe they want to watch it while downloading.

Audio codec


The default quality is 100, I increased that to 200. Hopefully this isn't the reason vimeo rejects the audio when in mov. The Object type should be Low (low complexity). Some decoders cannot decode anything else.

Video codec


I decreased the maximum GOP size to 30 as recommended by Vimeo. B-frames still screw up some decoders, so I didn't enable them. All other settings are default.


I encode with constant quality. In quicktime, there is no difference between CBR and VBR video, so the decoder won't notice. Constant quality also has the advantage that this setting is independent from the image size. The quantizer parameter was decreased from 26 to 16 to increase quality. It could be decreased further.

Bugs

The following bugs were fixed during that process:
  • Reading raw DV files was completely broken. I broke it when I implemented DVCPROHD support last summer.
  • Chroma placement for H.264 is the same as for MPEG-2. This is now handled correctly by libquicktime and gmerlin-avdecoder.
  • Blending of text subtitles onto video frames in the transcoder was broken as well. It's needed for the advertisement banner at the end.
  • Gmerlin-avdecoder always signalled the existance of timecodes for raw DV. This is ok if the footage comes from a tape, but when recording on the fly my camera produces no timecodes. This resulted in a Quicktime file with a timecode track, but without timecodes. Gmerlin-avdecoder was modified to tell about timecodes only if the first frame actually contains a timecode.
  • For making the screenshots, I called
    LANG="C" gmerlin_transcoder
    This switched the GUI to English, except the items belonging to libquicktime. I found, that libquicktime translated the strings way to early (the German strings were saved gmerlin plugin registry). I made a change to libquicktime so that the strings are only translated for the GUI widget. Internally they are always English.



The result


Wednesday, November 19, 2008

Make your webcam suck less

Every webcam sucks. Not because of the webcam itself, but because of the way it's handled by the software. Some programs support only RGB formats, others work only in YUV. Supporting all pixelformats directly by the hardware would increase the price of these low-end articles. Supporting all pixelformats by the drivers would mean to have something similar to gavl in the Linux kernel. The linux kernel developers don't want this because it belongs into userspace. They are right IMO. And since not all programs have proper pixelformat support, you can always find an application, which doesn't support your cam.

Other problems are that some webcams flip the image horizontally (for reasons I don't want to research). Furthermore, some programs aren't really smart when detecting webcams. They stop at the first device (which can be TV-card instead of a webcam) they can't handle.

So the project was to make a webcam device at /dev/video0, which supports as many pixelformats as possible and allows image manipulation (like horizontal flipping).

The solution involved the following:
  • Wrote a V4L2 input module for the real webcam (not directly necessary for this project though).
  • Fixed my old webcam tool camelot. Incredible how software breaks, if you don't maintain it for some time.
  • Added support for gmerlin filters in camelot: These can not only correct the image flipping, they provide tons of manipulation options. Serious ones and funny ones.
  • Added an output module for vloopback. It's built into camelot and provides the webcam stream through a video4linux (1, not 2) device. It supports most video4linux pixelformats because it has the conversion power of gavl behind it. Vloopback is not in the standard kernel. I got it from svn with

    svn co http://www.lavrsen.dk/svn/vloopback/trunk/ vloopback
A tiny initialization script (to be called as root) initializes the kernel modules:
#!/bin/sh
# Remove modules if they were already loaded
rmmod pwc
rmmod vloopback

# Load the pwc module, real cam will be /dev/video3
modprobe pwc dev_hint=3

# Load the vloopback module, makes /dev/video1 and /dev/video2
modprobe vloopback dev_offset=1

# Link /dev/video2 to /dev/video0 so even stupid programs find it
ln -sf /dev/video2 /dev/video0
Instead of the pwc module, you must the one appropriate for your webcam. Not sure if all webcam drivers support the dev_hint option.

My new webcam works with the following applications:
These are all I need for now. Not working are kopete, flash and Xawtv.

Thursday, November 13, 2008

Gmerlin pipelines explained

Building multimedia software on top of gavl saves a lot of time others spend on writing optimized conversion routines (gavl already has more than 2000 of them) and bullet-proof housekeeping functions.

On the other hand, gavl is a low-level library, which leaves lots of architectural decisions to the application level. And this means, that gavl will not provide you with fully featured A/V pipelines. Instead, you have to write them yourself (or use libgmerlin and take a look at include/gmerlin/filters.h and include/gmerlin/converters.h).

I'm not claiming to have found the perfect solution for the gmerlin player and transcoder, but nevertheless here is how it works:

Building blocks
The pipelines are composed of
  • A source plugin, which gets A/V frames from a media file, URL or a hardware device
  • Zero or more filters, which somehow change the A/V frames
  • A destination plugin. In the player it displays video or sends audio to the soundcard. For the transcoder, it encodes into media files.
  • Format converters: These are inserted on demand between any two of the above elements
Asynchronous pull approach
The whole pipeline is pull-based. Pull-based means, that each component requests data from the preceeding component. Asynchronous means that (in contrast to plain gavl), we make no assumption on how many frames/samples a component needs at the input for producing one output frame/sample. This makes it possible to do things like framerate conversion or framerate-doubling deinterlacing. As a consequence, filters and converters which remember previous frames need a reset function to forget about them (the player e.g. calls them after seeking).

Unified callbacks
In modular applications it's always important that modules know as little as possible about each other. For A/V pipelines this means, that each component gets data from the preceeding component using a unified callback, no matter if it's a filter, converter or source. There are prototypes in gmerlin/plugin.h
typedef int (*bg_read_audio_func_t)(void * priv, gavl_audio_frame_t* frame, int stream,
int num_samples);

typedef int (*bg_read_video_func_t)(void * priv, gavl_video_frame_t* frame, int stream);
These are provided by input plugins, converters and filters. The stream argument is only meaningful for media files which have more than one audio or video stream. How the pipeline is exactly constructed (e.g. if intermediate converters are needed) matters only during initialization, not in the time critical processing loop.

Asynchronous vs synchronous
As noted above, some filter types are only realizable if the architecture is asynchronous. Another advantage is that for a filter, the input- and output frame can be the same (in-place conversion). E.g. the timecode tweak filter of gmerlin looks like:
typedef struct
{
bg_read_video_func_t read_func;
void * read_data;
int read_stream;

/* Other stuff */
/* ... */
} tc_priv_t;

static int read_video_tctweak(void * priv, gavl_video_frame_t * frame,
int stream)
{
tc_priv_t * vp;
vp = (tc_priv_t *)priv;

/* Let the preceeding element fill the frame, return 0 on EOF */
if(!vp->read_func(vp->read_data, frame, vp->read_stream))
return 0;

/* Change frame->timecode */
/* ... */

/* Return success */
return 1;
}
A one-in-one-out API would need to memcpy the video data only for changing the timecode.

Of course in some situations outside the scope of gmerlin, asynchronous pipelines can cause problems. This is especially the case in editing applications, where frames might be processed out of order (e.g. when playing backwards). How to solve backwards playback for filters, which use previous frames, is left to the NLE developers. But it would make sense to mark gmerlin filters, which behave synchronously (most of them actually do), as such so we know we can always use them.

Sunday, November 9, 2008

Release plans

Time to make releases of the packages. The current status is the following:


  • Gmerlin-mozilla is practically ready, some well hidden bugs cause it to crash sometimes though. Also the scripting interface could be further extended.
  • gavl is ready. New features since the last version are timecode support, image transformation and a contributed varispeed capable audio resampler.
  • gmerlin-avdecoder got lots of fixes, support for newer ffmpegs, a demuxer for redcode files and RTP/RTSP support. The latter was most difficult to implement. It still needs some work for better recovering after packet loss in UDP mode. Since all important features are implemented now, gmerlin-avdecoder will get the version 1.0.0.
  • The GUI player can now import directories with the option "watch directory". This will cause the album to be syncronized with the directory each time it is opened. The plan is to further extend this such that even an opened album is regularly synchronized via inotify. Except from this, the gmerlin package is ready.

Thursday, November 6, 2008

Introducing gmerlin-mozilla

What's the best method to check if your multimedia architecture is really completely generic and reusable? One way is to write a firefox plugin for video playback and beat onto everything until it no longer crashes. The preliminary result is here:



And here are some things I think are worth noting:

How a plugin gets it's data
There are 2 methods:
  • firefox handles the TCP connection and passes data via callbacks (NPP_WriteReady, NPP_Write). The gmerlin plugin API got a callback based read interface for this.
  • the plugin opens the URL itself
The first method has the advantage, that procotols not supported by gmerlin but by firefox (e.g. https) will work. The disadvantage is, that passing the data from firefox to the input thread if the player will almost lockup the GUI because firefox spends most of it's time waiting until the player can accept more data. I found no way to prevent this in an elegant way. Thus, such streams are written to a temporary file and read by the input thread. Local files are recognized as such and opened by the plugin.

Emulating other plugins
Commercial plugins (like Realplayer or Quicktime) have lots of gimmicks. One of these lets you embed multiple instances of the plugin, where one will show up as the video window, another one as the volume slider etc. Gmerlin-mozilla handles these pragmatically: The video window always has a toolbar (which can hide after the mouse was idle), so additional control widgets are not initialized at all. They will appear as grey boxes.

Of course not all oddities are handled correctly yet, but the infrastructure for doing this is there.

Scriptability
Another gimmick is to control the plugin from a JavaScript GUI. While the older scripting API (XPCOM) was kind of bloated and forced the programmer into C++, the new method (npruntime) is practically as versatile, but much easier to support (even in plain C). Basically, the plugin exports an object, (an NPObject) which has (among others) functions for querying the supported methods and properties. Other functions exist for invoking methods, or setting and getting properties. Of course not all scripting commands are supported yet.

GUI
A GUI for a web-plugin must look sexy, that's clear. Gmerlin-mozilla has a GUI similar to the GUI player (which might look completely unsexy for some). But in contrast to other free web-plugins it's skinnable, so there is at least a chance to change this.

Some GUI widgets had to be updated and fixed before they could be used in the plugin. Most importantly, timeouts (like for the scrolltext) have to be removed from the event-loop before the plugin is destroyed, otherwise a crash happens after.

The fine thing is, that firefox also uses gtk-2 for it's GUI, so having Gtk-widgets works perfectly. If the browser isn't gtk-2 based, the plugin won't load.

Embedding technique
Gmerlin-mozilla needs XEmbed. Some people hate XEmbed, but I think it's pretty well designed as long as you don't expect too much from it. The Gmerlin X11 display plugin already supports XEmbed because it always opens it's own X11 connection. After I fixed some things, it embeds nicely into firefox.

Configuration
The GUI should not be bloated by exotic buttons, which are rarely used. Therefore most of the configuration options are available via the right-click menu. Here, you can also select fullscreen mode.