FFmpeg: A beginners guide – part 2
Continuing the last post on FFmpeg, here we will discusses various fundamental tasks you can accomplish with the audio stream in FFmpeg.
2.1.1 Introduction to TranscodingOne of the basic tasks you can perform on an audio track in FFmpeg is to convert it into another format. This process known as Transcoding, is the direct digital-to-digital conversion of one stream encoding to another, whether video or audio. Transcoding is usually done in cases where a target device – media player such as iPod, iPAD, DVD players or a software application, does not support the format or has limited storage capacity that requires a condensed file size. Transcoding can also be used to convert an incompatible or obsolete format to a better-supported format.
Transcoding is generally a “lossy process” – a data encoding method which compresses data by discarding (losing) some of it to minimize the amount of data that need to be stored in a file; however, transcoding can also be “lossless” if the input is losslessly compressed and the output is either losslessly compressed or stored in a uncompressed state. Although compression can reduce file size considerably, repeatedly performing transcoding on a single file using lossy compression can create a ‘generation loss’ – a reduction in the quality of the audio when copying, which would cause further reduction in quality on making a copy of the copy. So you need to keep this in mind while repeatedly transcoding between various formats.
Although I could not show you here the difference between an original and lossy audio compression (due to the limitation of the media of course), the following shows an example of a lossy compression in an image. The original JPG image is on the left and a lossy image of the same after repeated compression is shown on the right. As we lose precious information forever during compression, we cannot get back the original image using the compressed image.
2.1.2 Audio compressionAudio compression is a form of data compression designed to reduce the transmission bandwidth and storage requirement of a digital audio stream. Audio compression algorithms are implemented in software as audio codec’s, – which is a software program or library capable of encoding/decoding a digital audio stream.
Audio compression is either lossy or lossless as discussed earlier. Lossless audio compression produces a version of digital audio that can be decoded to an exact digital duplicate of the original audio stream. This is in contrast to the irreversible changes upon playback from lossy compression techniques such as Vorbis and MP3.
The whole idea behind audio compression in FFmpeg is to lower the audio bitrate (96kbps, 128kbps, 192 kbps etc.), this effectively also reduces the fidelity or quality of the audio. So you want to keep in mind that, a high bitrate audio file confirms a better sound quality, so by lowering its bitrate you are actually degrading the quality.
For normal computer use, the 128kbs rate produces a quality equal to that of an audio CD. But in the case of an MP3 use, it is necessary to use a 256kbs bitrate to reach an identical result to that of the CD quality sound.
Now that we have gone through a short introduction to compression, we will now work on the process of transcoding audio files.
To run the example commands in this section, you will need an audio file in a .wav or an .mp3 format. You can get hold of a wav file by ripping an audio track from a music CD or downloading an mp3 file from the Internet. Call the resulting file ‘myaudio.mp3’. For this section I used the ‘Solo Piano 7’ Opening file from http://www.archive.org/details/solo-piano-7.
Next, we will get ffmpeg to identify the file. This will tell us the various details of the audio file. The simple way to get this information is to just tell ffmpeg to use it for input. For this we need to use the –i option. Enter the following command at your prompt.
ffmpeg -i myaudio.mp3
The exact output on my PC is shown below; which may differ from yours depending on the version of ffmpeg you are using.
D:\ffmpeg>ffmpeg -i myaudio.mp3
ffmpeg version N-31100-g9251942, Copyright (c) 2000-2011 the FFmpeg developers
Input #0, mp3, from ‘myaudio.mp3′:
Metadata:
album : solo piano 7
artist : Torley
album_artist : Torley
composer : Torley
genre : Piano
track : 001/176
title : 001 – Openings
date : 2008
Duration: 00:01:39.50, start: 0.000000, bitrate: 193 kb/s
Stream #0.0: Audio: mp3, 44100 Hz, stereo, s16, 192 kb/s
At least one output file must be specified
There is a lot of information we can gather from the output – the track is
1 minute 39.50 seconds long, the bitrate is 193kb/s, the audio is encoded in mp3 format at 44100Hz (44.1KHz) and has two channels (stereo). All this information will come in handy during a transcoding process.
Let us now convert the downloaded file to a simple wav format. Notice that we have not specified any format option or flag, just the complete output filename. FFmpeg automatically guesses which encoders to use by noticing the format of the input and output files, this can be a big help if you keep forgetting the option name or are just being lazy. If you are not going to specify the encoder format, make sure you mention the full filename, along with the appropriate format extension.
ffmpeg -i myaudio.mp3 myaudio.wav
The output of the command is shown below.
ffmpeg version N-31100-g9251942, Copyright (c) 2000-2011 the FFmpeg developers
Input #0, mp3, from ‘myaudio.mp3′:
Metadata:
album : solo piano 7
artist : Torley
album_artist : Torley
composer : Torley
genre : Piano
track : 001/176
title : 001 – Openings
date : 2008
Duration: 00:01:39.50, start: 0.000000, bitrate: 193 kb/s
Stream #0.0: Audio: mp3, 44100 Hz, stereo, s16, 192 kb/s
File ‘myaudio.wav’ already exists. Overwrite ? [y/N] y
Output #0, wav, to ‘myaudio.wav’:
Metadata:
album : solo piano 7
artist : Torley
album_artist : Torley
composer : Torley
genre : Piano
track : 001/176
title : 001 – Openings
date : 2008
encoder : Lavf53.4.0
Stream #0.0: Audio: pcm_s16le, 44100 Hz, stereo, s16, 1411 kb/s
Stream mapping:
Stream #0.0 -> #0.0
Press [q] to stop, [?] for help
size= 17141kB time=00:01:39.50 bitrate=1411.2kbits/s
video:0kB audio:17141kB global headers:0kB muxing overhead 0.000251%
Notice how large the resulting wav file is (17 Mb) as compared to the original mp3 format (2.1 Mb). This being for the reason that the wav file is not compressed like its mp3 counterpart. Incidentally, the audio format of the wav is Pulse-code modulation (PCM), technically PCM signed 16 bit little-endian format.
As you can see from the screenshot above the output of an ffmpeg command is quite large, so from here on I’ll just specify the command and do away with the output screen unless it is required for explanation.
2.1.5 Changing the bitrate of the audioAs we learned in Chapter 1, bitrates control the file size and the quality of an audio or video stream. Lowering the bitrate will result not only in a reduced file size but also diminish the quality of the final output. This can be required if you have a high quality audio recording and need to lower the quality for a reduced file size to stream over the Web. For example the following command will set the bitrate of the mp3 file to 64kb/s. This uses the –ab option to the job.
-ab
ffmpeg -i myaudio.mp3 -ab 64k out.mp3
The higher the value the better is the audio quality. This is one of the important factors responsible for the audio quality. But that doesn’t mean you can make a poor audio file sound better by increasing its bitrate. The resultant file will just be of bigger size.
Another example – to transcode an mp3 file to an AAC format, with a bitrate of 128K, we can use the following.
ffmpeg -i myaudio.mp3 -ab 128k myaudio.aac
As we saw earlier the original audio track has 2 channels (stereo). Many times it is not necessary to have 2 channels, like in a speech recording, where its really doesn’t matter.. In such cases you can further reduce the file size by setting the audio channels to mono or ‘1’. For output streams it is set by default to the number of input audio channels.
-ac
ffmpeg -i myaudio.mp3 -ac 1 out.mp3
Note that once you convert a stereo channel to a mono, you cannot convert it back to a stereo channel audio. That information is lost forever. The same thing happens with bitrates. Once you reduce a bitrate of an audio file, you cannot just increase the bitrate back again to get the original quality. That information is already gone. So as a precaution, never work with your original media files. Make a copy of the original and work with the copy.
The other important audio option is –acodec. This option lets you choose the type of audio codec you want to use. e.g. if you are using ffmpeg on a mp3 file, then it will need the audio codec libmp3lame. You can specify it using -acodec libmp3lame. Although, by default, ffmpeg takes care of the codecs you need (by guessing it from the output file format) but if you need anything different, then go for this option. FFmpeg uses a default encoder for each audio stream, using the output file extension to guess the encoder to use. This option lets you force FFmpeg to use a specific audio encoder rather than the default. The following for example will extract the audio stream from a .flv video and save it as an .mp3 file using the libmp3lame encoder.
-acodec
ffmpeg -i myvideo.flv -acodec libmp3lame myaudio.mp3
Sometime you FFmpeg may be unable to correctly decode the input file, giving the error something like the following.
Error while decoding stream #0.0
In such cases you can force FFmpeg to use a particular decoder to decode the input file. The following example will force FFmpeg to use the mp3 codec toe decode the input file audio.
ffmpeg -acodec libmp3lame -i myvideo.flv myaudio.mp3
Note that the –acodec option comes before the –i option when we want the codec to apply to the input stream and comes after the –i option when we want the codec to apply to the output stream. To see what codecs are available on your system, issue the following command.
ffmpeg -codecs
Sometimes you may want to completely disable the audio recording for which we can use the –an option. This can be used to strip out an audio stream from a video file. When you use this option, all the other audio related attributes are cancelled out, which is fine, as they would not matter without the audio. So for example you are want to disable the audio from a video file and only copy the video stream, you can use the following.
ffmpeg -i myvideo.flv -an out.flv
Another important option is –ar, the audio sampling frequency. This lets you set the maximum sampling frequency of the audio stream. Audio sampling was discussed in Chapter 1. You can use the option to reduce the sampling frequency to a lower value to reduce file storage or Internet bandwidth capacity. The default value is set at 44100Hz. The value is given in Hz. So the following will resample the input audio to 11025Hz with a single channel (mono).
-ar
ffmpeg -i myaudio.mp3 -ar 11025 -ac 1 myaudio.mp3
Note that once you have reduced the sampling frequency some of the audio data is lost. You cannot again resample it to a higher value and expect increase in the audio quality.
2.1.6 Audio grabbingUntil now we have looked into how to transform existing audio stream into other formats. FFmpeg can also grab audio from external devices such as a microphone. This can be useful if you need to record from your desktop microphone or create a screencast. Note that the following command will not work on a Windows machine. You need to have a Linux machine to correctly grab the mic audio. Enter the following command at your Linux prompt.
ffmpeg -f oss -i /dev/dsp ./audio.wav
This will start recording the input audio from the mic to the ‘audio.wav’ file. Once started you will need to press ‘q’ to stop the recording. We will now look into the various options given above.
The option –f denotes the format to be used for the input stream. There are various formats FFmpeg supports; you can find the complete list by issuing the following command.
ffmpeg -formats
Here we are using the ‘oss’ format, which stands for Open Sound System input device. The Open Sound System (OSS) is an interface for making and capturing sound in Unix or Unix-like operating systems. In the Linux kernel, there have historically been two uniform sound APIs. One is OSS; the other is ALSA (Advanced Linux Sound Architecture). ALSA is available for Linux only.
The device ‘/dev/dsp’ is the default audio input device in the Linux system. It’s connected to the main speakers and the primary recording source such as a microphone. The system administrator can set /dev/dsp to be a symbolic link to the desired default device.
The ‘audio.wav’ file is where the recorded audio will be saved.
Another example – the following will record the mic audio to the file ‘rec.flac’ in the current directory, this is a flac format file.
ffmpeg -f alsa -ar 48000 -i front ./rec.flac2.2 Some popular audio formats
.AAC Advanced Audio Coding File – declared the new audio-file standard in 1997, designed to replace its predecessor, MP3. It provides better quality at lower bit rates, and its Apple’s standard iTunes and iPod audio format.
.AIF(F) Audio Interchange File Format – developed by Electronic Arts and Apple back in the ’80s. AIFF files contain uncompressed audio, resulting in large file sizes.
.m4a Apple Lossless – This file format uses lossless compressions for digital music.
.MP3 MPEG Layer 3 – the most popular digital-audio music format, designed by a team of European engineers in 1991 to conserve the quality of a song while storing it in a small, compact file.
.OGG Ogg Vorbis – one of the most popular license-free, open-source audio-compression formats. It’s efficient for streaming and compression because it creates smaller files than MP3 while maintaining audio quality.
.RA(M) Real Audio Media – developed by RealNetworks in 1995. It has a wide variety of uses, from videos to music, but is mainly used for streaming audio such as that from Internet radio stations.
.WAV Windows WAVE – IBM and Microsoft-developed format popular audio format among PC computer users; it can hold both compressed and uncompressed audio.
.WMA Windows Media Audio - designed by Microsoft to be an MP3 competitor, but with the introduction of iTunes and iPods, it’s fallen far behind MP3 in popularity.
2.3 Audio processing recipesMP3 to AAC High Quality Stereo
ffmpeg -i in.mp3 -acodec aac -ac 2 -ar 48000 -ab 192k out.aac
MP3 to AAC High Quality 5.1
ffmpeg -i in.mp3 -acodec aac -ac 6 -ar 48000 -ab 448k out.aac
Convert to low quality mp3 to preserve storage
ffmpeg -i in.mp3 -ab 64K out.mp3
MP3 to Vorbis OGG (can be played in HTML 5)
ffmpeg -i in.mp3 -acodec vorbis -aq 50 out.ogg
In the next post in the series we will look into various video processing tasks.
Subscribe now to get the next post update.
FFmpeg: A beginners guide – part 1
The idea for this posts arose from my frustration on not finding any organized documentation for learning FFmpeg. Thus, my aim in writing this series has been to provide newbie learners to quickly get up-and-running with FFmpeg.
FFmpeg is a command-line tool for *nix and Windows systems that, in its simplest form, provide a facility to decode and an encode media files. With the proliferation of video on the Internet and in our daily lives, users need the ability to transcode (convert) audio and video files from one format to another. For example, a user might have downloaded a video from YouTube and need to convent it to a format playable on an iPod or other media device.
Besides this obvious use, FFmpeg is also capable of a few other fundamental manipulations on the audio and video data. These manipulations include changing the sample rate of the audio and advancing or delaying it with respect to the video, reducing the size of the media file. They also include changing the frame rate of the resulting video, cropping it, resizing it, placing bars left and right and/or top and bottom in order to pad it when necessary, or changing the aspect ratio of the picture. Furthermore, ffmpeg allows importing audio and video from different sources such as a microphone.
The main components of FFmpeg are libavcodec, an audio/video codec library, libavformat, an audio/video container mux/demux library, and the ffmpeg command line program for passing various transcoding options to the main program.
The FFmpeg project was started by Fabrice Bellard, and has been maintained by Michael Niedermayer since 2004. The name of the project comes from the MPEG video standards group, together with “FF” for “fast forward”. On March 13, 2011 a group of FFmpeg developers decided to fork the project under the name Libav (http://libav.org/) due to some project management related issues.
FFmpeg is used by many open source and proprietary projects, including ffmpeg2theora, VLC, MPlayer, HandBrake, Blender, Google Chrome, and various others.
1.1.2 Components of FFmpegFFmpeg is made of the following main components.
Programs
ffmpeg – a command line tool to convert multimedia files between formats.
ffserver – a multimedia streaming server for live broadcasts.
ffplay – a simple media player based on SDL and the FFmpeg libraries.
ffprobe – a simple multimedia stream analyzer.
Libraries
libavutil – a library containing functions for simplifying programming, including random number generators, data structures, mathematics routines, core multimedia utilities, and much more.
libavcodec – a library containing decoders and encoders for audio/video codecs.
libavformat – a library containing demuxers and muxers for multimedia container formats.
libavdevice – a library containing input and output devices for grabbing from and rendering to many common multimedia input/output software frameworks, including Video4Linux, Video4Linux2, VfW, and ALSA.
libavfilter – a library containing media filters.
libswscale – a library performing highly optimized image scaling and color space/pixel format conversion operations.
In this posts we will primarily focus on the ffmpeg program, the other programs like ffserver are used for video broadcasts and is outside the scope of this posts. Among the libraries, the most notable parts of FFmpeg are libavcodec, an audio/video codec library, and libavformat, an audio/video container mux and demux library
1.1.3 CompressionTo be honest, trying to shoehorn the complete details of audio and video in a paragraph or two is plainly ridiculous, as the topic is rather complex. But since this is a beginner’s guide, a few basic overviews will be enough to get you started using ffmpeg properly.
If you are working with audio and video, you are well aware that these files take an inordinate space for storage. You cannot easily work with these files if they were not compressed beforehand. Assuming an NTSC standard video format; a raw (uncompressed) video at 720×480 pixels, 30 frames per second and 24-bit RGB color, would take about 1,036,800 bytes (1 Mb) per frame. That’s almost 30MB per second, or over 200GB for a 2-hour movie. And that’s just the video. Audio stream also takes additional storage. Something needs to be done so that the movie can be stored on a consumer-grade medium such as a DVD. The data needs to be compressed beforehand.
Conventional, lossless compression algorithms such as ZIP, which everyone uses on a regular basis, don’t reduce the size of the data enough, so we need to look into lossy compression for further size reduction. Lossy compression works by discarding some data in the media which results in smaller file sizes. So now you might be thinking what data the compression algorithm discards. Well in general the algorithm does not discard any random data, which would be a disaster. The compression algorithm discards data only if it thinks that the data is redundant. For example in movie frames many times not much changes between successive frames; if the compression software discards some of these frames the viewer will hardly notice any difference, but the storage requirement of those frames have been saved.
Lossy compression is commonly used to compress multimedia data such as audio, video and still images. The only negative aspect of lossy compression is that as some data is removed during compression which can reduce the fidelity of the output.
The algorithms that allow us to encode and decode the data, whether by using lossy or lossless technique are called codecs. Several codecs are enclosed in the libavcodec library supplied with ffmpeg, which enables you to work with a wide variety of video and audio formats.
Once the audio and video streams have been encoded by their respective codecs, this encoded data needs to be put together into a single file. This file is called the ‘container’. A graphic of the process is shown below.
1.1.4 BitratesA movie is made-up of two main components, Audio and Video. Both “components” produce a separate stream of data that must be decoded by your DVD-player or some program so we can see and hear the video properly.
The bitrate of a movie is the key to the quality of the audio and video of that movie. Also, particular formats specify the bitrate or the maximum bitrate to be used. Bitrate is a measurement of the number of bits that are transmitted over a set length of time. Your overall bitrate is a combination of your video stream and audio stream in your file with the majority coming from your video stream. Bitrate denotes the average number of bits that one second of audio or video data will take up in your compressed bit stream. The overall bitrate of your movie is a combination of your video stream and audio stream in your file with the majority coming from your video stream.
A bit rate is usually measured in some multiple of bits per second – for example, kilobits, or thousands of bits per second (Kbps – for example, kilobits, or thousands of bits per second (Kbps).
Bitrates come in two versions – VBR (Variable Bit Rate encoding) or CBR (Constant Bit Rate encoding). VBR allows a higher bitrate (and therefore more storage space) to be allocated to the more complex segments of media files while less space is allocated to less complex segments. The average of these rates can be calculated to produce an average bitrate for the file. VBR allows you to set a maximum and minimum bitrate. The compression algorithm then tries to efficiently compress the data reducing to the minimum bitrate when there is little or no motion on screen and increasing to the maximum defined rate when the motion is prevalent. This helps to give you a smaller overall file size without compromising the quality of the video.
CBR is used when a predictable flat bit rate is needed. Although the flat bitrate throughout the entire file comes at the price of efficiency for the codec; usually resulting in a larger file, but smoother playback. CBR is useful for streaming multimedia content on limited capacity channels since it is the maximum bit rate that matters, not the average, so CBR would be used to take advantage of all of the capacity. CBR would not be the optimal choice for storage as it would not allocate enough data for complex sections (resulting in degraded quality) while wasting data on simple sections.
Depending on your video you might want to use a VBR for a streaming playback if the sudden spikes do not exceed your target user’s connection speed. For example if there is only one high motion scene in a video, you will be wasting considerable bandwidth on a CBR throughout the entire file and may better serve your user’s need by using a VBR. Either way try experimenting with the two settings to find what works best for your video.
Briefly, a bitrate specifies how many kilobits the file may use per second of audio. The following shows the quality for various standard audio bitrates.
64 Kbps Audio encoded at 64 Kbps have a 15:1 compression ratio. This bitrate is not recommended for digital music but is acceptable for voice-only recordings. 96 Kbps Audio encoded at 96 Kbps have a 15:1 compression ratio. One minute of music will be about 700KB of disk space. 128 Kbps Audio encoded at 128 Kbps have an 11:1 compression ratio. One minute of music is takes around 1MB of disk space. 160 Kbps Audio encoded at 160 Kbps have a 9:1 compression ratio. One minute of music will is about 1.5MB of disk space. 192 Kbps and above MP3s encoded at this setting take up the most space but have CD quality sound and can take up to 2MB of space per 60 seconds of music. Online music stores or music download services will have at least this high of a bitrate. 1.1.5 Audio Sampling FrequencyThe audio sampling frequency is the number of times per second audio is sampled and stored – CD audio is sampled at 44.1 KHz, which means when the sound is converted from analog to digital, 44100 samples per second are taken of the audio signal. The higher the sampling rate the audio has, the wider the frequency range it provides. In other words, higher is better quality. Your lows will be lower; your highs will be higher. For example the following image shows an analog signal on the left converted to a digital representation using two different sampling rates. As you can see the higher sampling will lead to an even more exact reproduction of the original signal.
The sample rate can be thought of as how often or how much the sound is described. CD quality audio has 44,100 of these measurements a second. That’s why it’s called 44.1 kilohertz (khz).
So what is the relationship between bitrate and sampling frequency? Bitrate simply specifies the number of bits per second that are used to encode the audio stream. The uncompressed bitrate for CD audio is 16 bits x 44100 samples x 2 channels = 1411200bps, or approximately 1411kbps. When audio is stored in an uncompressed format, the bitrate is a linear function of the sample rate; i.e. doubling the sample rate doubles the bitrate.
With uncompressed audio, there is a direct relationship between the sample rate and the bitrate. A 44.1kHz 16-bit stereo signal takes 1411.2 kbps, or approximately 10.4Mb per minute to record. A 44.1kHz 16-bit mono file would take half of this, as would a 44.1kHz 8-bit stereo file or a 22.05kHz 16-bit stereo file.
But now formats like Ogg Vorbis and MP3, compress audio by making calculated guesses about the sounds humans aren’t likely to hear and then discard these sound samples. As part of this process, such formats allow us to make some of the decisions by deciding how much to throw away, or to put it more simply, how much data to use to represent the original sound. So, using our 44.1kHz stereo sample, we can choose to use as little as 48kbps or as much as approx 500kbps to store this sound. At 500kbps, more of the original sound fidelity is preserved than at 48kbps.
Calculating values
An audio file’s bit rate can be easily calculated when given sufficient information.
Bit rate = (sampling rate) x (bit depth) x (number of channels)
e.g., a recording with a 44.1 kHz sampling rate, a 16 bit depth, and 2 channels:
44100 x 16 x 2 = 1411200 bits per second, or 1411.2 kbit/s
The file size of an audio recording can also be calculated using a similar formula:
File Size (Bytes) = (sampling rate) x (bit depth) x (total channels) x (seconds) / 8
e.g. a 70 minutes long CD quality recording will take up 740MB:
44100 x 16 x 2 x 4200 / 8 = 740880000 Bytes
Some standard sampling frequencies with their applications is given below.
Sampling Rate Use 8,000 Hz Telephone, walkie-talkie, wireless intercom and wireless microphone transmission; adequate for human speech. 11,025 Hz used for lower-quality PCM, MPEG 22,050 Hz One half the sampling rate of audio CDs; used for lower-quality PCM and MPEG 32,000 Hz miniDV digital video camcorder, video tapes with extra channels of, DAT, High-quality digital wireless microphones, digitizing FM radio. 44,100 Hz Audio CD, also most commonly used with MPEG-1 audio (VCD, SVCD, MP3). Most professional audio equipment uses 44.1 kHz sampling and above. 48,000 Hz he standard audio sampling rate used by professional digital video equipment such as tape recorders, video servers, vision mixers and so on. Also used for sound with consumer video formats like DV, digital TV, DVD, and films. 96,000 Hz DVD-Audio, some LPCM DVD tracks, Blu-ray Disc audio tracks, HD DVD High-Definition DVD) audio tracks. 1.1.6 Frame rateThe frame rate is how many unique consecutive images are displayed per second in the video to give the illusion of movement; each image thus is called a ‘frame’. The human brain perceives a smooth continuous motion if shown around 24 frames per second. If the frames are less than this magic number, you will see a jerky motion rather than a smooth one. Most video creators use this frame rate.
This is not a standard of course, if your video is a screen cast you can get to frame rates as low as 5fps. Television standards such as PAL (common in Europe and some parts of Asia) uses 25fps, while NTSC standard (used in the US and Japan) uses 29.97fps. Generally you should never exceed the frame rate of the source video. Obviously, the best results will be achieved if the frame rate is kept the same as your original source.
1.1.7 ContainersA container file is used to identify and combine different data types. Simpler container formats can contain different types of audio formats, while more advanced container formats can support multiple audio and video streams, subtitles and meta-data — along with the synchronization information needed to play back the various streams together. In most cases, the file header and most of the metadata are specified by the container format. For example, container formats exist for optimized, low-quality, internet video streaming which differs from high-quality DVD streaming requirements.
The video file formats we’re familiar with, such as Quicktime movies (.mov), .avi are media container formats. Some container formats just contain audio, like WAV file fro Windows, MP3 music files or AIFF files for Macs. Others contain audio and video, such as ASF files for Windows, which contain audio compressed with the WAV codec and video compressed with the WMV codec. There are dozens of these container formats. If you’re uploading a video to an online site, check to see what formats the site supports. Sometimes this can be confusing because the list of accepted formats may have both compression formats like MPEG-4 and container formats like .mov listed.
1.2 Installing FFmpegFFmpeg is developed under GNU/Linux, but it can be compiled under most operating systems, including Mac OS X, Microsoft Windows, AmigaOS. In most of the Linux distros, you can directly install ffmpeg using their respective package managers. But in case you are looking for installing the latest version or want to customize the installation, you might need direct installation from the source code too, but as it is an involved and tricky procedure, I’m not discussing it here.
Installing FFmpeg on Ubuntu
Run the following command in the terminal to install FFmpeg.
$ sudo apt-get install ffmpeg
Installing FFmpeg on Fedora
FFmpeg can be directly installed from the repos using the following command.
$ su -c 'yum install ffmpeg'
Installing FFmpeg on CentOS
FFmpeg can be directly installed from the repos using the following command.
$ yum install ffmpeg ffmpeg-devel
Installing FFmpeg on Windows
By far the easiest way to start using FFmpeg is to get a precompiled binary. Zeranoe.com has pre-built binaries for windows, which makes it easier to install ffmpeg. So if you are using Windows you can get up and running FFmpeg in no time. Go ahead and grab the binaries from the below link.
http://ffmpeg.zeranoe.com/builds/
Once installed use the following command to get the ffmpeg version and the versions of the codecs installed.
C:\ffmpeg>ffmpeg -version
On my Windows machine it returns the following; of course this may be different on your system, depending on the version of FFmpeg installed:
ffmpeg version N-31100-g9251942, Copyright (c) 2000-2011 the FFmpeg developers
built on Jun 30 2011 21:17:59 with gcc 4.5.3
libavutil 51. 11. 0 / 51. 11. 0
libavcodec 53. 7. 0 / 53. 7. 0
libavformat 53. 4. 0 / 53. 4. 0
libavdevice 53. 2. 0 / 53. 2. 0
libavfilter 2. 24. 0 / 2. 24. 0
libswscale 2. 0. 0 / 2. 0. 0
libpostproc 51. 2. 0 / 51. 2. 0
ffmpeg N-31100-g9251942
libavutil 51. 11. 0 / 51. 11. 0
libavcodec 53. 7. 0 / 53. 7. 0
libavformat 53. 4. 0 / 53. 4. 0
libavdevice 53. 2. 0 / 53. 2. 0
libavfilter 2. 24. 0 / 2. 24. 0
libswscale 2. 0. 0 / 2. 0. 0
libpostproc 51. 2. 0 / 51. 2. 0
Adhering to the UNIX culture, FFmpeg relies on a plethora of command-line options to do its work. The generic syntax of an FFmpeg command is shown below.
ffmpeg [[infile options]['-i' infile]]...{[outfile options] outfile}...
Each section of the command is explained below.
ffmpeg – The first is the FFmpeg executable file name.
infile option – This is where you put options for your input video or audio file. This tells FFmpeg to apply any options give here to the input file before processing starts. This section is not as widely used as the ‘outfile options’.
-i infile – This is the actual video or audio file you use for processing, and also the directory of where it is located.
e.g /home/george/media/myvideo.flv. You will always need to include the `-i` option before your file name.
outfile options – This is where you will put the various options that are required which you want to be applied to the video or audio you will be creating.
outfile – The name of the output file you want to create, and also the directory path if it not the same as your input file directory.
e.g is /home/george/media/out.flv
Now that we have FFmpeg installed, in the next post we will learn about audio processing.
Subscribe now to get the next post update.
Using the new Page visibility API in your apps
One of the features lacking in the current browser api is that of determining whether the web page is currently visible to the user or is hidden (either in another tab or window).
The new Page Visibility API allows you to do just that – determine whether your web page is visible to the user, is hidden in a background tab or window, or is prerendering. It allows the developer to use the page visibility state in JavaScript logic to make the user experience more friendly; for example, by stopping video, animation or slideshow playback whenever the user switches to another browser tab or window, and resuming whenever the user switches back. Also if your page is doing some ajax processing periodically, which consumes precious system resources, we can pause it when the page is not in focus. Other use can be in analytics, checking how long the page had been in actual user focus, rather then as a hidden tab or window.
Check the below demo page to see how this works. The demo was tested in Safari, Opera 11.10, Chrome and Firefox.
Notice how the video pauses whenever the page is hidden. Also check the title of the page as it switches state.
The api is quite simple in its design – whenever a web page visibility changes, the ‘visibilitychange’ event is fired. This event can then be registered using the addEventListener method.
Currently the Page Visibility API supports three visibility states:
‘visible’ : user has opened the page and is working within it.
‘hidden’ : user has switched to another tab or minimized browser window.
‘prerender’ : browser is just prerendering a page.
However the problem with the current api implementation is that different browsers have implemented the api specification a little differntly, which can make working with them a little tedious. So one option is to use a wrapper which hides the various browser differences. Visibility.js is one such wrapper which eases the usage of the api by hiding vendor-specific property prefixes and adding some high-level functions.
Below is the complete source for the above demo page. I’d not go into further details, as the respective pages have all the detailed documentation.
<!DOCTYPE html>
<html>
<head>
<meta charset='UTF-8'>
<title></title>
<script src="lib/visibility.fallback.js"></script>
<script src="lib/visibility.js"></script>
<script src="swfobject.js"></script>
</head>
<body>
<h1>Visibility.js test</h1>
<p>Page Visibility API <strong id="support"></strong>.</p>
<p>Switch browser tabs or window.</p>
<div id="ytapiplayer">
You need Flash player 8+ and JavaScript enabled to view this video.
</div>
<script type="text/javascript">
function onYouTubePlayerReady(playerId) {
ytplayer = document.getElementById("myytplayer");
}
function play() {
if (ytplayer) ytplayer.playVideo();
}
function pause() {
if (ytplayer) ytplayer.pauseVideo();
}
var params = { allowScriptAccess: "always" };
var atts = { id: "myytplayer" };
swfobject.embedSWF("http://www.youtube.com/v/ylLzyHk54Z0?" +
"enablejsapi=1&playerapiid=ytplayer&version=3",
"ytapiplayer", "425", "356", "8",
null, null, params, atts);
/* Make sure page visibility api is supported */
var support = document.getElementById('support');
if ( Visibility.isSupported() ) {
support.innerHTML = 'is supported';
} else {
support.innerHTML = 'isn’t supported';
}
document.title = Visibility.state();
/* Pause/Play video when the page changes state */
Visibility.change(function (e, state) {
/* Also change the page title on state change */
document.title = state;
if(state == "visible")
play();
else
pause();
});
</script>
</body>
</html>
Generating clean URLs with javascript
In a recent project I needed to generate clean search urls on a form submit. There are basically two ways to do that. One is to post the search variables to the same page, and generate a clean url using PHP and then redirect it to the new url. The other way is to generate clean urls using JavaScript and immediately direct the page to the new url. This saves some processing on the server and one redirection, and also allows us to generate a url without a page refresh. I decided to go with the JavaScript solution.
A rough idea of the first option is given below.
if(isset($_GET['search'])) {
$clean_url = '';
// code to create a clean url.
// After that $clean_url will contain the new redirect url
header("Location: $clean_url");
exit();
}
<form class="myForm" action="#" method="get">
<input type="text" id="search" name="search" maxlength="100">
<input id="submit" name="submit" type="submit" value="Search">
</form>
The problem with the first option is that you cannot generate a clean url without refreshing the page. This can be problematic if your application requires you to generate clean urls without going back to the server (which entails a page refresh).
So for example if on form submission you have some ugly urls like the following.
http://mysite.com/search.php?keyword=blue+boxed+buttons
You can generate clean urls like the following on the client and then send it to the server.
http://mysite.com/search/blue-boxed-buttons
The basic search form is shown below.
<form class="myForm" action="#" method="get">
<input type="text" id="search" name="search" maxlength="100">
<input id="submit" name="submit" type="submit" value="Search">
</form>
Now the following code will handle the clean url generation on the above form submission. It will intercept the form submission, generate a clean url, and direct the page to the new url, all without a page refresh.
/* Friendly URL rewrite */
$("form").submit(function() {
/* Remove unwanted characters, only accept alphanumeric and space */
var keyword = $('#search').val().replace(/[^A-Za-z0-9 ]/g,'');
/* Replace multi spaces with a single space */
keyword = keyword.replace(/\s{2,}/g,' ');
/* Replace space with a '-' symbol */
keyword = keyword.replace(/\s/g, "-");
var cleanUrl = 'http://mysite.com/search/' + keyword.toLowerCase();
window.location = cleanUrl;
return false; // Prevent default button behaviour
});
Below is the Apache htaccess rule to go along with the above redirection. Our main search page is called ‘search.php’.
RewriteEngine on RewriteBase / RewriteRule ^search/([A-Za-z0-9-]+)/?$ search.php?keyword=$1 [NC,L]
Rejecting unwanted characters from input
It seems that some common elements of programming stump us from time to time. Take the task of filtering a input search string in PHP to remove unwanted characters. Using a RegEx many developers find it easy to search for a substring, but find it difficult to use the same to reject some particular characters from a string. A simple solution is shown below, which rejects all the characters from the input except alphanumeric and a space.
$search = "the great /%&&world ,fair of 1964";
$cleaned = preg_replace("/[^A-Za-z0-9 ]/", "", $search);
Returns:
the great world fair of 1964
The important part of the regular expression is the caret ^ along with the character class [...]. The normal character class will match elements specified in the class. For example the following will match the alphanumeric characters and a space in the class and replace them with a empty character, effectively removing them, because we have specified a empty string as the second parameter in preg_replace.
$search = "the great /%&&world ,fair of 1964";
$cleaned = preg_replace("/[A-Za-z0-9 ]/", "", $search);
Returns:
/%&&,
However if we use a negated character class, which is a character class starting with a caret ^ sign, we invert the meaning of the class. Now it means match all the characters not in the character class, which basically removes all the unwanted characters.
If you are using ereg_replace then you need to use the following,which does not have any delimiters. Note that this function has been DEPRECATED as of PHP 5.3.0, and will throw a ‘Deprecated’ error.
$cleaned = ereg_replace("[^A-Za-z0-9 ]", "", $search);


