Video for Linux Two - Image Data Formats

Bill Dirks - June 26, 2003

In order to exchange images between drivers and applications, it is necessary to have standard image data formats which both sides will interpret the same way. V4L2 includes several such formats, and this document is intended to be an unambiguous specification for the standard image data formats in V4L2.

V4L2 drivers are not limited to these formats, however. Driver-specific formats are possible. In that case the application may depend on a codec driver to convert images to one of the standard formats when needed. But the data can still be stored and retreived in the proprietary format. For example, a device may support a proprietary compressed format. Applications can still capture and save the data in the compressed format, saving much disk space, and later use a codec device driver to convert the images to the X Windows screen format when the video is to be displayed.

Even so, ultimately, some standard formats are needed, so the V4L2 specification would not be complete without well-defined standard formats.
 
 

About the V4L2 Standard Formats

The V4L2 standard formats are all uncompressed formats.

The pixels are always arranged in memory from left to right, and from top to bottom. The first byte of data in the image buffer is always for the leftmost pixel of the topmost row. Following that is the pixel immediately to its right, and so on until the end of the top row of pixels. Following the rightmost pixel of the row there may be zero or more bytes of padding to guarantee that each row of pixel data has a certain alignment. Following the pad bytes, if any, is data for the leftmost pixel of the second row from the top, and so on. The last row has just as many pad bytes after it as the other rows.

The formats fall into two broad categories, the RGB formats and YUV formats. The YUV formats all use the YCbCr color space used in the ITU-R601 and ITU-R656 digital video standards. There is more information about the YCbCr color space later in this document.

In V4L2, each format has an identifier which looks like PIX_FMT_XXX, defined in videodev.h.

The rest of this document describes each standard format.

In order to make the specifications endianness independent, the following diagrams show the order of the data in memory on a byte by byte basis. Each cell of the diagrams is one byte. The bytes are arranged in memory from left to right, top to bottom. Possible pad bytes after each row are not shown.
 
 

The RGB Formats

These formats are designed to match the pixel formats of typical PC graphics frame buffers. Four formats are defined, two 16-bits per pixel, one 24 bpp, and one 32 bpp. These are all packed-pixel formats, meaning all the data for a pixel are next to each other in memory.

V4L2_PIX_FMT_RGB555,
V4L2_PIX_FMT_RGB565
 

A 4x4 image. Each cell is one byte.
p00 q00 p01 q01 p02 q02 p03 q03
p10 q10 p11 q11 p12 q12 p13 q13
p20 q20 p21 q21 p22 q22 p23 q23
p30 q30 p31 q31 p32 q32 p33 q33

Each pixel is two bytes, denoted here as p and q. For RGB 5-5-5, each pair of bytes contains five bits of red, five bits of green, five bits of blue, and one extra bit. The value of the extra bit is undefined. For RGB 5-6-5 there are six green bits and no extra bits. The RGB bits are arranged in p and q like this:
 

RGB 5-5-5
bit (MSB) 7 6 5 4 3 2 1 0 (LSB)
p G2 G1 G0 R4 R3 R2 R1 R0
q ? B4 B3 B2 B1 B0 G4 G3

 
RGB 5-6-5
bit 7 6 5 4 3 2 1 0
p G2 G1 G0 R4 R3 R2 R1 R0
q B4 B3 B2 B1 B0 G5 G4 G3



V4L2_PIX_FMT_RGB555X,
V4L2_PIX_FMT_RGB565X
 

A 4x4 image. Each cell is one byte.
p00 q00 p01 q01 p02 q02 p03 q03
p10 q10 p11 q11 p12 q12 p13 q13
p20 q20 p21 q21 p22 q22 p23 q23
p30 q30 p31 q31 p32 q32 p33 q33

Each pixel is two bytes, denoted here as p and q. For RGB 5-5-5, each pair of bytes contains five bits of red, five bits of green, five bits of blue, and one extra bit. The value of the extra bit is undefined. For RGB 5-6-5 there are six green bits and no extra bits. These RGB555X and RGB565X are the same as RGB555 and RGB565, except the bytes are swapped in each pixel. The RGB bits are arranged in p and q like this:
 

RGB 5-5-5
bit (MSB) 7 6 5 4 3 2 1 0 (LSB)
p
? B4 B3 B2 B1 B0 G4 G3
q
G2 G1 G0 R4 R3 R2 R1 R0

 
RGB 5-6-5
bit 7 6 5 4 3 2 1 0
p B4 B3 B2 B1 B0 G5 G4 G3
q G2 G1 G0 R4 R3 R2 R1 R0


 

V4L2_PIX_FMT_BGR24
 

A 4x4 image. Each cell is one byte.
B00 G00 R00 B01 G01 R01 B02 G02 R02 B03 G03 R03
B10 G10 R10 B11 G11 R11 B12 G12 R12 B13 G13 R13
B20 G20 R20 B21 G21 R21 B22 G22 R22 B23 G23 R23
B30 G30 R30 B31 G31 R31 B22 G32 R32 B33 G33 R33

Each pixel is three bytes. B is first, then G then R.
 
 

V4L2_PIX_FMT_RGB24
 

A 4x4 image. Each cell is one byte.
R00 G00 B00 R01 G01 B01 R02 G02 B02 R03 G03 B03
R10 G10 B10 R11 G11 B11 R12 G12 B12 R13 G13 B13
R20 G20 B20 R21 G21 B21 R22 G22 B22 R23 G23 B23
R30 G30 B30 R31 G31 B31 R22 G32 B32 R33 G33 B33

Each pixel is three bytes. R is first, then G then B.
 
 

V4L2_PIX_FMT_BGR32
 

A 4x4 image. Each cell is one byte.
B00 G00 R00 ? B01 G01 R01 ? B02 G02 R02 ? B03 G03 R03 ?
B10 G10 R10 ? B11 G11 R11 ? B12 G12 R12 ? B13 G13 R13 ?
B20 G20 R20 ? B21 G21 R21 ? B22 G22 R22 ? B23 G23 R23 ?
B30 G30 R30 ? B31 G31 R31 ? B22 G32 R32 ? B33 G33 R33 ?

Each pixel is four bytes. B is first, then G then R, then an extra byte. The value of the extra byte is undefined.
 
 

V4L2_PIX_FMT_RGB32
 

A 4x4 image. Each cell is one byte.
R00 G00 B00 ? R01 G01 B01 ? R02 G02 B02 ? R03 G03 B03 ?
R10 G10 B10 ? R11 G11 B11 ? R12 G12 B12 ? R13 G13 B13 ?
R20 G20 B20 ? R21 G21 B21 ? R22 G22 B22 ? R23 G23 B23 ?
R30 G30 B30 ? R31 G31 B31 ? R22 G32 B32 ? R33 G33 B33 ?

Each pixel is four bytes. R is first, then G then B, then an extra byte. The value of the extra byte is undefined.
 
 

V4L2_PIX_FMT_RGB332
 

A 4x4 image. Each cell is one byte.
p00 p01 p02 p03
p10 p11 p12 p13
p20 p21 p22 p23
p30 p31 p32 p33

Each pixel is one byte. This format is intended for use with 8-bit colormap displays. Each byte contains three bits of red, three bits of green, and two bits of blue. The RGB bits are arranged in the bytes like this:
 

bit 7 6 5 4 3 2 1 0
p
B1
B0 G2 G1 G0 R2 R1 R0

 

YUV Formats

These formats are designed to be compatible with devices that use ITU-R601 or ITU-R656 digital video internally. They use the YCbCr color space. YCbCr is a modified YUV format. In YCbCr, Y ranges from 16, corresponding to 0.0; to 235, corresponding to 1.0 or full brightness. Cb and Cr range from 16, corresponding to -0.5; to 240, corresponding to +0.5 (128 corresponds to 0.0). To convert from YCbCr to RGB, where the R, G and B should range from 0 to 255, use the following transforms:
 
Y = (255/219)(Y - 16)
U = (127/112)(Cb - 128)
V = (127/112)(Cr - 128)

That gives a Y as 0...255, and U and V as -127...+127. Convert to RGB:
 

R = Y + 1.402V
G = Y - 0.344U - 0.714V
B = Y + 1.772U

If you are writing a color space conversion routine take note: Due to image filtering, brightness controls, and other common video operations, it is normal that YCbCr values can go out of range. It is also normal for the computed R, G, or B values to be below 0 or above 255, even if YCbCr were in their legal range. It is necessary for a conversion algorithm to clamp all the result values to their legal range.

The inverse transform to convert RGB into YCbCr can be derived (how are your linear algebra skills?) and is as follows:
 

Y = 0.2990R + 0.5670G + 0.1140B
U = -0.1687R - 0.3313G + 0.5000B
V = 0.5000R - 0.4187G - 0.0813B

Which gives Y as 0...255, and U and V as -127 to 127. Then convert to YCbCr ranges:
 

Y = (219/255)Y + 16
Cb = (112/127)U + 128
Cr = (112/127)V + 128

The purpose of using this color space is to separate the brightness information (Y) from the color information (U and V or Cb and Cr). It is a property of the human visual system that brightness information is more important, and color information can be partially discarded with little loss of perceptual quality. Therefore the YUV formats always use fewer Cb's and Cr's than Y's. There is always one Y per pixel. The YUV formats differ by how much color information is discarded, and by how the Y's, Cb's and Cr's are arranged in memory.
 
 

V4L2_PIX_FMT_YUYV,
V4L2_PIX_FMT_UYVY
V4L2_PIX_FMT_VYUY
V4L2_PIX_FMT_YVYU
 

A 4x4 YUYV image. Each cell is one byte.
Y00 Cb00 Y01 Cr00 Y02 Cb02 Y03 Cr02
Y10 Cb10 Y11 Cr10 Y12 Cb12 Y13 Cr12
Y20 Cb20 Y21 Cr20 Y22 Cb22 Y23 Cr22
Y30 Cb30 Y31 Cr30 Y32 Cb32 Y33 Cr32

In these formats each four bytes is two pixels. Each four bytes is two Y's, a Cb and a Cr. Each Y goes to one of the pixels, and the Cb and Cr belong to both pixels. As you can see, the Cr and Cb components have half the horizontal resolution of the Y component. V4L2_PIX_FMT_UYVY is the same, except the data are arranged in a different order: Cb-Y-Cr-Y. V4L2_PIX_FMT_YUYV is known in the Windows environment as YUY2. Similarly, V4L2_PIX_FMT_VYUY uses byte order Cr-Y-Cb-Y, and V4L2_PIX_FMT_YVYU is Y-Cr-Y-Cb.
 

V4L2_PIX_FMT_Y41P
 

An 8x4 image. Each cell is one byte.
Cb00 Y00 Cr00 Y01 Cb04 Y02 Cr04 Y03 Y04 Y05 Y06 Y07
Cb10 Y10 Cr10 Y11 Cb14 Y12 Cr14 Y13 Y14 Y15 Y16 Y17
Cb20 Y20 Cr20 Y21 Cb24 Y22 Cr24 Y23 Y24 Y25 Y26 Y27
Cb30 Y30 Cr30 Y31 Cb34 Y32 Cr34 Y33 Y34 Y35 Y36 Y37

In this format each 12 bytes is eight pixels. In the twelve bytes are two CbCr pairs and eight Y's. The first CbCr pair goes with the first four Y's, and the second CbCr pair goes with the other four Y's. The Cb and Cr components have one fourth the horizontal resolution of the Y component.
 

V4L2_PIX_FMT_YVU420,
V4L2_PIX_FMT_YUV420
 

A 4x4 image. Each cell is one byte.
Y00 Y01 Y02 Y03
Y10 Y11 Y12 Y13
Y20 Y21 Y22 Y23
Y30 Y31 Y32 Y33
Cr00 Cr02
Cr20 Cr22
Cb00 Cb02
Cb20 Cb22

These are planar formats, as opposed to a packed format. The three components are separated into three sub-images or planes. The Y plane is first. The Y plane has one byte per pixel. For V4L2_PIX_FMT_YVU420, the Cr plane immediately follows the Y plane in memory. The Cr plane is half the width and half the height of the Y plane (and of the image). Each Cr belongs to four pixels, a two-by-two square of the image. For example, Cr00 belongs to Y00, Y01, Y10, and Y11. Following the Cr plane is the Cb plane, just like the Cr plane. V4L2_PIX_FMT_YUV420 is the same except the Cb plane comes first, then the Cr plane.

If the Y plane has pad bytes after each row, then the Cr and Cb planes have half as many pad bytes after their rows. In other words, two Cx rows (including padding) is exactly as long as one Y row (including padding).
 
 

V4L2_PIX_FMT_YVU410,
V4L2_PIX_FMT_YUV410

 

A 4x4 image. Each cell is one byte.
Y00 Y01 Y02 Y03
Y10 Y11 Y12 Y13
Y20 Y21 Y22 Y23
Y30 Y31 Y32 Y33
Cr00
Cb00

This is a planar format, as opposed to a packed format. The three components are separated into three sub-images or planes. The Y plane is first. The Y plane has one byte per pixel. For V4L2_PIX_FMT_YVU410, the Cr plane immediately follows the Y plane in memory. The Cr plane is ¼ the width and ¼ the height of the Y plane (and of the image). Each Cr belongs to 16 pixels, a four-by-four square of the image. Following the Cr plane is the Cb plane, just like the Cr plane. V4L2_PIX_FMT_YUV410 is the same, except the Cb plane comes first, then the Cr plane.

If the Y plane has pad bytes after each row, then the Cr and Cb planes have ¼ as many pad bytes after their rows. In other words, four C x rows (including padding) is exactly as long as one Y row (including padding).
 
 

V4L2_PIX_FMT_YUV422P

 

A 4x4 image. Each cell is one byte.
Y00 Y01 Y02 Y03
Y10 Y11 Y12 Y13
Y20 Y21 Y22 Y23
Y30 Y31 Y32 Y33
Cb00 Cb02
Cb10 Cb12
Cb20 Cb22
Cb30 Cb32
Cr00 Cr02
Cr10 Cr12
Cr20 Cr22
Cr30 Cr32

This format is not commonly used. This is a planar version of the YUYV format. The three components are separated into three sub-images or planes. The Y plane is first. The Y plane has one byte per pixel. The Cb plane immediately follows the Y plane in memory. The Cb plane is half the width of the Y plane (and of the image). Each Cb belongs to two pixels. For example, Cb00 belongs to Y00, Y01. Following the Cb plane is the Cr plane, just like the Cb plane.

If the Y plane has pad bytes after each row, then the Cr and Cb planes have half as many pad bytes after their rows. In other words, two Cx rows (including padding) is exactly as long as one Y row (including padding).
 
 

V4L2_PIX_FMT_YUV411P

 

A 4x4 image. Each cell is one byte.
Y00 Y01 Y02 Y03
Y10 Y11 Y12 Y13
Y20 Y21 Y22 Y23
Y30 Y31 Y32 Y33
Cb00
Cb10
Cb20
Cb30
Cr00
Cr10
Cr20
Cr30

This format is not commonly used. This is a planar format similar to the 422 planar format except with half as many chroma. The three components are separated into three sub-images or planes. The Y plane is first. The Y plane has one byte per pixel. The Cb plane immediately follows the Y plane in memory. The Cb plane is ¼ the width of the Y plane (and of the image). Each Cb belongs to 4 pixels all on the same row. For example, Cb00 belongs to Y00, Y01, Y02 and Y03. Following the Cb plane is the Cr plane, just like the Cb plane.

If the Y plane has pad bytes after each row, then the Cr and Cb planes have ¼ as many pad bytes after their rows. In other words, four C x rows (including padding) is exactly as long as one Y row (including padding).
 
 


V4L2_PIX_FMT_NV12


A 4x4 image. Each cell is one byte.
Y00 Y01 Y02 Y03
Y10 Y11 Y12 Y13
Y20 Y21 Y22 Y23
Y30 Y31 Y32 Y33
Cb00 Cr00 Cb02 Cr02
Cb20 Cr20 Cb20 Cr22

This is a two-plane version of the YUV420 format. The three components are separated into two sub-images or planes. The Y plane is first. The Y plane has one byte per pixel. Immediately following that in memory is a combined CbCr plane. The CbCr plane is the same width, in bytes, as the Y plane (and of the image), but is half as tall. Each CbCr pair belongs to four pixels. For example, Cb00/Cr00 belongs to Y00, Y01, Y10, Y11. 

If the Y plane has pad bytes after each row, then the CbCr plane has as many pad bytes after its rows.
 


V4L2_PIX_FMT_GREY
 

A 4x4 image. Each cell is one byte.
Y00 Y01 Y02 Y03
Y10 Y11 Y12 Y13
Y20 Y21 Y22 Y23
Y30 Y31 Y32 Y33

This is a greyscale (black and white) image. It is really a degenerate YCbCr format which simply contains no Cr or Cb data. Y ranges from 16 (darkest) to 235 (lightest).