VLD: variable length decoding

Introduction

The VLD is the first stage of the VP2 decoding pipeline. It is part of PBSP and deals with decoding the H.264 bitstream into syntax elements.

The input to the VLD is the raw H.264 bitstream. The output of VLD is MBRING, a ring buffer structure storing the decoded syntax elements in the form of word-aligned packets.

The VLD only deals with parsing the NALs containing the slice data - the remaining NAL types are supposed to be parsed by the host. Further, the hardware can only parse pred_weight_table and slice_data elements efficiently - the remaining parts of the slice NAL are supposed to be parsed by the firmware controlling the VLD in a semi-manual manner: the VLD provides commands that parse single syntax elements.

The following H.264 profiles are supported:

  • Constrained Baseline
  • Baseline [only in single-macroblock mode if FMO used - see below]
  • Main
  • Progressive High
  • High
  • Multiview High
  • Stereo High

The limitations are:

  • max picture width and height: 128 macroblocks
  • max macroblocks in picture: 8192

Todo

width/height max may be 255?

There are two modes of operation that VLD can be used with: single-macroblock mode and whole-slice mode. In the single-macroblock mode, parsing for each macroblock has to be manually triggered by the firmware. In whole-slice mode, the firmware triggers processing of a whole slice, and the hardware automatically iterates over all macroblocks in the slice. However, whole-slice mode doesn’t support Flexible Macroblock Ordering aka. slice groups. Thus, single-macroblock mode has to be used for sequences with non-zero value of num_slice_groups_minus1.

The VLD keeps extensive hidden internal state, including:

  • pred_weight_table data, to be prepended to the next emitted macroblock
  • bitstream position, zero byte count [for escaping], and lookahead buffer
  • CABAC valMPS, pStateIdx, codIOFfset, codIRange state
  • previously decoded parts of macroblock data, used for CABAC and CAVLC context selection algorithms
  • already queued but not yet written MBRING output data

The registers

The VLD registers are located in PBSP XLMI space at addresses 0x00000:0x08000 [BAR0 addresses 0x103000:0x103200]. They are:

XLMI MMIO Name Description
0x00000 0x103000 PARM_0 parameters from sequence/picture parameter structs and the slice header
0x00100 0x103004 PARM_1 parameters from sequence/picture parameter structs and the slice header
0x00200 0x103008 MB_POS position of the current macroblock
0x00300 0x10300c COMMAND writing executes a VLD command
0x00400 0x103010 STATUS shows busy status of various parts of the VLD
0x00500 0x103014 RESULT result of a command
0x00700 0x10301c INTR_EN interrupt enable mask
0x00800 0x103020 ??? ???
0x00900 0x103024 INTR interrupt status
0x00a00 0x103028 RESET resets the VLD and its registers to initial state
0x01000+i*0x100 0x103040+i*4 CONF[0:8] length and enable bit of stream buffer i
0x01100+i*0x100 0x103044+i*4 OFFSET[0:8] offset of stream buffer i
0x02000 0x103080 BITPOS the bit position in the stream
0x04000 0x103100 OFFSET the MBRING offset
0x04100 0x103104 HALT_POS the MBRING halt position
0x04200 0x103108 WRITE_POS the MBRING write position
0x04300 0x10310c SIZE the MBRING size
0x04400 0x103110 TRIGGER writing executes MBRING commands

Todo

reg 0x00800

Reset

The engine may be reset at any time by poking the RESET register.

BAR0 0x103028 / XLMI 0x00a00: RESET
Any write will cause the VLD to be reset. All internal state is reset to default values. All user-writable registers are set to 0, except UNK8 which is set to 0xffffffff.

Parameter and position registers

The values of these registers are used by some of the VLD commands. PARM registers should be initialised with values derived from sequence parameters, picture parameters, and slice header. MB_POS should be set to the address of currently processed macroblock [for single-macroblock operation] or the first macroblock of the slice [for whole-slice operation]. In whole-slice operation, MB_POS is updated by the hardware to the position of the last macroblock in the parsed slice.

For details on use of this information by specific commands, see their documentation.

BAR0 0x103000 / XLMI 0x00000: PARM_0
  • bit 0: entropy_coding_mode_flag - set as in picture parameters
  • bits 1-8: width_mbs - set to pic_width_in_mbs_minus1 + 1
  • bit 9: mbaff_frame_flag - set to (mb_adaptive_frame_field_flag && !field_pic_flag)
  • bits 10-11: picture_structure - one of:
    • 0: frame - set if !field_pic_flag
    • 1: top field - set if field_pic_flag && !bottom_field_flag
    • 2: bottom field - set if field_pic_flag && bottom_field_flag
  • bits 12-16: nal_unit_type - set as in the slice NAL header [XXX: check]
  • bit 17: constrained_intra_pred - set as in picture parameters [XXX: check]
  • bits 18-19: cabac_init_idc - set as in slice header, for P and B slices
  • bits 20-21: chroma_format_idc - if parsing auxiliary coded picture, set to 0, otherwise set as in sequence parameters
  • bit 22: direct_8x8_inference_flag - set as in sequence parameters
  • bit 23: transform_8x8_mode_flag - set as in picture parameters
BAR0 0x103004 / XLMI 0x00100: PARM_1
  • bits 0-1: slice_type - set as in slice header
  • bits 2-14: slice_tag - used to tag macroblocks in internal state with their slices, for determining availability status in CABAC/CAVLC context selection algorithms. See command description.
  • bits 15-19: num_ref_idx_l0_active_minus1 - set as in slice header, for P and B slices
  • bits 20-24: num_ref_idx_l1_active_minus1 - set as in slice header, for B slices
  • bits 25-30: sliceqpy - set to (pic_init_qp_minus26 + 26 + slice_qp_delta)
BAR0 0x103008 / XLMI 0x00200: MB_POS
  • bits 0-12: addr - address of the macroblock
  • bits 13-20: x - x coordinate of the macroblock in macroblock units
  • bits 21-28: y - y coordinate of the macroblock in macroblock units
  • bit 29: first - 1 if the described macroblock is the first macroblock in its slice, 0 otherwise

Internal state for context selection

Both CAVLC and CABAC sometimes use decoded data of previous macroblocks in the slice to determine the decoding algorithm for syntax elements of the current macroblock. The VLD thus stores this data in its internal hidden memory.

Todo

what macroblocks are stored, indexing, tagging, reset state

For each macroblock, the following data is stored:

  • slice_tag
  • mb_field_decoding_flag
  • mb_skip_flag
  • mb_type
  • coded_block_pattern
  • transform_size_8x8_flag
  • intra_chroma_pred_mode
  • ref_idx_lX[i]
  • mvd_lX[i][j]
  • coded_block_flag for each block
  • total_coeffs for each luma 4x4 / luma AC block

Todo

and availability status?

Additionally, the following data of the previous decoded macroblock [not indexed by macroblock address] is stored:

  • mb_qp_delta

Interrupts

Todo

write me

BAR0 0x10301c / XLMI 0x00700: INTR_EN
  • bit 0: UNK_INPUT_1
  • bit 1: END_OF_STREAM
  • bit 2: UNK_INPUT_3
  • bit 3: MBRING_HALT
  • bit 4: SLICE_DATA_DONE
BAR0 0x103024 / XLMI 0x00900: INTR
  • bits 0-3: INPUT - 0: no interrupt pending - 1: UNK_INPUT_1 - 2: END_OF_STREAM - 3: UNK_INPUT_3 - 4: SLICE_DATA_DONE
  • bit 4: MBRING_FULL

Stream input

Todo

RE and write me

MBRING output

Todo

write me

Command and status registers

Todo

write me

Command 0: GET_UE

Parameter:
none
Result:
the decoded value of parsed bitfield, or 0xffffffff if out of range

Parses one ue(v) element as defined in H.264 spec. Only elements in range 0..0xfffe [up to 31 bits in the bitstream] are supported by this command. If the next bits of the bitstream are a valid ue(v) element in supported range, the element is parsed, the bitstream pointer advances past it, and its parsed value is returned as the result. Otherwise, bitstream pointer is not modified and 0xffffffff is returned.

Operation:

if (nextbits(16) != 0) {
    int bitcnt = 0;
    while (getbits(1) == 0)
        bitcnt++;
    return (1 << bitcnt) - 1 + getbits(bitcnt);
} else {
    return 0xffffffff;
}

Command 1: GET_SE

Parameter:
none
Result:
the decoded value of parsed bitfield, or 0x80000000 if out of range

Parses one se(v) element as defined in H.264 spec. Only elements in range -0x7fff..0x7fff [up to 31 bits in the bitstream] are supported by this command. If the next bits of the bitstream are a valid se(v) element in supported range, the element is parsed, the bitstream pointer advances past it, and its parsed value is returned as the result. Otherwise, bitstream pointer is not modified and 0x80000000 is returned.

Operation:

if (nextbits(16) != 0) {
    int bitcnt = 0;
    while (getbits(1) == 0)
        bitcnt++;
    int tmp = (1 << bitcnt) - 1 + getbits(bitcnt);
    if (tmp & 1)
        return (tmp+1) >> 1;
    else
        return -(tmp >> 1);
} else {
    return 0x80000000;
}

Command 2: GETBITS

Parameter:
number of bits to read, or 0 to read 32 bits [5 bits]
Result:
the bits from the bitstream

Given parameter n, returns the next (n?n:32) bits from the bitstream as an unsigned integer.

Operation:

return getbits(n?n:32);

Command 3: NEXT_START_CODE

Parameter:
none
Result:
the next start code found

Skips bytes in the raw bitstream until the start code [00 00 01] is found. Then, read the byte after the start code and return it as the result. The bitstream pointer is advanced to point after the returned byte.

Operation:

byte_align();
while (nextbytes_raw(3) != 1)
    getbits_raw(8);
getbits_raw(24);
return getbits_raw(8);

Command 4: CABAC_START

Parameter:
none
Result:
none

Skips bits in the bitstream until the current bit position is byte-aligned, then initialises the arithmetic decoding engine registers codIRange and codIOffset, as per H.264.9.3.1.2.

Oprtation:

byte_align();
cabac_init_engine();

Command 5: MORE_RBSP_DATA

Parameter:
none
Result:
1 if there’s more data in RBSP, 0 otherwise

Returns 0 if there’s a valid RBSP trailing bits element at the current bit position, 1 otherwise. Does not modify the bitstream pointer.

Operation:

return more_rbsp_data();

Command 6: MB_SKIP_FLAG

Parameter:
none
Result:
value of parsed mb_skip_flag

Parses the CABAC mb_skip_flag element. The SLICE_POS has to be set to the address of the macroblock to which this element applies.

Operation:

return cabac_mb_skip_flag();

Command 7: END_OF_SLICE_FLAG

Parameter:
none
Result:
value of parsed end_of_slice_flag

Parses the CABAC end_of_slice_flag element.

Operation:

return cabac_terminate();

Command 8: CABAC_INIT_CTX

Parameter:
none
Result:
none

Initialises the CABAC context variables, as per H.264.9.3.1.1. slice_type, cabac_init_idc [for P/B slices], and sliceqpy have to be set in the PARM registers for this command to work properly.

Operation:

cabac_init_ctx();

Command 9: MACROBLOCK_SKIP_MBFDF

Parameter:
mb_field_decoding_flag presence [1 bit]
Result
none

If parameter is 1, mb_field_decoding_flag syntax element is parsed. Otherwise, the value of mb_field_decoding_flag is inferred from preceding macroblocks. A skipped macroblock with thus determined value of mb_field_decoding_flag is emitted into the MBRING, and its data stored into internal state. SLICE_POS has to be set to the address of this macroblock.

Operation:

if (param) {
    if (entropy_coding_mode_flag)
        this_mb.mb_field_decoding_flag = cabac_mb_field_decoding_flag();
    else
        this_mb.mb_field_decoding_flag = getbits(1);
} else {
    this_mb.mb_field_decoding_flag = mb_field_decoding_flag_infer();
}
this_mb.mb_skip_flag = 1;
this_mb.slice_tag = slice_tag;
mbring_emit_macroblock();

Todo

more inferred crap

Command 0xa: MACROBLOCK_LAYER_MBFDF

Parameter:
mb_field_decoding_flag presence [1 bit]
Result:
none

If parameter is 1, mb_field_decoding_flag syntax element is parsed. Otherwise, the value of mb_field_decoding_flag is inferred from preceding macroblocks. A macroblock_layer syntax structure is parsed from the bitstream, data for the decoded macroblock is emitted into the MBRING, and stored into internal state. SLICE_POS has to be set to the address of this macroblock.

Operation:

if (param) {
    if (entropy_coding_mode_flag)
        this_mb.mb_field_decoding_flag = cabac_mb_field_decoding_flag();
    else
        this_mb.mb_field_decoding_flag = getbits(1);
} else {
    this_mb.mb_field_decoding_flag = mb_field_decoding_flag_infer();
}
this_mb.mb_skip_flag = 0;
this_mb.slice_tag = slice_tag;
macroblock_layer();

Command 0xb: PRED_WEIGHT_TABLE

Parameter:
none
Result:
none

Parses the pred_weight_table element, stores its contents in internal memory, and advances the bitstream to the end of the element.

Operation:

Todo

write me

Command 0xc: SLICE_DATA

Parameter:
none
Result:
none

Writes the stored pred_weight_table data to MBRING, parses the slice_data element, storing decoded data into MBRING, halting when the RBSP trailing bit sequence is encountered. When done, raises the MACROBLOCKS_DONE interrupt. Bitstream pointer is updated to point to the RBSP traling bits. SLICE_POS has to be set to the address of the first macroblock on slice before this command is called. When this command finishes, SLICE_POS is updated to the address of the last macroblock in the parsed slice.

Operation:

if (entropy_coding_mode_flag) {
    cabac_init_ctx();
    byte_align();
    cabac_init_engine();
}
mb_pos.first = 1;
first = 1;
skip_pending = 0;
end = 0;
bottom = 0;
while (1) {
    if (slice_type == P || slice_type == B) {
        if (entropy_coding_mode_flag) {
            while (1) {
                tmp = cabac_mb_skip_flag();
                if (!tmp)
                    break;
                skip_pending++;
                if (!mbaff_frame_flag || bottom) {
                    end = cabac_terminate();
                    if (end)
                        break;
                }
                bottom = !bottom;
            }
        } else {
            skip_pending = get_ue();
            end = !more_rbsp_data();
            bottom ^= skip_pending & 1;
        }
    } else {
        skip_pending = 0;
    }
    while (1) {
        if (!skip_pending)
            break;
        if (mbaff_frame_flag && bottom && skip_pending < 2)
            break;
        if (first) {
            first = 0;
        } else {
            mb_pos_advance();
        }
        macroblock_skip_mbfdf(0);
        skip_pending--;
    }
    if (end)
        break;
    if (first) {
        first = 0;
    } else {
        mb_pos_advance();
    }
    if (mbaff_frame_flag) {
        if (skip_pending) {
            macroblock_skip_mbfdf(1);
            mb_pos_advance();
            macroblock_layer_mbfdf(0);
            skip_pending = 0;
        } else {
            if (bottom) {
                macroblock_layer_mbfdf(0);
            } else {
                macroblock_layer_mbfdf(1);
            }
        }
        bottom = !bottom;
    } else {
        macroblock_layer_mbfdf(0);
    }
    if (entropy_coding_mode) {
        if (mbaff_frame_flag && bottom)) {
            end = 0;
        } else {
            end = cabac_terminate();
        }
    } else {
        end = !more_rbsp_data();
    }
    if (end) break;
}
trigger_intr(SLICE_DATA_DONE);