VLD: variable length decoding¶
Contents
- VLD: variable length decoding
- Introduction
- The registers
- Reset
- Parameter and position registers
- Internal state for context selection
- Interrupts
- Stream input
- MBRING output
- Command and status registers
- Command 0: GET_UE
- Command 1: GET_SE
- Command 2: GETBITS
- Command 3: NEXT_START_CODE
- Command 4: CABAC_START
- Command 5: MORE_RBSP_DATA
- Command 6: MB_SKIP_FLAG
- Command 7: END_OF_SLICE_FLAG
- Command 8: CABAC_INIT_CTX
- Command 9: MACROBLOCK_SKIP_MBFDF
- Command 0xa: MACROBLOCK_LAYER_MBFDF
- Command 0xb: PRED_WEIGHT_TABLE
- Command 0xc: SLICE_DATA
Introduction¶
The VLD is the first stage of the VP2 decoding pipeline. It is part of PBSP and deals with decoding the H.264 bitstream into syntax elements.
The input to the VLD is the raw H.264 bitstream. The output of VLD is MBRING, a ring buffer structure storing the decoded syntax elements in the form of word-aligned packets.
The VLD only deals with parsing the NALs containing the slice data - the remaining NAL types are supposed to be parsed by the host. Further, the hardware can only parse pred_weight_table and slice_data elements efficiently - the remaining parts of the slice NAL are supposed to be parsed by the firmware controlling the VLD in a semi-manual manner: the VLD provides commands that parse single syntax elements.
The following H.264 profiles are supported:
- Constrained Baseline
- Baseline [only in single-macroblock mode if FMO used - see below]
- Main
- Progressive High
- High
- Multiview High
- Stereo High
The limitations are:
- max picture width and height: 128 macroblocks
- max macroblocks in picture: 8192
Todo
width/height max may be 255?
There are two modes of operation that VLD can be used with: single-macroblock mode and whole-slice mode. In the single-macroblock mode, parsing for each macroblock has to be manually triggered by the firmware. In whole-slice mode, the firmware triggers processing of a whole slice, and the hardware automatically iterates over all macroblocks in the slice. However, whole-slice mode doesn’t support Flexible Macroblock Ordering aka. slice groups. Thus, single-macroblock mode has to be used for sequences with non-zero value of num_slice_groups_minus1.
The VLD keeps extensive hidden internal state, including:
- pred_weight_table data, to be prepended to the next emitted macroblock
- bitstream position, zero byte count [for escaping], and lookahead buffer
- CABAC valMPS, pStateIdx, codIOFfset, codIRange state
- previously decoded parts of macroblock data, used for CABAC and CAVLC context selection algorithms
- already queued but not yet written MBRING output data
The registers¶
The VLD registers are located in PBSP XLMI space at addresses 0x00000:0x08000 [BAR0 addresses 0x103000:0x103200]. They are:
Todo
reg 0x00800
Reset¶
The engine may be reset at any time by poking the RESET register.
- BAR0 0x103028 / XLMI 0x00a00: RESET
- Any write will cause the VLD to be reset. All internal state is reset to default values. All user-writable registers are set to 0, except UNK8 which is set to 0xffffffff.
Parameter and position registers¶
The values of these registers are used by some of the VLD commands. PARM registers should be initialised with values derived from sequence parameters, picture parameters, and slice header. MB_POS should be set to the address of currently processed macroblock [for single-macroblock operation] or the first macroblock of the slice [for whole-slice operation]. In whole-slice operation, MB_POS is updated by the hardware to the position of the last macroblock in the parsed slice.
For details on use of this information by specific commands, see their documentation.
- BAR0 0x103000 / XLMI 0x00000: PARM_0
- bit 0: entropy_coding_mode_flag - set as in picture parameters
- bits 1-8: width_mbs - set to pic_width_in_mbs_minus1 + 1
- bit 9: mbaff_frame_flag - set to (mb_adaptive_frame_field_flag && !field_pic_flag)
- bits 10-11: picture_structure - one of:
- 0: frame - set if !field_pic_flag
- 1: top field - set if field_pic_flag && !bottom_field_flag
- 2: bottom field - set if field_pic_flag && bottom_field_flag
- bits 12-16: nal_unit_type - set as in the slice NAL header [XXX: check]
- bit 17: constrained_intra_pred - set as in picture parameters [XXX: check]
- bits 18-19: cabac_init_idc - set as in slice header, for P and B slices
- bits 20-21: chroma_format_idc - if parsing auxiliary coded picture, set to 0, otherwise set as in sequence parameters
- bit 22: direct_8x8_inference_flag - set as in sequence parameters
- bit 23: transform_8x8_mode_flag - set as in picture parameters
- BAR0 0x103004 / XLMI 0x00100: PARM_1
- bits 0-1: slice_type - set as in slice header
- bits 2-14: slice_tag - used to tag macroblocks in internal state with their slices, for determining availability status in CABAC/CAVLC context selection algorithms. See command description.
- bits 15-19: num_ref_idx_l0_active_minus1 - set as in slice header, for P and B slices
- bits 20-24: num_ref_idx_l1_active_minus1 - set as in slice header, for B slices
- bits 25-30: sliceqpy - set to (pic_init_qp_minus26 + 26 + slice_qp_delta)
- BAR0 0x103008 / XLMI 0x00200: MB_POS
- bits 0-12: addr - address of the macroblock
- bits 13-20: x - x coordinate of the macroblock in macroblock units
- bits 21-28: y - y coordinate of the macroblock in macroblock units
- bit 29: first - 1 if the described macroblock is the first macroblock in its slice, 0 otherwise
Internal state for context selection¶
Both CAVLC and CABAC sometimes use decoded data of previous macroblocks in the slice to determine the decoding algorithm for syntax elements of the current macroblock. The VLD thus stores this data in its internal hidden memory.
Todo
what macroblocks are stored, indexing, tagging, reset state
For each macroblock, the following data is stored:
- slice_tag
- mb_field_decoding_flag
- mb_skip_flag
- mb_type
- coded_block_pattern
- transform_size_8x8_flag
- intra_chroma_pred_mode
- ref_idx_lX[i]
- mvd_lX[i][j]
- coded_block_flag for each block
- total_coeffs for each luma 4x4 / luma AC block
Todo
and availability status?
Additionally, the following data of the previous decoded macroblock [not indexed by macroblock address] is stored:
- mb_qp_delta
Interrupts¶
Todo
write me
- BAR0 0x10301c / XLMI 0x00700: INTR_EN
- bit 0: UNK_INPUT_1
- bit 1: END_OF_STREAM
- bit 2: UNK_INPUT_3
- bit 3: MBRING_HALT
- bit 4: SLICE_DATA_DONE
- BAR0 0x103024 / XLMI 0x00900: INTR
- bits 0-3: INPUT - 0: no interrupt pending - 1: UNK_INPUT_1 - 2: END_OF_STREAM - 3: UNK_INPUT_3 - 4: SLICE_DATA_DONE
- bit 4: MBRING_FULL
Stream input¶
Todo
RE and write me
MBRING output¶
Todo
write me
Command and status registers¶
Todo
write me
Command 0: GET_UE¶
- Parameter:
- none
- Result:
- the decoded value of parsed bitfield, or 0xffffffff if out of range
Parses one ue(v) element as defined in H.264 spec. Only elements in range 0..0xfffe [up to 31 bits in the bitstream] are supported by this command. If the next bits of the bitstream are a valid ue(v) element in supported range, the element is parsed, the bitstream pointer advances past it, and its parsed value is returned as the result. Otherwise, bitstream pointer is not modified and 0xffffffff is returned.
Operation:
if (nextbits(16) != 0) {
int bitcnt = 0;
while (getbits(1) == 0)
bitcnt++;
return (1 << bitcnt) - 1 + getbits(bitcnt);
} else {
return 0xffffffff;
}
Command 1: GET_SE¶
- Parameter:
- none
- Result:
- the decoded value of parsed bitfield, or 0x80000000 if out of range
Parses one se(v) element as defined in H.264 spec. Only elements in range -0x7fff..0x7fff [up to 31 bits in the bitstream] are supported by this command. If the next bits of the bitstream are a valid se(v) element in supported range, the element is parsed, the bitstream pointer advances past it, and its parsed value is returned as the result. Otherwise, bitstream pointer is not modified and 0x80000000 is returned.
Operation:
if (nextbits(16) != 0) {
int bitcnt = 0;
while (getbits(1) == 0)
bitcnt++;
int tmp = (1 << bitcnt) - 1 + getbits(bitcnt);
if (tmp & 1)
return (tmp+1) >> 1;
else
return -(tmp >> 1);
} else {
return 0x80000000;
}
Command 2: GETBITS¶
- Parameter:
- number of bits to read, or 0 to read 32 bits [5 bits]
- Result:
- the bits from the bitstream
Given parameter n, returns the next (n?n:32) bits from the bitstream as an unsigned integer.
Operation:
return getbits(n?n:32);
Command 3: NEXT_START_CODE¶
- Parameter:
- none
- Result:
- the next start code found
Skips bytes in the raw bitstream until the start code [00 00 01] is found. Then, read the byte after the start code and return it as the result. The bitstream pointer is advanced to point after the returned byte.
Operation:
byte_align();
while (nextbytes_raw(3) != 1)
getbits_raw(8);
getbits_raw(24);
return getbits_raw(8);
Command 4: CABAC_START¶
- Parameter:
- none
- Result:
- none
Skips bits in the bitstream until the current bit position is byte-aligned, then initialises the arithmetic decoding engine registers codIRange and codIOffset, as per H.264.9.3.1.2.
Oprtation:
byte_align();
cabac_init_engine();
Command 5: MORE_RBSP_DATA¶
- Parameter:
- none
- Result:
- 1 if there’s more data in RBSP, 0 otherwise
Returns 0 if there’s a valid RBSP trailing bits element at the current bit position, 1 otherwise. Does not modify the bitstream pointer.
Operation:
return more_rbsp_data();
Command 6: MB_SKIP_FLAG¶
- Parameter:
- none
- Result:
- value of parsed mb_skip_flag
Parses the CABAC mb_skip_flag element. The SLICE_POS has to be set to the address of the macroblock to which this element applies.
Operation:
return cabac_mb_skip_flag();
Command 7: END_OF_SLICE_FLAG¶
- Parameter:
- none
- Result:
- value of parsed end_of_slice_flag
Parses the CABAC end_of_slice_flag element.
Operation:
return cabac_terminate();
Command 8: CABAC_INIT_CTX¶
- Parameter:
- none
- Result:
- none
Initialises the CABAC context variables, as per H.264.9.3.1.1. slice_type, cabac_init_idc [for P/B slices], and sliceqpy have to be set in the PARM registers for this command to work properly.
Operation:
cabac_init_ctx();
Command 9: MACROBLOCK_SKIP_MBFDF¶
- Parameter:
- mb_field_decoding_flag presence [1 bit]
- Result
- none
If parameter is 1, mb_field_decoding_flag syntax element is parsed. Otherwise, the value of mb_field_decoding_flag is inferred from preceding macroblocks. A skipped macroblock with thus determined value of mb_field_decoding_flag is emitted into the MBRING, and its data stored into internal state. SLICE_POS has to be set to the address of this macroblock.
Operation:
if (param) {
if (entropy_coding_mode_flag)
this_mb.mb_field_decoding_flag = cabac_mb_field_decoding_flag();
else
this_mb.mb_field_decoding_flag = getbits(1);
} else {
this_mb.mb_field_decoding_flag = mb_field_decoding_flag_infer();
}
this_mb.mb_skip_flag = 1;
this_mb.slice_tag = slice_tag;
mbring_emit_macroblock();
Todo
more inferred crap
Command 0xa: MACROBLOCK_LAYER_MBFDF¶
- Parameter:
- mb_field_decoding_flag presence [1 bit]
- Result:
- none
If parameter is 1, mb_field_decoding_flag syntax element is parsed. Otherwise, the value of mb_field_decoding_flag is inferred from preceding macroblocks. A macroblock_layer syntax structure is parsed from the bitstream, data for the decoded macroblock is emitted into the MBRING, and stored into internal state. SLICE_POS has to be set to the address of this macroblock.
Operation:
if (param) {
if (entropy_coding_mode_flag)
this_mb.mb_field_decoding_flag = cabac_mb_field_decoding_flag();
else
this_mb.mb_field_decoding_flag = getbits(1);
} else {
this_mb.mb_field_decoding_flag = mb_field_decoding_flag_infer();
}
this_mb.mb_skip_flag = 0;
this_mb.slice_tag = slice_tag;
macroblock_layer();
Command 0xb: PRED_WEIGHT_TABLE¶
- Parameter:
- none
- Result:
- none
Parses the pred_weight_table element, stores its contents in internal memory, and advances the bitstream to the end of the element.
Operation:
Todo
write me
Command 0xc: SLICE_DATA¶
- Parameter:
- none
- Result:
- none
Writes the stored pred_weight_table data to MBRING, parses the slice_data element, storing decoded data into MBRING, halting when the RBSP trailing bit sequence is encountered. When done, raises the MACROBLOCKS_DONE interrupt. Bitstream pointer is updated to point to the RBSP traling bits. SLICE_POS has to be set to the address of the first macroblock on slice before this command is called. When this command finishes, SLICE_POS is updated to the address of the last macroblock in the parsed slice.
Operation:
if (entropy_coding_mode_flag) {
cabac_init_ctx();
byte_align();
cabac_init_engine();
}
mb_pos.first = 1;
first = 1;
skip_pending = 0;
end = 0;
bottom = 0;
while (1) {
if (slice_type == P || slice_type == B) {
if (entropy_coding_mode_flag) {
while (1) {
tmp = cabac_mb_skip_flag();
if (!tmp)
break;
skip_pending++;
if (!mbaff_frame_flag || bottom) {
end = cabac_terminate();
if (end)
break;
}
bottom = !bottom;
}
} else {
skip_pending = get_ue();
end = !more_rbsp_data();
bottom ^= skip_pending & 1;
}
} else {
skip_pending = 0;
}
while (1) {
if (!skip_pending)
break;
if (mbaff_frame_flag && bottom && skip_pending < 2)
break;
if (first) {
first = 0;
} else {
mb_pos_advance();
}
macroblock_skip_mbfdf(0);
skip_pending--;
}
if (end)
break;
if (first) {
first = 0;
} else {
mb_pos_advance();
}
if (mbaff_frame_flag) {
if (skip_pending) {
macroblock_skip_mbfdf(1);
mb_pos_advance();
macroblock_layer_mbfdf(0);
skip_pending = 0;
} else {
if (bottom) {
macroblock_layer_mbfdf(0);
} else {
macroblock_layer_mbfdf(1);
}
}
bottom = !bottom;
} else {
macroblock_layer_mbfdf(0);
}
if (entropy_coding_mode) {
if (mbaff_frame_flag && bottom)) {
end = 0;
} else {
end = cabac_terminate();
}
} else {
end = !more_rbsp_data();
}
if (end) break;
}
trigger_intr(SLICE_DATA_DONE);