Overview of VP1 video processor¶
Contents
Introduction¶
VP1 is basically a vector processor with PFIFO interface and custom ISA designed for video decoding. It’s made of the following units:
- the processor proper:
- 8kb of dedicated data RAM
- instruction cache
- memory interface
- DMA engine
- FIFO interface
There are 3 variants of VP1:
- the original NV41 variant [NV41:NV44]
- the NV44 variant, with improved context switching and method processing capabilities [NV44:G80]
- the G80 variant, with changes to match G80 memory model [G80:G84]
Interrupts¶
Todo
write me
Instruction fetching¶
Todo
write me
Instruction format¶
On VP1, all instructions are stored as 32-bit little-endian words. The top 8 bits of the instruction word determine the opcode, and the highest bits of the opcode determine its execution unit:
0x00-0x7f
: scalar instructions0x80-0xbf
: vector instructions0xc0-0xdf
: address instructions0xe0-0xff
: branch instructions
The executed instruction stream is divided into so-called instruction bundles, which are groups of up to 4 instructions targetting distinct execution units. The instructions of a bundle are executed in parallel - they don’t see each others’ changes to register state. In some cases, however, the scalar instruction of a bundle computes data to be used by the vector instruction of the same bundle (so-called s2v path).
The instructions are grouped into bundles as follows:
- bundles cannot cross aligned 4 word (16 byte) bounduaries
- a bundle can contain an arbitrary nonempty subset of (address, scalar, vector, branch) instructions, in that order
- bundles are as long as possible, subject to the previous restrictions
In other words, an instruction word starts a new bundle iff:
- the instruction starts on aligned 4-word bounduary, or
- the current bundle already contains an instruction of this kind, or a higher kind (where branch > vector > scalar > address)
For example, the following splits happen:
|A|A|A|A|A|A|A|A|
- if only instructions of a single kind are fetched, each executes as one bundle|A S V B|A S V B|
- the perfect case, 4-bundle instructions|A V|S B|S|A V B|
- if instructions are in the wrong order, they won’t make a bundle|A|A|A S|V B|B|B|
- the 4 middle instructions can’t be in a single bundle, because there is a 4-word bounduary in the middle|B|V|S|A|B|V|S|A|
- worst case, all instructions in wrong order
Registers¶
VP1 has 15 distinct register files:
$r0-$r30
(with$r31
hardwired to 0): 32-bit scalar registers, sometimes treated as groups of 4 bytes for SIMD instructions$v0-$v31
: 128-bit vector registers, treated as groups of 16 bytes for SIMD instructions$a0-$a31
: 32-bit address registers, they have funny bitfields used for memory addressing, looping, and mode selection$c0-$c3
: 16-bit condition code registers, split into individual bits belonging to one of the four execution units, used for branching and conditionally selecting inputs$vc0-$vc3
: 32-bit vector condition code registers, like$c
, but with different fields, and each bit is duplicated 16 times (one for each vector component)$va
, 448-bit vector accumulator, split into 16 components, each with 12 integer and 16 fractional bits$vx
, 128-bit vector extra register, split into 16 components like normal$v
$l0-$l3
: 16-bit loop registers, split into 8-bit loop counter and 8-bit loop total count$m0-$m63
: 32-bit method registers$x0-$x15
: 32-bit extra registers (G80 only)$d0-$d7
: 17-bit DMA object registers (G80 only)$f0-$f1
: FIFO special registers$sr0-$sr31
: misc special registers$mi0-$mi31
: memory interface special registers$uc0-$uc31
: processor control special registers
Todo
incomplete for <G80
Condition code registers¶
There are 4 condition code registers, $c0-$c3
. Each of them has
the following bitfields:
- bits 0-7: scalar flags
- bits 8-10: address flags
- bits 11-12: unused, always 0
- bit 13: branch flag
- bit 14: always 0
- bit 15: always 1
Special registers¶
Todo
write me
Extra registers¶
The G80 variant of VP1 introduced 16 extra registers, $x0-$x15
, each of
them 32 bits long. They have no special semantics and the only way to access
them is by using the mov to/from alternate register file scalar
instruction.
Instruction conflicts¶
Sometimes, instructions within a single bundle interact in funny ways:
one instruction of a bundle writes a register, another reads it: in this case, the old value is read (every instruction reads all its inputs as they were before the whole bundle started executing).
more than one instruction of a bundle writes a single register:
- for a multiply written
$r
register, the first of the following wins:- scalar instruction, except mov from
$c
,$v
,$a
,$l
(but including mov from$m
and$x
) - address instruction (load to
$r
) - scalar mov from
$c
,$v
,$a
,$l
- scalar instruction, except mov from
- for
$v
:- vector instruction
- address instruction (load to
$v
) - scalar mov to
$v
- for
$a
:- scalar mov to
$a
- address instruction
- scalar mov to
- for
$l
:- branch instruction
- scalar mov to
$l
Relying on any of that is probably a very bad idea.
- for a multiply written
two instructions of a bundle fight for a shared read port, and one of them wins:
- if both the scalar (mov from
$v
) and address (store$v
orldr
) instructions read from$v
registers, the$v
register read by the address instruction is forced to be the one used by the scalar instruction - if the scalar instruction is a mov to other register file, and the address
instruction reads from a
$r
register (store$r
), the$r
register read by the scalar instruction is forced to the one used by the address instruction
Relying on that is also a very bad idea. Avoid issuing such bundles.
- if both the scalar (mov from
if the scalar instruction is a mov from
$l
, and the branch instruction isexit
, the$r
register won’t be written.Needless to say, don’t do that.
Todo
mov from $sr, $uc, $mi, $f, $d