FIFO overview

Introduction

Commands to most of the engines are sent through a special engine called PFIFO. PFIFO maintains multiple fully independent command queues, known as “channels” or “FIFO”s. Each channel is controlled through a “channel control area”, which is a region of MMIO [pre-GF100] or VRAM [GF100+]. PFIFO intercepts all accesses to that area and acts upon them.

PFIFO internally does time-sharing between the channels, but this is transparent to the user applications. The engines that PFIFO controls are also aware of channels, and maintain separate context for each channel.

The context-switching ability of PFIFO depends on card generation. Since NV40, PFIFO is able to switch between channels at essentially any moment. On older cards, due to lack of backing storage for the CACHE, a switch is only possible when the CACHE is empty. The PFIFO-controlled engines are, however, much worse at switching: they can only switch between commands. While this wasn’t a big problem on old cards, since the commands were guaranteed to execute in finite time, introduction of programmable shaders with looping capabilities made it possible to effectively hang the whole GPU by launching a long-running shader.

Todo

check if it still holds on GF100

On NV1:NV4, the only engine that PFIFO controls is PGRAPH, the main 2d/3d engine of the card. In addition, PFIFO can submit commands to the SOFTWARE pseudo-engine, which will trigger an interrupt for every submitted method.

The engines that PFIFO controls on NV4:GF100 are:

Id Present on Name Description
0 all SOFTWARE Not really an engine, causes interrupt for each command, can be used to execute driver functions in sync with other commands.
1 all PGRAPH Main engine of the card: 2d, 3d, compute.
2 NV31:G98 G200:MCP77 PMPEG The PFIFO interface to VPE MPEG2 decoding engine.
3 NV40:G84 PME VPE motion estimation engine.
4 NV41:G84 PVP1 VPE microcoded vector processor.
4 VP2 PVP2 xtensa-microcoded vector processor.
5 VP2 PCIPHER AES cryptography and copy engine.
6 VP2 PBSP xtensa-microcoded bitstream processor.
2 VP3- PPPP falcon-based video post-processor.
4 VP3- PPDEC falcon-based microcoded video decoder.
5 VP3 PSEC falcon-based AES crypto engine. On VP4, merged into PVLD.
6 VP3- PVLD falcon-based variable length decoder.
3 GT215- PCOPY falcon-based memory copy engine.
5 MCP89:GF100 PVCOMP falcon-based video compositing engine.

The engines that PFIFO controls on GF100- are:

Id Id Id Id Id Present on Name Description
GF100 GK104 GK208 GK20A GM107      
1f 1f 1f 1f 1f all SOFTWARE Not really an engine, causes interrupt for each command, can be used to execute driver functions in sync with other commands.
0 0 0 0 0 all PGRAPH Main engine of the card: 2d, 3d, compute.
1 1 1 ? - GF100:GM107 PPDEC falcon-based microcoded picture decoder.
2 2 2 ? - GF100:GM107 PPPP falcon-based video post-processor.
3 3 3 ? - GF100:GM107 PVLD falcon-based variable length decoder.
4,5 - - - - GF100:GK104 PCOPY falcon-based memory copy engines.
- 6 5 ? 2 GK104: PVENC falcon-based H.264 encoding engine.
- 4,5.7 4,-.6 ? 4,-.5 GK104: PCOPY Memory copy engines.
- - - ? 1 GM107: PVDEC falcon-based unified video decoding engine
- - - ? 3 GM107: PSEC falcon-based AES crypto engine, recycled

This file deals only with the user-visible side of the PFIFO. For kernel-side programming, see NV1:NV4 PFIFO engine, NV4:G80 PFIFO engine, Tesla PFIFO engine, or GF100+ PFIFO engine.

Note

GF100 information can still be very incomplete / not exactly true.

Overall operation

The PFIFO can be split into roughly 4 pieces:

  • PFIFO pusher: collects user’s commands and injects them to
  • PFIFO CACHE: a big queue of commands waiting for execution by
  • PFIFO puller: executes the commands, passes them to the proper engine, or to the driver.
  • PFIFO switcher: ticks out the time slices for the channels and saves / restores the state of the channels between PFIFO registers and RAMFC memory.

A channel consists of the following:

  • channel mode: PIO [NV1:GF100], DMA [NV4:GF100], or IB [G80-]
  • PFIFO DMA pusher state [DMA and IB channels only]
  • PFIFO CACHE state: the commands already accepted but not yet executed
  • PFIFO puller state
  • RAMFC: area of VRAM storing the above when channel is not currently active on PFIFO [not user-visible]
  • RAMHT [pre-GF100 only]: a table of “objects” that the channel can use. The objects are identified by arbitrary 32-bit handles, and can be DMA objects [see NV3 DMA objects, NV4:G80 DMA objects, DMA objects] or engine objects [see Puller - handling of submitted commands by FIFO and engine documentation]. On pre-G80 cards, individual objects can be shared between channels.
  • vspace [G80+ only]: A hierarchy of page tables that describes the virtual memory space visible to engines while executing commands for the channel. Multiple channels can share a vspace. [see Tesla virtual memory, GF100 virtual memory]
  • engine-specific state

Channel mode determines the way of submitting commands to the channel. PIO mode is available on pre-GF100 cards, and involves poking the methods directly to the channel control area. It’s slow and fragile - everything breaks down easily when more than one channel is used simultanously. Not recommended. See PIO submission to FIFOs for details. On NV1:NV40, all channels support PIO mode. On NV40:G80, only first 32 channels support PIO mode. On G80:GF100 only channel 0 supports PIO mode.

Todo

check PIO channels support on NV40:G80

NV1 PFIFO doesn’t support any DMA mode.

NV3 PFIFO introduced a hacky DMA mode that requires kernel assistance for every submitted batch of commands and prevents channel switching while stuff is being submitted. See DMA submission for details.

NV4 PFIFO greatly enhanced the DMA mode and made it controllable directly through the channel control area. Thus, commands can now be submitted by multiple applications simultaneously, without coordination with each other and without kernel’s help. DMA mode is described in DMA submission to FIFOs on NV4.

G80 introduced IB mode. IB mode is a modified version of DMA mode that, instead of following a single stream of commands from memory, has the ability to stitch together parts of multiple memory areas into a single command stream - allowing constructs that submit commands with parameters pulled directly from memory written by earlier commands. IB mode is described along with DMA mode in DMA submission to FIFOs on NV4.

GF100 rearchitectured the whole PFIFO, made it possible to have up to 3 channels executing simultaneously, and introduced a new DMA packet format.

The commands, as stored in CACHE, are tuples of:

  • subchannel: 0-7
  • method: 0-0x1ffc [really 0-0x7ff] pre-GF100, 0-0x3ffc [really 0-0xfff] GF100+
  • parameter: 0-0xffffffff
  • submission mode [NV10+]: I or NI

Subchannel identifies the engine and object that the command will be sent to. The subchannels have no fixed assignments to engines/objects, and can be freely bound/rebound to them by using method 0. The “objects” are individual pieces of functionality of PFIFO-controlled engine. A single engine can expose any number of object types, though most engines only expose one.

The method selects an individual command of the object bound to the selected subchannel, except methods 0-0xfc which are special and are executed directly by the puller, ignoring the bound object. Note that, traditionally, methods are treated as 4-byte addressable locations, and hence their numbers are written down multiplied by 4: method 0x3f thus is written as 0xfc. This is a leftover from PIO channels. In the documentation, whenever a specific method number is mentioned, it’ll be written pre-multiplied by 4 unless specified otherwise.

The parameter is an arbitrary 32-bit value that accompanies the method.

The submission mode is I if the command was submitted through increasing DMA packet, or NI if the command was submitted through non-increasing packet. This information isn’t actually used for anything by the card, but it’s stored in the CACHE for certain optimisation when submitting PGRAPH commands.

Method execution is described in detail in DMA puller and engine-specific documentation.

Pre-NV1A, PFIFO treats everything as little-endian. NV1A introduced big-endian mode, which affects pushbuffer/IB reads and semaphores. On NV1A:G80 cards, the endianness can be selected per channel via the big_endian flag. On G80+ cards, PFIFO endianness is a global switch.

Todo

look for GF100 PFIFO endian switch

The channel control area endianness is not affected by the big_endian flag or G80+ PFIFO endianness switch. Instead, it follows the PMC MMIO endianness switch.

Todo

is it still true for GF100, with VRAM-backed channel control area?