FIFO overview ¶

Contents

FIFO overview
- Introduction
- Overall operation

Introduction ¶

Commands to most of the engines are sent through a special engine called PFIFO. PFIFO maintains multiple fully independent command queues, known as “channels” or “FIFO”s. Each channel is controlled through a “channel control area”, which is a region of MMIO [pre-GF100] or VRAM [GF100+]. PFIFO intercepts all accesses to that area and acts upon them.

PFIFO internally does time-sharing between the channels, but this is transparent to the user applications. The engines that PFIFO controls are also aware of channels, and maintain separate context for each channel.

The context-switching ability of PFIFO depends on card generation. Since NV40, PFIFO is able to switch between channels at essentially any moment. On older cards, due to lack of backing storage for the CACHE, a switch is only possible when the CACHE is empty. The PFIFO-controlled engines are, however, much worse at switching: they can only switch between commands. While this wasn’t a big problem on old cards, since the commands were guaranteed to execute in finite time, introduction of programmable shaders with looping capabilities made it possible to effectively hang the whole GPU by launching a long-running shader.

Todo

check if it still holds on GF100

On NV1:NV4, the only engine that PFIFO controls is PGRAPH, the main 2d/3d engine of the card. In addition, PFIFO can submit commands to the SOFTWARE pseudo-engine, which will trigger an interrupt for every submitted method.

The engines that PFIFO controls on NV4:GF100 are:

Id	Present on	Name	Description
0	all	SOFTWARE	Not really an engine, causes interrupt for each command, can be used to execute driver functions in sync with other commands.
1	all	PGRAPH	Main engine of the card: 2d, 3d, compute.
2	NV31:G98 G200:MCP77	PMPEG	The PFIFO interface to VPE MPEG2 decoding engine.
3	NV40:G84	PME	VPE motion estimation engine.
4	NV41:G84	PVP1	VPE microcoded vector processor.
4	VP2	PVP2	xtensa-microcoded vector processor.
5	VP2	PCIPHER	AES cryptography and copy engine.
6	VP2	PBSP	xtensa-microcoded bitstream processor.
2	VP3-	PPPP	falcon-based video post-processor.
4	VP3-	PPDEC	falcon-based microcoded video decoder.
5	VP3	PSEC	falcon-based AES crypto engine. On VP4, merged into PVLD.
6	VP3-	PVLD	falcon-based variable length decoder.
3	GT215-	PCOPY	falcon-based memory copy engine.
5	MCP89:GF100	PVCOMP	falcon-based video compositing engine.

The engines that PFIFO controls on GF100- are:

Id	Id	Id	Id	Id	Present on	Name	Description
GF100	GK104	GK208	GK20A	GM107
1f	1f	1f	1f	1f	all	SOFTWARE	Not really an engine, causes interrupt for each command, can be used to execute driver functions in sync with other commands.
0	0	0	0	0	all	PGRAPH	Main engine of the card: 2d, 3d, compute.
1	1	1	?	-	GF100:GM107	PPDEC	falcon-based microcoded picture decoder.
2	2	2	?	-	GF100:GM107	PPPP	falcon-based video post-processor.
3	3	3	?	-	GF100:GM107	PVLD	falcon-based variable length decoder.
4,5	-	-	-	-	GF100:GK104	PCOPY	falcon-based memory copy engines.
-	6	5	?	2	GK104:	PVENC	falcon-based H.264 encoding engine.
-	4,5.7	4,-.6	?	4,-.5	GK104:	PCOPY	Memory copy engines.
-	-	-	?	1	GM107:	PVDEC	falcon-based unified video decoding engine
-	-	-	?	3	GM107:	PSEC	falcon-based AES crypto engine, recycled

This file deals only with the user-visible side of the PFIFO. For kernel-side programming, see NV1:NV4 PFIFO engine, NV4:G80 PFIFO engine, Tesla PFIFO engine, or GF100+ PFIFO engine.

Note

GF100 information can still be very incomplete / not exactly true.

Overall operation ¶

The PFIFO can be split into roughly 4 pieces:

PFIFO pusher: collects user’s commands and injects them to
PFIFO CACHE: a big queue of commands waiting for execution by
PFIFO puller: executes the commands, passes them to the proper engine, or to the driver.
PFIFO switcher: ticks out the time slices for the channels and saves / restores the state of the channels between PFIFO registers and RAMFC memory.

A channel consists of the following:

channel mode: PIO [NV1:GF100], DMA [NV4:GF100], or IB [G80-]
PFIFO DMA pusher state [DMA and IB channels only]
PFIFO CACHE state: the commands already accepted but not yet executed
PFIFO puller state
RAMFC: area of VRAM storing the above when channel is not currently active on PFIFO [not user-visible]
RAMHT [pre-GF100 only]: a table of “objects” that the channel can use. The objects are identified by arbitrary 32-bit handles, and can be DMA objects [see NV3 DMA objects, NV4:G80 DMA objects, DMA objects] or engine objects [see Puller - handling of submitted commands by FIFO and engine documentation]. On pre-G80 cards, individual objects can be shared between channels.
vspace [G80+ only]: A hierarchy of page tables that describes the virtual memory space visible to engines while executing commands for the channel. Multiple channels can share a vspace. [see Tesla virtual memory, GF100 virtual memory]
engine-specific state

Channel mode determines the way of submitting commands to the channel. PIO mode is available on pre-GF100 cards, and involves poking the methods directly to the channel control area. It’s slow and fragile - everything breaks down easily when more than one channel is used simultanously. Not recommended. See PIO submission to FIFOs for details. On NV1:NV40, all channels support PIO mode. On NV40:G80, only first 32 channels support PIO mode. On G80:GF100 only channel 0 supports PIO mode.

Todo

check PIO channels support on NV40:G80

NV1 PFIFO doesn’t support any DMA mode.

NV3 PFIFO introduced a hacky DMA mode that requires kernel assistance for every submitted batch of commands and prevents channel switching while stuff is being submitted. See DMA submission for details.

NV4 PFIFO greatly enhanced the DMA mode and made it controllable directly through the channel control area. Thus, commands can now be submitted by multiple applications simultaneously, without coordination with each other and without kernel’s help. DMA mode is described in DMA submission to FIFOs on NV4.

G80 introduced IB mode. IB mode is a modified version of DMA mode that, instead of following a single stream of commands from memory, has the ability to stitch together parts of multiple memory areas into a single command stream - allowing constructs that submit commands with parameters pulled directly from memory written by earlier commands. IB mode is described along with DMA mode in DMA submission to FIFOs on NV4.

GF100 rearchitectured the whole PFIFO, made it possible to have up to 3 channels executing simultaneously, and introduced a new DMA packet format.

The commands, as stored in CACHE, are tuples of:

subchannel: 0-7
method: 0-0x1ffc [really 0-0x7ff] pre-GF100, 0-0x3ffc [really 0-0xfff] GF100+
parameter: 0-0xffffffff
submission mode [NV10+]: I or NI

Subchannel identifies the engine and object that the command will be sent to. The subchannels have no fixed assignments to engines/objects, and can be freely bound/rebound to them by using method 0. The “objects” are individual pieces of functionality of PFIFO-controlled engine. A single engine can expose any number of object types, though most engines only expose one.

The method selects an individual command of the object bound to the selected subchannel, except methods 0-0xfc which are special and are executed directly by the puller, ignoring the bound object. Note that, traditionally, methods are treated as 4-byte addressable locations, and hence their numbers are written down multiplied by 4: method 0x3f thus is written as 0xfc. This is a leftover from PIO channels. In the documentation, whenever a specific method number is mentioned, it’ll be written pre-multiplied by 4 unless specified otherwise.

The parameter is an arbitrary 32-bit value that accompanies the method.

The submission mode is I if the command was submitted through increasing DMA packet, or NI if the command was submitted through non-increasing packet. This information isn’t actually used for anything by the card, but it’s stored in the CACHE for certain optimisation when submitting PGRAPH commands.

Method execution is described in detail in DMA puller and engine-specific documentation.

Pre-NV1A, PFIFO treats everything as little-endian. NV1A introduced big-endian mode, which affects pushbuffer/IB reads and semaphores. On NV1A:G80 cards, the endianness can be selected per channel via the big_endian flag. On G80+ cards, PFIFO endianness is a global switch.

Todo

look for GF100 PFIFO endian switch

The channel control area endianness is not affected by the big_endian flag or G80+ PFIFO endianness switch. Instead, it follows the PMC MMIO endianness switch.

Todo

is it still true for GF100, with VRAM-backed channel control area?

FIFO overview ¶

Introduction ¶

Overall operation ¶

Table of Contents

Previous topic

Next topic

This Page

FIFO overview¶

Introduction¶

Overall operation¶

FIFO overview ¶

Introduction ¶

Overall operation ¶