NV1 ROP - per-pixel operations¶
Contents
Introduction¶
Once the rasterizer determines what pixels are to be drawn, it is ROP’s task to actually perform the drawing - that is, read current pixel data from memory (if needed), perform per-pixel operations, then write the result to memory, or discard it.
The per-pixel operations are as follows:
- Determine the working color format. This is based on source color format, framebuffer format, and some configuration bits - it does not depend on the individual pixel.
- If double buffer mode is active, determine which buffer(s) the pixel should be written to, if any. This is selected by the current object.
- Get pixel coordinates from raster. Both X and Y are 12-bit unsigned integers. If BLIT is performed, also get source X and Y.
- Compare the coordinates against the cliprects and the clipping rectangle from XY logic. This may result in discarding the pixel, or writing it to only one of two buffers.
- Get the source color:
- for solids, just take it from (the low bits of)
SRC_COLOR
. - for IFC, IFM, and textured quads, take it from
SRC_COLOR
, selecting the right pixel for < 32bpp input. - for BITMAP, take the right bit of
SRC_COLOR
, then perform bitmap expansion. - for BLIT, read the source pixel from the framebuffer. If the source pixel is outside the clipping rectangle from XY logic, or rejected by the cliprects, discard the current pixel.
- for solids, just take it from (the low bits of)
- If alpha is enabled, extract the source alpha component according to the
source color format. Otherwise, source alpha is assumed to be
0xff
. - If the source alpha component is 0, discard the pixel.
- Convert the source color to working color format, if necessary.
- If the operation selected by the current object requires that, read the current value of the destination pixel, and (if needed) upconvert it to the working color format.
- If the operation selected by the current object requires it, compute the pattern color at the destination coordinates, and (if needed) downconvert it to the working color format. If the pattern alpha component is 0, discard the current pixel.
- If the operation selected by the current object is
BLEND_*
, calculate the blend factor, then perform the blending. - If the operation is not
BLEND_*
:- If the operation is not
SRCCOPY
: perform the bitwise operation. - If color key is enabled on current object: downconvert the color key to the working color format (if necessary), compare against the color computed so far, discard the pixel if they are equal.
- If plane masking is enabled on current object: downconvert the plane mask to the working color format (if necessary), merge the color computed so far with the current destination color using the plane mask.
- If the operation is not
- If necessary, downconvert the color from the working format to framebuffer format, possibly with dithering.
- Write the final color to the framebuffer(s).
Todo
figure out selecting the right part of SRC_COLOR for IFC/IFM/BITMAP
Todo
BLIT and source pixel discards
Todo
pseudocode, please
The framebuffer(s)¶
On NV1, handling framebuffer addressing is PFB’s job - see The PGRAPH framebuffer. PFB exposes 1 or 2 buffers to PGRAPH and handles converting the X, Y coords to VRAM addresses. Both X and Y coordinates are 12-bit unsigned integers once they reach ROP stage.
The pixel size is selected by PFB and exposed to PGRAPH. It can be:
- 8 bpp: each pixel is a single byte, in Y8 format (single component, color index).
- 16 bpp: each pixel is a 16-bit little-endian word. Depending on
configuration, it can be in one of two formats:
- indexed (D1X7Y8):
- bits 0-7: color index
- bits 8-14: unused, written as 0
- bit 15: CLUT bypass - whenever a pixel is written, this will be set
to the current value of
CANVAS_CONFIG.CLUT_BYPASS
. In turn, PDAC will use it to select pixel mode.
- direct (D1R5G5B5):
- bits 0-4: blue component
- bits 5-9: green component
- bits 10-14: red component
- bit 15: CLUT bypass (see above)
- indexed (D1X7Y8):
- 32 bpp: each pixel is a 32-bit little-endian word. Depending on
configuration, it can be in one of two formats:
- indexed (D1X23Y8):
- bits 0-7: color index
- bits 8-30: unused, written as 0
- bit 31: CLUT bypass - whenever a pixel is written, this will be set
to the current value of
CANVAS_CONFIG.CLUT_BYPASS
. In turn, PDAC will use it to select pixel mode.
- direct (D1X1R10G10B10):
- bits 0-9: blue component
- bits 10-19: green component
- bits 20-29: red component
- bit 30: unused, written as 0
- bit 31: CLUT bypass (see above)
- indexed (D1X23Y8):
Indexed vs direct color is chosen as follows:
def is_indexed():
if CTX_SWITCH.OP in BLEND_*:
# Always direct if blending is involved.
return False
if PFB.CONFIG.BPP <= 1:
# If framebuffer is 8bpp, always indexed.
return True
if ACCESS.CLASS == BLIT:
# If doing blit, treat pixels as direct color.
return False
if CTX_SWITCH.COLOR_FORMAT_DST.COLOR_FORMAT != A8Y8:
# Also, treat as direct color if source color format is anything
# other than Y8.
return False
if CANVAS_CONFIG.Y8_EXPAND:
# If Y8 expansion is performed, treat as direct color.
return False
# Otherwise (not a blit, Y8 source format, and no Y8 expansion),
# treat as indexed.
return True
Todo
weird shit happens if blending is enabled and framebuffer is 8bpp.
If single buffer mode is selected on PFB, rendered pixels will always be
written to buffer 0, unless they are discarded to some reason. If double
buffer mode is selected, the COLOR_FORMAT_DST
field of the current
object determines which buffer(s) are written to -
BUF0_*
will write to buffer 0, BUF1_*
to buffer 1, BUF01_*
will write to both buffers, and BUF_NONE_*
will discard all pixels.
Note
If both buffers are enabled, each pixel will be written independently to both of them - if the selected operation involves the current value of destination pixel (for blending or bitwise operation), they may get written with different final colors.
Canvas configuration¶
There is a register that controls assorted aspects of per-pixel operations:
-
reg32
nv1-pgraph-canvas-config
¶ -
nv1-pgraph
0x634: CANVAS_CONFIG
- bit 0: CLUT_BYPASS - for 16bpp and 32bpp framebuffer formats, the value of this bit will be copied to the highest bit of the written pixels, ie. the “CLUT bypass” bit.
- bit 4: BUF1_IGNORE_CLIPRECT - if set, cliprects will only affect buffer 0 in dual-buffer configuration - they will be ignored when writing to buffer 1. If not set, cliprects will apply to both buffers.
- bit 12: Y8_EXPAND - controls color format in use when source format is Y8 and framebuffer is 16bpp or 32bpp. If set, Y8 will be expanded to R5G5B5 or R10G10B10, by broadcasting the single value into all 3 color components. Otherwise, it will remain as Y8, and written thus to the framebuffer.
- bit 16: DITHER - controls color downconversion to R5G5B5 format when writing to the framebuffer. If set, colors will be dithered. Otherwise, a simple truncation will be used.
- bit 20: REPLICATE - controls color upconversion from source format to R10G10B10. If set, R5G5B5 source components will be multiplied by 0x21 to get R10G10B10 components (effectively duplicating the 5-bit values to get 10-bit values), and R8G8B8/Y8 source components will be multiplied by 0x101 and shifted right by 6 bits (effectively duplicating the high 2 bits as extra 2 low bits). If not set, components will be converted by a simple shift left.
- bit 24: SOFTWARE - if set, the desired framebuffer configuration is considered too complex for NV1’s little mind, and all drawing operations will trigger CANVAS_SOFTWARE interrupts instead of performing their usual function.
This register cannot be changed by any class method, and must be modified manually by software, if so desired.
Cliprects¶
NV1 supports, as part of per-pixel operations, discarding pixels based on their relation with up to two clipping rectangles. This is distinct from the clipping to canvas and user clip rectangle done by the XY logic, and also less efficient (since the pixels will be produced by the rasterizer and then discarded).
Cliprect state cannot be changed by any class method, and must be modified manually by software, if so desired. The registers involved are:
-
reg32
nv1-pgraph-cliprect-config
¶ -
nv1-pgraph
0x6a0: CLIPRECT_CONFIG
- bits 0-1: COUNT - selects how many cliprects are enabled. Valid values are 0-2. If this is 0, cliprects are disabled, and will pass all pixels.
- bit 4: MODE - selects which pixels will be rendered, if COUNT is not 0:
- 0: INCLUDED - pixels that are covered by at least one of the cliprects will be rendered, pixels not covered will be discarded.
- 1: OCCLUDED - pixels that are not covered by any cliprect will be rendered, pixels covered by at least one cliprect will be discarded.
- bit 8: SOFTWARE - if set, the desired cliprects are too complex for NV1’s little mind, and all drawing operations will trigger CLIP_SOFTWARE interrupts instead of performing their usual function.
-
reg32
nv1-pgraph-cliprect-min
¶ -
nv1-pgraph
0x690+i*0x8: CLIPRECT_MIN[i] (i<2)
- bits 0-11: X - the X coordinate of the left edge of the cliprect
- bits 16-27: Y - the Y coordinate of the top edge of the cliprect
-
reg32
nv1-pgraph-cliprect-max
¶ -
nv1-pgraph
0x694+i*0x8: CLIPRECT_MAX[i] (i<2)
Since rectangles on NV1 are represented in right-exclusive fashion, these coordinates are actually 1 pixel to the right and 1 pixel down from the actual corner of the clipping rectangle.
- bits 0-11: X - the X coordinate of the right edge of the cliprect plus 1
- bits 16-27: Y - the Y coordinate of the bottom edge of the cliprect plus 1
If dual-buffer configuration is enabled in PFB, a bit in CANVAS_CONFIG
selects whether cliprects apply to both buffers,
or just to buffer 0.
The exact operation performed is:
def cliprect_covered(i, x, y) -> bool:
if x < CLIPRECT_MIN[i].X:
return False
if y < CLIPRECT_MIN[i].Y:
return False
if x >= CLIPRECT_MAX[i].X:
return False
if y >= CLIPRECT_MAX[i].Y:
return False
return True
def cliprect_pass(buf, x, y) -> bool:
if buf == 1 and CANVAS_CONFIG.BUF1_IGNORE_CLIPRECT:
return True
if CLIPRECT_CONFIG.COUNT == 0:
return True
covered = cliprect_covered(0, x, y)
# COUNT == 3 is treated as if it was 2.
if CLIPRECT_CONFIG.COUNT >= 2:
covered |= cliprect_covered(1, x, y)
if CLIPRECT_CONFIG.MODE == INCLUDED:
return covered
else
return not covered
# Note: CLIPRECT_CONFIG.SOFTWARE is checked by XY logic
# before rasterization even starts.
Color formats¶
Working format selection¶
Working format can be Y8
, R5G5B5
, or R10G10B10
. It is selected
as follows:
def working_format():
if is_indexed():
# If framebuffer is indexed, always work on Y8.
return Y8
if PFB.CONFIG.BPP == 2 and CTX_SWITCH.COLOR_FORMAT_DST.COLOR_FORMAT == A1R5G5B5:
# Both formats are R5G5B5, so let's use that.
return R5G5B5
if PFB.CONFIG.BPP == 2 and CTX_SWITCH.OP in BLEND_* and not CANVAS_CONFIG.DITHER:
# Special case: if blending to R5G5B5 destination with no
# dithering, convert inputs to R5G5B5.
return R5G5B5
# All other cases use R10G10B10.
return R10G10B10
Source color format conversion and extracting alpha¶
Color format conversion is performed on the incoming source pixel data (if needed), on the current destination pixel data (if needed), and on colors submitted as parameters to some ROP state-setting methods. If such conversion is needed at all, it’s always done to (A8)R10G10B10 or (A8)R5G5B5 format.
Color upconversion is affected by the CANVAS_CONFIG.REPLICATE
bit: if it’s set, color components are multiplied
by the correct factors to cover the 0-0x3ff
range uniformly. Otherwise,
they are simply shifted left.
For Y8
and Y16
formats, the singular component is simply broadcast
to all three components, resulting in grayscale. Since the destination
format has only 10 bits per component, the low 6 bits of Y16
are simply
discarded.
If the working format is R5G5B5, the conversion is performed as for R10G10B10, then the low 5 bits of each component are discarded.
The exact operation is:
def convert_src(val, mode):
if CTX_SWITCH.COLOR_FORMAT_DST.COLOR_FORMAT == A1R5G5B5:
b = val & 0x1f
g = val >> 5 & 0x1f
r = val >> 10 & 0x1f
a = val >> 15 & 1
if CANVAS_CONFIG.REPLICATE:
# R, G, B are 5 bits - duplicate to get 10 bits.
b *= 0x21
g *= 0x21
r *= 0x21
else:
b <<= 5
g <<= 5
r <<= 5
# A is always either 0 or 0xff.
a *= 0xff
elif CTX_SWITCH.COLOR_FORMAT_DST.COLOR_FORMAT == A8R8G8B8:
b = val & 0xff
g = val >> 8 & 0xff
r = val >> 16 & 0xff
a = val >> 24 & 0xff
if CANVAS_CONFIG.REPLICATE:
# R, G, B are 8-bit: duplicate to get 16 bits, then truncate
# to 10.
b = (b * 0x101) >> 6
g = (g * 0x101) >> 6
r = (r * 0x101) >> 6
else:
b <<= 2
g <<= 2
r <<= 2
# A is already 8-bit.
elif CTX_SWITCH.COLOR_FORMAT_DST.COLOR_FORMAT == A2R10G10B10:
b = val & 0x3ff
g = val >> 10 & 0x3ff
r = val >> 20 & 0x3ff
a = val >> 30 & 3
# R, G, B are already 10-bit: nothing to do.
# A is 2-bit - repeat 4 times to get 8 bits (this is not affected
# by REPLICATE!).
a *= 0x55
elif CTX_SWITCH.COLOR_FORMAT_DST.COLOR_FORMAT == A8Y8:
y = val & 0xff
a = val >> 8 & 0xff
if CANVAS_CONFIG.REPLICATE:
# Y is 8-bit: duplicate to get 16 bits, then truncate to 10.
y = (y * 0x101) >> 6
else:
y <<= 2
# Broadcast it.
r = g = b = y
# A is already 8-bit.
elif CTX_SWITCH.COLOR_FORMAT_DST.COLOR_FORMAT == A16Y16:
y = val & 0xffff
a = val >> 16 & 0xffff
# Truncate 16 to 10 and broadcast.
r = g = b = y >> 6
# Truncate 16 to 8.
a >>= 8
if not CTX_SWITCH.ALPHA:
# Whatever we determined for alpha, it's invalid if not enabled.
a = 0xff
if mode == Y8:
return val & 0xff, a
elif mode == R5G5B5:
r >>= 5
g >>= 5
b >>= 5
return r << 10 | g << 5 | b, a
elif mode == R10G10B10:
return r << 20 | g << 10 | b, a
def convert_fb(val):
if mode == Y8:
return val & 0xff
elif mode == R5G5B5:
return val & 0x7fff
elif mode == R10G10B10:
if PFB.CONFIG.BPP == 2:
b = val & 0x1f
g = val >> 5 & 0x1f
r = val >> 10 & 0x1f
if CANVAS_CONFIG.REPLICATE:
# R, G, B are 5 bits - duplicate to get 10 bits.
b *= 0x21
g *= 0x21
r *= 0x21
else:
b <<= 5
g <<= 5
r <<= 5
return r << 20 | g << 10 | b
else:
return val & 0x3fffffff
State color downconversion¶
Since the colors stored as part of ROP state are always stored in R10G10B10
format, they need to be downconverted to the working format when needed.
This downconversion is done by simple truncation - it is assumed that they
were originally submitted in the working format, but were upconverted for
storage:
def state_downconvert(val):
if mode == Y8:
return val >> 2 & 0xff
elif mode == R5G5G5:
r = val >> 25 & 0x1f
g = val >> 15 & 0x1f
b = val >> 5 & 0x1f
return r << 10 | g << 5 | b
elif mode == R10G10B10:
return val & 0x3fffffff
Final color downconversion and dithering¶
Todo
write me
Bitmap expansion¶
Todo
write me
Bitmap colors¶
-
reg32
nv1-pgraph-bitmap-color
¶ -
nv1-pgraph
0x61c+i*0x4: BITMAP_COLOR[i] (i<2)
Todo
write me
-
reg32
nv1-mthd-bitmap-color
¶ Todo
write me
Pattern¶
The pattern is an endlessly repeating 8×8, 64×1, or 1×64 2-color bitmap that is stored as part of ROP state. It can be used as an input to blending and bitwise operations, along with source and destination.
Pattern shape¶
-
reg32
nv1-pgraph-pattern-shape
¶ -
nv1-pgraph
0x618: PATTERN_SHAPE
The pattern shape, one of:
- 0: 8×8 pixels
- 1: 64×1 pixels
- 2: 1×64 pixels
Only bits 0-1 of this register are usable, rest are always 0. Setting this register to 3 results in a rather weird pattern.
-
reg32
nv1-mthd-pattern-shape
¶ -
nv1-upattern
0x308: PATTERN_SHAPE
Sets the pattern shape. Only values 0-2 are valid.
PATTERN_SHAPE = val & 3 if (val > 2) trigger_intr(ILLEGAL_DATA);
Pattern bitmap¶
-
reg32
nv1-pgraph-pattern-bitmap
¶ -
nv1-pgraph
0x610+i*0x4: PATTERN_BITMAP[i] (i<2)
These registers contain the pattern bitmap, with one bit per pixel. Register 0 contains bits 0-31, and register 1 contains bits 32-63. These registers always store the bitmap in LE format - it is converted at the time of submission. The bits of the bitmap are indexed X-first, with dimensions determined by PATTERN_SHAPE.
-
reg32
nv1-mthd-pattern-bitmap
¶ -
nv1-upattern
0x318+i*0x4: PATTERN_BITMAP[i] (i<2)
Sets the corresponding half of pattern bitmap. The bitmap is converted from the current object’s bitmap format to LE format.
- ::
# NOTE: on NV3, if a context switch was triggered # by this method, the format is taken from the old # method. This is a hardware bug.
- if CTX_SWITCH.BITMAP_FORMAT == LE:
- PATTERN_BITMAP[idx] = val
- else:
# BITMAP_FORMAT == CGA6 rval = 0 for bit in range(0x20):
- if val & 1 << bit:
- rval |= 1 << (bit ^ 7)
PATTERN_BITMAP[idx] = rval
Pattern colors¶
There are two pattern colors, corresponding to 0 and 1 bits in the pattern bitmap. They are stored as A8R10G10B10, with separate registers for RGB and alpha:
-
reg32
nv1-pgraph-pattern-bitmap-color
¶ -
nv1-pgraph
0x600+i*0x8: PATTERN_BITMAP_COLOR[i] (i<2)
These registers store the pattern colors’ RGB values.
- bits 0-9: B
- bits 10-19: G
- bits 20-29: R
-
reg32
nv1-pgraph-pattern-bitmap-alpha
¶ -
nv1-pgraph
0x604+i*0x8: PATTERN_BITMAP_ALPHA[i] (i<2)
These registers store the pattern colors’ alpha values.
- bits 0-7: A
-
reg32
nv1-mthd-pattern-bitmap-color
¶ -
nv1-upattern
0x310+i*0x4: PATTERN_BITMAP_COLOR[i] (i<2)
These methods set the corresponding pattern color. The value is given in the source format of the current object, and converted to A8R10G10B10 for storage:
rgb, a = convert_src(val, R10G10B10) PATTERN_COLOR[idx] = rgb PATTERN_ALPHA[idx] = a
Pattern operation¶
- def pattern_pixel(x, y, mode):
- if PATTERN_SHAPE == 0:
- bit = x & 7 | (y & 7) << 3
- elif PATTERN_SHAPE == 1:
- bit = x & 0x3f
- elif PATTERN_SHAPE == 2:
- bit = y & 0x3f
cidx = PATTERN_BITMAP[bit >> 5] >> (bit & 0x1f) & 1 return state_downconvert(PATTERN_COLOR[idx], mode), PATTERN_ALPHA[idx]
Color key¶
If enabled by the current object, the ROP will perform color key test on all pixels to be written to the framebuffer: if they match the current color key, they will be discarded.
Color key conflicts with blending - if both are selected, the color key will be effectively disabled.
The current color key is stored in A1R10G10B10
format in a PGRAPH register:
-
reg32
nv1-pgraph-chroma
¶ -
nv1-pgraph
0x62c: CHROMA
- bits 0-9: B - the blue component
- bits 10-19: G - the green component
- bits 20-29: R - the red component
- bit 30: A - the alpha component
Even though it’s stored as A1R10G10B10
, the color key will be converted
to the working color for the color key test.
The 1-bit alpha component can be used to effectively enable or disable the color key operation - if alpha is 0, the color key is considered to never match, passing all pixels.
The current color key can be set by the following method:
-
reg32
nv1-mthd-chroma
¶ -
nv1-uchroma
0x304: CHROMA
Sets the color key. The value is interpreted according to the current object’s color format, and upconverted to
A1R10G10B10
for storage. The alpha component is converted to 0 if the source alpha is 0, to 1 if it’s any other value:rgb, a = convert_src(val, R10G10B10) CHROMA.A = 1 if a != 0 else 0 CHROMA.R = rgb >> 20 & 0x3ff CHROMA.G = rgb >> 10 & 0x3ff CHROMA.B = rgb & 0x3ff
The color key test works as follows:
def chroma_pass(val):
if not CTX_SWITCH.CHROMA:
# Color key disabled - always pass.
return True
if not CHROMA.A:
# Color key alpha is 0 - always pass.
return True
chr = state_downconvert(CHROMA)
if chr == val:
# Color key matched - kill the pixel.
return False
# Otherwise, pass the pixel.
return True
Note
Color key test is performed in the working format, not in the source or destination format - if they are different, color key may fail to match if a different REPLICATE setting is in effect when pixel is rendered vs when color key was submitted, even though the submitted values themselves were actually the same.
Bitwise operations¶
Todo
write me
ROP selection¶
-
reg32
nv1-pgraph-rop
¶ -
nv1-pgraph
0x624: ROP
Todo
write me
Plane mask¶
If enabled by the current object, the ROP will perform plane masking on all pixels going to the framebuffer - bits that are 0 in the plane mask will be set to the current contents of the destination pixel, instead of whatever color was computed by previous per-pixel operations, while bits that are 1 in the plane mask will be unaffected.
Plane masking conflicts with blending - if both are selected, the plane mask will be effectively disabled.
The current plane mask is stored in A1R10G10B10
format in a PGRAPH
register:
-
reg32
nv1-pgraph-plane
¶ -
nv1-pgraph
0x628: PLANE
- bits 0-9: B - the blue component
- bits 10-19: G - the green component
- bits 20-29: R - the red component
- bit 30: A - the alpha component
Even though it’s stored as A1R10G10B10
, the plane mask will be converted
to the working color for the plane masking operation.
The 1-bit alpha component is used in a weird manner. If the alpha component
is 0, and DEBUG_A.PLANE_ALPHA_ENABLE
is set,
all incoming pixels will be discarded. Otherwise, it does nothing.
The current plane mask can be set by the following method:
-
reg32
nv1-mthd-plane
¶ -
nv1-uplane
0x304: PLANE
Sets the plane mask. The value is interpreted according to the current object’s color format, and upconverted to
A1R10G10B10
for storage. The alpha component is converted to 0 if the source alpha is 0, to 1 if it’s any other value:rgb, a = convert_src(val, R10G10B10) PLANE.A = 1 if a != 0 else 0 PLANE.R = rgb >> 20 & 0x3ff PLANE.G = rgb >> 10 & 0x3ff PLANE.B = rgb & 0x3ff
The plane masking operation works as follows:
def plane_mask(val, dst):
if not CTX_SWITCH.PLANE:
# Disabled - passthru.
return val
if not PLANE.A and DEBUG_A.PLANE_ALPHA_ENABLE:
raise PixelDiscarded
pma = state_downconvert(PLANE)
return (val & pma) | (dst & ~pma)
return y
Note
Plane masking is performed in the working format, not in the destination format - if they are different, and dithering is enabled, effects will be interesting.