.. _pcounter: ==================================== PCOUNTER: performance counter engine ==================================== .. contents:: .. todo:: crossrefs Introduction ============ PCOUNTER is the card units that contains performance monitoring counters. It is present on NV10+ GPUs, with the exception of NV11, NV1A, NV17, NV18 for unknown reasons. .. todo:: why? any others excluded? NV25, NV2A, NV30, NV36 pending a check PCOUNTER is actually made of several identical hardware counter units, one for each so-called domain. Each PCOUNTER domain can potentially run on a different source clock, allowing one to monitor events in various clock domains. The PCOUNTER domains are mostly independent, but there's some limitted communication and shared circuitry among them. There are two major revisions of PCOUNTER hardware, and some minor subrevisions: - NV10:GF100 major revision: - NV10:NV15 - first version, one domain, only single-event mode available - NV15:NV20 - added one period / all periods event counter mode switch - NV20:NV30 - added second domain for events associated with memory clock - NV30:NV40 - removed separate clrflag/setflag input selection, changed from 40-bit to 32-bit counters, added quad event mode, added logic op chaining through SETFLAG. - NV40:G84 - rearranged register space to make space for 8 domains, added 3 new special counter modes - G84:G92 - added record mode, swap input selection, and PERIODIC signals - G92:GT215 - added slightly more flexible logic op delayed source selection and a register to set high 8 bits of address for record mode - GT215:GF100 - added USER signals - GF100+ major revision: - GF100+ - split PCOUNTER into hub, per-gpc and per-partition domain sets, ??? .. todo:: figure out what else happened on GF100 .. note:: the information in this document is at the moment not fully verified for GF100+. .. todo:: make it so The inputs to PCOUNTER are various activity monitoring signals from all over the card. The PCOUNTER hardware selects a few of them, performs programmable logic operations on them, and aggregates it to a handful of actual counter inputs. Some of the inputs are special and control counting start/stop, while others are the events to be counted. PCOUNTER can be used in three modes: - single event mode - a single event is being counted, with fine-grained control of counting periods via pre-start/start/stop signals. Several counting periods per run may be configured, and a threshold counter may be used. The input signals used are: - PRE - a programmable amount of pulses on this input must happen before START is recognised - START - a pulse on this input starts a counting period - EVENT - the pulses on this input are counted - STOP - a pulse on this input stops a counting period - quad event mode [NV30-] - 4 events are being counted, with a simple "swap counter sets" trigger to delimit counting periods. The inputs used are: - PRE, START, EVENT, STOP - the pulses on these inputs are counted [in 4 separate counters] - SWAP - a pulse on this input swaps counter sets, ie. copies the internal counters to the MMIO registers and resets internal counters to 0. - record mode [G84-] - 12 simple events are being counted, and the counters written to a "record buffer" in memory on every pulse of STOP input. The inputs used are: - PRE_SRC[0..3], START_SRC[0..3], EVENT_SRC[0..3] - 12 events to be counted - STOP - a pulse on this input writes current counter values to memory and clears the counters to 0 The PCOUNTER uses MMIO area 0x00a000:0x00b000 on NV10:NV40 and NV40:GF100. On GF100+, it uses 0x180000:0x1c0000. NV10:GF100 PCOUNTER is unaffected by all PMC.ENABLE bits and has no interrupt lines. GF100+ PCOUNTER is enabled by PMC.ENABLE bit 28. .. todo:: figure out interupt business MMIO registers ============== The MMIO registers are similiar among PCOUNTER revisions, but their placement is very different. NV10 ---- .. space:: 8 nv10-pcounter 0x1000 performance monitoring counters 0x400[dom:2/0x100] PRE_SRC pcounter-pre-src 0x404[dom:2/0x100] PRE_OP pcounter-pre-op 0x408[dom:2/0x100] START_SRC pcounter-start-src 0x40c[dom:2/0x100] START_OP pcounter-start-op 0x410[dom:2/0x100] EVENT_SRC pcounter-event-src 0x414[dom:2/0x100] EVENT_OP pcounter-event-op 0x418[dom:2/0x100] STOP_SRC pcounter-stop-src 0x41c[dom:2/0x100] STOP_OP pcounter-stop-op 0x420[dom:2/0x100] SETFLAG_SRC pcounter-setflag-src NV10:NV30 0x424[dom:2/0x100] SETFLAG_OP pcounter-setflag-op 0x428[dom:2/0x100] CLRFLAG_SRC pcounter-clrflag-src NV10:NV30 0x42c[dom:2/0x100] CLRFLAG_OP pcounter-clrflag-op 0x430[dom:2/0x100][hi:2/0x200][lo:4] SIG_STATUS pcounter-sig-status 0x600[dom:2/0x100] CTR_CYCLES pcounter-ctr-cycles 0x604[dom:2/0x100] CTR_CYCLES_HI pcounter-ctr-cycles-hi NV10:NV30 0x608[dom:2/0x100] CTR_CYCLES_ALT pcounter-ctr-cycles-alt 0x60c[dom:2/0x100] CTR_CYCLES_ALT_HI pcounter-ctr-cycles-alt-hi NV10:NV30 0x610[dom:2/0x100] CTR_EVENT pcounter-ctr-event 0x614[dom:2/0x100] CTR_EVENT_HI pcounter-ctr-event-hi NV10:NV30 0x618[dom:2/0x100] CTR_START pcounter-ctr-start 0x61c[dom:2/0x100] CTR_START_HI pcounter-ctr-start-hi NV10:NV30 0x620[dom:2/0x100] CTR_PRE pcounter-ctr-pre 0x624[dom:2/0x100] CTR_STOP pcounter-ctr-stop 0x628[dom:2/0x100] THRESHOLD pcounter-threshold 0x62c[dom:2/0x100] THRESHOLD_HI pcounter-threshold-hi NV10:NV30 0x738 QUAD_ACK_TRIGGER pcounter-quad-ack-trigger-nv30 NV30:NV40 0x73c CTRL pcounter-ctrl-nv10 .. todo:: wtf is CYCLES_ALT for? NV40 ---- .. space:: 8 nv40-pcounter 0x1000 performance monitoring counters 0x400[dom:8] PRE_SRC pcounter-pre-src 0x420[dom:8] PRE_OP pcounter-pre-op 0x440[dom:8] START_SRC pcounter-start-src 0x460[dom:8] START_OP pcounter-start-op 0x480[dom:8] EVENT_SRC pcounter-event-src 0x4a0[dom:8] EVENT_OP pcounter-event-op 0x4c0[dom:8] STOP_SRC pcounter-stop-src 0x4e0[dom:8] STOP_OP pcounter-stop-op 0x500[dom:8] SETFLAG_OP pcounter-setflag-op 0x520[dom:8] CLRFLAG_OP pcounter-clrflag-op 0x540[dom:8] SRC_STATUS pcounter-src-status 0x560[dom:8] SPEC_SRC pcounter-spec-src 0x580[dom:8] USER_TRIGGER pcounter-user-trigger-tesla GT215- 0x600[dom:8] CTR_CYCLES pcounter-ctr-cycles 0x640[dom:8] CTR_CYCLES_ALT pcounter-ctr-cycles-alt 0x680[dom:8] CTR_EVENT pcounter-ctr-event 0x6a0[dom:8] RECORD_ADDRESS_HIGH pcounter-record-address-high G92- 0x6c0[dom:8] CTR_START pcounter-ctr-start 0x6e0[dom:8] RECORD_STATUS pcounter-record-status G84- 0x700[dom:8] CTR_PRE pcounter-ctr-pre 0x720[dom:8] RECORD_LIMIT pcounter-record-limit G84- 0x740[dom:8] CTR_STOP pcounter-ctr-stop 0x760[dom:8] RECORD_START pcounter-record-start G84- 0x780[dom:8] THRESHOLD pcounter-threshold 0x7a0 RECORD_CHAN pcounter-record-chan G84- 0x7a4 RECORD_DMA pcounter-record-dma G84- 0x7a8 GCTRL pcounter-gctrl-g84 G84- 0x7c0[dom:8] CTRL pcounter-ctrl-nv40 0x7e0[dom:8] QUAD_ACK_TRIGGER pcounter-quad-ack-trigger-nv40 0x800[dom:8][i:8] SIG_STATUS pcounter-sig-status .. todo:: C51 has no PCOUNTER, but has a7f4/a7f8 registers .. todo:: MCP73 also has a7f4/a7f8 but also has normal PCOUNTER GF100 ----- .. space:: 8 gf100-pcounter 0x40000 performance monitoring counters .. todo:: write me .. space:: 8 gf100-pcounter-domain 0x200 a perf counter domain 0x000[i:16] SIG_STATUS pcounter-sig-status 0x040 PRE_SRC pcounter-pre-src 0x044 PRE_OP pcounter-pre-op 0x048 START_SRC pcounter-start-src 0x04c START_OP pcounter-start-op 0x050 EVENT_SRC pcounter-event-src 0x054 EVENT_OP pcounter-event-op 0x058 STOP_SRC pcounter-stop-src 0x05c STOP_OP pcounter-stop-op 0x060 SETFLAG_OP pcounter-setflag-op 0x064 CLRFLAG_OP pcounter-clrflag-op 0x068 SRC_STATUS pcounter-src-status 0x06c SWAP_SRC pcounter-swap-src 0x0a0 QUAD_ACK_TRIGGER pcounter-quad-ack-trigger-nv40 0x0ec USER_TRIGGER pcounter-user-trigger-fermi .. todo:: complete me .. _pcounter-signal: The PCOUNTER signals ==================== The raw inputs that PCOUNTER operates on are called "signals". A signal is a single 0/1 wire sampled on every clock. The signals come from many different areas of the card and represent various state information. Example signals may be: - is unit X busy? - counting 1s on this signal together with elapsed clock cycles will give activity percentage for given unit - did microcontroller X execute an instruction this cycle? - counting 1s will give the number of executed instructions The signals are grouped into so-called domains. A domain has a single base clock and its own counting circuitry - the counting process and counter registers are per-domain. Domains are further grouped into domain sets. Domains within a domain set can communicate to a limitted extend. NV10:GF100 GPUs have a single domain set, while on GF100+ there's one domain set for each GPC, one for each partition, and one for all domains not associated with a GPC/partition. On NV10:NV20, there's only one domain. On NV20:NV40 there are 2 domains. On NV40+ there can be up to 8 domains per domain set. On all GPUs, there can be up to 256 signals per domain. The available signals and domains depend heavily on the GPU. The signals are packed tightly, so even a signal common to two GPUs may be at different position between them. The lists of known domains and signals may be found in :ref:`pcounter-signal-nv10`, :ref:`pcounter-signal-nv40`, :ref:`pcounter-signal-g80`, :ref:`pcounter-signal-gf100`. .. _pcounter-signal-status: The STATUS registers -------------------- The STATUS registers may be used to peek at the current value of each signal. .. reg:: 32 pcounter-sig-status Signal status Reading register #i gives current value of signals i*32..i*32+31 as bits 0..31 of the read value. These registers are per-domain and read-only. Only indices corresponding to actually present domains and signals are valid. On NV10:NV40, this array is split into two parts - the full index is computed like this:: i == hi * 4 + lo .. _pcounter-signal-trailer: Trailer signals --------------- A special kind of signals is so-called "trailer signals". These signals are common for all domains in a domain set. The position of these signals is not exactly constant between the domains, but their position modulo 0x20 is [ie. they're at the same position inside a STATUS reg for all domains, but not necessarily in the same STATUS reg]. Therefore, the position of each trailer signal here is given as an offset from "trailer base". The trailer signals for NV10:NV20 are: - base+0x1f: PCOUNTER.FLAG - the flag For NV20:NV40: - base+0x1d: PGRAPH.PM_TRIGGER - the PM_TRIGGER pulse from PGRAPH - base+0x1e: PCOUNTER.DOM[1].FLAG - the flag from domain 1 - base+0x1f: PCOUNTER.DOM[0].FLAG - the flag from domain 0 For NV40:GF100: - base+0x0c: ZERO - always 0 [G84:GF100] - base+0x0d: PCOUNTER.PERIODIC - the PERIODIC signal from current domain [G84:GF100] - base+0x0e: PGRAPH.WRCACHE_FLUSH - the WRCACHE_FLUSH pulse from PGRAPH [G84:GF100] - base+0x0e: ZERO - always 0 [NV40:G84] - base+0x0f: PGRAPH.PM_TRIGGER - the PM_TRIGGER pulse from PGRAPH - base+0x10: PCOUNTER.DOM[7].EVENT - the EVENT input from domain 7 - base+0x11: PCOUNTER.DOM[6].EVENT - the EVENT input from domain 6 - base+0x12: PCOUNTER.DOM[5].EVENT - the EVENT input from domain 5 - base+0x13: PCOUNTER.DOM[4].EVENT - the EVENT input from domain 4 - base+0x14: PCOUNTER.DOM[3].EVENT - the EVENT input from domain 3 - base+0x15: PCOUNTER.DOM[2].EVENT - the EVENT input from domain 2 - base+0x16: PCOUNTER.DOM[1].EVENT - the EVENT input from domain 1 - base+0x17: PCOUNTER.DOM[0].EVENT - the EVENT input from domain 0 - base+0x18: PCOUNTER.DOM[7].FLAG - the FLAG from domain 7 - base+0x19: PCOUNTER.DOM[6].FLAG - the FLAG from domain 6 - base+0x1a: PCOUNTER.DOM[5].FLAG - the FLAG from domain 5 - base+0x1b: PCOUNTER.DOM[4].FLAG - the FLAG from domain 4 - base+0x1c: PCOUNTER.DOM[3].FLAG - the FLAG from domain 3 - base+0x1d: PCOUNTER.DOM[2].FLAG - the FLAG from domain 2 - base+0x1e: PCOUNTER.DOM[1].FLAG - the FLAG from domain 1 - base+0x1f: PCOUNTER.DOM[0].FLAG - the FLAG from domain 0 For GF100+: - base+0x1f..0x22: PCOUNTER.MAIN.??? - base+0x23..0x26: PCOUNTER.MAIN.??? - base+0x27: PCOUNTER.USER_0 - the USER_0 signal from current domain - base+0x28: PCOUNTER.USER_1 - base+0x29: PCOUNTER.USER_2 - base+0x2a: PCOUNTER.USER_3 - base+0x2b: PGRAPH.CTXCTL.UNK86C.UNK4 - base+0x2c: PCOUNTER.PAUSED - 1 if this domain is in the PAUSED state - base+0x2d: ??? - base+0x2e: PCOUNTER.PERIODIC - the PERIODIC signal from current domain - base+0x2f: ??? - base+0x30: PCOUNTER.DOM[7].EVENT - the EVENT input from domain 7 - base+0x31: PCOUNTER.DOM[6].EVENT - the EVENT input from domain 6 - base+0x32: PCOUNTER.DOM[5].EVENT - the EVENT input from domain 5 - base+0x33: PCOUNTER.DOM[4].EVENT - the EVENT input from domain 4 - base+0x34: PCOUNTER.DOM[3].EVENT - the EVENT input from domain 3 - base+0x35: PCOUNTER.DOM[2].EVENT - the EVENT input from domain 2 - base+0x36: PCOUNTER.DOM[1].EVENT - the EVENT input from domain 1 - base+0x37: PCOUNTER.DOM[0].EVENT - the EVENT input from domain 0 - base+0x38: PCOUNTER.DOM[7].FLAG - the FLAG from domain 7 - base+0x39: PCOUNTER.DOM[6].FLAG - the FLAG from domain 6 - base+0x3a: PCOUNTER.DOM[5].FLAG - the FLAG from domain 5 - base+0x3b: PCOUNTER.DOM[4].FLAG - the FLAG from domain 4 - base+0x3c: PCOUNTER.DOM[3].FLAG - the FLAG from domain 3 - base+0x3d: PCOUNTER.DOM[2].FLAG - the FLAG from domain 2 - base+0x3e: PCOUNTER.DOM[1].FLAG - the FLAG from domain 1 - base+0x3f: PCOUNTER.DOM[0].FLAG - the FLAG from domain 0 .. todo:: PAUSED? .. todo:: unk bits .. _pcounter-signal-event: .. _pcounter-signal-flag: The EVENT and FLAG signals -------------------------- The trailer signals include EVENT and FLAG signals from all domains in the same domain set, allowing limitted inter-domain communication. The EVENT signal is simply the output of the EVENT logic operation in a given domain. The FLAG signal is the status of the FLAG in a given domain. In a given domain, its own FLAG and EVENT signals are connected directly to the relevant sources. However, other domains' signals need to be first converted to the right clock domain. On NV20:NV40, this is done by a simple synchronizer - the state of DOM[x].FLAG signal in domain y will be the same as the state of FLAG in domain x as of two domain y clocks ago. While this is appropriate for many purposes, this means that, if the two domains don't share the same clock, single-clock pulses in domain x may appear as multi-clock pulses in domain y [if it has faster clock], or be lost entirely [if it has slower clock]. On NV40+, one of two synchronization mode can be selected for signals coming from other domains: - CONTINUOUS: behaves like NV20:NV40 - PULSE: mode converts all 0-to-1 transitions in source domain into single-clock pulses in destination domain There are two synchronization mode switches per domain. One applies to all incoming EVENT signals from other domains, while the other applies to all incoming FLAG signals. Note that the synchronization applies even between domains that do share a clock. However, the domain's own EVENT and FLAG signals aren't subject to synchronization when used inside it. .. _pcounter-signal-pm-trigger: .. _pcounter-signal-wrcache-flush: The PM_TRIGGER and WRCACHE_FLUSH signals ---------------------------------------- .. todo:: write me .. _pcounter-signal-user: The USER signals ---------------- On GT215:GF100, each domain has two "user" signals controllable directly by PCOUNTER's MMIO register. The signals are called USER_0 and USER_1. .. reg:: 32 pcounter-user-trigger-tesla triggers user-controllable signals - bit 0: value for USER_0 - bit 1: value for USER_1 - bit 2: pulse mode for USER_0 - if set, will reset USER_0 to 0 one cycle after setting it to the value of bit 0. - bit 3: pulse mode for USER_1 Whenever this register is written, USER_0 signal is set to the value of bit 0, and USER_1 is set to the value of bit 1. On the next cycle after the signal change, the USER signals for which the pulse mode bit is set are reset to 0. This register is write-only. On GF100+, this number is bumped to 4, the USER_TRIGGER register is read/write, and the signals are now located in the trailer area. .. reg:: 32 pcounter-user-trigger-fermi triggers user-controllable signals - bits 0-3: value for USER_0..USER_3 - bits 4-7: pulse mode for USER_0..USER_3 Works like the GT215 USER_TRIGGER register, except it's also readable. Note that bits 0-3 will be auto-cleared by bits 4-7 after one cycle - bits 0-3 of the read value correspond directly to the signals' current values. In effect: - write value = 0, pulse = any to set signal to 0 indefinitely - write value = 1, pulse = 0 to set signal to 1 indefinitely - write value = 1, pulse = 1 to set signal to 1 for one pulse only [and then set to 0 indefinitely] .. _pcounter-signal-periodic: The PERIODIC signal ------------------- On G84+, each domain has a single PERIODIC signal connected to a simple periodic pulse generator. The pulse generator will generate a single-clock '1' pulse every X clocks, with X selectable via the CTRL register from powers of two between 0x400 and 0x10000 clocks. The PERIODIC signal can also be disabled - it'll output a constant '0' signal in this case. The GCTRL register has a global PERIODIC_RESET bit that keeps the periodic generator in a reset state while it's set to 1. This bit can be used to start the PERIODIC signal generators synchronously for all domains. .. _pcounter-input: Input selection =============== Each domain has up to 256 signals, but only a handful of inputs are used for the counting process. They are: - PRE, START, EVENT, STOP: created from 4 individually selected signals through an arbitrary 4-input logic operation, used by the counting process - CLRFLAG, SETFLAG: likewise created through an arbitrary 4-input logic operation, but on NV30+ the logic operation input signal selections are shared with PRE/START/EVENT/STOP inputs [NV10:NV30 have separate selections like the other inputs]. Used to control the FLAG. - SWAP [NV30-]: hardwired to PGRAPH.PM_TRIGGER on NV30:G84, can be assigned to an arbitrary signal [without logic operation] on G84+. Used by the quad event mode. - UNK8 [G84:GF100]: can be assigned to an arbitrary signal, also without logic operation. Purpose unknown .. todo:: UNK8 Starting with NV30, the SETFLAG input may also be used as an argument to the EVENT and STOP logic operations, allowing one to construct 7-input logic operations. The registers used to select the signals going into the logic operations are: .. reg:: 32 pcounter-pre-src PRE input selection Selects the 4 signals used as inputs to PRE's logic operation. - bits 0-7: signal 0 - bits 8-15: signal 1 - bits 16-23: signal 2 - bits 24-31: signal 3 On NV30+, these signals are also used as inputs to CLRFLAG and SETFLAG logic operations. .. reg:: 32 pcounter-start-src START input selection Like PRE_SRC, but for START. On NV30+, these signals are also used as inputs to CLRFLAG and SETFLAG logic operations, and are used as a 4-bit integer or low 4 bits of 6-bit integer in special counter modes. .. reg:: 32 pcounter-event-src EVENT input selection Like PRE_SRC, but for EVENT. On NV40+, signals 2 and 3 are also used as high 2 bits of a 6-bit integer in special counter modes, and signals 0 and 1 are used as a 2-bit integer. .. reg:: 32 pcounter-stop-src STOP input selection Like PRE_SRC, but for STOP. .. reg:: 32 pcounter-setflag-src SETFLAG input selection Like PRE_SRC, but for SETFLAG. .. reg:: 32 pcounter-clrflag-src CLRFLAG input selection Like PRE_SRC, but for CLRFLAG. For convenience, the status of all 16 source signals can be checked by reading the SRC_STATUS register on NV40+: .. reg:: 32 pcounter-src-status Selected inputs status - bits 0-3: current state of PRE_SRC signals 0-3 - bits 4-7: current state of START_SRC signals 0-3 - bits 8-11: current state of EVENT_SRC signals 0-3 - bits 12-15: current state of STOP_SRC signals 0-3 The PRE/START/EVENT/STOP/SETFLAG/CLRFLAG input calculation goes like that: 1. Start with the 4 signals selected by corresponding SRC register, call them SRC[0..3]. If on NV30+ and the input being calculated is SETFLAG/CLRFLAG, the SRC register doesn't exist, and SRC[0..3] are instead set to: - SETFLAG: START_SRC[2], START_SRC[3], PRE_SRC[0], PRE_SRC[1] - CLRFLAG: PRE_SRC[2], PRE_SRC[3], START_SRC[0], START_SRC[1] 2. Initially, set ARG[0..3] to SRC[0..3] 3. If argument 0 delay bit is set, set ARG[0] to SRC[0] as of previous clock cycle instead. 4. If argument 1 delay bit is set, set ARG[1] to SRC[1] as of previous clock cycle instead. 5. If on G92+ and argument 2 SRC[0] delay replace bit is set, set ARG[2] to SRC[0] as of previous clock cycle instead. 6. If on G92+ and argument 3 SRC[1] delay replace bit is set, set ARG[3] to SRC[1] as of previous clock cycle instead. 7. If on NV30+, the input being calculated is EVENT or STOP, and argument 3 SETFLAG replace bit is set, set ARG[3] to the value of SETFLAG input [computed in the same clock cycle - *not* delayed] 8. Perform the logic operation on ARG[0..3] to get the final value of the input. This is done as follows: - construct a 4-bit index i, with bit 0 set to ARG[0], bit 1 set to ARG[1], and so on - the value of the input is set to bit #i of the logic operation selector The logic operation selector thus effectively functions as a truth table for the logic operation. The registers selecting the actual logic operation are: .. reg:: 32 pcounter-pre-op PRE logic operation - bits 0-15: the logic operation to perform on the signals selected by PRE_SRC - bit 16: if set, argument 0 of the logic operation is delayed by 1 clock cycle - bit 17: if set, argument 1 of the logic operation is delayed by 1 clock cycle - bit 18: selects argument 2 of the logic operation [G92-] - 0: PRE_SRC[2] - 1: PRE_SRC[0] delayed by 1 clock cycle - bit 19: selects argument 3 of the logic operation [G92-] - 0: PRE_SRC[3] - 1: PRE_SRC[1] delayed by 1 clock cycle This register is special - writing it will cause a swap in quad event mode on G84:GF100, and start the single event mode counting process on NV10:GF100. .. reg:: 32 pcounter-start-op START logic operation - bits 0-15: the logic operation to perform on the signals selected by START_SRC - bit 16: if set, argument 0 of the logic operation is delayed by 1 clock cycle - bit 17: if set, argument 1 of the logic operation is delayed by 1 clock cycle - bit 18: selects argument 2 of the logic operation [G92-] - 0: START_SRC[2] - 1: START_SRC[0] delayed by 1 clock cycle - bit 19: selects argument 3 of the logic operation [G92-] - 0: START_SRC[3] - 1: START_SRC[1] delayed by 1 clock cycle .. reg:: 32 pcounter-event-op EVENT logic operation - bits 0-15: the logic operation to perform on the signals selected by EVENT_SRC - bit 16: if set, argument 0 of the logic operation is delayed by 1 clock cycle - bit 17: if set, argument 1 of the logic operation is delayed by 1 clock cycle - bit 18: selects argument 3 of the logic operation [NV30-]: - 0: EVENT_SRC[3] [NV30:G92] or as selected by bit 20 [G92-] - 1: SETFLAG - bit 19: selects argument 2 of the logic operation [G92-] - 0: EVENT_SRC[2] - 1: EVENT_SRC[0] delayed by 1 clock cycle - bit 20: selects argument 3 of the logic operation, if not set to SETFLAG by bit 18 [G92-] - 0: EVENT_SRC[3] - 1: EVENT_SRC[1] delayed by 1 clock cycle .. reg:: 32 pcounter-stop-op STOP logic operation - bits 0-15: the logic operation to perform on the signals selected by STOP_SRC - bit 16: if set, argument 0 of the logic operation is delayed by 1 clock cycle - bit 17: if set, argument 1 of the logic operation is delayed by 1 clock cycle - bit 18: selects argument 3 of the logic operation [NV30-]: - 0: STOP_SRC[3] [NV30:G92] or as selected by bit 20 [G92-] - 1: SETFLAG - bit 19: selects argument 2 of the logic operation [G92-] - 0: STOP_SRC[2] - 1: STOP_SRC[0] delayed by 1 clock cycle - bit 20: selects argument 3 of the logic operation, if not set to SETFLAG by bit 18 [G92-] - 0: STOP_SRC[3] - 1: STOP_SRC[1] delayed by 1 clock cycle .. reg:: 32 pcounter-setflag-op SETFLAG logic operation - bits 0-15: the logic operation to perform. - bit 16: if set, argument 0 of the logic operation is delayed by 1 clock cycle - bit 17: if set, argument 1 of the logic operation is delayed by 1 clock cycle - bit 18: selects argument 2 of the logic operation [G92-] - 0: PRE_SRC[0] - 1: START_SRC[2] delayed by 1 clock cycle - bit 19: selects argument 3 of the logic operation [G92-] - 0: PRE_SRC[1] - 1: START_SRC[3] delayed by 1 clock cycle .. reg:: 32 pcounter-clrflag-op CLRFLAG logic operation - bits 0-15: the logic operation to perform. On NV10:NV30, the arguments are selected by SETFLAG_SRC. On NV30+, the arguments are: PRE_SRC[2], PRE_SRC[3], START_SRC[0], START_SRC[1]. - bit 16: if set, argument 0 of the logic operation is delayed by 1 clock cycle - bit 17: if set, argument 1 of the logic operation is delayed by 1 clock cycle - bit 18: selects argument 2 of the logic operation [G92-] - 0: START_SRC[0] - 1: PRE_SRC[2] delayed by 1 clock cycle - bit 19: selects argument 3 of the logic operation [G92-] - 0: START_SRC[1] - 1: PRE_SRC[3] delayed by 1 clock cycle .. todo:: check bits 16-20 on GF100 The register used to select the SWAP and UNK8 inputs on G84:GF100 cards is: .. reg:: 32 pcounter-spec-src SWAP and UNK8 input selection - bits 0-7: the SWAP signal - bits 8-15: the UNK8 signal And on GF100+: .. reg:: 32 pcounter-swap-src SWAP input selection - bits 0-7: the SWAP signal On NV10:GF100, writing any of the _SRC and _OP registers except PRE_OP in single event mode will result in the state being reset to INACTIVE. Writing PRE_OP will start the counting process, setting the state to WAIT_PRE. On G84:GF100 in quad event mode, writing PRE_OP will cause a swap, as if the SWAP input was asserted for one cycle. .. todo:: figure out how single event mode is supposed to be used on GF100+ .. _pcounter-counter: Counters ======== The single event mode and quad event mode use MMIO-visible counter registers. They are: - CTR_CYCLES: counts all clock cycles in a counting period - CTR_CYCLES_ALT: a copy of CTR_CYCLES? - CTR_EVENT: counts 1s on EVENT input, or sums integers in EVENT_* special counter modes - CTR_START: in quad event mode, counts 1s on START input, or sums integers in EXTRA_* special counter modes; in single event mode counts measurement periods in which CTR_EVENT reached value >= THRESHOLD - CTR_PRE: in quad event mode, counts 1s on PRE input; in single event mode, counts down PRE assertions until WAIT_FOR_PRE state is left, then sums integers in EXTRA_* special counter modes and is unused otherwise. - CTR_STOP: in quad event mode, counts 1s on STOP input; in single event mode, counts down counting periods until the counting process ends. .. todo:: wtf is CYCLES_ALT? On NV10:NV30, the CTR_CYCLES, CTR_CYCLES_ALT, CTR_EVENT and CTR_START counters are 40-bit, while CTR_PRE and CTR_STOP are 32-bit. On NV30+, all counters are 32-bit. On NV30+, The counters are saturated - once they reach the largest possible value [0xffffffff], they stop incrementing. On NV10:NV30, the low 39 bits will wrap normally, but bit 39 is sticky: that is, 0xffffffffff increments to 0x8000000000, while other values increment normally. The registers used to access the counters are: .. reg:: 32 pcounter-ctr-cycles Elapsed cycles counter Read-only, gives the current value of CTR_CYCLES. Returns low 32 bits on NV10:NV30. .. reg:: 32 pcounter-ctr-cycles-hi Elapsed cycles counter - high part Read-only, gives the high 8 bits of the current value of CTR_CYCLES. .. reg:: 32 pcounter-ctr-cycles-alt Elapsed cycles counter copy Read-only, gives the current value of CTR_CYCLES_ALT. Returns low 32 bits on NV10:NV30. .. reg:: 32 pcounter-ctr-cycles-alt-hi Elapsed cycles counter copy - high part Read-only, gives the high 8 bits of the current value of CTR_CYCLES_ALT. .. reg:: 32 pcounter-ctr-event EVENT counter Read-only, gives the current value of CTR_EVENT. Returns low 32 bits on NV10:NV30. .. reg:: 32 pcounter-ctr-event-hi EVENT counter - high part Read-only, gives the high 8 bits of the current value of CTR_EVENT. .. reg:: 32 pcounter-ctr-start START counter Read-only, gives the current value of CTR_START. Returns low 32 bits on NV10:NV30. .. reg:: 32 pcounter-ctr-start-hi START counter - high part Read-only, gives the high 8 bits of the current value of CTR_START. .. reg:: 32 pcounter-ctr-pre PRE counter When read, gives the current value of CTR_PRE. When written, sets the initial CTR_PRE value for single-event mode. .. reg:: 32 pcounter-ctr-stop STOP counter When read, gives the current value of CTR_STOP. When written, sets the initial CTR_STOP value for single-event mode. The CTR_PRE and CTR_STOP counters have two values: the visible "current" value, and the hidden "initial" value. Reading the corresponding register reads the "current" value, while writing sets the "initial" value. The "initial" values are used when starting counting process in single event mode. Note that, in quad event mode, these registers access the copies of the counters from previous counting period, and the currently active counters are not visible. The record mode uses a different counting algorithm, and the counters are written to memory instead of being accessed directly via MMIO. The same underlying storage is used internally, so parts of the counter state may be visible via MMIO registers. This isn't particularly useful. .. todo:: figure out what's the deal with GF100 counters .. _pcounter-counter-mode: Special counter modes --------------------- While the simplest way to use the counters is to have them increment by 1 every clock cycle when a given input is set, PCOUNTER supports a few more complex modes where a 4-bit, 6-bit, or 2-bit integer made of several signals is added to a counter on every cycle. This is used to count events which can happen multiple times in a single cycle - the relevant unit then exports a multi-bit event count, instead of simple event strobe. The integers used in special copunter modes are: - B4: 4-bit integer, made of the following signals, in low-to-high bit order: - START_SRC[0] - START_SRC[1] - START_SRC[2] - START_SRC[3] - B6: 6-bit integer, made of: - START_SRC[0] - START_SRC[1] - START_SRC[2] - START_SRC[3] - EVENT_SRC[2] - EVENT_SRC[3] - B2: 2-bit integer, made of: - EVENT_SRC[0] - EVENT_SRC[1] The modes are: - SIMPLE: CTR_EVENT is increased by 1 on every cycle when EVENT input is 1 [ie. nothing interesting happens] - EVENT_B4: CTR_EVENT is increased by B4 on every cycle when EVENT input is 1 - EVENT_B6 [NV40-]: CTR_EVENT is increased by B6 on every cycle when EVENT input is 1 - EXTRA_B4 [NV40-]: CTR_EVENT behaves as in SIMPLE mode, but: - single event mode: CTR_PRE, instead of staying at 0 after leaving WAIT_FOR_PRE state, is used as a counter, and is increased by B4 on every clock cycle - quad event mode: CTR_START, instead of being controlled by START input, is increased by B4 on every clock cycle - EXTRA_B6_EVENT_B2 [NV40-]: CTR_EVENT is increased by B2 on every clock cycle, and: - single event mode: CTR_PRE behaves like in EXTRA_B4 mode, but is increased by B6 instead of B4 every cycle - quad event mode: CTR_START behaves like in EXTRA_B4 mode, bus is increased by B6 instead of B4 every cycle .. todo:: figure out if there's anything new on GF100 .. _pcounter-control: Control registers ================= The operation of PCOUNTER is controlled by the CTRL registers. NV10:NV40 have a single CTRL register, shared between both domains: .. reg:: 32 pcounter-ctrl-nv10 PCOUNTER control - bit 0: TVOUT_DEBUG_SEL - selects the signals that go to TV-out debug port, if enabled. - bit 1: TVOUT_DEBUG_ENABLE - if 0, external TV encoder pins behave normally; if 1, the display circuitry signals are disconnected, and internal PCOUNTER debug pins are exposed via these pins. - bit 2: CTR_MODE - selects counter mode [see above], affects both domains - 0: SIMPLE - 1: EVENT_B4 - bits 3-4: DOM0_SINGLE_STATE - read-only, reads as the current single event mode state for domain #0: - 0: INACTIVE - 1: WAIT_PRE - 2: WAIT_START - 3: COUNTING - bits 5-6: DOM1_SINGLE_STATE [NV20:NV40] - like bits 3-4, but for domain #1 - bit 8: DOM0_EVENT_CTR_PERIOD [NV15:NV40] - EVENT_CTR_PERIOD for domain #0: - 0: ONE - 1: ALL - bit 9: DOM1_EVENT_CTR_PERIOD [NV20:NV40] - like bit 8, but for domain #1 - bit 16: DOM0_MODE [NV30:NV40] - selects counting mode for domain #0: - 0: SINGLE - single event mode - 1: QUAD - quad event mode - bit 18: DOM1_MODE [NV30:NV40] - like bit 16, but for domain #1 - bits 24-25: DOM0_QUAD_STATE [NV30:NV40] - read-only, reads as the current quad event mode state for domain #0: - 0: EMPTY - 1: VALID - 3: OVERFLOW - bits 26-27: DOM1_QUAD_STATE [NV30:NV40] - like bits 24-25, but for domain #1 NV40:GF100 instead have per-domain CTRL registers: .. reg:: 32 pcounter-ctrl-nv40 PCOUNTER domain control - bits 0-1: MODE - selects counting mode - 0: SINGLE - single event mode - 1: QUAD - quad event mode - 2: RECORD - record mode - bits 4-6: CTR_MODE - selects counter mode - 0: SIMPLE - 1: EVENT_B4 - 2: EVENT_B6 - 3: EXTRA_B4 - 4: EXTRA_B6_EVENT_B2 - bit 8: EVENT_CTR_PERIOD - like on NV15 - bit 11: EVENT_IMPORT_MODE - selects synchronization mode for EVENT signals imported from other domains - 0: CONTINUOUS - 1: PULSE - bit 13: FLAG_IMPORT_MODE - like bit 11, but for FLAG signals - bit 16: ??? - bit 20: RECORD_FORMAT - selects packet format for record mode [G84:GF100] - 0: LONG - 32-byte packets with 12 usable event counters - 1: SHORT - 16-byte packets with 4 usable event counters - bits 21-23: PERIODIC_PERIOD [G84:GF100] - selects PERIODIC signal period: - 0: disabled, PERIODIC signal is always 0 - 1: 0x400 clocks - 2: 0x800 clocks - 3: 0x1000 clocks - 4: 0x2000 clocks - 5: 0x4000 clocks - 6: 0x8000 clocks - 7: 0x10000 clocks - bits 24-25: QUAD_STATE - like on NV30 - bit 27: FAULT_CLEAR - write-only, when written as 1 clears the FAULT bit in RECORD_STATUS. Note, however, that the domain will still be in a wedged state due to [probably] a hardware bug. This bit is thus useless. - bits 28-29: SINGLE_STATE - like on NV10 - bit 30: ??? [G92:GF100] .. todo:: unk bits In addition, G84:GF100 have a global GCTRL register used for a few bits shared by all domains: .. reg:: 32 pcounter-gctrl-g84 PCOUNTER global control - bit 0: RECORD_RESET - when set to 0, record counters increment normally; when set, forces all record counters to 0 value - bit 4: PERIODIC_RESET - when set to 0, PERIODIC signals operate normally; when set, PERIODIC signals are forced to 0 and will continue from the beginning of the cycle upon reenabling .. todo:: more bits .. todo:: GF100 .. _pcounter-mode-single: Single event mode ================= In single event mode, one event input is being monitored and counted, with quite complex counting period management. The inputs used by single event mode counting process are PRE, START, EVENT, STOP. The counting process may be in one of 4 states: - INACTIVE: nothing is happening, PCOUNTER needs to be set up - WAIT_FOR_PRE: counting process has started, but PRE pulses are reuired before it's actually possible to start a counting period - WAIT_FOR_START: counting process has started, a counting period is not currently active, but will be started on a START pulse - COUNTING: a counting period is currently active, and the counters are in use Counting process works like this: On every cycle:: if (PCOUNTER config register other than PRE_OP written this cycle) { SINGLE_STATE = INACTIVE; } switch (SINGLE_STATE) { case INACTIVE: if (PRE_OP written this cycle) { /* start counting process, init counters */ CTR_EVENT = 0; CTR_START = 0; CTR_CYCLES = CTR_CYCLES_ALT = 0; CTR_PRE = CTR_PRE_init; CTR_STOP = CTR_STOP_init; FLAG = 0; SINGLE_STATE = WAIT_FOR_PRE; } break; case WAIT_FOR_PRE: if (SETFLAG) FLAG = 1; if (CLRFLAG) FLAG = 0; if (PRE) { if (CTR_PRE != 0) { CTR_PRE--; } else { SINGLE_STATE = WAIT_FOR_START; } } break; case WAIT_FOR_START: if (SETFLAG) FLAG = 1; if (CLRFLAG) FLAG = 0; if (START) { CTR_CYCLES = CTR_CYCLES_ALT = 0; if (gpu < NV15 || EVENT_CTR_PERIOD == ONE) CTR_EVENT = 0; SINGLE_STATE = COUNTING; } break; case COUNTING: if (SETFLAG) FLAG = 1; if (CLRFLAG) FLAG = 0; increase CTR_EVENT and maybe CTR_PRE according to the counter mode; if (STOP) { if (CTR_EVENT >= THRESHOLD) CTR_START++; if (CTR_STOP != 0) { CTR_STOP--; SINGLE_STATE = WAIT_FOR_START; } else { SINGLE_STATE = INACTIVE; } } } Or, in summary: - before actual counting, (CTR_PRE+1) 1s must happen on PRE input - a counting process consists of (CTR_STOP+1) counting periods - a counting period is started by 1 on START input and stopped by 1 on STOP input - events outside of a counting period don't count - if EVENT_CTR_PERIOD is ONE, CTR_EVENT effectively applies to a counting period, if it's ALL, it contains a sum over all counting periods. CTR_PRE, when EXTRA_* counter mode is in use, always contains a sum over all counting periods. NV10:NV15 cards don't have this submode bit and always behave as if it was ONE. - CTR_CYCLES always contains length of current [COUNTING] or last [WAIT_FOR_START] couting period - CTR_START will contain the number of counting periods that ended with CTR_EVENT >= THRESHOLD - probably only useful with EVENT_CTR_PERIOD = ONE. - writing any \*_OP register except PRE_OP, any \*_SRC register, any CTR register, THRESHOLD register, and CTRL register will abort the counting process - flag is frozen when in INACTIVE state, cleared to 0 when entering WAIT_FOR_PRE Single event mode doesn't use shadow counters - the values of all counters are immediately visible through MMIO registers. The threshold value for CTR_START counter can be set and read via the following registers: .. reg:: 32 pcounter-threshold EVENT counter threshold The THRESHOLD value, or low 32 bits of THRESHOLD value on NV10:NV30. .. reg:: 32 pcounter-threshold-hi EVENT counter threshold - high part The high 8 bits of THRESHOLD value. .. todo:: threshold on GF100 .. _pcounter-mode-quad: Quad event mode =============== In quad event mode, 4 different event inputs are counted, each in a dedicated counter. The events are counted in invisible "shadow" registers, while the visible registers contain the final values of counters from previous counting period. Counting periods are controlled by the special SWAP input, which copies the "shadow" counters to visible registers, and clears the shadow counters to 0. In addition, the SWAP signal marks the counter values as available in the CONTROL register. The counters used in quad event mode are: - CTR_CYCLES and CTR_CYCLES_ALT: increases by 1 for every cycle - CTR_EVENT: increases as per the counter mode, usually by 1 for every cycle when EVENT input is set - CTR_START: increases as per the counter mode, usually by 1 for every cycle when START input is set - CTR_PRE: increases by 1 for every cycle when PRE input is set - CTR_STOP: increases by 1 for every cycle when STOP input is set When in quad event mode, the counters are always active - there's no INACTIVE state like in single event mode. The counter swap is triggered on every cycle when SWAP input is set. On G84:GF100, the counter swap is also triggered on every write to the PRE_OP register. The PCOUNTER keeps track of how many counter value sets have been swapped and how many have been read. It can thus be in one of the three states: - EMPTY - no new counter values to read - VALID - swap has happened and counter values are available for reading - OVERFLOW - another swap has happened while in VALID state, and counter values were lost A swap bumps the state up one unit - EMPTY goes to VALID, VALID goes to OVERFLOW, and OVERFLOW is unchanged. Note that the swap is performed before updating the counters for a given cycle - thus if SWAP and one of the event inputs are active on the same cycle, the events will be counted for the *next* period. The software may inform the PCOUNTER of read completion by poking the write-only QUAD_ACK_TRIGGER register. The register is shared for all domains on NV30:NV40, and per-domain for NV40+: .. reg:: 32 pcounter-quad-ack-trigger-nv30 Acks counter data in quad event mode - bit 0: DOM0 - when written as 0, nothing happens. When written as 1, the status of domain #1 is bumped down one unit - VALID goes to EMPTY, OVERFLOW goes to VALID, and EMPTY is unchanged. - bit 8: DOM1 - like DOM0, but affects domain #1 .. reg:: 32 pcounter-quad-ack-trigger-nv40 Acks counter data in quad event mode - bit 0: Like NV30's DOM0/DOM1 bits, affects the domain the register is in. .. _pcounter-mode-record: Record mode =========== In record mode, counter values are written to memory for later analysis instead of being read via MMIO - this enables much more frequent sampling and simplifies software. The counter values are written to a given virtual memory buffer in 16-byte or 32-byte packets, consisting of 14 counters. A new packet is written whenever one of the 12 event counters is close to overflowing, or when the STOP input is asserted. The counters are: - 48-bit cycles counter, incremented by 1 on every cycle, cleared only when record mode operation is started by writting the RECORD_START register or GCTRL.RECORD_RESET is set to 1. This counter wraps on overflow. - 12 16-bit event counters, corresponding to 12 monitored signals selected by PRE_SRC[0..3], START_SRC[0..3], EVENT_SRC[0..3]. Incremented by 1 on every cycle when corresponding signal is 1. Cleared after writing a packet. A packet write is triggered whenever any of these counters reaches 0xf000. If a counter reaches 0xffff, it stops incrementing further. - 12-bit STOP counter, incremented by 1 whenever the STOP input is 1. Cleared after writing a packet. A packet write is triggered whenever this counter is non-0. If this counter reaches 0xfff, it stops incrementing further. There are two packet formats available: long and short. Long format packets are 32 bytes long and include all counters, while short format paackets are 16 bytes long and have only 4 of the 12 event counters. A packet in long format is made of 16 16-bit little endian words: - 0x00: low 16 bits of cycle counter - 0x02: middle 16 bits of cycle counter - 0x04: high 16 bits of cycle counter - 0x06: - bits 0-11: the STOP counter - bits 12-15: always 0 - 0x08: PRE_SRC[0] event counter - 0x0a: PRE_SRC[1] event counter - 0x0c: PRE_SRC[2] event counter - 0x0e: PRE_SRC[3] event counter - 0x10: START_SRC[0] event counter - 0x12: START_SRC[1] event counter - 0x14: START_SRC[2] event counter - 0x16: START_SRC[3] event counter - 0x18: EVENT_SRC[0] event counter - 0x1a: EVENT_SRC[1] event counter - 0x1c: EVENT_SRC[2] event counter - 0x1e: EVENT_SRC[3] event counter A packet in short format is simply the first 16 bytes of a packet in long format. Packets are normally written to memory when STOP input is asserted. For this reason, packets in memory will usually have the STOP counter equal to 1 [for the one pulse that triggered them]. However, to avoid saturating the event counters, a packet write will also be triggered whenever any event counter is >= 0xf000. The STOP counter in the memory packet will be equal to 0 in this case. STOP counter values greater than 1 are possible when STOP input is asserted too often for the memory interface to keep up - each domain has place for one outgoing packet. Whenever a packet write is triggered and there isn't an outgoing packet yet, the packet will be sent, and the counters reset. When a packet write is triggered and there already is an outgoing packet, nothing will happen - the counters will just keep incrementing until the current packet write is finished. .. todo:: check if still valid on GF100 Record mode setup ----------------- Before record mode is started, a few registers need to be set up. First, the channel and DMA object for the record buffer need to be bound. The PCOUNTER will access virtual memory as engine 0xb, client 0xf, DMA slot 0. The channel and DMA object are global for all domains. Note that the channel register has to be written *after* the DMA object register for a successful bind. .. reg:: 32 pcounter-record-dma DMA object for record mode - bits 0-15: the DMA object to be used by PCOUNTER. Writing this register only stores the DMA object, it doesn't actually bind it - the bind is done by RECORD_CHAN write. .. reg:: 32 pcounter-record-chan VM channel for record mode - bits 0-29: CHAN - the channel to bind to PCOUNTER engine - bit 31: VALID - if set, a channel bind and DMA object bind will be done when writing this register. If unset, the register will be written, but no binds will be done. The address of the record buffer is settable per-domain: .. reg:: 32 pcounter-record-start The starting address of the record buffer The start address of the record buffer. Only bits 4-31 are valid - the buffer has to be aligned to 16 byte bounduary. When this register is written, the address is copied to RECORD_STATUS position field, the "buffer valid" internal flag will be set, and all counters are reset if the domain is in record mode. Note that setting this register will not properly clear the counter state if the domain is not in record mode - in fact, a bogus packet will likely be written immediately after transitioning to the record mode if RECORD_START is written in another mode. To avoid that, write RECORD_START after entering record mode [and make sure the "buffer valid" flag is not set], or use the GCTRL.RECORD_RESET bit. .. reg:: 32 pcounter-record-limit The highest valid address in the record buffer The last valid address in the record buffer. Only bits 4-31 are valid. After a packet is written with address >= the value of this register, the internal "buffer valid" flag will be cleared, and all further writes will be ignored until RECORD_START is written. Note that one packet write will always succeed before the limit hit flag is set and further writes are disabled - even if the position is set far beyond the limit. .. reg:: 32 pcounter-record-status Current status and position of record buffer This register is read-only. - bit 0: if set, a VM FAULT happened when writing the record buffer - bits 4-31: bits 4-31 of the current record buffer position, ie. address of the next packet to be written The PCOUNTER internally operates on 32-bit addresses. On G84:G92, the high 8 bits of 40-bit virtual address are always forced to 0, limitting the record buffer to low 4GB of the VM space. On G92+, the high 8 bits of the address are instead taken from a register: .. reg:: 32 pcounter-record-address-high High 8 bits of record buffer address Sets the high 8 bits of the record buffer virtual address. Note, however, that the internal address size is still 32-bit: the position will thus wrap at 4GB bounduary, instead of incrementing bit 32 of address. For this reason, record buffers that cross a 4GB block bounduary in virtual space cannot be used. Note that VM faults on the record buffer will permanently hang the faulting domain until the GPU is reset - while there's a "clear VM FAULT status" bit in the control register, it only clears the status bit, while hardware is still in a wedged state. This is likely a hardware bug. .. todo:: figure out record mode setup for GF100 .. _pcounter-flag: The flag ======== The FLAG is a single per-domain bit that can be set and cleared via the SETFLAG and CLRFLAG inputs. On every clock cycle: - if CLRFLAG is 1, the FLAG is set to 0 - if SETFLAG is 1 and CLRFLAG is 0, the FLAG is set to 1 - if both CLRFLAG and SETFLAG are 0, the FLAG is unchanged In addition, when in single-event mode, the FLAG is frozen [will not respond to CLRFLAG/SETFLAG] when in INACTIVE state, and will be cleared to 0 when going to WAIT_FOR_PRE state. The current value of the FLAG is available as a common trailer signal to all domains in the same domain set, allowing complex operations to be performed. Note however that the effect of CLRFLAG/SETFLAG on the FLAG signal is delayed by 2 clock cycles - if the SETFLAG input becomes 1 on cycle X, the FLAG signal will become 1 on cycle X+2.