# Arithmetic instructions¶

## Introduction¶

The arithmetic/logical instructions do operations on \$r0-\$r15 GPRs, sometimes setting bits in \$flags register according to the result. The instructions can be “sized” or “unsized”. Sized instructions have 8-bit, 16-bit, and 32-bit variants. Unsized instructions don’t have variants, and always operate on full 32-bit registers. For 8-bit and 16-bit sized instructions, high 24 or 16 bits of destination registers are unmodified.

## \$flags result bits¶

The \$flags bits often affected by ALU instructions are:

• bit 8: c, carry flag. Set by addition instructions iff a carry out of the high bit (or, equivalently, unsigned overflow) has occured. Likewise set by subtraction instructions iff a borrow into the high bit (or unsigned overflow) has occured. Also used by shift instructions to store the last shifted out bit. Used as the less-than condition in old comparisons.
• bit 9: o, signed overflow flag - set by addition, subtraction, comparison, and negation instructions if a signed overflow occured. Set to 0 by some other instructions.
• bit 10: s, sign flag - set according to the high bit of the result by most arithmetic instructions.
• bit 11: z, zero flag - set iff the result was equal to 0 by most arithmetic instructions.

Also, a few ALU instructions operate on \$flags register as a whole.

## Pseudocode conventions¶

`sz`, for sized instructions, is the selected size of operation: 8, 16, or 32.

`S(x)` evaluates to `(x >> (sz - 1) & 1)`, ie. the sign bit of `x`. If insn is unsized, assume `sz == 32`.

`C(a, b, c)`, where `a, b, c` are booleans, is the carry flag for an addition where the two inputs have high bits of `a` and `b`, and the result has a high bit of `c`. It is computed as follows:

```bool C(bool a, bool b, bool c) {
// a and b both set - there is always carry out.
if (a && b)
return 1;
// One of a and b is set - there is carry out iff result has high
// bit 0.
if ((a || b) && !c)
return 1;
# Otherwise (a and b both clear), there is no possibility of carry
# out.
return 0;
}
```

Also, `!C(a, !b, c)` is the borrow flag for a subtraction where the two inputs have high bits of `a` and `b`, and the result has a high bit of `c`.

Likewise, `O(a, b, c)` is similarly defined as the signed overflow flag for an addition:

```bool O(bool a, bool b, bool c) {
return a == b && a != c;
// equivalent definition (check it yourself):
// return a ^ b ^ c ^ C(a, b, c);
}
```

Similarly, `O(a, !b, c)` is the signer overflow flag for subtraction.

## Comparison: cmpu, cmps, cmp¶

Compare two values, setting flags according to results of comparison. `cmp` sets the usual set of 4 flags, and behaves identically to a subtraction instruction that doesn’t write its destination register. `cmpu` sets only `c` and `z`, but otherwise behaves like `cmp` - thus it is only useful for unsigned comparisons. `cmps` sets `z` normally, but sets `c` iff `SRC1` is less then `SRC2` when treated as signed number (thus using unsigned condition codes to store the result of a signed comparison instead).

`cmpu`/`cmps` are the only comparison instructions available on Falcon v0. Both of them set only the `c` and `z` flags, with `cmps` setting `c` flag in an unusual way to enable signed comparisons while using unsigned flags and condition codes. To do an unsigned comparison, use `cmpu` and the unsigned branch conditions [`b/a/e`]. To do a signed comparison, use `cmps`, also with unsigned branch conditions.

The Falcon v3+ new `cmp` instruction sets the full set of flags. To do an unsigned comparison on v3+, use `cmp` and the unsigned branch conditions. To do a signed comparison, use cmp and the signed branch conditions [`l/g/e`].

Instructions:
Name Description Present on Subopcode
cmpu compare unsigned all units 4
cmps compare signed all units 5
cmp compare v3+ units 6
Instruction class:
sized
Execution time:
1 cycle
Operands:
SRC1, SRC2
Forms:
Form Opcode
R2, I8 30
R2, I16 31
R2, R1 38
Immediates:
cmpu:
zero-extended
cmps:
sign-extended
cmp:
sign-extended
Operation:
```uint<sz>_t diff = SRC1 - SRC2;
\$flags.z = (diff == 0);
if (op == cmps)
\$flags.c = O(S(SRC1), !S(SRC2), S(diff)) ^ S(diff);
else if (op == cmpu)
\$flags.c = !C(S(SRC1), !S(SRC2), S(diff));
else if (op == cmp) {
\$flags.c = !C(S(SRC1), !S(SRC2), S(diff));
\$flags.o = O(S(SRC1), !S(SRC2), S(diff));
\$flags.s = S(diff);
}
```

Add or substract two values, possibly with carry/borrow. The full set of arithmetic flags is always written.

Instructions:
Name Description Subopcode
sub substract 2
sbb substrace with borrow 3
Instruction class:
sized
Execution time:
1 cycle
Operands:
DST, SRC1, SRC2
Forms:
Form Opcode
R1, R2, I8 10
R1, R2, I16 20
R2, R2, I8 36
R2, R2, I16 37
R2, R2, R1 3b
R3, R2, R1 3c
Immediates:
zero-extended
Operation:
```uint<sz>_t res;
if (op == add)
res = SRC1 + SRC2;
else if (op == adc)
res = SRC1 + SRC2 + \$flags.c;
else if (op == sub)
res = SRC1 - SRC2;
else if (op == sbb)
res = SRC1 - SRC2 - \$flags.c;

if (op == add || op == adc) {
\$flags.c = C(S(SRC1), S(SRC2), S(res));
\$flags.o = O(S(SRC1), S(SRC2), S(res));
} else {
\$flags.c = !C(S(SRC1), !S(SRC2), S(res));
\$flags.o = O(S(SRC1), !S(SRC2), S(res));
}
DST = res;
\$flags.s = S(res);
\$flags.z = (res == 0);
```

## Shifts: shl, shr, sar, shlc, shrc¶

Shift a value. For `shl/shr`, the extra bits “shifted in” are 0. For `sar`, they’re equal to sign bit of source. For `shlc/shrc`, the first such bit is taken from carry flag, the rest are 0. On Falcon v3+, these instructions set all 4 arithmetic flags - `s` and `z` are set as usual, `o` is always set to 0, and `c` is set to the value of the last shifted out bit, or 0 if the shift count was 0. On Falcon v0, only `c` is set.

The shift count is always masked to 3 bits in case of 8-bit shift instructions, 4 bits in case of 16-bit shift instructions, and 5 bits in case of 32-bit shift instructions.

Instructions:
Name Description Subopcode
shl shift left 4
shr shift right 5
sar shift right with sign bit 6
shlc shift left with carry in c
shrc shift right with carry in d
Instruction class:
sized
Execution time:
1 cycle
Operands:
DST, SRC1, SRC2
Forms:
Form Opcode
R1, R2, I8 10
R2, R2, I8 36
R2, R2, R1 3b
R3, R2, R1 3c
Immediates:
truncated
Operation:
```unsigned shcnt;
if (sz == 8)
shcnt = SRC2 & 7;
else if (sz == 16)
shcnt = SRC2 & 0xf;
else // sz == 32
shcnt = SRC2 & 0x1f;
uint<sz>_t res;
if (op == shl || op == shlc) {
res = SRC1 << shcnt;
if (op == shlc && shcnt != 0)
res |= \$flags.c << (shcnt - 1);
if (shcnt == 0)
\$flags.c = 0;
else
\$flags.c = SRC1 >> (sz - shcnt) & 1;
} else { // shr, sar, shrc
res = SRC1 >> shcnt;
if (op == shrc && shcnt != 0)
res |= \$flags.c << (sz - shcnt);
if (op == sar && S(SRC1))
res |= ~0 << (sz - shcnt);
if (shcnt == 0)
\$flags.c = 0;
else
\$flags.c = SRC1 >> (shcnt - 1) & 1;
}
DST = res;
if (falcon_version != 0) {
\$flags.o = 0;
\$flags.s = S(DST);
\$flags.z = (DST == 0);
}
```

## Unary operations: not, neg, mov, movf, hswap¶

not flips all bits in a value. neg negates a value. mov and movf move a value from one register to another. mov is the v3+ variant, which just does the move. movf is the v0 variant, which additionally sets flags according to the moved value. hswap rotates a value by half its size. All instructions except `mov` set 3 flags: `s` and `z` (which are set as usual), as well as `o` (which is set iff signed overflow occured for `neg`, and always set to 0 for other instructions).

Instructions:
Name Description Present on Subopcode
not bitwise complement all units 0
neg negate a value all units 1
movf move a value and set flags v0 units 2
mov move a value v3+ units 2
hswap Swap halves all units 3
Instruction class:
sized
Execution time:
1 cycle
Operands:
DST, SRC
Forms:
Form Opcode
R1, R2 39
R2, R2 3d
Operation:
```if (op == not) {
DST = ~SRC;
\$flags.o = 0;
} else if (op == neg) {
DST = -SRC;
\$flags.o = (DST == 1 << (sz - 1));
} else if (op == movf) {
DST = SRC;
\$flags.o = 0;
} else if (op == mov) {
DST = SRC;
} else if (op == hswap) {
DST = SRC >> (sz / 2) | SRC << (sz / 2);
\$flags.o = 0;
}
if (op != mov) {
\$flags.s = S(DST);
\$flags.z = (DST == 0);
}
```

mov sets a register to an immediate. sethi sets high 16 bits of a register to an immediate, leaving low bits untouched. mov can be thus used to load small [16-bit signed] immediates, while mov+sethi can be used to load any 32-bit immediate.

Instructions
Name Description Subopcode
mov Load an immediate 7
sethi Set high bits 3
Instruction class:
unsized
Execution time:
1 cycle
Operands:
DST, SRC
Forms:
Form Opcode
R2, I8 f0
R2, I16 f1
Immediates:
mov:
sign-extended
sethi:
zero-extended
Operation:
```if (op == mov)
DST = SRC;
else if (op == sethi)
DST = DST & 0xffff | SRC << 16;
```

## Clearing registers: clear¶

Sets a register to 0.

Instructions:
Name Description Subopcode
clear Clear a register 4
Instruction class:
sized
Operands:
DST
Forms:
Form Opcode
R2 3d
Operation:
```DST = 0;
```

## Setting flags from a value: setf¶

Sets `z` and `s` flags according to a value, sets `o` flag to 0.

Instructions:
Name Description Present on Subopcode
setf Set flags according to a value v3+ units 5
Instruction class:
sized
Execution time:
1 cycle
Operands:
SRC
Forms:
Form Opcode
R2 3d
Operation:
```\$flags.o = 0;
\$flags.s = S(SRC);
\$flags.z = (SRC == 0);
```

## Multiplication: mulu, muls¶

Does a 16x16 -> 32 multiplication. The inputs are unsigned for `mulu`, signed for `muls`. Sets no flags.

Instructions:
Name Description Subopcode
mulu Multiply unsigned 0
muls Multiply signed 1
Instruction class:
unsized
Operands:
DST, SRC1, SRC2
Forms:
Form Opcode
R1, R2, I8 c0
R1, R2, I16 e0
R2, R2, I8 f0
R2, R2, I16 f1
R2, R2, R1 fd
R3, R2, R1 ff
Immediates:
mulu:
zero-extended
muls:
sign-extended
Operation:
```s1 = SRC1 & 0xffff;
s2 = SRC2 & 0xffff;
if (op == muls) {
if (s1 & 0x8000)
s1 |= 0xffff0000;
if (s2 & 0x8000)
s2 |= 0xffff0000;
}
DST = s1 * s2;
```

## Sign extension: sext¶

Does a sign-extension of low (X+1) bits of a value. Sets `s` and `z` flags according to the result. The second argument is, after masking to 5 bits, the bit index (counting from LSB) which contains the new sign bit - the result will be equal to the source with all bits higher than that replaced with a copy of the sign bit.

Instructions:
Name Description Subopcode
sext Sign-extend 2
Instruction class:
unsized
Execution time:
1 cycle
Operands:
DST, SRC1, SRC2
Forms:
Form Opcode
R1, R2, I8 c0
R2, R2, I8 f0
R2, R2, R1 fd
R3, R2, R1 ff
Immediates:
truncated
Operation:
```bit = SRC2 & 0x1f;
if (SRC1 & 1 << bit) {
DST = SRC1 & ((1 << bit) - 1) | -(1 << bit);
} else {
DST = SRC1 & ((1 << bit) - 1);
}
\$flags.s = S(DST);
\$flags.z = (DST == 0);
```

## Bitfield extraction: extr, extrs¶

Extracts a bitfield. The bitfield to extract is given as a pair of (low bit index, size in bits - 1) packed in a single 10-bit source, with each part taking 5 bits. The value of the bitfield is returned in the low bits of the destination register. `extr` extracts an unsigned bitfield, setting the remaining destination bits to 0, while `extrs` extracts a signed bitfield, setting the remaining bits to a copy of the sign bit (ie. the highest bit of the bitfield).

Both instructions set `s` and `z` flags. While `z` is set as usual, `s` is set to the “fill” bit used for high bits of the destination - thus it is always `0` for `extr`.

Instructions:
Name Description Present on Subopcode
extrs Extract signed bitfield v3+ units 3
extr Extract unsigned bitfield v3+ units 7
Instruction class:
unsized
Execution time:
1 cycle
Operands:
DST, SRC1, SRC2
Forms:
Form Opcode
R1, R2, I8 c0
R1, R2, I16 e0
R3, R2, R1 ff
Immediates:
zero-extended
Operation:
```int low = SRC2 & 0x1f;
int sizem1 = (SRC2 >> 5 & 0x1f);
uint32_t bf = (SRC1 >> low) & ((2 << sizem1) - 1);
bool fill_bit;
if (op == extr) {
fill_bit = 0;
} else if (op == extrs) {
// depending on the mask is probably a bad idea.
int signbit = (low + sizem1) & 0x1f;
fill_bit = SRC1 >> signbit & 1;
}
if (fill_bit)
bf |= -(2 << sizem1);
DST = bf;
\$flags.s = fill_bit;
\$flags.z = (DST == 0);
```

## Bitfield insertion: ins¶

Inserts a bitfield, which is specified like for `extr/extrs`. Sets no flags.

Instructions:
Name Description Present on Subopcode
ins Insert a bitfield v3+ units b
Instruction class:
unsized
Execution time:
1 cycle
Operands:
DST, SRC1, SRC2
Forms:
Form Opcode
R1, R2, I8 c0
R1, R2, I16 e0
Immediates:
zero-extended.
Operation:
```low = SRC2 & 0x1f;
size = (SRC2 >> 5 & 0x1f) + 1;
if (low + size <= 32) { // nop if bitfield out of bounds - I wouldn't depend on it, though...
DST &= ~(((1 << size) - 1) << low); // clear the current contents of the bitfield
bf = SRC1 & ((1 << size) - 1);
DST |= bf << low;
}
```

## Bitwise operations: and, or, xor¶

Ands, ors, or xors two operands. On Falcon v0, sets no flags. On Falcon v3, sets all flags - `s` and `z` are set as usual, `c` and `o` are always set to 0.

Instructions:
Name Description Subopcode
and Bitwise and 4
or Bitwise or 5
xor Bitwise xor 6
Instruction class:
unsized
Execution time:
1 cycle
Operands:
DST, SRC1, SRC2
Forms:
Form Opcode
R1, R2, I8 c0
R1, R2, I16 e0
R2, R2, I8 f0
R2, R2, I16 f1
R2, R2, R1 fd
R3, R2, R1 ff
Immediates:
zero-extended
Operation:
```if (op == and)
DST = SRC1 & SRC2;
if (op == or)
DST = SRC1 | SRC2;
if (op == xor)
DST = SRC1 ^ SRC2;
if (falcon_version != 0) {
\$flags.c = 0;
\$flags.o = 0;
\$flags.s = S(DST);
\$flags.z = (DST == 0);
}
```

## Bit extraction: xbit¶

Extracts a single bit of a specified register. On Falcon v0, the bit is stored to bit 0 of DST, while other destination bits are unmodified, and no flags are set. On Falcon v3+, the bit is stored to bit 0 of DST, all other bits of DST are set to 0, `s` flag is set to 0, and `z` flag is set iff the extracted bit was 0 (behaving exactly like an `extr` instruction with size 1). In both cases, the bit index is masked off to 5 bits.

Instructions:
Name Description Subopcode - opcodes c0, ff Subopcode - opcodes f0, fe
xbit Extract a bit 8 c
Instruction class:
unsized
Execution time:
1 cycle
Operands:
DST, SRC1, SRC2
Forms:
Form Opcode
R1, R2, I8 c0
R3, R2, R1 ff
R2, \$flags, I8 f0
R1, \$flags, R2 fe
Immediates:
truncated
Operation:
```if (falcon_version == 0) {
DST = DST & ~1 | (SRC1 >> bit & 1);
} else {
DST = SRC1 >> bit & 1;
\$flags.s = 0;
\$flags.z = (DST == 0);
}
```

## Bit manipulation: bset, bclr, btgl¶

Set, clear, or flip a specified bit of a register. The requested bit index is masked off to 5 bits. No flags are set.

Instructions:
Name Description Subopcode - opcodes f0, fd, f9 Subopcode - opcode f4
bset Set a bit 9 31
bclr Clear a bit a 32
btgl Flip a bit b 33
Instruction class:
unsized
Execution time:
1 cycle
Operands:
DST, SRC
Forms:
Form Opcode
R2, I8 f0
R2, R1 fd
\$flags, I8 f4
\$flags, R2 f9
Immediates:
truncated
Operation:
```bit = SRC & 0x1f;
if (op == bset)
DST |= 1 << bit;
else if (op == bclr)
DST &= ~(1 << bit);
else // op == btgl
DST ^= 1 << bit;
```

## Division and remainder: div, mod¶

Does unsigned 32-bit division / modulus. Sets no flags. If a division by 0 is requested, no exception happens - the division result is always `0xffffffff` in this case, and the modulus result is equal to the first source.

Instructions:
Name Description Present on Subopcode
div Divide v3+ units c
mod Take modulus v3+ units d
Instruction class:
unsized
Execution time:
30-33 cycles
Operands:
DST, SRC1, SRC2
Forms:
Form Opcode
R1, R2, I8 c0
R1, R2, I16 e0
R3, R2, R1 ff
Immediates:
zero-extended
Operation:
```if (SRC2 == 0) {
dres = 0xffffffff;
} else {
dres = SRC1 / SRC2;
}
if (op == div)
DST = dres;
else // op == mod
DST = SRC1 - dres * SRC2;
```

## Setting predicates: setp¶

Sets bit #SRC2 in \$flags to bit 0 of SRC1. The bit index is masked off to 5 bits.

Instructions:
Name Description Subopcode
setp Set predicate 8
Instruction class:
unsized
Execution time:
1 cycle
Operands:
SRC1, SRC2
Forms:
Form Opcode
R2, I8 f2
R2, R1 fa
Immediates:
truncated
Operation:
```bit = SRC2 & 0x1f;
\$flags = (\$flags & ~(1 << bit)) | (SRC1 & 1) << bit;
```