VC4ASM - Assembler directives

↑ Top .align .assert .back .bit .byte .clone .code .const .double .else .elseif .endb .endf .endfor .endif .endm .endr .equ .float .foreach .func .global .if .ifset .include .int .lconst .long .lset .lunset .macro .rep .rodata .set .short .text .unset

.const .set .lconst .lset - define a constant or single line function

.const identifier, expression
.set identifier, expression
.lconst identifier, expression
.lset identifier, expression
.const identifier(argument1, argument2 ...) expression .set identifier(argument1, argument2 ...) expression .lconst identifier(argument1, argument2 ...) expression .lset identifier(argument1, argument2 ...) expression
identifier
This identifier is assigned, i.e. from this line on. In case of .lset and .lconst the assignment is only preserved in the current context, e.g. the current include or macro.
argument1, argument2 ...
Function arguments. Function arguments must be expressions of any type including registers, but they cannot be incomplete expressions like operators or identifier fragments.
These are identifiers that can be used in the expression body like constants wherever an expression is allowed.
expression
This expression is assigned to identifier. The expression can be of any type including registers but it must evaluate at the time .set is parsed unless you have an argument list which causes delayed evaluation. You cannot assign incomplete expressions like bare operators.

Constants can be used wherever an expression is allowed.

There is an important difference between constants with arguments and those without. Constants without arguments are evaluated at the time of the definition, i.e. if they use other constants in the expression body the values are taken at the context of definition.
As soon as the argument braces are used this behavior changes. These definitions are evaluated at the time and from the context of invocation.

The assignment of .const and .lconst is final, i.e. a second assignment to the same identifier is an error. But final assignments neither prevent from shadowing in nested contexts nor from .unset.

Further aliases for .set are: .define, .equ.

Example

.const ra_link_0, ra0

.set vpm_setup(num, stride, dma) (num & 0xf) << 20 | (stride & 0x3f) << 12 | (dma & 0xfff) .set v32(y, x) 0x200 | (y & 0x30) | (x & 0xf)

mov vw_setup, vpm_setup(1, 1, v32(0,0))

.unset .lunset - revoke a constant definition

.unset identifier
.lunset identifier
identifier
The assignment of this identifier is undone. This normally reverts the identifier to undefined state. However, if a local version of the identifier currently shadows another value of the same identifier the previous state before the shadowing is restored.

.lunset only removes the identifier from the current local context.

.local - enter a local block

.local
#...
.endloc

.local/.endloc has no direct effect on the generated code but it creates a local context for symbols that can be set by .lset. All local symbols go out of scope at the end of the block. The behavior is similar to { } in C and similar languages.

Example

.local
.lset count, ra2
mov count, unif
:.1
# some loop body here

sub.setf count, count, 1
brr.allnz -, r:.1
nop
nop
nop
.endloc

.func - define a multi line user function

.func identifier(argument1, argument2 ...)
 body
.endf
identifier
Name of the function.
argument1, argument2 ...
Function arguments. These are identifiers that can be used in the function body like constants wherever an expression is allowed.
body
The function body may only contain a single expression like .set(). But you may use .if, .assert and also .set etc. to decide what expression should evaluate.

Functions are similar to constants with parameters but their body is multi line. This has the side effect that you can use .if or .lset to do the calculation of the result.

Example

.func vpm_setup(num, stride, dma)
.assert num <= 16 && num > 0
.assert stride <= 64 && stride > 0
.assert (dma & ~0xfff) == 0
(num & 0xf) << 20 | (stride & 0x3f) << 12 | dma
.endf
.func v32(y, x)
.assert (y & ~0x30) == 0
.assert (x & ~0xf) == 0
 0x200 | y | x
.endf

mov vw_setup, vpm_setup(1, 1, v32(0,0))

The example above provides a checked version of the example to .set.

.macro - define a macro

.macro identifier, argument1, argument2 ...
your code
...
.endm
identifier
Name of the macro.
argument1, argument2 ...
Macro arguments. These are identifiers that can be used in the macro body like constants wherever an expression is allowed.

A macro insert a block of code at the point where it is invoked in the code. The code might depend on arguments. In contrast to function macros may emit code. But they also might contain other directives.

The macro arguments must be expressions of any type including registers, but they cannot be incomplete expressions like operators or unresolved identifiers. The arguments are evaluated at the time of macro invocation rather than the time where they are used in the macro body. So they cannot depend on code in the macro body.

Example

Header of a subroutine. The entry point address is assigned to a register,

.macro proc, rx_ptr, label
    brr rx_ptr, label
    nop
    nop
    nop
.endm

proc ra23, r:1f
subroutine body
...
:1

.rep - repeat a code block multiple times

.rep identifier, count
your_code

...
.endr
identifier
This identifier receives the loop count starting at 0 and up to count-1 for the last loop cycle.
At the end of .endr the identifier returns to to its previous value.
count
Number of loop cycles. count must be ≥ 0 and evaluate at the time .rep is parsed.

Example

Acquire all 15 QPU semaphores.

.rep i 15
sacq i
.endr

.foreach - repeat code for a set of expressions

.foreach identifier, expr1, expr2, ...
your_code

...
.endfor
identifier
This identifier receives the expressions.
At the end of .endfor the identifier returns to its previous value.
expr...
Expressions to assign to identifier. The loop is assembled once for each expression.

Example

Clear a bunch of registers.

.foreach reg ra0, rb1, ra2, rb2, ra4, rb4
 ;mov reg, 0;
.endr

.back - emit code before the last few instructions

.back count
your_code

...
.endb
count
Number of instructions to go back. Must be less or equal to 5.

The code between .back and .endb is inserted before the last count instructions rather than at the current location. This can be quite useful when dealing with macros and branch instruction. But be aware that there might be dependencies, e.g. the inserted code might modify registers or flags that are used by last instructions. vc4asm will not check for that.

Example

some_macro
.back 3
brr -, r:loop
.endb

The code above will insert the branch instruction before the last three instructions emitted by the macro some_macro or even code before.

.clone - copy instructions

.clone label, count
label
Copy instructions starting at this label.
count
Number of instructions to copy.

.clone inserts copies count instructions starting at label and inserts them at the current location. It is intended to optimize branch instructions that cannot be placed earlier. The concept is to copy the first few instruction of a branch target instead of using nop.

You should not clone branch instructions or immediate values from label differences. This is not reliable with a 2 pass assembler.

Example

brr -, r:target + 3*8
.clone :target, 3

This copies the first 3 instructions of :target after the branch and branches after the 3rd instruction of :target instead.

.if - conditional compile

.if condition
your code

...
.elseif condition
another code

...
.else
alternate code
...
.endif
condition
Expression to check. The expression must be constant and of type integer or float and is checked to be non-zero.

.ifset - check whether a constant is defined

.ifset identifier
your code

...
.else
alternate code
...
.endif
identifier
Check whether an identifier is defined within the current context. This will also check for macro arguments.

.assert - check for static condition

.assert condition
condition
Expression to check. The expression must be constant and of type integer or float and is checked to be non-zero.

.include - include another file

.include "filename"
.include <filename>
filename
Name of another assembler file to include. The path may be relative or absolute.
If filename is enclosed in angle brackets <...> the list of include paths is searched first for the file. See option -I.

An included file denotes a local context. Definitions that are local like .lset are only valid within the included file and sub includes.

.byte .short .int .long .bit .float .half .double - emit constants inside code blocks

.byte constant, constant ...   # 8 bit integers
.short constant, constant ... # 16 bit integers
.int constant, constant ... # 32 bit integers
.long constant, constant ... # 64 bit integers
.bit constant, constant ... # 1 bit boolean
.float constant, constant ... # 32 bit single precision float
.half constant, constant ... # 16 bit half precision float
.double constant, constant ... # 64 bit double precision float (not directly supported by Videocore IV)
constant
Expression that evaluates to an integer or floating point constant.

The above directive directly place constants in the code. This might be used to load constants by using the return value of a branch instruction as address. Or you may emit opcodes that are unsupported by vc4asm.

The result is always stored in big endian format.

You should ensure that the constants will not accidentally be executed unless they contain valid Videocore IV instructions.

In general you should prefer uniforms over constants in the code because they are easier to access. In most cases it is also more efficient to use ldi. But as soon as you add some offsets to the address that depends on the QPU element number or something like that the latter two are no longer an option.

Example

Load two float constants into r1, r2...

    brr r0, r:1f
mov t0s, r0
add r0, r0, 4; ldtmu0 # load first floating point value after :0 mov r1, r4; mov t0s, r0
:0 .float 3.14159, 2.71828, ...
:1 add r0, r0, 4; ldtmu0 # load second floating point value after :0 mov r2, r4 mov t0s, r0

.align - ensure memory alignment

.align bytes
.align bytes, base

Force the current instruction pointer to be aligned to a byte boundary with a specified power of 2.

bytes
Expression that evaluates to an integer constant with the number of bytes to align. The value must be a power of two and ≤ 64. .align 0 is a no-op.
base
Optional offset. If specified the alignment does not use 0 as base offset - in fact the start of the current binary output - but the given base instead.
base could either be an integer constant. In this case the location after .align will be aligned to bytes with residual base % bytes.
Or base could be a label. In this case the alignment is relative to the label location, i.e. location after .align - label % bytes == 0.
Note that the label must neither be a forward reference nor should the integer constant depend on a forward reference, because .align cannot do its job in this case.

.align simply uses zeros for padding than nop instructions. So do not use it inside executable code. It is intended for data or sub function alignment only.

.global - export symbol

::label
.global :label
.global symbol, value

Export a label or some other constant as global linker symbol in ELF format. .global has no effect on other output formats.
It is an error to assign different values to the same global symbol.

label
Name of a label to export. This will create a global symbol which address points to the code at label after linkage.
.global :label is the short form of .global label, :label, and ::label is the short form of .global :label including the label definition.
symbol
Name of the global symbol to define. The symbol receives the following value.
value
Value of the symbol. This can be any expression that evaluates to a label or a 32 bit integer constant.
If it is a label the exported symbol will point to the code at the label after linkage.
If it is an integer constant the symbol receives the value of the expression as absolute value, i.e. SHN_ABS.

Example

.global code_start
.global code_size, :end - :start
:start
# some code
:end

Use the ELF output to create a object file (.o).
And in the C language write:

extern char code_start[];
extern char code_size[];
#define code_length ((size_t)&code_size)

memcpy(qpu_memory_buffer, code_start, code_length);

Note that this is just an example. The start, the size and the end of the generated code block are automatically exported as symbols as mentioned here.

.code, .rodata - segment definitions

.code
.text

.rodata

Declares the following instructions or data directives as executable code (.code and .text) or data section (.data) respectively. While this has no effect to the generated output it tells  the validator not to validate immediate data embedded in the code.

If you specify neither of this directives vc4asm will automatically detect code or data. I.e. everything emitted by any GPU instruction will be marked as code and everything emitted by a data directive like .int will be treated as data. Normally this should hit the nail on the head and you need not to worry about this.