↑ Top .align .assert .back .bit .byte .clone .code .const .double .else .elseif .endb .endf .endfor .endif .endm .endr .equ .float .foreach .func .global .if .ifset .include .int .lconst .long .lset .lunset .macro .rep .rodata .set .short .text .unset
.const identifier, expression .set identifier, expression .lconst identifier, expression .lset identifier, expression
.const identifier(argument1, argument2 ...) expression .set identifier(argument1, argument2 ...) expression .lconst identifier(argument1, argument2 ...) expression .lset identifier(argument1, argument2 ...) expression
Constants can be used wherever an expression is allowed.
There is an important difference between constants with arguments and
those without. Constants without arguments are evaluated at the time of
the definition, i.e. if they use other constants in the expression body
the values are taken at the context of definition.
As soon as the argument braces are used this behavior changes. These
definitions are evaluated at the time and from the context of invocation.
The assignment of .const and .lconst is final, i.e. a second assignment to the same identifier is an error. But final assignments neither prevent from shadowing in nested contexts nor from .unset.
Further aliases for .set are: .define, .equ.
.const ra_link_0, ra0
.set vpm_setup(num, stride, dma) (num & 0xf) << 20 | (stride & 0x3f) << 12 | (dma & 0xfff) .set v32(y, x) 0x200 | (y & 0x30) | (x & 0xf)
mov vw_setup, vpm_setup(1, 1, v32(0,0))
.unset identifier
.lunset identifier
.lunset only removes the identifier from the current local context.
.local
#...
.endloc
.local/.endloc has no direct effect on the generated code but it creates a local context for symbols that can be set by .lset. All local symbols go out of scope at the end of the block. The behavior is similar to { } in C and similar languages.
.local
.lset count, ra2
mov count, unif
:.1
# some loop body here
sub.setf count, count, 1
brr.allnz -, r:.1
nop
nop
nop
.endloc
.func identifier(argument1, argument2 ...)
body
.endf
Functions are similar to constants with parameters but their body is multi line. This has the side effect that you can use .if or .lset to do the calculation of the result.
.func vpm_setup(num, stride, dma)
.assert num <= 16 && num > 0
.assert stride <= 64 && stride > 0
.assert (dma & ~0xfff) == 0
(num & 0xf) << 20 | (stride & 0x3f) << 12 | dma
.endf
.func v32(y, x)
.assert (y & ~0x30) == 0
.assert (x & ~0xf) == 0
0x200 | y | x
.endf
mov vw_setup, vpm_setup(1, 1, v32(0,0))
The example above provides a checked version of the example to .set.
.macro identifier, argument1, argument2 ...
your code
...
.endm
A macro insert a block of code at the point where it is invoked in the code. The code might depend on arguments. In contrast to function macros may emit code. But they also might contain other directives.
The macro arguments must be expressions of any type including registers, but they cannot be incomplete expressions like operators or unresolved identifiers. The arguments are evaluated at the time of macro invocation rather than the time where they are used in the macro body. So they cannot depend on code in the macro body.
Header of a subroutine. The entry point address is assigned to a register,
.macro proc, rx_ptr, label brr rx_ptr, label nop nop nop .endm
proc ra23, r:1f
subroutine body
...
:1
.rep identifier, count
your_code
...
.endr
Acquire all 15 QPU semaphores.
.rep i 15
sacq i
.endr
.foreach identifier, expr1, expr2, ...
your_code
...
.endfor
Clear a bunch of registers.
.foreach reg ra0, rb1, ra2, rb2, ra4, rb4
;mov reg, 0;
.endr
.back count
your_code
...
.endb
The code between .back and .endb is inserted before the last count instructions rather than at the current location. This can be quite useful when dealing with macros and branch instruction. But be aware that there might be dependencies, e.g. the inserted code might modify registers or flags that are used by last instructions. vc4asm will not check for that.
some_macro
.back 3
brr -, r:loop
.endb
The code above will insert the branch instruction before the last three instructions emitted by the macro some_macro or even code before.
.clone label, count
.clone inserts copies count instructions starting at label and inserts them at the current location. It is intended to optimize branch instructions that cannot be placed earlier. The concept is to copy the first few instruction of a branch target instead of using nop.
You should not clone branch instructions or immediate values from label differences. This is not reliable with a 2 pass assembler.
brr -, r:target + 3*8
.clone :target, 3
This copies the first 3 instructions of :target after the
branch and branches after the 3
.if condition
your code
...
.elseif condition
another code
...
.else
alternate code
...
.endif
.ifset identifier
your code
...
.else
alternate code
...
.endif
.assert condition
.include "filename"
.include <filename>
An included file denotes a local context. Definitions that are local like .lset are only valid within the included file and sub includes.
.byte constant, constant ... # 8 bit integers
.short constant, constant ... # 16 bit integers
.int constant, constant ... # 32 bit integers
.long constant, constant ... # 64 bit integers
.bit constant, constant ... # 1 bit boolean
.float constant, constant ... # 32 bit single precision float
.half constant, constant ... # 16 bit half precision float
.double constant, constant ... # 64 bit double precision float (not directly supported by Videocore IV)
The above directive directly place constants in the code. This might be used to load constants by using the return value of a branch instruction as address. Or you may emit opcodes that are unsupported by vc4asm.
The result is always stored in big endian format.
You should ensure that the constants will not accidentally be executed unless they contain valid Videocore IV instructions.
In general you should prefer uniforms over constants in the code because they are easier to access. In most cases it is also more efficient to use ldi. But as soon as you add some offsets to the address that depends on the QPU element number or something like that the latter two are no longer an option.
Load two float constants into r1, r2...
brr r0, r:1f
mov t0s, r0
add r0, r0, 4; ldtmu0 # load first floating point value after :0 mov r1, r4; mov t0s, r0
:0 .float 3.14159, 2.71828, ...
:1 add r0, r0, 4; ldtmu0 # load second floating point value after :0 mov r2, r4 mov t0s, r0
.align bytes
.align bytes, base
Force the current instruction pointer to be aligned to a byte boundary with a specified power of 2.
.align simply uses zeros for padding than nop instructions. So do not use it inside executable code. It is intended for data or sub function alignment only.
::label
.global :label
.global symbol, value
Export a label or some other constant as global linker symbol in ELF
format. .global has no effect on other output formats.
It is an error to assign different values to the same global symbol.
.global code_start
.global code_size, :end - :start
:start
# some code
:end
Use the ELF output to create a object file (.o).
And in the C language write:
extern char code_start[];
extern char code_size[];
#define code_length ((size_t)&code_size)
memcpy(qpu_memory_buffer, code_start, code_length);
Note that this is just an example. The start, the size and the end of the generated code block are automatically exported as symbols as mentioned here.
.code
.text
.rodata
Declares the following instructions or data directives as executable code (.code and .text) or data section (.data) respectively. While this has no effect to the generated output it tells the validator not to validate immediate data embedded in the code.
If you specify neither of this directives vc4asm will automatically detect code or data. I.e. everything emitted by any GPU instruction will be marked as code and everything emitted by a data directive like .int will be treated as data. Normally this should hit the nail on the head and you need not to worry about this.