VC4ASM - `vc4.qinc` include

↑ Top, expression predicated, VPM/VDC setup, bit operations, numerical constants, Broadcom compatibility

This file contains several useful macros for VC4 programming.
Use .include <vc4.qinc> in the source or the command line option -i vc4.qinc to use this file.

Expression predicates

Constants

isConstant(x): Evaluates true if and only if x is a constant expression, i.e. no register, no semaphore and no label.
isLdPE(x) isLdPES(x) isLdPEU(x): Evaluates true if x is a per QPU element constant expression. isLdPES is false when the constant contains negative values, isLdPEU is false when the constant contains values greater than 1.
isSmallImmd(x): Evaluates true if x is a constant expression that fits into a small immediate field.

Registers

Evaluates true if x is a register expression, including semaphore registers.
isRegfileA(x) isRegfileB(x): Evaluates true if x is a register from register file A/B, including peripheral registers.
isAccu(x): Evaluates true if x is an accumulator, i.e. r0..r5.
isReadable(x)isWritable(x): Evaluates true if x is a readable/writable register.
isRotate(x): Evaluates true if x is a register with an vector rotation attribute, e.g. r5<<2.
isSemaphore(x): Evaluates true if x is a semaphore register.

VPM/VCD setup helpers

See Broadcom Videocore IV Architecture Reference Guide for further details.

VPM read/write

vpm_setup(num,stride,dma): Setup VPM read/write for num items starting at dma and incrementing dma by stride after each item.
h_32(y): Macro for dma component of vpm_setup. Start a horizontal 32 bit transfer at position 0,y in VPM, i.e. QPU elements access 16 consecutive horizontal values in the VPM.
h16p(y,h): Macro for dma component of vpm_setup. Start a horizontal 16 bit packed transfer at position 0,y and half word h in VPM.
h16l(y,h): Macro for dma component of vpm_setup. Start a horizontal 16 bit laned transfer at position 0,y and half word h in VPM.
h16p(y,h): Macro for dma component of vpm_setup. Start a horizontal 8 bit packed transfer at position 0,y and byte b in VPM.
h16l(y,h): Macro for dma component of vpm_setup. Start a horizontal 8 bit laned transfer at position 0,y and byte b in VPM.
v_32(y,x): Macro for dma component of vpm_setup. Start a vertical 32 bit transfer at position x,y in VPM, i.e. QPU elements access 16 consecutive vertical values in the VPM. y must be a multiple of 16 in this mode.
h16p(y,h): Macro for dma component of vpm_setup. Start a vertical 16 bit packed transfer at position x,y and half word h in VPM.
h16l(y,h): Macro for dma component of vpm_setup. Start a vertical 16 bit laned transfer at position x,y and half word h in VPM.
h8p(y,h): Macro for dma component of vpm_setup. Start a vertical 8 bit packed transfer at position x,y and byte b in VPM.
h8l(y,h): Macro for dma component of vpm_setup. Start a vertical 8 bit laned transfer at position x,y and byte b in VPM.

VCD DMA write access (VDW)

vdw_setup_0(units,depth,dma): Setup VPM DMA write for units rows of length depth starting at dma.
vdw_setup_1(stride): Setup VPM DMA write memory stride from the last item of a row to the first item of the next row. This function activates block mode, i.e. one row in memory is one row or column in VPM.
vdw_setup_1p(stride): Setup VPM DMA write memory stride from the last item of a row to the first item of the next row. This function activates packed mode.
dma_h32(y,x): Macro for dma component of vdw_setup_0. Start a horizontal 32 bit transfer at position x,y in VPM, i.e. elements in a row are horizontally aligned in VPM. The y component is incremented after each row.
dma_h16p(y,x,h): Macro for dma component of vdw_setup_0. Start a horizontal, packed 16 bit transfer at position x,y half word h in VPM, i.e. elements in a row are horizontally aligned in VPM.
dma_h8p(y,x,b): Macro for dma component of vdw_setup_0. Start a horizontal, packed 8 bit transfer at position x,y byte b in VPM, i.e. elements in a row are horizontally aligned in VPM.
dma_v32(y,x): Macro for dma component of vdw_setup_0. Start a vertical 32 bit transfer at position x,y in VPM, i.e. elements in a row are vertically aligned in VPM. The x component is incremented after each row.
dma_v16p(y,x,h): Macro for dma component of vdw_setup_0. Start a vertical, packed 16 bit transfer at position x,y half word h in VPM, i.e. elements in a row are vertically aligned in VPM.
dma_v8p(y,x,b): Macro for dma component of vdw_setup_0. Start a vertical, packed 8 bit transfer at position x,y byte b in VPM, i.e. elements in a row are vertically aligned in VPM.

VCD DMA read access (VDR)

vdr_setup_0(mpitch,rowlen,nrows,dma): Setup VPM DMA read for nrows rows of length rowlen starting at dma. mpitch is the distance of two rows in memory. It must either be a power of 2 or 0 to indicate that vdw_setup_1 has to be used.
vdr_setup_1(stride): Setup VPM DMA read memory stride from the last item of a row to the first item of the next row.
vdr_h32(vpitch,y,x): Macro for dma component of vdr_setup_0. Start a horizontal 32 bit transfer at position x,y in VPM, i.e. elements in a row are horizontally aligned in VPM. The y component is incremented byvpitch after each row.
vdr_v32(vpitch,y,x): Macro for dma component of vdr_setup_0. Start a vertical 32 bit transfer at position x,y in VPM, i.e. elements in a row are vertically aligned in VPM. The y component is incremented byvpitch after each row.
further modes need to be defined...

Bit operations

countBits(x): Number of set bits in x.
reverseBits(x,n): Rightmost n bits of x in reversed order.
reverseBits4(x) reverseBits8(x) reverseBits16(x) reverseBits32(x) reverseBits64(x): Shorthand for reverseBits(x,4/8/16/32/64), also faster.
ilog2(x): Index of the highest set bit in x, i.e. 0 → -1, 1 → 0, 2 → 1, 3 → 1, 4 → 2 ...

Numerical constants

Name	Value	Description
`M_E`	2.7182818284590452354	e (Euler number)
`M_LOG2E`	1.4426950408889634074	log₂ e
`M_LOG10E`	0.43429448190325182765	log₁₀ e
`M_PI`	3.14159265358979323846	π
`M_2PI`	6.28318530717958647693	2π
`M_PI_2`	1.57079632679489661923	^π/₂
`M_PI_4`	0.78539816339744830962	^π/₄
`M_1_PI`	0.31830988618379067154	¹/_π
`M_2_PI`	0.63661977236758134308	²/_π
`M_2_SQRTPI`	1.12837916709551257390	²/_√π
`M_SQRT2`	1.41421356237309504880	√2
`M_SQRT1_2`	0.70710678118654752440	¹/_√2
`M_NAN`	NaN	not a number
`M_INF`	Inf	infinity

Broadcom compatibility helpers

`sacq(i) srel(i)`

Alternative way to declare semaphore register sacq0..15 and srel0..15.

VC4ASM - vc4.qinc include