▲ Top, ▼ General IOCTLs, ▼ Memory allocation, ▼ Execute QPU code, ▼ Performance counters, ▶ Programming guide
int vcio2 = open("/dev/vcio2", O_RDWR);
Open the vcio2 device for further usage.
Note that all resources acquired by this device are tied to this device handle. So do not close it unless you no longer need the resources.
Device handles of vcio2 cannot be reasonably inherited nor passed to forked process instances. The resources are always tied to the PID that opened the device.
All calls to vcio2 handles are thread-safe.
close(vcio2);
Close the vcio2 device and release all resources.
If QPU code is executed while calling close close will wait until the execution completed or timed out and then discard all memory results and close the handle.
Return the vcio2 API version. The high word is the major version, the low word the minor version. Currently 0x00000003, i.e. 0.3.
int version = ioctl(vcio2, IOCTL_GET_VCIO_VERSION, 0);
It is a good advice to check the compatibility of the driver version before further use to avoid unexpected results. The inline helper function vcio2_version_is_compatible will do the check:
if (!vcio2_version_is_compatible(version))
// show appropriate error
Allocate GPU memory. The memory is continuous in physical address and taken from the reserved GPU memory pool.
typedef struct { union { struct { unsigned int size; unsigned int alignment; unsigned int flags;
} in; struct { unsigned int handle; } out; }; } vcio_mem_allocate;
vcio_mem_allocate buf;
...
int retval = ioctl(vcio2, IOCTL_MEM_ALLOCATE, &buf);
vcio2 keeps track of the allocated memory chunks. As soon as the vcio2 device is closed or the application terminates. The memory is given back to the GPU memory pool. So remember to keep the device open!
Besides doing all steps of memory allocations manually you may also allocate the memory directly by calling mmap with a NULL pointer.
int retval = ioctl(vcio2, IOCTL_MEM_RELEASE, handle);
Release GPU memory. This also unlocks the memory segment if still locked.
uint32_t addr = handle;
int retval = ioctl(vcio2, IOCTL_MEM_LOCK, &addr);
Lock the memory segment at a physical address.
int retval = ioctl(vcio2, IOCTL_MEM_UNLOCK, handle);
Unlock memory segment and release the binding to a physical address.
Note that unlocking memory has the side effect of invalidation of all memory mappings that refer to this segment. vcio removes the corresponding PTEs from your process so you will get a bus error when you try to access a virtual address formerly mapped to this memory block.
Query information about a memory allocation.
typedef struct { unint32_t handle;
unint32_t bus_addr;
void* virt_addr;
unint32_t size; } vcio_mem_query;
vcio_mem_query buf;
...
int retval = ioctl(vcio2, IOCTL_MEM_QUERY, &buf);
All fields in vcio_mem_query are optional on input. Simply leave the unneeded fields zero. The driver will fill all missing values on successful return. At least one of handle, bus_addr or virt_addr should be filled or you will get EINVAL. EINVAL is also returned when the supplied address or handle does not belong to an memory allocation made via the same device file handle.
You may also pass a memory address from within an allocated area. In this case the driver will change the address to the
beginning of the area. This applies to bus_addr and virt_addr as well.
I.e. the driver will never return partial memory segments. But it depends on the kind of the query what is considered a memory
segment. If you ask for a virtual address you may get smaller chunks because virtual address mappings could cover only a part
of an allocated memory segment. In this case the returned bus_addr may not match the start of the returned handle
but will match the returned start of virt_addr instead.
If you specify size on input the entire range from the start address must be within the same memory segment,
otherwise the driver returns EINVAL. This could be used to verify if an address range is valid.
The same applies if you supply multiple fields, e.g. handle and bus_addr. If they do not match you'll get
EINVAL.
To be able to access the GPU memory from the ARM cortex you will need map the memory into you physical address space. Simply use mmap with the vcio2 device handle for this purpose.
uint32_t *mem = mmap(0, size, PROT_READ|PROT_WRITE, MAP_SHARED, vcio2, addr);
vcio2 validates the memory mappings. I.e. you can only map memory that has been previously allocated with the same device handle. Otherwise you get an EACCES error.
Memory mappings cannot be inherited to forked or child processes. vcio2 simply does not support that.
Allocate physical GPU memory with mmap. This will allocate memory, lock it to a physical address and map it into the virtual address space of the current process in one step. You will always get page aligned memory without VC4 L2 cache.
uint32_t* mem = mmap(0, size, PROT_READ|PROT_WRITE, MAP_SHARED, vcio2, 0);or simply
uint32_t bus_address = *mem;
uint32_t* mem = vcio2_malloc(vcio2, size);
uint32_t bus_address = *mem;
The memory allocated this way is released as soon as you call unmap.
Power on/off the GPU.
int retval = ioctl(vcio2, IOCTL_ENABLE_GPU, flag);
The QPU is automatically powered on at IOCTL_EXEC_QPU and automatically turned off when the last process closes the vcio2 device. So there is normally no need to call this IOCTL explicitly.
Execute QPU code.
typedef struct { unsigned int uniforms; unsigned int code; } vcio_exec_qpu_entry;
typedef struct { unsigned int num_qpus; unsigned int control; unsigned int noflush; unsigned int timeout; } vcio_exec_qpu;
vcio_exec_qpu buf;
...
int retval = ioctl(vcio2, IOCTL_EXEC_QPU, &buf);
Although vcio2 does some basic checks to prevent accidental access to invalid memory it cannot check for memory access done by the QPU code. So you have to take care to execute only valid QPU code, otherwise the Raspberry might crash. However, in most cases the Raspi will recover from faults after the timeout and no resources will be lost. So GPU development is significantly relaxed.
While QPU code is executing the Raspian kernel can no longer access the the property channel used for several other purposes, e.g. power management or several firmware calls. Every attempt to do such a function is blocked until the QPU code raises an host interrupt or the timeout elapsed. This is a restriction of the firmware rather than vcio2.
If the QPU is not yet powered on, the power will be turned on automatically before this request. The power will not be turned off afterwards unless the device is closed or you explicitly request it by IOCTL_ENABLE_QPU 0 and of course no other process needs QPU power.
Enable or disable V3D performance counters for this instance.
int retval = ioctl(vcio2, IOCTL_SET_V3D_PERF_COUNT, enabled);
V3D_PERF_COUNT_QPU_CYCLES_IDLE
V3D_PERF_COUNT_QPU_CYCLES_VERTEX_SHADING
V3D_PERF_COUNT_QPU_CYCLES_FRAGMENT_SHADING
V3D_PERF_COUNT_QPU_CYCLES_VALID_INSTRUCTIONS
V3D_PERF_COUNT_QPU_CYCLES_STALLED_TMU
V3D_PERF_COUNT_QPU_CYCLES_STALLED_SCOREBOARD
V3D_PERF_COUNT_QPU_CYCLES_STALLED_VARYINGS
V3D_PERF_COUNT_QPU_INSTRUCTION_CACHE_HITS
V3D_PERF_COUNT_QPU_INSTRUCTION_CACHE_MISSES
V3D_PERF_COUNT_QPU_UNIFORMS_CACHE_HITS
V3D_PERF_COUNT_QPU_UNIFORMS_CACHE_MISSES
V3D_PERF_COUNT_TMU_TEXTURE_QUADS_PROCESSED
V3D_PERF_COUNT_TMU_TEXTURE_CACHE_MISSES
V3D_PERF_COUNT_VPM_CYCLES_STALLED_VDW
V3D_PERF_COUNT_VPM_CYCLES_STALLED_VCD
V3D_PERF_COUNT_L2C_L2_CACHE_HITS
V3D_PERF_COUNT_L2C_L2_CACHE_MISSES
Performance counters are a limited resource of VideoCore IV. No more than 16 counters can be activated at the same time.
Furthermore vcio2 currently does not support switching enabled counters for individual QPU executions of different open driver
instances. I.e. no more than 16 counters can be activated at the same time over all vcio2 users. However, if two instances
request the same counter it will be physically shared. But every instance has it's own set of counter values. They are only
activated when an execution of the own instance is performed. In fact this makes the counter V3D_PERF_COUNT_QPU_CYCLES_IDLE
somewhat useless since it will not count the time between executions.
Get currently activated performance counters of this instance.
uint32_t enabled;
int retval = ioctl(vcio2, IOCTL_GET_V3D_PERF_COUNT, &enabled);
Read all enabled performance counters.
uint32_t counters[16];
int retval = ioctl(vcio2, IOCTL_GET_V3D_PERF_COUNT, &counters);
The counter values are returned in ascending order and disabled counters will not have an empty slot. E.g. if you enabled V3D_PERF_COUNT_QPU_INSTRUCTION_CACHE_HITS|V3D_PERF_COUNT_L2C_L2_CACHE_HITS|V3D_PERF_COUNT_VPM_CYCLES_STALLED_VDW then you will receive exactly 3 values: V3D_PERF_COUNT_QPU_INSTRUCTION_CACHE_HITS in counters[0], V3D_PERF_COUNT_VPM_CYCLES_STALLED_VDW in counters[1] and V3D_PERF_COUNT_L2C_L2_CACHE_HITS in counters[2]. Due to restrictions of VideoCore IV the call will never return more than 16 values.
Reset performance counters of this instance.
int retval = ioctl(vcio2, IOCTL_RESET_V3D_PERF_COUNT, 0);