ATTILA Programming Reference Public

From AttilaWiki

Jump to:navigation, search

Contents

Programming Model

The ATTILA GPU is programmed using a slightly different model than the one used on modern GPUs. The reason for the difference is both historical (that's how we implemented the first versions of the simulator) and due to it's nature as a simulated GPU.

The simulator is trace driven and the communication between the software layer (API + driver) and the GPU uses a push model. What we call AGP Transactions are pushed from the API library and GPU driver to the ATTILA simulator or emulator.

We call AGP Transaction to a block of data that is transmitted from host memory to the ATTILA Command Processor or ATTILA memory. The name AGP comes from historical reasons, the name could be changed now to PCIe Transactions, but signals that the block of data comes from outside the GPU. However there is no attempt to model the actual protocols involved in the old AGP or current PCIe bus in the PC computer architecture. Our AGP Transaction are not really bus protocol transactions in any form. The actual equivalence of AGP Transactions in a real GPU comes from the data blocks that is stored in ring buffers (command buffers) stored in the graphics memory aperture and that are used to communicate, through DMA and/or push/pop models, the host with the GPU.

ATTILA implements three basic types of AGP Transactions: memory, commands and register updates.

The memory transactions are used to simulate the transferences from host to GPU memory. In a real architecture those transferences may be use different methods and protocols, usually involving a DMA engine, but ATTILA just accounts, in some cases, for the cycles required due to the limited bandwidth between host and GPU to transmit the data. This simple bandwidth model is implemented by the ATTILA Command Processor (which we could consider models the DMA engine and AGP/PCIe interface for host-GPU transferences). A more detailed model has not been implemented nor is planned.

The command and register update transactions are used to program the ATTILA GPU through it's Command Processor. The ATTILA Command Processor is a simple state based machine that tracks the current state of the rendering pipeline and coordinates the different stages of the rendering pipeline. Programming ATTILA is divided in two phases: first the proper state is defined by updating the required registers and then a command issue to perform some operation. In this programming model commands are just triggers with no associated data. The state required for a given commmand or operation must be already defined in the corresponding GPU registers.

There is a fourth type of AGP Transaction that is used for simulation control and debug purposes: events. Events are used to signal different transitions and states and used for example to mark the start and end of a frame to account the number of cycles spent rendering it disregarding initializations. In a sense the events could be considered as an equivalent to hardware performance counters in a real GPU.

AGP Transactions

AGP_WRITE

Name
AGP_WRITE
Description
Write data into GPU memory or in the GPU system memory region.
Parameters
Name Type Description
Address uint32 Address in GPU memory or the system memory region memory mapped to the GPU where to write the data
Data data array The data to write
Size uint32 Size in bytes of the data to write
Sync bool Wait for all previous commands to finish before processing this command
Notes

AGP_READ

Name
AGP_READ
Description
Read data from GPU memory or the GPU system memory region.
Parameters
Name Type Description
Address uint32 Address in GPU memory or the system memory region memory mapped to the GPU from where to read the data
Data data array A buffer where to store the read data
Size uint32 Size in bytes of the data to read
Notes
Not implemented by the simulator.

AGP_PRELOAD

Name
AGP_PRELOAD
Description
Write data into GPU memory or the system memory region mapped for the GPU with no bandwidth cost.
Parameters
Name Type Description
Address uint32 Address in GPU memory or the system memory region memory mapped to the GPU where to write the data
Data data array A buffer with the data to write
Size uint32 Size in bytes of the data to write
Notes
This transaction is used to load data before starting the actual simulation of the trace, for example when skipping the first N frames of a frame, and has zero bandwidth cost and very small cycle cost (the transaction requires a single simulation cycle in the Command Processor).

AGP_REG_WRITE

Name
AGP_REG_WRITE
Description
Write a GPU register
Parameters
Name Type Description
Register GPURegId GPU register to write
Index uint32 For array GPU registers the index of the GPU register element to write
Data GPURegData Data to write into the register. Depends on the register, can be up to 16 bytes
Notes

AGP_REG_READ

Name
AGP_REG_READ
Description
Read GPU register
Parameters
Name Type Description
Register GPURegId GPU register to read
Index uint32 For array GPU registers the index of the GPU register element to read
Data GPURegData Data read from the register. Type depends on the register, can be up to 16 bytes
Notes
Not implemented in the simulator.

AGP_COMMAND

Name
AGP_COMMAND
Description
Issue command to the GPU
Parameters
Name Type Description
Commad CommandId The GPU command to issue
Notes

AGP_INIT_END

Name
AGP_INIT_END
Description
Marks the end of the initialization phase
Parameters
Notes
Used as a hint to the simulator when using AGP Transaction to mark start of the actual frame simulation.

AGP_EVENT

Name
AGP_EVENT
Description
Signal an event to the GPU Command processor
Parameters
Name Type Description
Event EventID The event signaled to the GPU command processor
Message String A message associated with the event signaled.
Notes
Used to give hints and trigger special actions in the simulator. For example it's used to mark the start and end of a frame and report the number of simulated cycles spent rendering the frame.

Commands

GPU Commands
Name Description Notes
GPU_RESET Reset the GPU state
GPU_DRAW Draw call. Render a primitive batch using the current state
GPU_SWAPBUFFERS Swap the color buffer front and back buffers The simulator outputs the rendered frame as an image file
GPU_BLIT Start a blit operation using the current state
GPU_CLEARBUFFERS Clear the current color, z and stencil buffers NOT IMPLEMENTED in the simulator
GPU_CLEARZBUFFER Clear the current z buffer NOT IMPLEMENTED in the simulator
GPU_CLEARZSTENCILBUFFER Clear the current z stencil buffer Fast clear so no data is actually written and doesn't work for z stencil buffers without compression support
GPU_CLEARCOLORBUFFER Clear the current color back buffer Fast clear so no data is actually written and doesn't work for color buffers without compression support
GPU_LOAD_VERTEX_PROGRAM Load a vertex program to the shader instruction cache/memory
GPU_LOAD_FRARGMENT_PROGRAM Load a fragment program to the shader instruction cache/memory
GPU_LOAD_SHADER_PROGRAM Load a shader program to the shader instruction cache/memory
GPU_FLUSHZSTENCIL Flush the z stencil caches
GPU_FLUSHCOLOR Flush the color caches
GPU_RESET_COLOR_STATE Resets the state of the color blocks stored in the color caches to uncompressed state
GPU_SAVE_COLOR_STATE Saves the state of the color blocks stored in the color caches into a memory buffer
GPU_RESTORE_COLOR_STATE Restores the state of the color blocks stored in the color caches from a memory buffer
GPU_RESET_ZSTENCIL_STATE Resets the state of the z stencil blocks stored in the z stencil caches to uncompressed state Clears the Hierarchical Z buffer
GPU_SAVE_ZSTENCIL_STATE Saves the state of the z stencil blocks stored in the z stencil caches into a memory buffer
GPU_RESTORE_ZSTENCIL_STATE Restores the state of the z stencil blocks stored in the z stencil caches from a memory buffer

Registers

General Registers

GPU_STATUS

Name
GPU_STATUS
Type
GPUStatus
Description
Stores the current GPU state.
Stages Affected
Command Processor
Notes
Read only. NOT IMPLEMENTED in the simulator.

GPU_MEMORY

Name
GPU_MEMORY
Type
uint32
Description
Size of the GPU local memory in megabytes.
Stages Affected
Command Processor
Notes
Read only. NOT IMPLEMENTED in the simulator.

GPU_TEXTURE_MEM_ADDR

Name
GPU_TEXTURE_MEM_ADDR
Type
uint32
Description
Base address for texture data in the GPU 32-bit memory space.
Stages Affected
Notes
Deprecated, should be set to default value (0).

GPU_PROGRAM_MEM_ADDR

Name
GPU_PROGRAM_MEM_ADDR
Type
uint32
Description
Base address for shader program data in the GPU 32-bit memory space.
Stages Affected
Notes
Deprecated, should be set to default value (0).

GPU_MCV2_2ND_INTERLEAVING_START_ADDR

Name
GPU_MCV2_2ND_INTERLEAVING_START_ADDR
Type
uint32
Description
Base address in the GPU 32-bit memory space for the memory region that uses the second memory interleaving configured in the Memory Controller.
Stages Affected
Memory Controller
Notes
This register is used by MemoryControllerV2 and not used by the legacy memory controller model.


GPU_SHADER_PROGRAM_ADDRESS

Name
GPU_SHADER_PROGRAM_ADDRESS
Type
uint32[SHADER_TARGETS]
Description
Address in the GPU 32-bit memory space from where to load the shader shader
Stages Affected
Command Processor
Notes
This register is part of the new interface for an unified shader instruction memory and it's not yet fully used by the software stack.

This register is used by the GPU_LOAD_SHADER_PROGRAM command as the source address for the shader program to load.

Shader Targets
Name Code
VERTEX_TARGET 0
FRAGMENT_TARGET 1
TRIANGLE_TARGET 2
MICROTRIANGLE_TARGET 3
MAX_SHADER_TARGETS 4

GPU_SHADER_PROGRAM_SIZE

Name
GPU_SHADER_PROGRAM_SIZE
Type
uint32[SHADER_TARGETS]
Description
Size in bytes of the shader program to load
Stages Affected
Command Processor, Shader
Notes
This register is part of the new interface for an unified shader instruction memory and it's not yet fully used by the software stack.

This register is used by the GPU_LOAD_SHADER_PROGRAM command as the size in bytes of the shader program to load. See GPU_SHADER_PROGRAM_ADDRESS for the available shader targets.

GPU_SHADER_PROGRAM_LOAD_PC

Name
GPU_SHADER_PROGRAM_LOAD_PC
Type
uint32[SHADER_TARGETS]
Description
Address in the shader instruction memory where to load the shader program
Stages Affected
Command Processor, Shader
Notes

This register is part of the new interface for an unified shader instruction memory and it's not yet fully used by the software stack. The valid values for this register are in the range [0, MAX_SHADER_INSTRUCTIONS-1]. This register is used by the GPU_LOAD_SHADER_PROGRAM command as the address in the shader instruction memory where to load the shader program. See GPU_SHADER_PROGRAM_ADDRESS for the available shader targets.

GPU_SHADER_PROGRAM_PC

Name
GPU_SHADER_PROGRAM_PC
Type
uint32[SHADER_TARGETS]
Description
Start address for the shader program used in the next draw call
Stages Affected
Command Processor, Shader
Notes

This register is part of the new interface for an unified shader instruction memory and it's not yet fully used by the software stack. See GPU_SHADER_PROGRAM_ADDRESS for the available shader targets.

GPU_SHADER_THREAD_RESOURCES

Name
GPU_SHADER_THREAD_RESOURCES
Type
uint32
Description
Number of thread resources (registers) required per vertex shader thread
Stages Affected
Shader
Notes

This register is part of the new interface for an unified shader instruction memory and it's not yet fully used by the software stack. The value assigned to this register limits the maximum number of threads on the fly. See GPU_SHADER_PROGRAM_ADDRESS for the available shader targets.

Streamer Registers

GPU_VERTEX_ATTRIBUTE_MAP

Name
GPU_VERTEX_ATTRIBUTE_MAP
Type
uint32[MAX_VERTEX_ATTRIBUTES]
Description
Maps vertex attributes to streams.
Stages Affected
Streamer, Texture Unit
Notes
The valid values for this register elements are [0, MAX_STREAM_BUFFERS-1] and ST_INACTIVE_ATTRIBUTE (255). The value ST_INACTIVE_ATTRIBUTE (255) is used to disable the vertex attribute. When an attribute is disabled in the Streamer the value loaded in the vertex shaders is the default value defined for the attribute.

GPU_VERTEX_ATTRIBUTE_DEFAULT_VALUE

Name
GPU_VERTEX_ATTRIBUTE_DEFAULT_VALUE
Type
quadfloat[MAX_VERTEX_ATTRIBUTES]
Description
Default value for the vertex attribute
Stages Affected
Streamer, Texture Unit
Notes
The vertex attribute default value is used when the attribute is defined as inactive.

GPU_STREAM_ADDRESS

Name
GPU_STREAM_ADDRESS
Type
uint32[MAX_STREAM_BUFFERS]
Description
Base address of the data stream
Stages Affected
Streamer, Texture Unit
Notes

GPU_STREAM_STRIDE

Name
GPU_STREAM_STRIDE
Type
uint32[MAX_STREAM_BUFFERS]
Description
Stride in bytes for elements in the data stream
Stages Affected
Streamer, Texture Unit
Notes

GPU_STREAM_DATA

Name
GPU_STREAM_DATA
Type
StreamData[MAX_STREAM_BUFFERS]
Description
Format of the elements in the data stream
Stages Affected
Streamer, Texture Unit
Notes
Stream Data Formats
Format Size (Bytes) Description
SD_UNORM8 1 8-bit unsigned normalized [0, 255] -> [0.0, 1.0]
SD_SNORM8 1 8-bit signed normalized [0, 255] -> [-1.0, 1.0]
SD_UNORM16 2 16-bit unsigned normalized [0, 65535] -> [0.0, 1.0]
SD_SNORM16 2 16-bit signed normalized [0, 65535] -> [-1.0, 1.0]
SD_UNORM32 4 32-bit unsigned normalized [0, 4294967295] -> [0.0, 1.0]
SD_SNORM32 4 32-bit signed normalized [0, 4294967295] -> [-1.0, 1.0]
SD_FLOAT16 2 16-bit float point
SD_FLOAT32 4 32-bit float point
SD_UINT8 1 8-bit unsigned integer
SD_SINT8 1 8-bit signed integer
SD_UINT16 2 16-bit unsigned integer
SD_SINT16 2 16-bit signed integer
SD_UINT32 4 32-bit unsigned integer
SD_SINT32 4 32-bit signed integer

GPU_STREAM_ELEMENTS

Name
GPU_STREAM_ELEMENTS
Type
uint32[MAX_STREAM_BUFFERS]
Description
Number of elements per attribute
Stages Affected
Streamer, Texture Unit
Notes
Valid values are 1 to 4.

GPU_STREAM_FREQUENCY

Name
GPU_STREAM_FREQUENCY
Type
uint32[MAX_STREAM_BUFFERS]
Description
Defines the frequency for sampling the stream
Stages Affected
Streamer
Notes
A value 0 means that the stream is sampled per index/vertex. A value greater than 0 specifies that the stream is sampled every N instances.


GPU_D3D9_COLOR_STREAM

Name
GPU_D3D9_COLOR_STREAM
Type
bool[MAX_STREAM_BUFFERS]
Description
Defines D3D9 color ordering for the attribute elements
Stages Affected
Streamer, Texture Unit
Notes
By default this register is set to FALSE and components are read in little-endian order: from memory address low address to memory high address element 0 to 3. When this register is set to TRUE the order in memory changes to first element 3 (alpha) and then elements 2 to 0 (BGR).


GPU_STREAM_START

Name
GPU_STREAM_START
Type
uint32
Description
Defines the first index for the next draw command
Stages Affected
Streamer
Notes
For indexed draw calls this register is the offset to the first index to read from the index stream.

For non-indexed draw calls this is the value of the first index.

GPU_STREAM_COUNT

Name
GPU_STREAM_COUNT
Type
uint32
Description
Number of indices/vertices to process in the next draw command
Stages Affected
Streamer, Primitive Assembly
Notes

GPU_STREAM_INSTANCES

Name
GPU_STREAM_INSTANCES
Type
uint32
Description
Defines the number of instances of the next geometry batch will be rendered in the next draw call
Stages Affected
Streamer, Primitive Assembly
Notes

GPU_INDEX_MODE

Name
GPU_INDEX_MODE
Type
bool
Description
Defines if the next draw call will be indexed or non-indexed
Stages Affected
Streamer
Notes

GPU_INDEX_STREAM

Name
GPU_INDEX_STREAM
Type
int32
Description
Defines if the stream from which indices are read
Stages Affected
Streamer
Notes
The valid values for this register are [0, MAX_STREAM_BUFFERS-1].

Implementation detail: The value of this register must be set before setting the GPU_STREAM_ADDRESS and GPU_STREAM_DATA registers for the stream, otherwise the stream information isn't sent to the stage that fetches indices.

GPU_ATTRIBUTE_LOAD_BYPASS

Name
GPU_ATTRIBUTE_LOAD_BYPASS
Type
bool
Description
Sets the attribute bypass mode in the Streamer
Stages Affected
Streamer
Notes
When this register is set to TRUE Streamer doesn't reads the vertex attributes from the data streams. The Streamer just creates or fetches the vertex indices and sends them to the shader processors. Vertex attributes are read from the shaders using the LDA instruction, accessed using the index received from the Streamer, through the Texture Unit.

Vertex Shading Registers

GPU_VERTEX_PROGRAM

Name
GPU_VERTEX_PROGRAM
Type
uint32
Description
Address in the GPU 32-bit memory space from where to load the vertex shader
Stages Affected
Command Processor
Notes
This register is used by the GPU_LOAD_VERTEX_PROGRAM command as the source address for the vertex shader program to load.

GPU_VERTEX_PROGRAM_SIZE

Name
GPU_VERTEX_PROGRAM_SIZE
Type
uint32
Description
Size in bytes of the vertex shader program to load
Stages Affected
Command Processor, Shader
Notes
This register is used by the GPU_LOAD_VERTEX_PROGRAM command as the size in bytes of the vertex shader program to load.

GPU_VERTEX_PROGRAM_PC

Name
GPU_VERTEX_PROGRAM_PC
Type
uint32
Description
Address in shader instruction memory of the vertex shader program
Stages Affected
Command Processor, Shader
Notes
The valid values for this register are in the range [0, MAX_SHADER_INSTRUCTIONS-1].

This register is used by the GPU_LOAD_VERTEX_PROGRAM command as the address in the shader instruction memory where to load the vertex shader program. This register is used as the pointer to the vertex shader program to use for the next draw call command.

GPU_VERTEX_THREAD_RESOURCES

Name
GPU_VERTEX_THREAD_RESOURCES
Type
uint32
Description
Number of thread resources (registers) required per vertex shader thread
Stages Affected
Shader
Notes
The value assigned to this register limits the maximum number of threads on the fly.

GPU_VERTEX_CONSTANT

Name
GPU_VERTEX_CONSTANT
Type
quadfloat[MAX_SHADER_CONSTANTS]
Description
Vertex shader constant bank
Stages Affected
Shader
Notes


GPU_VERTEX_OUTPUT_ATTRIBUTE

Name
GPU_VERTEX_OUTPUT_ATTRIBUTE
Type
bool[MAX_VERTEX_ATTRIBUTES]
Description
Defines if a vertex attribute is generated by the current vertex shader program.
Stages Affected
Primitive Assembly, Shader
Notes
Limits vertex bandwidth between the Shader, Streamer and Primitive Assembly stages.

Primitive Assembly Registers

GPU_PRIMITIVE

Name
GPU_PRIMITIVE
Type
PrimitiveType
Description
Defines the primitive type to be used in the next draw call.
Stages Affected
Primitive Assembly
Notes
Primitive Types
Type Description Graphic
TRIANGLE Three new indices/vertices are grouped to form a new triangle File:Primitive-triangle.png
TRIANGLE_STRIP After the first two indices/vertices every new index/vertex forms a new triangle with the two previous indices/vertices File:Primitive-triangle-strip.png
TRIANGLE_FAN After the first two indices/vertices every new index/vertex forms a new triangle with the first two indices/vertices File:Primitive-triangle-fan.png
QUAD Four new indices/vertices are grouped to form two new triangles File:Primitive-quad.png
QUAD_STRIP After the first two indices/vertices every new index/vertex forms a new triangle File:Primitive-quad-strip.png
LINE Two new indices/vertices are grouped to form a new line segment. NOT IMPLEMENTED File:Primitive-line.png
LINE_STRIP After the first index/vertex every new index/vertex forms a new line segment with the previous index/vertices. NOT IMPLEMENTED File:Primitive-line-strip.png
LINE_FAN After the first index/vertex every new index/vertex forms a new line segment with the first index/vertex. NOT IMPLEMENTED File:Primitive-line-fan.png
POINT Every index/vertex defines a point or point sprite. NOT IMPLEMENTED File:Primitive-point.png

Clipper Registers

GPU_FRUSTUM_CLIPPING

Name
GPU_FRUSTUM_CLIPPING
Type
bool
Description
Defines if frustum clipping is enabled
Stages Affected
Clipper
Notes
This register is used to enable trivial reject of triangles completely outside the frustum/clipping volume.

GPU_USER_CLIP

Name
GPU_USER_CLIP
Type
quadfloat[MAX_USER_CLIP_PLANES]
Description
Defines a user clip plane
Stages Affected
Clipper
Notes
NOT IMPLEMENTED

GPU_USER_CLIP_PLANE

Name
GPU_USER_CLIP_PLANE
Type
bool[MAX_USER_CLIP_PLANES]
Description
Defines if a user clip plane is enabled
Stages Affected
Clipper
Notes
NOT IMPLEMENTED

Rasterization Registers

GPU_FACEMODE

Name
GPU_FACEMODE
Type
FaceMode
Description
Defines the vertex order for front facing triangles
Stages Affected
Triangle Setup
Notes
Use GPU_CW for clock-wise order, and GPU_CCW for counter clock-wise order.

GPU_CULLING

Name
GPU_CULLING
Type
CullMode
Description
Defines which triangles are culled based on the direction the triangle is facing
Stages Affected
Triangle Setup
Notes
Triangle Cull Modes
Mode Description
NONE No triangle is culled
FRONT Front facing triangles are culled
BACK Back facing triangles are culled
FRONT_AND_BACK All triangles are culled

GPU_HIERARCHICALZ

Name
GPU_HIERARCHICALZ
Type
bool
Description
Used to enable or disable the Hierarchical Z test on blocks of fragments
Stages Affected
Hierarchical Z
Notes
Hierarchical Z can only be enabled when some conditions are fulfilled. The Hierarchical Z and Z buffer were constructed using the GPU_LEQUAL function. Depth test is enabled and the current Z comparison function must be GPU_LEQUAL or GPU_EQUAL and the stencil Z fail function must be set to STENCIL_KEEP.

IMPLEMENTATION DEPENDENT. EXPAND AND VERIFY.

GPU_EARLYZ

Name
GPU_EARLYZ
Type
bool
Description
Used to enable or disable the Early Z Read/Test/Write
Stages Affected
Fragment FIFO
Notes

Conditions for enabling Early Z Read/Test/Write:

  • The fragment shader program doesn't kill the fragment (KIL and CMPKIL instructions)
  • The fragment shader doesn't write the fragment depth.

IMPLEMENTATION DEPENDENT. EXPAND AND VERIFY.

GPU_DISPLAY_X_RES

Name
GPU_DISPLAY_X_RES
Type
uint32
Description
Defines the render target and z/stencil buffer width in pixels
Stages Affected
Triangle Setup, Triangle Traversal, Z Stencil Test, Color Write, DAC, Blitter
Notes
Valid values for this registers are in the range [0, MAX_DISPLAY_RES_X-1].


GPU_DISPLAY_Y_RES

Name
GPU_DISPLAY_Y_RES
Type
uint32
Description
Defines the render target and z/stencil buffer height in pixels
Stages Affected
Triangle Setup, Triangle Traversal, Z Stencil Test, Color Write, DAC, Blitter
Notes
Valid values for this registers are in the range [0, MAX_DISPLAY_RES_Y-1].


GPU_D3D9_PIXEL_COORDINATES

Name
GPU_D3D9_PIXEL_COORDINATES
Type
bool
Description
Defines the pixel at (0,0) as the top left texel in the viewport.
Stages Affected
Triangle Setup, DAC
Notes
In D3D9 the pixel at (0, 0) corresponds with the top left pixel of the viewport and this register must be set to TRUE.

In OpenGL the pixel at (0, 0) corresponds with the bottom left pixel of the viewport and this register must be set to FALSE.


GPU_VIEWPORT_INI_X

Name
GPU_VIEWPORT_INI_X
Type
sint32
Description
Defines the horizontal start point of the viewport relative to the render target
Stages Affected
Triangle Setup, Triangle Traversal, Hierarchical Z
Notes

GPU_VIEWPORT_INI_Y

Name
GPU_VIEWPORT_INI_Y
Type
sint32
Description
Defines the vertical start point of the viewport relative to the render target
Stages Affected
Triangle Setup, Triangle Traversal, Hierarchical Z
Notes

GPU_VIEWPORT_WIDTH

Name
GPU_VIEWPORT_WIDTH
Type
uint32
Description
Defines the width of the viewport in pixels
Stages Affected
Triangle Setup, Triangle Traversal, Hierarchical Z
Notes

GPU_VIEWPORT_HEIGHT

Name
GPU_VIEWPORT_HEIGHT
Type
uint32
Description
Defines the height of the viewport in pixels
Stages Affected
Triangle Setup, Triangle Traversal, Hierarchical Z
Notes

GPU_SCISSOR_TEST

Name
GPU_SCISSOR_TEST
Type
bool
Description
Used to enable the scissor test
Stages Affected
Hierarchical Z
Notes

GPU_SCISSOR_INI_X

Name
GPU_SCISSOR_INI_X
Type
sint32
Description
Defines the horizontal start point for the scissor window relative to the render target
Stages Affected
Hierarchical Z
Notes

GPU_SCISSOR_INI_Y

Name
GPU_SCISSOR_INI_Y
Type
sint32
Description
Defines the vertical start point for the scissor window relative to the render target
Stages Affected
Hierarchical Z
Notes

GPU_SCISSOR_WIDTH

Name
GPU_SCISSOR_WIDTH
Type
uint32
Description
Defines the width in pixels of the scissor window
Stages Affected
Hierarchical Z
Notes

GPU_SCISSOR_HEIGHT

Name
GPU_SCISSOR_HEIGHT
Type
sint32
Description
Defines the height in pixels of the scissor window
Stages Affected
Hierarchical Z
Notes

GPU_DEPTH_RANGE_NEAR

Name
GPU_DEPTH_RANGE_NEAR
Type
float32
Description
Defines position of the near clip plane relative to the z axis
Stages Affected
Triangle Setup, Triangle Traversal
Notes
The valid values for this register are in the range [0.0, 1.0].

The value in this register defines the range of values for the z interpolated for fragments inside the clip volume.

GPU_DEPTH_RANGE_FAR

Name
GPU_DEPTH_RANGE_FAR
Type
float32
Description
Defines position of the far plane relative to the z axis
Stages Affected
Triangle Setup, Triangle Traversal
Notes
The valid values for this register are in the range [0.0, 1.0].

The value in this register defines the range of values for the z interpolated for fragments inside the clip volume.

GPU_DEPTH_SLOPE_FACTOR

Name
GPU_DEPTH_SLOPE_FACTOR
Type
float32
Description
Defines the depth slope factor
Stages Affected
Triangle Setup, Triangle Traversal
Notes
From OpenGL spec. Used to bias the triangle z.

NOT IMPLEMENTED.

GPU_DEPTH_UNIT_OFFSET

Name
GPU_DEPTH_UNIT_OFFSET
Type
float32
Description
Defines the depth unit offset
Stages Affected
Triangle Setup, Triangle Traversal
Notes
From OpenGL spec. Used to bias the triangle z.

NOT IMPLEMENTED.

GPU_Z_BUFFER_BIT_PRECISSION

Name
GPU_Z_BUFFER_BIT_PRECISSION
Type
uint32
Description
Defines the bit precision of depth values in the z buffer
Stages Affected
Triangle Setup, Triangle Traversal, Hierarchical Z, Z Stencil Test
Notes
The only valid value for this register is 24.

This register will likely be removed in future implementations and replaced by a register defining the depth buffer format.

GPU_D3D9_DEPTH_RANGE

Name
GPU_D3D9_DEPTH_RANGE
Type
bool
Description
Use the D3D9 depth range for the clip volume
Stages Affected
Clipper, Triangle Setup, Triangle Traversal
Notes
In D3D9 the depth range for the clip volume is [-1.0, 1.0] and this register must be set to TRUE.

In OpenGL the depth range for the clip volume is [0.0, 1.0f] and this register must be set to FALSE.

GPU_D3D9_RASTERIZATION_RULES

Name
GPU_D3D9_RASTERIZATION_RULES
Type
bool
Description
Use the D3D9 rasterization rules
Stages Affected
Triangle Setup, Triangle Traversal
Notes
The rasterization rules for D3D9 and OpenGL differ in a number of ways. Set the register to TRUE for D3D9 API and to FALSE for the OpenGL API.

In the current implementation this register changes when a triangle is considered front facing and how the viewport transformation is performed.


GPU_TWOSIDED_LIGHTING

Name
GPU_TWOSIDED_LIGHTING
Type
bool
Description
Enable two sided lighting
Stages Affected
Triangle Setup
Notes
When two sided lighting is enabled the vertex shader computes two different colors, one for the triangle front face and one for the triangle back face. When this register is set the fragment color input attribute is selected between these two colors based on the triangle facing.

GPU_MULTISAMPLING

Name
GPU_MULTISAMPLING
Type
bool
Description
Enable multisampling antialiasing (MSAA) for the current render target and z stencil buffer
Stages Affected
Triangle Traversal, Z Stencil Test, Color Write, DAC, Blitter
Notes

GPU_MSAA_SAMPLES

Name
GPU_MSAA_SAMPLES
Type
bool
Description
Defines the number of samples to generate for multisampling antialiasing (MSAA)
Stages Affected
Triangle Traversal, Z Stencil Test, Color Write, DAC, Blitter
Notes
The valid values for this register are: 0, 2, 4, 8.

GPU_MODIFY_FRAGMENT_DEPTH

Name
GPU_MODIFY_FRAGMENT_DEPTH
Type
bool
Description
The fragment shader program modifies the fragment depth
Stages Affected
Hierarchical Z, Z Stencil Test
Notes
When the fragment shader program computes or modifies the fragment depth this register must be set to TRUE.

When this register is set to TRUE the Hierarchical Z early test is disabled.


GPU_INTERPOLATION

Name
GPU_INTERPOLATION
Type
bool[MAX_FRAGMENT_ATTRIBUTES]
Description
Used to enable fragment attribute interpolation
Stages Affected
Interpolator
Notes
When this register is set to TRUE the fragment attributes are linearly perspective correctly interpolated from the triangle vertex attributes.

When this register is set to FALSE the fragment attributes are copied from the attributes of the first vertex of the triangle.

Fragment Shading Registers

GPU_FRAGMENT_INPUT_ATTRIBUTES

Name
GPU_FRAGMENT_INPUT_ATTRIBUTES
Type
bool[MAX_FRAGMENT_ATTRIBUTES]
Description
Used to enable a fragment input attribute
Stages Affected
FragmentFIFO, Interpolator, Shader
Notes

This register controls how many fragment attributes are interpolated/copied by the Interpolator. This register controls how many fragment input registers have defined values for fragment shader threads. The value of this register affects the fragment throughput of the Interpolator stage. The value of this register affects how many resources are consumed per fragment shader thread and limits the number of threads on execution.

GPU_FRAGMENT_PROGRAM

Name
GPU_FRAGMENT_PROGRAM
Type
uint32
Description
Address in the GPU 32-bit memory space from where to load the fragment shader
Stages Affected
Command Processor
Notes
This register is used by the GPU_LOAD_FRAGMENT_PROGRAM command as the source address for the fragment shader program to load.

GPU_FRAGMENT_PROGRAM_SIZE

Name
GPU_FRAGMENT_PROGRAM_SIZE
Type
uint32
Description
Size in bytes of the fragment shader program to load
Stages Affected
Command Processor, Shader
Notes
This register is used by the GPU_LOAD_FRAGMENT_PROGRAM command as the size in bytes of the fragment shader program to load.

GPU_FRAGMENT_PROGRAM_PC

Name
GPU_FRAGMENT_PROGRAM_PC
Type
uint32
Description
Address in shader instruction memory of the fragment shader program
Stages Affected
Command Processor, Shader
Notes
The valid values for this register are in the range [0, MAX_SHADER_INSTRUCTIONS-1].

This register is used by the GPU_LOAD_FRAGMENT_PROGRAM command as the address in the shader instruction memory where to load the vertex shader program. This register is used as the pointer to the vertex shader program to use for the next draw call command.

GPU_FRAGMENT_THREAD_RESOURCES

Name
GPU_FRAGMENT_THREAD_RESOURCES
Type
uint32
Description
Number of thread resources (registers) required per fragment shader thread
Stages Affected
Shader
Notes
The value assigned to this register limits the maximum number of threads on the fly.

GPU_FRAGMENT_CONSTANT

Name
GPU_FRAGMENT_CONSTANT
Type
quadfloat[MAX_SHADER_CONSTANTS]
Description
Fragment shader constant bank
Stages Affected
Shader
Notes

Texture Registers

GPU_TEXTURE_ENABLE

Name
GPU_TEXTURE_ENABLE
Type
bool[MAX_TEXTURE_UNITS]
Description
Used to enable if a texture is attached to the texture unit/stage
Stages Affected
Texture Unit
Notes
The texture unit/stage registers combine two different state sets in a graphic API: texture state and texture sampler state.


GPU_TEXTURE_MODE

Name
GPU_TEXTURE_MODE
Type
TextureMode[MAX_TEXTURE_UNITS]
Description
Defines the type (or mode) for the texture attached to the texture unit/stage
Stages Affected
Texture Unit
Notes
The texture unit/stage registers combine two different state sets in a graphic API: texture state and texture sampler state.


Texture Types
Name Description
GPU_TEXTURE1D One dimensional texture (array)
GPU_TEXTURE1D Two dimensional texture (table)
GPU_TEXTURE3D Three dimensional texture (volume)
GPU_TEXTURECUBEMAP Cubemap, six two dimensional textures of equal dimensions arranged as the six faces of a cube.

GPU_TEXTURE_ADDRESS

Name
GPU_TEXTURE_ADDRESS
Type
uint32[MAX_TEXTURES * MAX_TEXTURE_SIZE * CUBEMAP_IMAGES]
Description
Base address in the GPU 32-bit memory space for the data corresponding to the texture attached to the texture unit/stage
Stages Affected
Texture Unit
Notes
The texture unit/stage registers combine two different state sets in a graphic API: texture state and texture sampler state.

This register defines the base address for each texture attached to the texture units/stages. The base address is defined per mip level for 1D, 2D and 3D surfaces. For cubemaps it's defined per mip level and face. For 3D surfaces slices from the same mip level are stored sequentally, as an array, from the same base address. See how this register is indexed for the different texture types:

Texture Address Register Indexing
Texture Type Indexing
1D
index = textureUnit * MAX_TEXTURE_SIZE * CUBEMAP_IMAGES + mipLevel * CUBEMAP_IMAGES
2D
index = textureUnit * MAX_TEXTURE_SIZE * CUBEMAP_IMAGES + mipLevel * CUBEMAP_IMAGES
cubemap
index = textureUnit * MAX_TEXTURE_SIZE * CUBEMAP_IMAGES + mipLevel * CUBEMAP_IMAGES + face
3D
index = textureUnit * MAX_TEXTURE_SIZE * CUBEMAP_IMAGES + mipLevel * CUBEMAP_IMAGES

MAX_TEXTURE_SIZE is the maximum number of mip levels supported (which also limits the maximum texture dimensions). CUBEMAP_IMAGES is a 6.

GPU_TEXTURE_WIDTH

Name
GPU_TEXTURE_WIDTH
Type
uint32[MAX_TEXTURE_UNITS]
Description
Width in texels of the texture attached to the texture unit/stage
Stages Affected
Texture Unit
Notes
The texture unit/stage registers combine two different state sets in a graphic API: texture state and texture sampler state.

The valid range for this register is in the range [0, (2^MAX_TEXTURE_SIZE)-1].

GPU_TEXTURE_HEIGHT

Name
GPU_TEXTURE_HEIGHT
Type
uint32[MAX_TEXTURE_UNITS]
Description
Height in texels of the texture attached to the texture unit/stage
Stages Affected
Texture Unit
Notes
The texture unit/stage registers combine two different state sets in a graphic API: texture state and texture sampler state.

The valid range for this register is in the range [0, (2^MAX_TEXTURE_SIZE)-1]. The value of this register is ignored for one dimensional textures.

GPU_TEXTURE_DEPTH

Name
GPU_TEXTURE_DEPTH
Type
uint32[MAX_TEXTURE_UNITS]
Description
Depth in texels/slices of the texture attached to the texture unit/stage
Stages Affected
Texture Unit
Notes
The texture unit/stage registers combine two different state sets in a graphic API: texture state and texture sampler state.

The valid range for this register is in the range [0, (2^MAX_TEXTURE_SIZE)-1]. The value of this register is ignored for one dimensional textures. The value of this register is ignored for two dimensional textures.

GPU_TEXTURE_WIDTH2

Name
GPU_TEXTURE_WIDTH2
Type
uint32[MAX_TEXTURE_UNITS]
Description
Two's logarithm (rounded up) of the width in texels of the texture attached to the texture unit/stage
Stages Affected
Texture Unit
Notes
The texture unit/stage registers combine two different state sets in a graphic API: texture state and texture sampler state.

The valid range for this register is in the range [0, (2^MAX_TEXTURE_SIZE)-1].

GPU_TEXTURE_HEIGHT2

Name
GPU_TEXTURE_HEIGHT2
Type
uint32[MAX_TEXTURE_UNITS]
Description
Two's logarithm (rounded up) of the height in texels of the texture attached to the texture unit/stage
Stages Affected
Texture Unit
Notes
The texture unit/stage registers combine two different state sets in a graphic API: texture state and texture sampler state.

The valid range for this register is in the range [0, (2^MAX_TEXTURE_SIZE)-1]. The value of this register is ignored for one dimensional textures.

GPU_TEXTURE_DEPTH2

Name
GPU_TEXTURE_DEPTH2
Type
uint32[MAX_TEXTURE_UNITS]
Description
Two's logarithm (rounded up) of the depth in texels/slices of the texture attached to the texture unit/stage
Stages Affected
Texture Unit
Notes
The texture unit/stage registers combine two different state sets in a graphic API: texture state and texture sampler state.

The valid range for this register is in the range [0, (2^MAX_TEXTURE_SIZE)-1]. The value of this register is ignored for one dimensional textures. The value of this register is ignored for two dimensional textures.

GPU_TEXTURE_BORDER

Name
GPU_TEXTURE_BORDER
Type
uint32[MAX_TEXTURE_UNITS]
Description
Defines if the width in texels of the border for the texture attached to the texture unit/stage
Stages Affected
Texture Unit
Notes
The texture unit/stage registers combine two different state sets in a graphic API: texture state and texture sampler state.

NOT IMPLEMENTED.

GPU_TEXTURE_FORMAT

Name
GPU_TEXTURE_FORMAT
Type
TextureFormat[MAX_TEXTURE_UNITS]
Description
Defines the format of the texture attached to the texture unit/stage
Stages Affected
Texture Unit
Notes
The texture unit/stage registers combine two different state sets in a graphic API: texture state and texture sampler state.

See the Texture Format table for supported formats.

GPU_TEXTURE_REVERSE

Name
GPU_TEXTURE_REVERSE
Type
bool[MAX_TEXTURE_UNITS]
Description
Defines the order in memory of the color components for the texture attached to the texture unit/stage
Stages Affected
Texture Unit
Notes
The texture unit/stage registers combine two different state sets in a graphic API: texture state and texture sampler state.

When the register is set to FALSE the order from lower memory addresses to higher memory addresses is: RGBA, RGB, RG. When the register is set to TRUE the order from lower memory addresses to higher memory addresses is: ABGR, BGR, GR.

if (textureReverse)
{
  newR = A
  newG = B
  newB = G
  newA = R
}

GPU_TEXTURE_D3D9_COLOR_CONV

Name
GPU_TEXTURE_D3D9_COLOR_CONV
Type
bool[MAX_TEXTURE_UNITS]
Description
Use the D3D9 color order for the components of the texture attached to the texture unit/stage.
Stages Affected
Texture Unit
Notes
The texture unit/stage registers combine two different state sets in a graphic API: texture state and texture sampler state.

When the register is set to FALSE the order is as defined by GPU_TEXTURE_UNIT_REVERSE. When the register is set to TRUE a new swizzling is applied to the order defined by GPU_TEXTURE_UNIT_REVERSE: RGBA is reinterpreted as ABGR.

if (d3d9ColorConv)
{
  newR = A;
  newG = B;
  newB = G;
  newA = R;
}

GPU_TEXTURE_D3D9_V_INV

Name
GPU_TEXTURE_D3D9_V_INV
Type
bool[MAX_TEXTURE_UNITS]
Description
Flip the vertital texture coordinate for the texture unit/stage.
Stages Affected
Texture Unit
Notes
The texture unit/stage registers combine two different state sets in a graphic API: texture state and texture sampler state.

When this register is set to TRUE the vertical texture coordinate is flipped:

vFlipped = 1 - v

The flip happens after applying the texture wrapping mode and reducing the range of the texture coordinate to [0, 1]. When texture coordinates are provided as in texel space (see GPU_TEXTURE_NON_NORMALIZED register) the value assigned to this register has no effect.

GPU_TEXTURE_COMPRESSION

Name
GPU_TEXTURE_COMPRESSION
Type
TextureCompression[MAX_TEXTURE_UNITS]
Description
Defines the texture compression mode for the texture attached to the texture unit/stage
Stages Affected
Texture Unit
Notes
The texture unit/stage registers combine two different state sets in a graphic API: texture state and texture sampler state.
Compression Modes
Mode Description Compatible Texture Formats
GPU_NO_TEXTURE_COMPRESSION Texture is not compressed All
GPU_S3TC_DXT1_RGB Texture is compressed with the DXT1/BC1 algorithm (see OpenGL S3TC extension or D3D10 spec for details) GPU_RGBA8888
GPU_S3TC_DXT1_RGBA Texture is compressed with the DXT1/BC1 algorithm (with 1-bit alpha) (see OpenGL S3TC extension or D3D10 spec for details) GPU_RGBA8888
GPU_S3TC_DXT3_RGBA Texture is compressed with the DXT3/BC2 algorithm (see OpenGL S3TC extension or D3D10 spec for details) GPU_RGBA8888
GPU_S3TC_DXT5_RGBA Texture is compressed with the DXT5/BC3 algorithm (see OpenGL S3TC extension or D3D10 spec for details) GPU_RGBA8888
GPU_LATC1 Texture is compressed with the LATC1 algorithm (see OpenGL LATC extension for details) GPU_LUMINANCE8
GPU_LATC1_SIGNED Texture is compressed with the LATC1 algorithm (see OpenGL LATC extension for details) GPU_LUMINANCE8_SIGNED
GPU_LATC2 Texture is compressed with the LATC2 algorithm (see OpenGL LATC extension for details) GPU_LUMINANCE8_ALPHA8
GPU_LATC2_SIGNED Texture is compressed with the LATC2 algorithm (see OpenGL LATC extension for details) GPU_LUMINANCE8_ALPHA8_SIGNED

GPU_TEXTURE_BLOCKING

Name
GPU_TEXTURE_BLOCKING
Type
TextureBlocking[MAX_TEXTURE_UNITS]
Description
Defines how the texture is stored in memory for the texture attached to the texture unit/stage
Stages Affected
Texture Unit
Notes
The texture unit/stage registers combine two different state sets in a graphic API: texture state and texture sampler state.
Texture Blocking Modes
Mode Description
GPU_TXBLOCK_TEXTURE Texture blocking format, for normal off-line textures
GPU_TXBLOCK_FRAMEBUFFER Render target blocking format, for textures rendered on-line

GPU_TEXTURE_BORDER_COLOR

Name
GPU_TEXTURE_BORDER_COLOR
Type
QuadFloat[MAX_TEXTURE_UNITS]
Description
Defines the texture border color for the texture attached to the texture unit/stage
Stages Affected
Texture Unit
Notes
The texture unit/stage registers combine two different state sets in a graphic API: texture state and texture sampler state.

The border color is used when sampling outside the texture. NOT IMPLEMENTED.

GPU_TEXTURE_WRAP_S

Name
GPU_TEXTURE_WRAP_S
Type
ClampMode[MAX_TEXTURE_UNITS]
Description
Defines the horizontal texture coordinate is wrapped/clamped for the texture unit/stage
Stages Affected
Texture Unit
Notes
The texture unit/stage registers combine two different state sets in a graphic API: texture state and texture sampler state.
Texture Coordinate Clamp/Wrap Modes
Mode Description
GPU_TEXT_CLAMP Clamp the texture coordinate to the [0, 1] range.
GPU_TEXT_CLAMP_EDGE Clamp the texture coordinate to the [0, 1] range. Avoid sampling outside the texture
GPU_TEXT_CLAMP_TO_BORDER Clamp the texture coordinate to the texture border
GPU_TEXT_REPEAT Use the fractional part of the texture coordinate to compute the texel addresses
GPU_TEXT_MIRRORED_REPEAT Use the fractional part of the texture coordinate to compute the texel address and flip the fractional part (v = 1 - v) when the integer part of the texture coordinate is even

GPU_TEXTURE_WRAP_T

Name
GPU_TEXTURE_WRAP_T
Type
ClampMode[MAX_TEXTURE_UNITS]
Description
Defines the vertical texture coordinate is wrapped/clamped for the texture unit/stage
Stages Affected
Texture Unit
Notes
The texture unit/stage registers combine two different state sets in a graphic API: texture state and texture sampler state.

See GPU_TEXTURE_WRAP_S for the supported texture coordinate clamp/wrap modes.

GPU_TEXTURE_WRAP_R

Name
GPU_TEXTURE_WRAP_R
Type
ClampMode[MAX_TEXTURE_UNITS]
Description
Defines the depth texture coordinate is wrapped/clamped for the texture unit/stage
Stages Affected
Texture Unit
Notes
The texture unit/stage registers combine two different state sets in a graphic API: texture state and texture sampler state.

See GPU_TEXTURE_WRAP_S for the supported texture coordinate clamp/wrap modes.

GPU_TEXTURE_NON_NORMALIZED

Name
GPU_TEXTURE_NON_NORMALIZED
Type
bool[MAX_TEXTURE_UNITS]
Description
Defines if the texture coordinates are in texel space or normalized space for the texture unit/stage
Stages Affected
Texture Unit
Notes
The texture unit/stage registers combine two different state sets in a graphic API: texture state and texture sampler state.

When this register is set to TRUE the texture coordinates for the texture are provided in texel space. In this mode the texture coordinate clamp/wrap mode has no effect. When this register is set to FALSE the texture coordinates for the texture are provided in normalized space.

GPU_TEXTURE_MIN_FILTER

Name
GPU_TEXTURE_MIN_FILTER
Type
FilterMode[MAX_TEXTURE_UNITS]
Description
Defines if the filter method to apply when minifying the texture for the texture unit/stage
Stages Affected
Texture Unit
Notes
The texture unit/stage registers combine two different state sets in a graphic API: texture

state and texture sampler state.

Filter Modes
Mode Description
GPU_NEAREST Point sampling, a single texel is sampled from the first mipmap level
GPU_LINEAR Bilinear, four texels are sampled from the first mipmap level
GPU_NEAREST_MIPMAP_NEAREST Point sampling, a single texel is sampled from the nearest mipmap level to the scale factor computed from the 4 neighbour pixel derivatives
GPU_NEAREST_MIPMAP_LINEAR Point sampling, a single texel is sampled from both mipmap levels around the scale factor computed from the 4 neighbour pixel derivatives
GPU_LINEAR_MIPMAP_NEAREST Bilinear, four texels are sampled from the nearest mipmap level to the scale factor computed from the 4 neighbour pixel derivatives
GPU_LINEAR_MIPMAP_LINEAR Bilinear, four texels are sampled from both mipmap levels around the scale factor computed from the 4 neighbour pixel derivatives

GPU_TEXTURE_MAG_FILTER

Name
GPU_TEXTURE_MAG_FILTER
Type
FilterMode[MAX_TEXTURE_UNITS]
Description
Defines if the filter method to apply when minifying the texture for the texture unit/stage
Stages Affected
Texture Unit
Notes
The texture unit/stage registers combine two different state sets in a graphic API: texture

state and texture sampler state.

Filter Modes
Mode Description
GPU_NEAREST Point sampling, a single texel is sampled from the first mipmap level
GPU_LINEAR Bilinear, four texels are sampled from the first mipmap level

GPU_TEXTURE_ENABLE_COMPARISON

Name
GPU_TEXTURE_ENABLE_COMPARISON
Type
bool[MAX_TEXTURE_UNITS]
Description
Defines if Percentage-Close-Filtering is enabled for the texture unit/stage
Stages Affected
Texture Unit
Notes
The texture unit/stage registers combine two different state sets in a graphic API: texture

state and texture sampler state. When the register is set to TRUE the sampled texels are first compared against reference value provided as a third texture coordinate/parameter using the comparison function defined in the GPU_TEXTURE_COMPARISON_FUNCTION register. The result of the comparison is a 0.0f (false) or 1.0f (true) value. The results of the comparison are then filtered using the defined filter method. This register is used to enable PCF (Percentage-Close-Filtering), a filtering algorithm used to sample from shadow maps (depth textures) to smooth shadow borders.

GPU_TEXTURE_COMPARISON_FUNCTION

Name
GPU_TEXTURE_COMPARISON_FUNCTION
Type
ComparisonMode[MAX_TEXTURE_UNITS]
Description
Defines the comparison mode to use for Percentage-Close-Filtering for the texture unit/stage
Stages Affected
Texture Unit
Notes
The texture unit/stage registers combine two different state sets in a graphic API: texture state and texture sampler state.

This register is used to define PCF (Percentage-Close-Filtering), a filtering algorithm used to sample from shadow maps (depth textures) to smooth shadow borders.

Comparison Functions
Function Description
GPU_NEVER The result of the comparison is FALSE (0.0f)
GPU_ALWAYS The result of the comparison is TRUE (1.0f)
GPU_LESS The result is TRUE (1.0f) if the texel value is less than the reference value, FALSE (0.0f) otherwise
GPU_LEQUAL The result is TRUE (1.0f) if the texel value is less or equal than the reference value, FALSE (0.0f) otherwise
GPU_EQUAL The result is TRUE (1.0f) if the texel value is equal to the reference value, FALSE (0.0f) otherwise
GPU_GEQUAL The result is TRUE (1.0f) if the texel value is greator or equal than the reference value, FALSE (0.0f) otherwise
GPU_GREATER The result is TRUE (1.0f) if the texel value is greater than the reference value, FALSE (0.0f) otherwise
GPU_NOTEQUAL The result is TRUE (1.0f) if the texel value is not equal to the reference value, FALSE (0.0f) otherwise

GPU_TEXTURE_SRGB

Name
GPU_TEXTURE_SRGB
Type
bool[MAX_TEXTURE_UNITS]
Description
Enables color conversion from sRGB to linear space for the texture unit/stage
Stages Affected
Texture Unit
Notes
The texture unit/stage registers combine two different state sets in a graphic API: texture

state and texture sampler state. When this register is set to TRUE the texel values sampled from the texture are converted from sRGB (gamma) color space to linear color space. The conversion function is:

out = in ^ 2.2f

GPU_TEXTURE_MIN_LOD

Name
GPU_TEXTURE_MIN_LOD
Type
float32[MAX_TEXTURE_UNITS]
Description
Defines the minimum LOD to which the computed texture LODs are clamped to for the texture unit/stage
Stages Affected
Texture Unit
Notes
The texture unit/stage registers combine two different state sets in a graphic API: texture

state and texture sampler state. See LOD equation:

lod = LOG2(scaleFactor) + CLAMP(textureLODBias + textureUnitLODBias + fragmentBias, -MAX_TEXTURE_LOD_BIAS, MAX_TEXTURE_LOD_BIAS));
lod = CLAMP(lod, textureMinLOD, textureMaxLOD);

GPU_TEXTURE_MAX_LOD

Name
GPU_TEXTURE_MAX_LOD
Type
float32[MAX_TEXTURE_UNITS]
Description
Defines the maximum LOD to which the computed texture LODs are clamped to for the texture unit/stage
Stages Affected
Texture Unit
Notes
The texture unit/stage registers combine two different state sets in a graphic API: texture

state and texture sampler state. See LOD equation:

lod = LOG2(scaleFactor) + CLAMP(textureLODBias + textureUnitLODBias + fragmentBias, -MAX_TEXTURE_LOD_BIAS, MAX_TEXTURE_LOD_BIAS));
lod = CLAMP(lod, textureMinLOD, textureMaxLOD);

GPU_TEXTURE_LOD_BIAS

Name
GPU_TEXTURE_LOD_BIAS
Type
float32[MAX_TEXTURE_UNITS]
Description
Defines a bias to be added to the computed texture LODs for the texture attached to the texture unit/stage
Stages Affected
Texture Unit
Notes
The texture unit/stage registers combine two different state sets in a graphic API: texture

state and texture sampler state. See LOD equation:

lod = LOG2(scaleFactor) + CLAMP(textureLODBias + textureUnitLODBias + fragmentBias, -MAX_TEXTURE_LOD_BIAS, MAX_TEXTURE_LOD_BIAS));
lod = CLAMP(lod, textureMinLOD, textureMaxLOD);

The bias assigned to this register should be the bias associated with the texture attached with the texture unit/stage. In a future implementation this register and GPU_TEXT_UNIT_LOD_BIAS may be combined as a single register.

GPU_TEXT_UNIT_LOD_BIAS

Name
GPU_TEXT_UNIT_LOD_BIAS
Type
float32[MAX_TEXTURE_UNITS]
Description
Defines a bias to be added to the computed texture LODs for the texture unit/stage
Stages Affected
Texture Unit
Notes
The texture unit/stage registers combine two different state sets in a graphic API: texture

state and texture sampler state. See LOD equation:

lod = LOG2(scaleFactor) + CLAMP(textureLODBias + textureUnitLODBias + fragmentBias, -MAX_TEXTURE_LOD_BIAS, MAX_TEXTURE_LOD_BIAS));
lod = CLAMP(lod, textureMinLOD, textureMaxLOD);

The bias assigned to this register should be the bias associated with the texture unit/stage. In a future implementation this register and GPU_TEXT_UNIT_LOD_BIAS may be combined as a single register.

GPU_TEXTURE_MIN_LEVEL

Name
GPU_TEXTURE_MIN_LEVEL
Type
uint32[MAX_TEXTURE_UNITS]
Description
Defines the minimum/base mipmap level for the texture attached to the texture unit/stage
Stages Affected
Texture Unit
Notes
The texture unit/stage registers combine two different state sets in a graphic API: texture

state and texture sampler state.

GPU_TEXTURE_MAX_LEVEL

Name
GPU_TEXTURE_MAX_LEVEL
Type
uint32[MAX_TEXTURE_UNITS]
Description
Defines the maximum/top mipmap level for the texture attached to the texture unit/stage
Stages Affected
Texture Unit
Notes
The texture unit/stage registers combine two different state sets in a graphic API: texture

state and texture sampler state.

GPU_TEXTURE_MAX_ANISOTROPY

Name
GPU_TEXTURE_MAX_ANISOTROPY
Type
uint32[MAX_TEXTURE_UNITS]
Description
Defines anisotropy filtering level for the texture unit/stage
Stages Affected
Texture Unit
Notes
The texture unit/stage registers combine two different state sets in a graphic API: texture

state and texture sampler state. The valid values for this register are:

Anisotropic Filtering Levels
Level Description
0 Anisotropic filtering is disabled
1 Anisotropic filtering is enabled and a maximum of one bilinear sample is taken per mipmap level. Same as bilinear but the mipmap level(s) sampled may change.
[2, 16] Anisotropic filtering is enabled but a maximum of 2 to 16 bilinear samples are taken per mipmap level. Same as bilinear but the mipmap level(s) sampled may change.
> 16 Not supported

Depending on the configuration/implementation of the simulation GPU only even values may be valid. Odd values will be rounded to the nearest smaller even value.

Z/Stencil Test Registers

GPU_Z_BUFFER_CLEAR

Name
GPU_Z_BUFFER_CLEAR
Type
uint32
Description
Defines the clear value for the compressed z buffer
Stages Affected
Hierarchical Z, Z Stencil Test, DAC (required to dump the depth buffer as an image file)
Notes
When the z stencil buffer is compressed the value of this register defines the depth value assigned to those z stencil buffer blocks marked as cleared by a fast z stencil clear comman (GPU_CLEARZSTENCILBUFFER).

Changing the value of this register changes the depth value returned for depth buffer blocks marked as cleared. However blocks where depth data was written and changed from the clear state to a compressed or uncompressed state will store, for those pixels in the block not written, the clear value assigned when the block changed from clear to compressed/uncompressed state. To avoid any inconsistence with the cleared pixels and fast clear the value of this register should only be updated just before a fast clear is performed.

The current implementation only supports the 24-bit normalized z buffer format so only the lower 24-bits bits of this register are used to define the depth clear value.

GPU_STENCIL_BUFFER_CLEAR

Name
GPU_STENCIL_BUFFER_CLEAR
Type
uint32
Description
Defines the clear value for the compressed stencil buffer
Stages Affected
Z Stencil Test, DAC (required to dump the stencil buffer as an image file)
Notes
When the z stencil buffer is compressed the value of this register defines the stencil value assigned to those z stencil buffer blocks marked as cleared by a fast z stencil clear command (GPU_CLEARZSTENCILBUFFER).

Changing the value of this register changes the stencil value returned for stencil buffer blocks marked as cleared. However blocks where stencil data was written and changed from the clear state to a compressed or uncompressed state will store, for those pixels in the block not written, the clear value assigned when the block changed from clear to compressed/uncompressed state. To avoid any inconsistence with the cleared pixels and fast clear the value of this register should only be updated just before a fast clear is performed.

The current implementation only supports the 8-bit unsigned integer stencil format so only the lower 8-bits of this register are used to define the stencil clear value.

GPU_ZSTENCIL_STATE_BUFFER_MEM_ADDR

Name
GPU_ZSTENCIL_STATE_BUFFER_MEM_ADDR
Type
uint32
Description
Defines the address in the GPU memory address space where state information for blocks in the compressed z stencil buffer are saved or restored
Stages Affected
Z Stencil Test
Notes
The address written in this register is used as the base address where to write the current block state for the compressed z stencil buffer by the GPU_SAVE_ZSTENCIL_STATE command.

The address written in this register is used as the base address from where the block state is read for the compressed z stencil buffer by the GPU_RESTORE_ZSTENCIL_STATE command.

GPU_STENCIL_TEST

Name
GPU_STENCIL_TEST
Type
bool
Description
Enables the stencil test
Stages Affected
Z Stencil Test, Fragment FIFO (not used in the current implementation)
Notes
Set this register to TRUE to enable the stencil test.

The stencil test compares the fragment stencil value stored in the stencil buffer with a defined reference stencil value using a defined comparison function. Depending on the result of the comparison the fragment is culled and the stencil value in the stencil buffer may be updated. The stencil value in the stencil buffer may also be updated based on the result depth test. Stencil test happens before depth test in the algorithmic 3D rendering pipeline so fragments culled by the stencil test won't be processed by the depth and color stages.

The stencil test follows the algorithm defined below:

bool cullFragment
u8bit fragmentStencil = readStencilBuffer(x, y)
cullFragment = !((referenceStencil & stencilCompareMask) compFunction (fragmentStencil & stencilCompareMask))
if (cullFragment)
  updateStencil(stencilFailFunction, fragmentStencil, referenceStencil, stencilUpdateMask)
else
  bool depthTestPass
  depthTestPass = depthTest(x, y)
  if (depthTestPass)
    updateStencil(depthPassFunction, fragmentStencil, referenceStencil, stencilUpdateMask)
  else
    updateStencil(depthFailFunction, fragmentStencil, referenceStencil, stencilUpdateMask)

GPU_STENCIL_FUNCTION

Name
GPU_STENCIL_FUNCTION
Type
ComparisonMode
Description
Defines the comparison function used to compute the result of the stencil test
Stages Affected
Z Stencil Test
Notes
The value in this register defines what comparison function is used to compare the current value stored in the stencil buffer for the fragment being processed with the defined reference stencil value.
Comparison Functions
Function Description
GPU_NEVER The result of the comparison is FALSE
GPU_ALWAYS The result of the comparison is TRUE
GPU_LESS The result is TRUE if the reference stencil value is less than the fragment stencil value, FALSE otherwise
GPU_LEQUAL The result is TRUE if the reference stencil value is less or equal than the fragment stencil value, FALSE otherwise
GPU_EQUAL The result is TRUE if the reference stencil value is equal to the fragment stencil value, FALSE otherwise
GPU_GEQUAL The result is TRUE if the reference stencil value is greater or equal than the fragment stencil value, FALSE otherwise
GPU_GREATER The result is TRUE if the reference stencil value is greater than the fragment stencil value, FALSE otherwise
GPU_NOTEQUAL The result is TRUE if the reference stencil value is not equal to the fragment stencil value, FALSE otherwise

GPU_STENCIL_COMPARE_MASK

Name
GPU_STENCIL_COMPARE_MASK
Type
u32bit
Description
Defines a bit-mask to apply to both the reference stencil and fragment stencil values before performing the stencil comparison
Stages Affected
Z Stencil Test
Notes
The current implementation only supports the 8-bit unsigned integer stencil format so only the lower 8-bits of the value written in the register are used.

GPU_STENCIL_UPDATE_MASK

Name
GPU_STENCIL_UPDATE_MASK
Type
u32bit
Description
Defines a bit-mask to apply to the result of the stencil update function before writting back the stencil value in the stencil buffer.
Stages Affected
Z Stencil Test
Notes
The current implementation only supports the 8-bit unsigned integer stencil format so only the lower 8-bits of the value written in the register are used.

GPU_STENCIL_FAIL_UPDATE

Name
GPU_STENCIL_FAIL_UPDATE
Type
StencilUpdateFunction
Description
Defines the function used to update the stencil value in the stencil buffer when the fragment stencil test fails
Stages Affected
Z Stencil Test
Notes
The stencil update functions supported are:
Stencil Update Functions
Function Description
STENCIL_KEEP Keeps the value stored in the stencil buffer
writeStencil = fragmentStencil & stencilUpdateMask
STENCIL_ZERO Writes a zero in the stencil buffer
writeStencil = 0 & stencilUpdateMask
STENCIL_REPLACE Writes the reference stencil value in the stencil buffer
writeStencil = referenceStencil & stencilUpdateMask
STENCIL_INCR Saturated increment of the value in the stencil buffer
writeStencil = ((fragmentStencil == 255) ? 255 : (fragmentStencil + 1)) & updateStencilMask
STENCIL_DECR Saturated decrement of the value in the stencil buffer
writeStencil = ((fragmentStencil == 0) ? 0 : (fragmentStencil - 1)) & stencilUpdateMask
STENCIL_INVERT Invert (negate) the value in the stencil buffer
writeStencil = (-fragmentStencil) & stencilUpdateMask
STENCIL_INCR_WRAP Increment with wrap of the value in the stencil buffer
writeStencil = ((fragmentStencil == 255) ? 0 : (fragmentStencil + 1)) & stencilUpdateMask
STENCIL_DECR_WRAP Decrement with wrap of the value in the stencil buffer
writeStencil = ((fragmentStencil == 0) ? 255 : (fragmentStencil - 1)) & stencilUpdateMask

GPU_DEPTH_FAIL_UPDATE

Name
GPU_DEPTH_FAIL_UPDATE
Type
StencilUpdateFunction
Description
Defines the function used to update the stencil value in the stencil buffer when the fragment depth test fails
Stages Affected
Z Stencil Test
Notes
See GPU_STENCIL_FAIL_UPDATE for the supported stencil update functions

GPU_DEPTH_PASS_UPDATE

Name
GPU_DEPTH_PASS_UPDATE
Type
StencilUpdateFunction
Description
Defines the function used to update the stencil value in the stencil buffer when the fragment depth test passes
Stages Affected
Z Stencil Test
Notes
See GPU_STENCIL_FAIL_UPDATE for the supported stencil update functions

GPU_DEPTH_TEST

Name
GPU_DEPTH_TEST
Type
bool
Description
Enables the depth test
Stages Affected
Z Stencil Test, Fragment FIFO (not used in the current implementation)
Notes
Set this register to TRUE to enable the depth test.

The depth test compares the depth value computed for the fragment with the depth value stored in the depth buffer. Depending on the result of the comparison the fragment is culled and the depth value in the depth buffer may be updated.

The depth test follows the algorithm defined below:

bool cullFragment
u8bit bufferDepth = readDepthBuffer(x, y)
cullFragment = !((fragmentDepth compFunction bufferDepth)
if (!cullFragment)
  updateDepth(fragmentDepth, depthWriteMask)

GPU_DEPTH_FUNCTION

Name
GPU_DEPTH_FUNCTION
Type
ComparisonMode
Description
Defines the comparison function used to compute the result of the depth test
Stages Affected
Z Stencil Test, Hierarchical Z
Notes
The value in this register defines what comparison function is used to compare the current value stored in the depth buffer for the fragment being processed with the depth value computed for the fragment.
Comparison Functions
Function Description
GPU_NEVER The result of the comparison is FALSE
GPU_ALWAYS The result of the comparison is TRUE
GPU_LESS The result is TRUE if the fragment depth value is less than the buffer depth value, FALSE otherwise
GPU_LEQUAL The result is TRUE if the fragment depth value is less or equal than the buffer depth value, FALSE otherwise
GPU_EQUAL The result is TRUE if the fragment depth value is equal to the buffer depth value, FALSE otherwise
GPU_GEQUAL The result is TRUE if the fragment depth value is greater or equal than the buffer depth value, FALSE otherwise
GPU_GREATER The result is TRUE if the fragment depth value is greater than the buffer depth value, FALSE otherwise
GPU_NOTEQUAL The result is TRUE if the fragment depth value not equal to the buffer depth value, FALSE otherwise

GPU_DEPTH_MASK

Name
GPU_DEPTH_MASK
Type
bool
Description
Defines if the depth buffer is updated after a fragment passes the depth test
Stages Affected
Z Stencil Test
Notes

GPU_ZSTENCIL_COMPRESSION

Name
GPU_ZSTENCIL_COMPRESSION
Type
bool
Description
Defines if the current depth stencil buffer is compressed
Stages Affected
Z Stencil Test, DAC (required to dump the stencil buffer as an image file)
Notes
When this register is set to TRUE the z stencil buffer is automatically compressed and decompressed when writing and reading from memory. Fast clears can be performed using the GPU_CLEARZSTENCILBUFFER command.

When this register is set to FALSE the z stencil buffer is not compressed. Fast clears can not be performed. The current implementation doesn't support reading compressed z stencil buffers through the Texture Unit so depth stencil buffers used as shadow maps must be rendered without compression enabled.

Color/Blend Registers

GPU_COLOR_BUFFER_FORMAT

Name
GPU_COLOR_BUFFER_FORMAT
Type
TextureFormat
Description
Defines the format of the default color buffer (back buffer)
Stages Affected
Color Write, DAC
Notes

The default color buffer (or back buffer) is aliased to render target 0. The texture formats supported by the ATTILA architecture are:

Render Target Formats
Format BPT Description
GPU_RGBA8888 32 4-channel 8-bit unsigned normalized
GPU_RG16F 32 2-channel 16-bit float point
GPU_R32F 32 1-channel 32-bit float point
GPU_RGBA16 64 4-channel 16-bit unsigned normalized
GPU_RGBA16F 64 4-channel 16-bit float point

GPU_COLOR_COMPRESSION

Name
GPU_COLOR_BUFFER_FORMAT
Type
bool
Description
Defines if the color buffer is compressed
Stages Affected
Color Write, DAC
Notes
When this register is set to TRUE the color buffer is automatically compressed and decompressed when writing and reading from memory. Fast clears can be performed using the GPU_CLEARCOLORBUFFER command. Compression only applies to the default color buffer (back buffer) which is aliased to render target 0.

When this register is set to FALSE the color buffer is not compressed. Fast clears can not be performed. The current implementation doesn't support reading compressed color buffers through the Texture Unit so render targets that are read as textures must not be compressed. A compressed color buffer can be decompressed using the Blitter.

GPU_COLOR_SRGB_WRITE

Name
GPU_COLOR_SRGB_WRITE
Type
bool
Description
Enable conversion from linear to sRGB color space on writing to the color buffer or render target
Stages Affected
Color Write, DAC
Notes
When this register is set to TRUE the color value written into the color buffer or render target is converted from linear to sRGB (gamma) color space. If blending is enabled the color read from the color buffer or render target is converted from sRGB (gamma) to linear color space, the blend operation is performed (with the fragment color in linear color space) and the result converted back to sRGB (gamma) space before being written.

Color conversions:

Linear to sRGB : sRGBColor = linearColor ^ (1 / 2.2f)
sRGB to linear : linearColor = sRGBColor ^ 2.2f

GPU_RENDER_TARGET_ENABLE

Name
GPU_RENDER_TARGET_ENABLE
Type
bool[MAX_RENDER_TARGETS]
Description
Enable a render target
Stages Affected
Color Write, DAC
Notes
When an element of this register is set to true the attached render target is updated using as input the corresponding fragment color output computed in the shader.

The default color buffer (back buffer) is aliased to render target 0 and enabled by default. All other render targets are disabled by default.

GPU_RENDER_TARGET_FORMAT

Name
GPU_RENDER_TARGET_FORMAT
Type
TextureFormat[MAX_RENDER_TARGETS]
Description
Defines the format of the render target
Stages Affected
Color Write, DAC
Notes
See GPU_COLOR_BUFFER_FORMAT for the supported render target formats.

The first render target (index 0) is aliased to the default color buffer (back buffer).

GPU_RENDER_TARGET_ADDRESS

Name
GPU_RENDER_TARGET_ADDRESS
Type
u32bit[MAX_RENDER_TARGETS]
Description
Defines the base address in GPU memory space of the render target
Stages Affected
Color Write, DAC
Notes
The first render target (index 0) is aliased to the default color buffer (back buffer).

GPU_COLOR_BUFFER_CLEAR

Name
GPU_COLOR_BUFFER_CLEAR
Type
QuadFloat
Description
Defines the clear color value for compressed color buffers
Stages Affected
Color Write, DAC
Notes
When the color buffer is compressed the value of this register defines the color value assigned to those color buffer blocks marked as cleared by a fast color clear command (GPU_CLEARCOLORBUFFER).

Changing the value of this register changes the color value returned for color buffer blocks marked as cleared. However blocks where color data was written and changed from the clear state to a compressed or uncompressed state will store, for those pixels in the block not written, the clear value assigned when the block changed from clear to compressed/uncompressed state. To avoid any inconsistence with the cleared pixels and fast clear the value of this register should only be updated just before a fast clear is performed.

GPU_COLOR_STATE_BUFFER_MEM_ADDR

Name
GPU_COLOR_STATE_BUFFER_MEM_ADDR
Type
uint32
Description
Defines the address in the GPU memory address space where state information for blocks in the compressed color buffer are saved or restored
Stages Affected
Color Write
Notes
The address written in this register is used as the base address where to write the current block state for the compressed color buffer by the GPU_SAVE_COLOR_STATE command.

The address written in this register is used as the base address from where the block state is read for the compressed color buffer by the GPU_RESTORE_COLOR_STATE command.

GPU_COLOR_BLEND

Name
GPU_COLOR_BLEND
Type
bool[MAX_RENDER_TARGETS]
Description
Defines if color blending is enabled for the render target.
Stages Affected
Color Write
Notes
When this register is set to true the fragment color, computed in the Fragment Shader, corresponding with the render target is combined using a blending equation with the color value stored in the render target.

The blending operation can been defined as an operation between the fragment and buffer colors and each of them may be scaled by a factor. The operation and their respective factors are configurable.

blendedColor = equation(fragmentColor, sourceFactor, bufferColor, destinationFactor)

The first render target (index 0) is aliased to the default color buffer (back buffer).

GPU_BLEND_EQUATION

Name
GPU_BLEND_EQUATION
Type
BlendEquation[MAX_RENDER_TARGETS]
Description
Defines the blend equation used to combine the fragment color with the render target color if color blending is enabled for the render target.
Stages Affected
Color Write
Notes
The supported blending equations are:
Blending Equation Modes
Mode Equation Description
BLEND_FUNC_ADD
blendedColor = fragmentColor * fragmentFactor + bufferColor * bufferFactor
Add fragment and buffer colors scaled by their respective factors
BLEND_FUNC_SUBTRACT
blendedColor = fragmentColor * fragmentFactor - bufferColor * bufferFactor
Substract buffer from fragment color scaled by their respective factors
BLEND_FUNC_REVERSE_SUBTRACT
blendedColor = bufferColor * bufferFactor - fragmentColor * fragmentFactor
Substract fragment from buffer color sclaed by their respective factors
BLEND_MIN
blendedColor = min(fragmentColor, bufferColor)
Select the minimum between the fragment and buffer colors
BLEND_MAX
blendedColor = max(fragmentColor, bufferColor)
Select the maximum between the fragment and buffer colors

The first render target (index 0) is aliased to the default color buffer (back buffer).

GPU_BLEND_SRC_RGB

Name
GPU_BLEND_SRC_RGB
Type
BlendFactor[MAX_RENDER_TARGETS]
Description
Defines the blend factor used for the fragment color red, green and blue color channels for the render target
Stages Affected
Color Write
Notes
The supported blend factor modes for the red, green and blue color channels are:
Blend Factor Modes (RGB)
Mode Description Equation
BLEND_ZERO Set factor to zero
(0, 0, 0)
BLEND_ONE Set factor to one
(1, 1, 1)
BLEND_SRC_COLOR Set factor to the fragment color value
(fragmentColor[R], fragmentColor[G], fragmentColor[B])
BLEND_ONE_MINUS_SRC_COLOR Set factor to the result of substracting the fragment color value from one
(1, 1, 1) - (fragmentColor[R], fragmentColor[G], fragmentColor[B])
BLEND_DST_COLOR Set factor to the buffer color value
(bufferColor[R], bufferColor[G], buffer[B])
BLEND_ONE_MINUS_DST_COLOR Set factor to the result of substracting the buffer color value from one
(1, 1, 1) -  (bufferColor[R], bufferColor[G], bufferColor[B])
BLEND_SRC_ALPHA Set factor the fragment color alpha channel value
(fragmentColor[A], fragmentColor[A], fragmentColor[A])
BLEND_ONE_MINUS_SRC_ALPHA Set factor to the result of substracting the fragment color alpha channel value from one
(1, 1, 1) - (fragmentColor[A], fragmentColor[A], fragmentColor[A])
BLEND_DST_ALPHA Set factor to the buffer color alpha channel value
(bufferColor[A], bufferColor[A], bufferColor[A])
BLEND_ONE_MINUS_DST_ALPHA Set factor to the result of substracting the buffer color alpha channel value from one
(1, 1, 1) - (bufferColor[A], bufferColor[A], bufferColor[A])
BLEND_CONSTANT_COLOR Set factor to the constant color value
(constantColor[R], constantColor[G], constantColor[B])
BLEND_ONE_MINUS_CONSTANT_COLOR Set factor to the result of substracting the constant color value from one
(1, 1, 1) - (constantColor[R], constantColor[G], constantColor[B])
BLEND_CONSTANT_ALPHA Set factor to the constant color alpha channel value
(constantColor[A], constantColor[A], constantColor[A])
BLEND_ONE_MINUS_CONSTANT_ALPHA Set factor to the result of substracting the constant color alpha channel value from one
(1, 1, 1) - (constantColor[A], constantColor[A], constantColor[A])
BLEND_SRC_ALPHA_SATURATE Set factor to the square of the minimum of the fragment color alpha channel and the result of substracting the buffer color alpha from one.
(f, f, f)^2  where f = min(fragmentColor[A], 1 - bufferColor[A])

The first render target (index 0) is aliased to the default color buffer (back buffer).

GPU_BLEND_DST_RGB

Name
GPU_BLEND_DST_RGB
Type
BlendFactor[MAX_RENDER_TARGETS]
Description
Defines the blend factor used for the buffer color red, green and blue color channels for the render target
Stages Affected
Color Write
Notes
See GPU_BLEND_SRC_RGB for the supported blend factor modes for the red, green and blue color channels.

The first render target (index 0) is aliased to the default color buffer (back buffer).

GPU_BLEND_SRC_ALPHA

Name
GPU_BLEND_SRC_ALPHA
Type
BlendFactor[MAX_RENDER_TARGETS]
Description
Defines the blend factor used for the fragment color alpha channel for the render target
Stages Affected
Color Write
Notes
The supported blend factor modes for the alpha color channel are:
Blend Factor Modes (Alpha)
Mode Description Equation
BLEND_ZERO Set factor to zero
 0
BLEND_ONE Set factor to one
1
BLEND_SRC_COLOR Set factor to the fragment color alpha channel value
fragmentColor[A]
BLEND_ONE_MINUS_SRC_COLOR Set factor to the result of substracting the fragment color alpha channel value from one
1 - fragmentColor[A]
BLEND_DST_COLOR Set factor to the buffer color alpha channel value
bufferColor[A]
BLEND_ONE_MINUS_DST_COLOR Set factor to the result of substracting the buffer color alpha channel value from one
1 - bufferColor[A]
BLEND_SRC_ALPHA Set factor the fragment color alpha channel value
fragmentColor[A]
BLEND_ONE_MINUS_SRC_ALPHA Set factor to the result of substracting the fragment color alpha channel value from one
1 - fragmentColor[A]
BLEND_DST_ALPHA Set factor to the buffer color alpha channel value
bufferColor[A]
BLEND_ONE_MINUS_DST_ALPHA Set factor to the result of substracting the buffer color alpha channel value from one
1 - bufferColor[A]
BLEND_CONSTANT_COLOR Set factor to the constant color alpha channel value
constantColor[A]
BLEND_ONE_MINUS_CONSTANT_COLOR Set factor to the result of substracting the constant color alpha channel value from one
1 - constantColor[A]
BLEND_CONSTANT_ALPHA Set factor to the constant color alpha channel value
constantColor[A]
BLEND_ONE_MINUS_CONSTANT_ALPHA Set factor to the result of substracting the constant color alpha channel value from one
1 - constantColor[A]
BLEND_SRC_ALPHA_SATURATE Force to one
1

The first render target (index 0) is aliased to the default color buffer (back buffer).

GPU_BLEND_DST_ALPHA

Name
GPU_BLEND_DST_ALPHA
Type
BlendFactor[MAX_RENDER_TARGETS]
Description
Defines the blend factor used for the buffer color alpha channel for the render target
Stages Affected
Color Write
Notes
See GPU_BLEND_SRC_ALPHA for the supported blend factor modes for the alpha color channel.

The first render target (index 0) is aliased to the default color buffer (back buffer).

GPU_BLEND_COLOR

Name
GPU_BLEND_COLOR
Type
QuadFloat[MAX_RENDER_TARGETS]
Description
Defines a constant color used for the blending operation for the render target
Stages Affected
Color Write
Notes
The first render target (index 0) is aliased to the default color buffer (back buffer).

GPU_COLOR_MASK_R

Name
GPU_COLOR_MASK_R
Type
bool[MAX_RENDER_TARGETS]
Description
Enabled writing the red color channel for the render target
Stages Affected
Color Write
Notes
When this register is set to TRUE the red channel from the fragment color or the result of the blending operation is written into the red channel of the corresponding render target pixel.

The first render target (index 0) is aliased to the default color buffer (back buffer).

GPU_COLOR_MASK_G

Name
GPU_COLOR_MASK_G
Type
bool[MAX_RENDER_TARGETS]
Description
Enabled writing the green color channel for the render target
Stages Affected
Color Write
Notes
When this register is set to TRUE the green channel from the fragment color or the result of the blending operation is written into the green channel of the corresponding render target pixel.

The first render target (index 0) is aliased to the default color buffer (back buffer).

GPU_COLOR_MASK_B

Name
GPU_COLOR_MASK_B
Type
bool[MAX_RENDER_TARGETS]
Description
Enabled writing the blue color channel for the render target
Stages Affected
Color Write
Notes
When this register is set to TRUE the blue channel from the fragment color or the result of the blending operation is written into the blue channel of the corresponding render target pixel.

The first render target (index 0) is aliased to the default color buffer (back buffer).

GPU_COLOR_MASK_A

Name
GPU_COLOR_MASK_A
Type
bool[MAX_RENDER_TARGETS]
Description
Enabled writing the alpha color channel for the render target
Stages Affected
Color Write
Notes
When this register is set to TRUE the alpha channel from the fragment color or the result of the blending operation is written into the alpha channel of the corresponding render target pixel.

The first render target (index 0) is aliased to the default color buffer (back buffer).

GPU_LOGICAL_OPERATION

Name
GPU_LOGICAL_OPERATION
Type
bool
Description
Enables a bit-wise logical operation between the fragment color value, or the result of the blending operation, and the buffer color value
Stages Affected
Color Write
Notes
When this register is set to true a bit-wise logical operation is performed between the fragment color value, or the result of the blending operation, and the buffer color value.

Logic operation only applies to the default color buffer (back buffer) which is aliased to the first render target (index 0). Logic operation is only supported for the GPU_RGBA888 format. The color is treated as a bit array of 32 elements. THE CURRENT IMPLEMENATION HAS NOT BEEN TESTED AND THERE IS A BUG.

GPU_LOGICOP_FUNCTION

Name
GPU_LOGICOP_FUNCTION
Type
LogicOperation
Description
Defines the bit-wise logical operation to perform between the fragment color value, or the result of the blending operation, and the buffer color value
Stages Affected
Color Write
Notes
The supported bit-wise logic operations are:
Logic Operations
Operation Description Equation
LOGICOP_CLEAR Set to zero
 logicOpColor = 0
LOGICOP_AND Bit-wise logical and between the fragment or blended color value and the buffer color value
logicOpColor = blendedColor & bufferColor
LOGICOP_AND_REVERSE Bit-wise logical and between the fragment or blended color value and the negated buffer color value
logicOpColor = blendedColor & ~bufferColor
LOGICOP_COPY Set to the fragment or blended color value
logicOpColor = blendedColor
LOGICOP_AND_INVERTED Bit-wise logical and between the negated fragment or blended color value and the buffer color value
logicOpColor = ~blendedColor & bufferColor
LOGICOP_NOOP Set to the buffer color value
logicOpColor = bufferColor
LOGICOP_XOR Bit-wise logical xor between the fragment or blended color value and the buffer color value
logicOpColor = blendedColor XOR bufferColor
LOGICOP_OR Bit-wise logical or between the fragment or blended color value and the buffer color value
logicOpColor = blendedColor | bufferColor
LOGICOP_NOR Bit-wise negated logical or between the fragment or blended color value and the buffer color value
logicOpColor = ~(blendedColor | bufferColor)
LOGICOP_EQUIV Bit-wise negated logical xor between the fragment or blended color value and the buffer color value
logicOpColor = ~(blendedColor XOR bufferColor)
LOGICOP_INVERT Set to the negated buffer color value
logicOpColor = ~bufferColor
LOGICOP_OR_REVERSE Bit-wise logical or between the fragment or blended color value and the negated buffer color value
logicOpColor = blendedColor | ~bufferColor
LOGICOP_COPY_INVERTED Set to the negated fragment or blended color value
logicOpColor = ~blendedColor
LOGICOP_OR_INVERTED Bit-wise logical or between the negated fragment or blended color value and the buffer color value
logicOpColor = ~blendedColor | bufferColor
LOGICOP_NAND Bit-wise negated logical and between the fragment or blended color value and the buffer color value
logicOpColor = ~(blendedColor & bufferColor)
LOGICOP_SET Set to one
logicOpColor = 1

Blitter

GPU_BLIT_INI_X

Name
GPU_BLIT_INI_X
Type
uint32
Description
Defines the start horizontal position of the backbuffer rectangle to be blitted into a texture
Stages Affected
DAC (Blitter)
Notes
The value defined in this register is used as a parameter for the GPU_BLIT command. The valid range for this register is [0, MAX_DISPLAY_RES_X-1].

GPU_BLIT_INI_Y

Name
GPU_BLIT_INI_Y
Type
uint32
Description
Defines the start vertical position of the backbuffer rectangle to be blitted into a texture
Stages Affected
DAC (Blitter)
Notes
The value defined in this register is used as a parameter for the GPU_BLIT command. The valid range for this register is [0, MAX_DISPLAY_RES_Y-1]

GPU_BLIT_X_OFFSET

Name
GPU_BLIT_X_OFFSET
Type
uint32
Description
Defines the start horizontal position into the texture where to blit the defined backbuffer rectangle
Stages Affected
DAC (Blitter)
Notes
The value defined in this register is used as a parameter for the GPU_BLIT command. The valid range for this register is [0, (2^MAX_TEXTURE_SIZE)-1].

GPU_BLIT_Y_OFFSET

Name
GPU_BLIT_Y_OFFSET
Type
uint32
Description
Defines the start vertical position into the texture where to blit the defined backbuffer rectangle
Stages Affected
DAC (Blitter)
Notes
The value defined in this register is used as a parameter for the GPU_BLIT command. The valid range for this register is [0, (2^MAX_TEXTURE_SIZE)-1].

GPU_BLIT_WIDTH

Name
GPU_BLIT_WIDTH
Type
uint32
Description
Defines the width of the backbuffer rectangle to be blitted into a texture
Stages Affected
DAC (Blitter)
Notes
The value defined in this register is used as a parameter for the GPU_BLIT command. The valid range for this register is [0, MAX_DISPLAY_RES_X-1]


GPU_BLIT_HEIGHT

Name
GPU_BLIT_HEIGHT
Type
uint32
Description
Defines the height of the backbuffer rectangle to be blitted into a texture
Stages Affected
DAC (Blitter)
Notes
The value defined in this register is used as a parameter for the GPU_BLIT command. The valid range for this register is [0, MAX_DISPLAY_RES_Y-1]


GPU_BLIT_DST_ADDRESS

Name
GPU_BLIT_DST_ADDRESS
Type
uint32
Description
Defines the base address in GPU memory space of the texture where to blit the defined backbuffer rectangle
Stages Affected
DAC (Blitter)
Notes
The value defined in this register is used as a parameter for the GPU_BLIT command.


GPU_BLIT_DST_TX_WIDTH2

Name
GPU_BLIT_DST_TX_WIDTH2
Type
uint32
Description
Defines the two's logarithm of the width of the texture where to blit the backbuffer rectangle
Stages Affected
DAC (Blitter)
Notes
The value defined in this register is used as a parameter for the GPU_BLIT command.

GPU_BLIT_DST_TX_FORMAT

Name
GPU_BLIT_DST_TX_FORMAT
Type
TextureFormat
Description
Defines the format of the texture where to blit the backbuffer rectangle
Stages Affected
DAC (Blitter)
Notes
The value defined in this register is used as a parameter for the GPU_BLIT command.

CHECK SUPPORTED FORMATS.

GPU_BLIT_DST_TX_BLOCK

Name
GPU_BLIT_DST_TX_BLOCK
Type
TextureBlocking
Description
Defines how the texture where to blit the backbuffer rectangle is stored
Stages Affected
DAC (Blitter)
Notes
The value defined in this register is used as a parameter for the GPU_BLIT command. See GPU_TEXTURE_BLOCKING for the supported texture blocking modes.

Events

Name Description
GPU_END_OF_FRAME_EVENT Used to signal the end of frame, the simulator uses it to report cycles spent on the frame
GPU_UNNAMED_EVENT Un-named/generic event

Texture Formats

Texture Formats
Format BPT Description Conversion
GPU_ALPHA8 8 1-channel 8-bit unsigned normalized alpha
alpha = float32(uint8(texelValue)) / 255.0f
sampleValue = (0.0f, 0.0f, 0.0f, alpha)
GPU_ALPHA12 16 1-channel 12-bit unsigned normalized alpha, stored as a 16-bit value, higher 4 bits are ignored
alpha = float32(uint16(texelValue) & 0x0FFF) / 4095.0f
sampleValue = (0.0f, 0.0f, 0.0f, alpha)
GPU_ALPHA16 16 1-channel 16-bit unsigned normalized alpha
alpha = float32(uint16(texelValue)) / 65535.0f
sampleValue = (0.0f, 0.0f, 0.0f, alpha)
GPU_DEPTH_COMPONENT16 16 1-channel 16-bit depth, NOT IMPLEMENTED
depth = float32(uint16(texelValue)) / 65535.0f
sampleValue = (depth, depth, depth, 1.0f)
GPU_DEPTH_COMPONENT24 32 1-channel 24-bit depth, stored as a 32-bit value, higher 8 bits are ignored
depth = float32(uint32(texelValue) & 0x00FFFFFF) / 16777215.0f
sampleValue = (depth, depth, depth, 1.0f)
GPU_DEPTH_COMPONENT32 32 1-channel 32-bit depth, NOT IMPLEMENTED
depth = float32(uint32(texelValue)) / 4294967295.0f
sampleValue = (depth, depth, depth, 1.0f)
GPU_LUMINANCE8 8 1-channel 8-bit unsigned normalized luminance
luminance = float32(uint8(texelValue)) / 255.0f
sampleValue = (luminance, luminance, luminance, 1.0f)
GPU_LUMINANCE8_SIGNED 8 1-channel 8-bit signed normalized luminance
luminance = float32(sint8(texelValue)) / 127.0f
sampleValue = (luminance, luminance, luminance, 1.0f)
GPU_LUMINANCE12 16 1-channel 12-bit unsigned normalized luminance, stored as a 16-bit value, higher 4 bits are ignored
luminance = float32(uint16(texelValue) & 0x0FFF) / 4095.0f
sampleValue = (luminance, luminance, luminance, 1.0f)
GPU_LUMINANCE16 16 1-channel 16-bit unsigned normalized luminance
luminance = float32(uint16(texelValue)) / 65535.0f
sampleValue = (luminance, luminance, luminance, 1.0f)
GPU_LUMINANCE4_ALPHA4 8 2-channel 8-bit unsigned normalized luminance and alpha
luminance = float32(uint8(texelValue >> 4        ) / 15.0f
alpha     = float32(uint8(texelValue       & 0x0F) / 15.0f
sampleValue = (luminance, luminance, luminance, alpha)
GPU_LUMINANCE6_ALPHA2 8 2-channel 8-bit unsigned normalized luminance and 2-bit unsigned normalize alpha
luminance = float32(uint8(texelValue >> 2        ) / 63.0f
alpha     = float32(uint8(texelValue       & 0x03) /  3.0f
sampleValue = (luminance, luminance, luminance, alpha)
GPU_LUMINANCE8_ALPHA8 16 2-channel 16-bit unsigned normalized luminance and alpha
luminance = float32(uint8(texelValue[0])) / 255.0f
alpha     = float32(uint8(texelValue[1])) / 255.0f
sampleValue = (luminance, luminance, luminance, alpha)
GPU_LUMINANCE8_ALPHA8_SIGNED 16 2-channel 16-bit signed normalized luminance and alpha
luminance = float32(sint8(texelValue[0])) / 255.0f
alpha     = float32(sint8(texelValue[1])) / 255.0f
sampleValue = (luminance, luminance, luminance, alpha)
GPU_LUMINANCE12_ALPHA4 16 2-channel 12-bit unsigned normalized luminance and 4-bit unsigned normalized alpha
luminance = float32(uint16(texelValue >> 4        ) / 4095.0f
alpha     = float32(uint16(texelValue       & 0x0F) /   15.0f
sampleValue = (luminance, luminance, luminance, alpha)
GPU_LUMINANCE12_ALPHA12 32 2-channel 12-bit unsigned normalized luminance and alpha, stored as a 32-bit value, higher 8 bits are ignored
luminance = float32(uint32(texelValue >> 12) & 0x0FFF) / 4095.0f
alpha     = float32(uint32(texelValue        & 0x0FFF)) / 4095.0f
sampleValue = (luminance, luminance, luminance, alpha)
GPU_LUMINANCE16_ALPHA16 32 2-channel 16-bit unsigned normalized luminance and alpha
luminance = float32(uint16(texelValue[0])) / 65535.0f
alpha     = float32(uint16(texelValue[1])) / 65535.0f
sampleValue = (luminance, luminance, luminance, alpha)
GPU_INTENSITY8 8 1-channel 8-bit unsigned normalized intensity
intensity = float32(uint8(texelValue)) / 255.0f
sampleValue = (intensity, intensity, intensity, intensity)
GPU_INTENSITY12 16 1-channel 12-bit unsigned normalized intensity, stored as a 16-bit value, higher 4 bits are ignored
intensity = float32(uint16(texelValue & 0x0FFF)) / 4095.0f
sampleValue = (intensity, intensity, intensity, intensity)
GPU_INTENSITY16 16 1-channel 16-bit unsigned normalized intensity
intensity = float32(uint16(texelValue)) / 65535.0f
sampleValue = (intensity, intensity, intensity, intensity)
GPU_RGB332 8 3-channel 3-bit unsigned normalized red and green, 2-bit unsigned normalized alpha
red   = float32(uint16(texelValue >> 5        ) / 7.0f
green = float32(uint16(texelValue >> 2) & 0x07) / 7.0f
blue  = float32(uint16(texelValue       & 0x03) / 3.0f
sampleValued = (red, green, blue, 1.0f)
GPU_RGB444 16 3-channel 4-bit unsigned normalized, stored as a 16-bit value, higher 4 bits are ignored
red   = float32(uint16(texelValue) >> 8) & 0x0F) / 15.0f
green = float32(uint16(texelValue) >> 4) & 0x0F) / 15.0f
blue  = float32(uint16(texelValue)       & 0x0F) / 15.0f
sampleValued = (red, green, blue, 1.0f)
GPU_RGB555 16 3-channel 5-bit unsigned normalized, stored as a 16-bit value, higher bit is ignored
red   = float32(uint16(texelValue) >> 10) & 0x1F) / 31.0f
green = float32(uint16(texelValue) >>  5) & 0x1F) / 31.0f
blue  = float32(uint16(texelValue)        & 0x1F) / 31.0f
sampleValued = (red, green, blue, 1.0f)
GPU_RGB565 16 3-channel 5-bit unsigned normalized red and blue, 6-bit unsigned normalized green
red   = float32(uint16(texelValue) >> 11       ) / 31.0f
green = float32(uint16(texelValue) >> 5) & 0x3F) / 63.0f
blue  = float32(uint16(texelValue)       & 0x1F) / 31.0f
sampleValued = (red, green, blue, 1.0f)
GPU_RGB888 32 3-channel 8-bit unsigned normalized, stored as a 32-bit value, higher 8 bits are ignored
red   = float32(uint8(texelValue[0])) / 255.0f
green = float32(uint8(texelValue[1])) / 255.0f
blue  = float32(uint8(texelValue[2])) / 255.0f
sampleValued = (red, green, blue, 1.0f)
GPU_RGB101010 32 3-channel 10-bit unsigned normalized, stored as a 32-bit value, higher 2 bits are ignored
red   = float32(uint32(texelValue) >> 20) & 0x03FF) / 1023.0f
green = float32(uint32(texelValue) >> 10) & 0x03FF) / 1023.0f
blue  = float32(uint32(texelValue)        & 0x03FF) / 1023.0f
sampleValued = (red, green, blue, 1.0f)
GPU_RGB121212 64 3-channel 12-bit unsigned normalized, stored as a 32-bit value, higher 28 bits are ignored, ERROR IN IMPLEMENTATION
red   = float32(uint64(texelValue) >> 24) & 0x0FFF) / 4095.0f
green = float32(uint64(texelValue) >> 12) & 0x0FFF) / 4095.0f
blue  = float32(uint64(texelValue)        & 0x0FFF)) / 4095.0f
sampleValued = (red, green, blue, 1.0f)
GPU_RGBA2222 8 4-channel 2-bit unsigned normalized
red   = float32(uint8(texelValue) >> 6        ) / 3.0f
green = float32(uint8(texelValue) >> 4) & 0x03) / 3.0f
blue  = float32(uint8(texelValue) >> 2) & 0x03) / 3.0f
alpha = float32(uint8(texelValue)       & 0x03)) / 3.0f
sampleValued = (red, green, blue, alpha)
GPU_RGBA4444 16 4-channel 4-bit unsigned normalized
red   = float32(uint16(texelValue) >> 12        ) / 15.0f
green = float32(uint16(texelValue) >>  8) & 0x0F) / 15.0f
blue  = float32(uint16(texelValue) >>  4) & 0x0F) / 15.0f
alpha = float32(uint16(texelValue)        & 0x0F) / 15.0f
sampleValued = (red, green, blue, alpha)
GPU_RGBA5551 16 4-channel 5-bit unsigned normalized red, green and blue, 1-bit alpha
red   = float32(uint16(texelValue) >> 11        ) / 31.0f
green = float32(uint16(texelValue) >>  6) & 0x1F) / 31.0f
blue  = float32(uint16(texelValue) >>  1) & 0x1F) / 31.0f
alpha = float32(uint16(texelValue)        & 0x01)
sampleValued = (red, green, blue, alpha)
GPU_RGBA88888 32 4-channel 8-bit unsigned normalized
red   = float32(uint8(texelValue[0])) / 255.0f
green = float32(uint8(texelValue[1])) / 255.0f
blue  = float32(uint8(texelValue[2])) / 255.0f
alpha = float32(uint8(texelValue[3])) / 255.0f
sampleValued = (red, green, blue, alpha)
GPU_RGBA1010102 32 4-channel 10-bit unsigned normalized red, green and blue, 2-bit unsigned normalized alpha
red   = float32(uint32(texelValue) >> 22          ) / 1023.0f
green = float32(uint32(texelValue) >> 12) & 0x03FF) / 1023.0f
blue  = float32(uint32(texelValue) >>  2) & 0x03FF) / 1023.0f
alpha = float32(uint32(texelValue)        & 0x0003) /    3.0f
sampleValued = (red, green, blue, alpha)
GPU_R16 16 1-channel 16-bit unsigned normalized
red   = float32(uint16(texelValue)) / 65535.0f
sampleValued = (red, 0.0f, 0.0f, 1.0f)
GPU_RG16 32 2-channel 16-bit unsigned normalized
red   = float32(uint16(texelValue[0])) / 65535.0f
green = float32(uint16(texelValue[1])) / 65535.0f
sampleValued = (red, green, 0.0f, 1.0f)
GPU_RGBA16 64 4-channel 16-bit unsigned normalized
red   = float32(uint16(texelValue[0])) / 65535.0f
green = float32(uint16(texelValue[1])) / 65535.0f
blue  = float32(uint16(texelValue[2])) / 65535.0f
alpha = float32(uint16(texelValue[3])) / 65535.0f
sampleValued = (red, green, blue, alpha)
GPU_R16F 16 1-channel 16-bit float point
red   = float32(float16(texelValue))
sampleValued = (red, 0.0f, 0.0f, 1.0f)
GPU_RG16F 32 2-channel 16-bit float point
red   = float32(float16(texelValue[0]))
green = float32(float16(texelValue[1]))
sampleValued = (red, green, 0.0f, 1.0f)
GPU_RGBA16F 64 4-channel 16-bit float point
red   = float32(float16(texelValue[0]))
green = float32(float16(texelValue[1]))
blue  = float32(float16(texelValue[2]))
alpha = float32(float16(texelValue[3]))
sampleValued = (red, green, blue, alpha)
GPU_R32F 32 1-channel 32-bit float point
red   = float32(texelValue)
sampleValued = (red, 0.0f, 0.0f, 1.0f)
GPU_RG32F 64 2-channel 32-bit float point
red   = float32(texelValue[0])
green = float32(texelValue[1])
sampleValued = (red, green, 0.0f, 1.0f)
GPU_RGBA32F 128 4-channel 32-bit float point
red   = float32(texelValue[0])
green = float32(texelValue[1])
blue  = float32(texelValue[2])
alpha = float32(texelValue[3])
sampleValued = (red, green, blue, alpha)

Render Buffer Formats

Render Target Formats
Format BPT Description Color Depth Compression
GPU_RGBA8888 32 4-channel 8-bit unsigned normalized X X
GPU_RG16F 32 2-channel 16-bit float point X
GPU_R32F 32 1-channel 32-bit float point X
GPU_RGBA16 64 4-channel 16-bit unsigned normalized X
GPU_RGBA16F 64 4-channel 16-bit float point X
GPU_DEPTH_COMPONENT24 32 1-channel 24-bit depth (8-bit stencil implied) X X

Texture Tiling

Texture tiling uses two levels of blocking and morton order inside both levels. The first blocking level is defined as a square of 2n x 2n texels. The second blocking level (superblock) is defined as a square of 2m x 2m first level blocks. Second level blocks (superblocks) are stored in row order.

The block and superblock sizes are defined in the ATTILA architecture configuration file (see TextureBlockDimension and TextureSuperBlockDimension in the ATTILA architecture configuration file specification).

Example for a 256x128 texture, block size of 22 x 22 and a superblock size of 24 x 24:

Txx : A texel
Bxx : A first level block (22 x 22 texels)
Sxx : A second level block (24 x 24 first level blocks)

Texels in the texture image for a first level block

T00  T01  T04  T06
T02  T03  T05  T07
T08  T09  T0C  T0D
T0A  T0B  T0E  T0F

Texels as stored in memory for a first level block from lower to higher memory addresses

T00 T01 T02 T03 T04 T05 T06 T07 T08 T09 T0A T0B T0C T0D T0E T0F

Blocks in the texture image for a second level block

B00  B01  B04  B05  B10  B11  B14  B15
B02  B03  B06  B07  B12  B13  B16  B17
B08  B09  B0C  B0D  B18  B19  B1C  B1D
B0A  B0B  B0E  B0F  B1A  B1B  B1E  B1F
B20  B21  B24  B25  B30  B31  B34  B35
B22  B23  B26  B27  B32  B33  B36  B37
B28  B29  B2C  B2D  B38  B39  B3C  B3D
B2A  B2B  B2E  B2F  B3A  B3B  B3E  B3F

Blocks as stored in memory for a second level block from lower to higher memory addresses

B00 B01 B02 B03 B04 B05 B06 B07 B08 B09 B0A B0B B0C B0D B0E B0F  
B10 B11 B12 B13 B14 B15 B16 B17 B18 B19 B1A B1B B1C B1D B1E B1F  
B20 B21 B22 B23 B24 B25 B26 B27 B28 B29 B2A B2B B2C B2D B2E B2F  
B30 B31 B32 B33 B34 B35 B36 B37 B38 B39 B3A B3B B3C B3D B3E B3F

Superblock in the texture image

S0 S1 S2 S3
S4 S5 S6 S7

Super blocks as stored in memory for the whole texture from lower to higher memory addresses

S0 S1 S2 S3 S4 S5 S7

Render Buffer Tiling

Render buffer tiling defines four levels of blocking:

Memory layout:

Over scan tiles:

    device

    <--- xRes ---->
    O0  O1  O2  O3  ^
    O4  O5  O6  O7  | yRes
    O8  O9  O10 O11 |
    O12 O13 O14 O15 v

    memory (rows)

    O0 O1 O2 O3 O4 O5 O6 O7 O8 O9 O10 O11 O12 O13 O14 O15

Scan tiles (inside over scan tile):

For 1 sample:
      
    device

    <--- overW ---->
    T0  T1  T2  T3  ^
    T4  T5  T6  T7  | overH
    T8  T9  T10 T11 |
    T12 T13 T14 T15 v

    memory (morton)

    T0 T1 T4 T5  T2 T3 T6 T7  T8 T9 T12 T13  T10 T11 T14 T15

When the number of samples is > 1 the scan tile is subdivided in as many subtiles as
the number of samples per pixel.  Subtiles from the different scan tiles are then
interleaved.

For 2 samples:

    <--------------------- overW * 2 -------------------->
    T0S0   T0S1  T1S0   T1S1   T2S0   T2S1   T3S0   T3S1  ^
    T4S0   T4S1  T5S0   T5S1   T6S0   T6S0   T7S0   T7S1  | overH
    T8S0   T8S1  T9S0   T9S1   T10S0  T10S1  T11S0  T11S1 |
    T12S0  T12S0 T13S0  T13S1  T14S0  T14S1  T15S0  T15S1 v

    memory (morton)
               
    T0S0 T1S0 T4S0 T5S0  T2S0 T3S0 T6S0 T7S0  T8S0 T9S0 T12S0 T13S0  T10S0 T11S0 T14S0 T15S0
    T0S1 T1S1 T4S1 T5S1  T2S1 T3S1 T6S1 T7S1  T8S1 T9S1 T12S1 T13S1  T10S1 T11S1 T14S1 T15S1

For 4 samples:

    <--------------------- overW * 2 -------------------->
    T0S0   T0S1  T1S0   T1S1   T2S0   T2S1   T3S0   T3S1  ^
    T0S2   T0S3  T1S2   T1S3   T2S2   T2S3   T3S2   T3S3  |
    T4S0   T4S1  T5S0   T5S1   T6S0   T6S0   T7S0   T7S1  |
    T4S2   T4S3  T5S2   T5S3   T6S2   T6S3   T7S2   T7S3  | overH * 2
    T8S0   T8S1  T9S0   T9S1   T10S0  T10S1  T11S0  T11S1 |
    T8S2   T8S3  T9S2   T9S3   T10S2  T10S3  T11S2  T11S3 |
    T12S0  T12S0 T13S0  T13S1  T14S0  T14S1  T15S0  T15S1 |
    T12S2  T12S3 T13S2  T13S3  T14S2  T14S3  T15S2  T15S3 v

    memory (morton)
               
    T0S0 T1S0 T4S0 T5S0  T2S0 T3S0 T6S0 T7S0  T8S0 T9S0 T12S0 T13S0  T10S0 T11S0 T14S0 T15S0
    T0S1 T1S1 T4S1 T5S1  T2S1 T3S1 T6S1 T7S1  T8S1 T9S1 T12S1 T13S1  T10S1 T11S1 T14S1 T15S1
    T0S2 T1S2 T4S2 T5S2  T2S2 T3S2 T6S2 T7S2  T8S2 T9S2 T12S2 T13S2  T10S2 T11S2 T14S2 T15S2
    T0S3 T1S3 T4S3 T5S3  T2S3 T3S3 T6S3 T7S3  T8S3 T9S3 T12S3 T13S3  T10S3 T11S3 T14S3 T15S3

For 8 samples:

    <------------------------------------------------- overW * 4 ----------------------------------------------->
    T0S0   T0S1  T0S2   T0S3   T1S0   T1S1   T1S2   T1S3   T2S0   T2S1   T2S2   T2S3   T3S0   T3S1   T3S2   T3S3  ^
    T0S4   T0S5  T0S6   T0S7   T1S4   T1S5   T1S6   T1S7   T2S4   T2S5   T2S6   T2S7   T3S4   T3S5   T3S6   T3S7  |
    T4S0   T4S1  T0S2   T4S3   T5S0   T5S1   T5S2   T5S3   T6S0   T6S1   T6S2   T6S3   T7S0   T7S1   T7S2   T7S3  |
    T4S4   T4S5  T0S6   T4S7   T5S4   T5S5   T5S6   T5S7   T6S4   T6S5   T6S6   T6S7   T7S4   T7S5   T7S6   T7S7  |
    T8S0   T8S1  T8S2   T8S3   T9S0   T9S1   T9S2   T9S3   T10S0  T10S1  T10S2  T10S3  T11S0  T11S1  T11S2  T11S3 |  overH * 2
    T8S4   T8S5  T8S6   T8S7   T9S4   T9S5   T9S6   T9S7   T10S4  T10S5  T10S6  T10S7  T11S4  T11S5  T11S6  T11S7 |
    T12S0  T12S1 T12S2  T12S3  T13S0  T13S1  T13S2  T13S3  T14S0  T14S1  T14S2  T14S3  T15S0  T51S1  T51S2  T51S3 |
    T12S4  T12S5 T12S6  T12S7  T13S4  T13S5  T13S6  T13S7  T14S4  T14S5  T14S6  T14S7  T15S4  T51S5  T51S6  T51S7 v

    memory (morton)
               
    T0S0 T1S0 T4S0 T5S0  T2S0 T3S0 T6S0 T7S0  T8S0 T9S0 T12S0 T13S0  T10S0 T11S0 T14S0 T15S0
    T0S1 T1S1 T4S1 T5S1  T2S1 T3S1 T6S1 T7S1  T8S1 T9S1 T12S1 T13S1  T10S1 T11S1 T14S1 T15S1
    T0S2 T1S2 T4S2 T5S2  T2S2 T3S2 T6S2 T7S2  T8S2 T9S2 T12S2 T13S2  T10S2 T11S2 T14S2 T15S2
    T0S3 T1S3 T4S3 T5S3  T2S3 T3S3 T6S3 T7S3  T8S3 T9S3 T12S3 T13S3  T10S3 T11S3 T14S3 T15S3
    T0S4 T1S4 T4S4 T5S4  T2S4 T3S4 T6S4 T7S4  T8S4 T9S4 T12S4 T13S4  T10S4 T11S4 T14S4 T15S4
    T0S5 T1S5 T4S5 T5S5  T2S5 T3S5 T6S5 T7S5  T8S5 T9S5 T12S5 T13S5  T10S5 T11S5 T14S5 T15S5
    T0S6 T1S6 T4S6 T5S6  T2S6 T3S6 T6S6 T7S6  T8S6 T9S6 T12S6 T13S6  T10S6 T11S6 T14S6 T15S6
    T0S7 T1S7 T4S7 T5S7  T2S7 T3S7 T6S7 T7S7  T8S7 T9S7 T12S7 T13S7  T10S7 T11S7 T14S7 T15S7

Generation tiles (inside scan tile):

    device

    <--- scanW ---->
    G0  G1  G2  G3  ^
    G4  G5  G6  G7  | scanH
    G8  G9  G10 G11 |
    G12 G13 G14 G15 v

    memory (rows)

    G0 G1 G2 G3 G4 G5 G6 G7 G8 G9 G10 G11 G12 G13 G14 G15             

Stamps (inside generation tile):       

    <---- genW ---->
    S0  S1  S2  S3  ^
    S4  S5  S6  S7  | genH
    S8  S9  S10 S11 |
    S12 S13 S14 S15 v

    memory

    S0 S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 S13 S14 S15

Pixels (inside a stamp):

    <--- stampW ---->
    P0  P1  P2  P3  ^
    P4  P5  P6  P7  | stampH
    P8  P9  P10 P11 |
    P12 P13 P14 P15 v

    memory

    P0 P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13 P14 P15

Samples (inside a pixel)

    memory
   
    S0 S1 S2 S3             

Bytes (inside a pixel):

    memory

    R G B A
ATTILA
Toolbox