ATTILA configuration parameters

From AttilaWiki

Jump to:navigation, search

The ATTILA configuration file is named bGPU.ini for both versions of the simulator binaries and includes configurations parameters to control the simulation process, the gathering of statistics, the generation of images or other outputs and the configuration of the simulated GPU architecture. The same configuration file is used for both the non-unified and unified simulator binaries. The configuration file must be present in the working directory were the simulator binary is started.

The ATTILA configuration file is divided into sections. Each section starts with the section name under brackets '['/']' and is followed by a list of parameters names and their associated values. The parameters can be of one of three types: natural numbers (0 to N), boolean values (using the TRUE and FALSE keywords) and string (between quotes '"'). The character '#' can be used to include comments in the configuration file. All the characters after a '#' character are ignored by the configuration parser.

Example:

[SECTION NAME]
parameter1 = 1235
parameter2 = TRUE
parameter3 = "output.txt"

Due to the primitive parameter reading capabilities of the ConfigurationLoader class sections can only appear once in the file and parameters for a section can only appear between the start of the section and the start of the next section (or the end of the configuration file).

There are no predefined values for most of the parameters in the configuration file so if they are not present they will take as a value whatever is the content of the memory associated with the first read of the parameters at start up (likely 0).

The current version (ATTILA rei) of the simulator supports the following sections in the configuration file:

Additionally, a sample of a reference baseline configuration for the ATTILA Architecture is included.

Contents

SIMULATOR Section

The SIMULATOR section is used to configurate the simulation process and the different outputs of the simulator, for example the generation of statistics or the signal traffic dump trace.

The parameters that can be used in the SIMULATOR section are:

GPU Section

The GPU section is used to configure global parameters for the simulated ATTILA GPU architecture.

The parameters that can be used in the GPU section are:

COMMANDPROCESSOR Section

The COMMANDPROCESSOR section is used to configure the Command Processor stage of the simulated ATTILA GPU architecture.

The parameter that can be used in the GPU section is:

MEMORYCONTROLLER Section

The MEMORYCONTROLLER section configures the Memory Controller unit/stage of the ATTILA GPU architecture.

The MEMORYCONTROLLER sections allows to select a new memory controller that we call Memory Controller V2 (define MemoryControllerV2 = TRUE to select this memory controller). Most of the parameters described here are ignored when using the MCv2. Memory Controller V2 uses specific parameters, its description can be found here: Memory Controller V2 parameters description. The only parameters shared by both memory controllers are the parameters that define the bus width among the memory controller and the GPU units.

The parameters that can be used in the MEMORYCONTROLLER section are:

The next parameters are only supported for the new Memory Controller model written by Carlos (not public release)

STREAMER Section

The STREAMER section configures the Streamer Fetch, Streamer Loader, Streamer Output Cache and Streamer Commit stages of the ATTILA GPU architecture.

The parameters that can be used in the STREAMER section are:

VERTEXSHADER Section

The VERTEXSHADER sections configures the Vertex Shader stage of the non unified version of the ATTILA GPU architecture. This section is ignored by the unified version of the ATTILA GPU architecture. The parameters that can be used in the VERTEXSHADER section are:

PRIMITIVEASSEMBLY Section

The PRIMITIVEASSEMBLY section configures the Primitive Assembly unit/stage of the ATTILA GPU architecture.

The parameters that can be set in the PRIMITIVEASSEMBLY section are:

CLIPPER Section

The CLIPPER section configures the Clipper unit/stage of the ATTILA GPU architecture.

The parameters that can be set in the CLIPPER section are:

RASTERIZER Section

The RASTERIZER section configures the Triangle Setup, Fragment Generation (Triangle Traversal box in the simulator), Hierarchical Z and Fragment FIFO stages of the ATTILA GPU architecture.

The parameters that can be set in the RASTERIZER section are:


FRAGMENTSHADER Section

The FRAGMENTSHADER section configures the Fragment Shader processors in the non-unified version of the ATTILA GPU architecture or the Unified Shader processors in the unified version of the ATTILA GPU architecture.

The parameters that can be used in the FRAGMENTSHADER section are:


ZSTENCILTEST Section

The ZSTENCILTEST section configures the Z and Stencil Test unit/stage of the ATTILA GPU architecture.

The parameters that can be set in the ZSTENCILTEST section are:

Queue parameter for the public release version of the simulator:

Queue parameters for the internal version of the simulator:

COLORWRITE Section

The COLORWRITE section configures the Color Write unit/stage of the ATTILA GPU architecture.

The parameters that can be set in the COLORWRITE section are:

Queue parameter for the public version of the simulator:

Queue parameters for the internal version of the simulator:

DAC Section

The DAC section configures the DAC unit/stage of the ATTILA GPU architecture.

The parameters that can be set in the DAC section are:

ATTILA baseline configuration

The sample bGPU.ini configuration file bellow corresponds to the baseline configuration of the ATTILA architecture described here:

Overall Pipeline Configuration

Parameter value
Vertex Shader Units (non-unified version only) 4
Fragment Shader Units (vertex shader units also for unified version) 2
ROP Pipelines 2
Texture Rate per Fragment Shader 4


GPU Units configuration

GPU Unit Input BW Output BW Input Queue Size Input Queue Element Width Latency
Streamer 1 index 1 vertex 48 16x4x32 bits Mem cycles
Primitive Assembly 1 vertex 1 triangle 8 3x16x4x32 bits 1 cycle
Clipping 1 triangle 1 triangle 4 3x4x32 bits 6 cycles
Triangle Setup 1 triangle 1 triangle 12 3x4x32 bits 10 cycles
Fragment Generation 1 triangle 2x64 fragments 16 3x4x32 bits 1 cycle
Hierarchical Z 2x64 fragments 2x64 fragments 64 (2x16+4x32)x4 bits 1 cycle
Z Test 4 fragments 4 fragments 64 (2x16+4x32)x4 bits 2 + Mem cycles
Interpolator 2x4 fragments 2x4 fragments - - 2 to 8 cycles
Color Write 4 fragments - 64 (2x16+4x32)x4 bits 2 + Mem cycles
Shader (vertex) 1 vertex 1 vertex 12+4 16x4x32 bits variable
Shader (fragment/unified) 4 fragments 4 fragments 112+16 10x4x32 bits variable


Memory Configuration

Parameter value
Memory Size 64 MB
Memory Bus Width (Per Channel) 64 bits
Memory Channels 4
System memory region size 16 MB
#
# bGPU Simulator
#
# Configuration File
#
# 30/11/2004
#

[SIMULATOR]

InputFile = "gltrace-sphere"
SimCycles = 10000
SimFrames = 1
SignalDumpFile = "signaltrace.txt"
StatsFile = "stats.csv"
StatsFilePerFrame = "stats.frame.csv"
StatsFilePerBatch = "stats.batch.csv"
StartFrame = 0
StartSignalDump = 0
SignalDumpCycles = 10000
StatisticsRate = 1000
DumpSignalTrace = FALSE
Statistics = FALSE
PerFrameStatistics = FALSE
PerBatchStatistics = FALSE
GenerateFragmentMap = FALSE
#
#  Latency map modes
#
#  0 : latency of the fragment since it was generated until it was written into the
#      color buffer.
#
FragmentMapMode = 0
DoubleBuffer = FALSE
ForceMSAA = FALSE
MSAASamples = 4
ForceFP16ColorBuffer = FALSE
ObjectSize0 = 512
BucketSize0 = 32768
ObjectSize1 = 4096
BucketSize1 = 2048
ObjectSize2 = 64
BucketSize2 = 32768

[GPU]

NumVertexShaders = 4
NumFragmentShaders = 2
NumStampPipes = 2

[COMMANDPROCESSOR]

PipelinedBatchRendering = TRUE



[MEMORYCONTROLLER]

MemorySize = 67108864
MemoryClockMultiplier = 1
MemoryFrequency = 1
MemoryBusWidth = 64
MemoryBuses = 4
SharedBanks = FALSE
BankGranurality = 1024
BurstLength = 8
ReadLatency = 10
WriteLatency = 6
WriteToReadLatency = 6
MemoryPageSize = 4096
OpenPages = 1
PageOpenLatency = 20
MaxConsecutiveReads = 16
MaxConsecutiveWrites = 16
CommandProcessorBusWidth = 8
StreamerFetchBusWidth = 64
StreamerLoaderBusWidth = 64
ZStencilBusWidth = 64
ColorWriteBusWidth = 64
DACBusWidth = 64
TextureUnitBusWidth = 64
MappedMemorySize = 16777216
ReadBufferLines = 32
WriteBufferLines = 32
RequestQueueSize = 64
ServiceQueueSize = 32

MemoryControllerV2 = FALSE

# Parameters only for Memory Controller V2
V2MemoryChannels = 4
V2BanksPerMemoryChannel = 8
V2MemoryRowSize = 4096 
V2BurstElementsPerCycle = 4
V2ChannelInterleaving = 1024
V2BankInterleaving = 1024
# 0 = fifo
V2ChannelScheduler = 0
# 0 = close, 1 = open
V2PagePolicy = 1
# flag that allows to use a memory model without timing constraints (only signal latency overhead)
V2PerfectMemory = FALSE


[STREAMER]

IndicesCycle = 1
IndexBufferSize = 1024
InputRequestQueueSize = 8
AttributesCycle = 4
InputCacheLines = 32
InputCacheLineSize = 64
InputCachePortWidth = 16
InputCacheRequestQueueSize = 4
InputCacheInputQueueSize = 4
OutputFIFOSize = 64
OutputMemorySize = 48
VerticesCycle = 1
AttributesSentCycle = 4


[VERTEXSHADER]

ExecutableThreads = 12
InputBuffers = 4
ThreadResources = 128
ThreadRate = 1
FetchRate = 1
ThreadGroup = 1
LockedExecutionMode = FALSE
#
#  Enabling the scalar ALU requires FetchRate to be 2.
#
ScalarALU = FALSE
ThreadWindow = TRUE
FetchDelay = 0
SwapOnBlock = FALSE
InputsPerCycle = 1
OutputsPerCycle = 1
OutputLatency = 11


[PRIMITIVEASSEMBLY]

VerticesCycle = 1
TrianglesCycle = 1
InputBusLatency = 10
AssemblyQueueSize = 8


[CLIPPER]

TrianglesCycle = 1
ClipperUnits = 1
StartLatency = 1
ExecLatency = 6
ClipBufferSize = 4


[RASTERIZER]

TrianglesCycle = 1
SetupFIFOSize = 12
SetupUnits = 1
SetupLatency = 10
SetupStartLatency = 4
TriangleInputLatency = 2
TriangleOutputLatency = 2
TriangleSetupOnShader = FALSE
TriangleShaderQueueSize = 8
StampsPerCycle = 2
MSAASamplesCycle = 2
OverScanWidth = 4
OverScanHeight = 4
ScanWidth = 16
ScanHeight = 16
GenWidth = 8
GenHeight = 8
RasterizationBatchSize = 4
BatchQueueSize = 16
RecursiveMode = TRUE
DisableHZ = FALSE
StampsPerHZBlock = 16
HierarchicalZBufferSize = 262144
HZCacheLines = 8
HZCacheLineSize = 16
EarlyZQueueSize =  128
HZAccessLatency =  5
HZUpdateLatency =  4
HZBlocksClearedPerCycle = 256
NumInterpolators = 4
ShaderInputQueueSize = 16
ShaderOutputQueueSize = 16
ShaderInputBatchSize = 256
TiledShaderDistribution = TRUE
#
#  This two parameters are only for the unified shader version.
#
VertexInputQueueSize = 16
ShadedVertexQueueSize = 48
TriangleInputQueueSize = 8
TriangleOutputQueueSize = 8

GeneratedStampQueueSize = 128
EarlyZTestedStampQueueSize = 32
InterpolatedStampQueueSize = 16
ShadedStampQueueSize = 128
EmulatorStoredTriangles = 32

[FRAGMENTSHADER]

ExecutableThreads = 240
InputBuffers = 16
ThreadResources = 240
ThreadRate = 4
FetchRate = 1
ThreadGroup = 4
LockedExecutionMode = TRUE
#
#  Enabling the scalar ALU requires FetchRate to be 2.
#
ScalarALU = FALSE
ThreadWindow = TRUE
FetchDelay = 0
SwapOnBlock = FALSE
InputsPerCycle = 4
OutputsPerCycle = 4
OutputLatency = 11
TextureUnits = 1
TextureRequestRate = 1
TextureRequestGroup = 64
AddressALULatency = 6
FilterALULatency = 4

AnisotropyAlgorithm = 1
TextureBlockDimension = 3
TextureSuperBlockDimension = 3
TextureRequestQueueSize = 4
TextureAccessQueue = 64
TextureResultQueue = 4
TextureWaitReadWindow = 32
TwoLevelTextureCache = FALSE
TextureCacheLineSize = 256
TextureCacheWays = 4
TextureCacheLines = 16
TextureCachePortWidth = 4
TextureCacheRequestQueueSize = 4
TextureCacheInputQueue = 4
TextureCacheLineSizeL1 = 256
TextureCacheWaysL1 = 4
TextureCacheLinesL1 = 16
TextureCacheInputQueueL1 = 4
TextureCacheMissesPerCycle = 1
TextureCacheDecompressLatency = 4

[ZSTENCILTEST]

StampsPerCycle = 1
BytesPerPixel = 4
DisableCompression = FALSE
ZCacheWays = 4
ZCacheLines = 16
ZCacheStampsPerLine = 16
ZCachePortWidth = 32
ZCacheExtraReadPort = TRUE
ZCacheExtraWritePort = TRUE
ZCacheRequestQueueSize = 8
ZCacheInputQueueSize = 8
ZCacheOutputQueueSize = 8
BlockStateMemorySize = 262144
BlocksClearedPerCycle = 1024
CompressionUnitLatency = 8
DecompressionUnitLatency = 8
#ZQueueSize = 64
InputQueueSize = 8
FetchQueueSize = 64
ReadQueueSize = 16
OpQueueSize = 4
WriteQueueSize = 8
ZALUTestRate = 1
ZALULatency = 2

[COLORWRITE]

StampsPerCycle = 1
BytesPerPixel = 4
DisableCompression = FALSE
ColorCacheWays = 4
ColorCacheLines = 16
ColorCacheStampsPerLine = 16
ColorCachePortWidth = 32
ColorCacheExtraReadPort = TRUE
ColorCacheExtraWritePort = TRUE
ColorCacheRequestQueueSize = 8
ColorCacheInputQueueSize = 8
ColorCacheOutputQueueSize = 8
BlockStateMemorySize = 262144
BlocksClearedPerCycle = 1024
CompressionUnitLatency = 8
DecompressionUnitLatency = 8
#ColorQueueSize = 64
InputQueueSize = 8
FetchQueueSize = 64
ReadQueueSize = 16
OpQueueSize = 4
WriteQueueSize = 8
BlendALURate = 1
BlendALULatency = 2

[DAC]

BytesPerPixel = 4
BlockSize = 256
BlockUpdateLatency = 1
BlocksUpdatedPerCycle = 1024
BlockRequestQueueSize = 32
#
# While we use the DAC just to dump the frame after each swap
# we can dismiss the real decompression latency to speed up the
# dumping.
#
#DecompressionUnitLatency = 8
DecompressionUnitLatency = 1
RefreshRate = 5000000
SynchedRefresh = TRUE
RefreshFrame = TRUE
ATTILA
Toolbox