'transforms' Submodule
- Introduction
- apply_bufferization
- apply_dataflow_optimization_transforms
- apply_greedy_kernel_fusion
- apply_hls_codegen
- apply_intensity_aware_linalg_tiling
- apply_linalg_optimization_transforms
- apply_linear_programming_resource_allocation
- apply_runtime_codegen
- apply_tensor_to_dataflow_conversion
- construct_kernel_fusion_transform_sequence
- construct_linalg_tiling_transform_sequence
Introduction¶
from streamtensor import transforms
Transformations for StreamTensor compiler.
This module provides a set of transformations for the StreamTensor compiler. These transformations are used to optimize the input MLIR module targeting spatial accelerators. The module is transformed in place.
apply_bufferization(module, task_partition_merging=True, task_chain_merging=False, data_driven_task_merging=False, max_stream_depth=None)
¶
Apply bufferization.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
module
|
Module
|
The module to be transformed. |
required |
task_partition_merging
|
bool
|
Whether to merge task partitions. |
True
|
task_chain_merging
|
bool
|
Whether to merge task chains. |
False
|
data_driven_task_merging
|
bool
|
Whether to merge data-driven tasks. |
False
|
max_stream_depth
|
Optional[int]
|
The maximum stream depth. |
None
|
apply_dataflow_optimization_transforms(module, max_widen_bitwidth=512, constant_folding=False)
¶
Apply dataflow optimization transforms.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
module
|
Module
|
The module to be transformed. |
required |
max_widen_bitwidth
|
int
|
The maximum bitwidth to widen. |
512
|
constant_folding
|
bool
|
Whether to perform constant folding. |
False
|
apply_greedy_kernel_fusion(module, entry_name, max_fusion_cost, dot_file=None, fused_dot_file=None, delete_sequence=True)
¶
Apply greedy kernel fusion.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
module
|
Module
|
The module to be transformed. |
required |
entry_name
|
str
|
The entry function name. |
required |
max_fusion_cost
|
int
|
The maximum fusion cost. |
required |
dot_file
|
Optional[str]
|
The DOT file to print the kernel graph. |
None
|
fused_dot_file
|
Optional[str]
|
The DOT file to print the post-fusion kernel graph. |
None
|
delete_sequence
|
bool
|
Whether to delete the transform sequence after transform. |
True
|
apply_hls_codegen(module, num_hbm_port=32, create_kernel_wrapper=True, assign_slr=False)
¶
Apply HLS C++ code generation transforms.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
module
|
Module
|
The module to be transformed. |
required |
num_hbm_port
|
int
|
The number of HBM ports. |
32
|
create_kernel_wrapper
|
bool
|
Whether to create kernel wrapper function. |
True
|
assign_slr
|
bool
|
Whether to assign SLR for each task. |
False
|
apply_intensity_aware_linalg_tiling(module, entry_name, default_tile_size, overall_unroll_size, max_vec_bitwidth, convert_linalg_to_dataflow, op_ii_map=None, dot_file=None, delete_sequence=True)
¶
Apply intensity-aware linalg tiling.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
module
|
Module
|
The module to be transformed. |
required |
entry_name
|
str
|
The entry function name. |
required |
default_tile_size
|
int
|
The default tile size. |
required |
overall_unroll_size
|
int
|
The overall unroll size. |
required |
max_vec_bitwidth
|
int
|
The maximum vectorize bitwidth. |
required |
convert_linalg_to_dataflow
|
bool
|
Whether to convert tiled linalg op to dataflow kernel. |
required |
op_ii_map
|
Optional[Dict[str, int]]
|
The map from op name to its initiation interval on hardware. |
None
|
dot_file
|
Optional[str]
|
The DOT file to print the linalg graph. |
None
|
delete_sequence
|
bool
|
Whether to delete the transform sequence after transform. |
True
|
apply_linalg_optimization_transforms(module, fake_quantize=False, quantize_bitwidth=8, constant_quantize_bitwidth=8)
¶
Apply linalg optimization transforms.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
module
|
Module
|
The module to be transformed. |
required |
fake_quantize
|
bool
|
Whether to apply fake quantization. |
False
|
quantize_bitwidth
|
int
|
The bitwidth for fake quantization. |
8
|
constant_quantize_bitwidth
|
int
|
The bitwidth of constants for fake quantization. |
8
|
apply_linear_programming_resource_allocation(module, entry_name, pipeline_rewinding=True, stream_size_map=None, dot_file=None, partition_dot_file=None, partition=3, num_dsp=[2664, 2784, 2928], num_uram=[320, 320, 320], num_bram=[1200, 1152, 1200], num_lut=[386880, 364320, 395040], max_widen_bitwidth=512, imbalance_threshold=0.3)
¶
Apply linear programming resource allocation.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
module
|
Module
|
The module to be transformed. |
required |
entry_name
|
str
|
The entry function name. |
required |
pipeline_rewinding
|
bool
|
Whether to enable pipeline rewinding. |
True
|
stream_size_map
|
Optional[Dict[Tuple[str, str], int]]
|
The overriding map from edge to stream size. |
None
|
dot_file
|
Optional[str]
|
The DOT file to print the stream-sized graph. |
None
|
partition_dot_file
|
Optional[str]
|
The DOT file to print the partitioned graph. |
None
|
partition
|
int
|
The number of partitions on the target FPGA. |
3
|
num_dsp
|
List[int]
|
The number of DSPs on the target FPGA for each partition. |
[2664, 2784, 2928]
|
num_uram
|
List[int]
|
The number of URAMs on the target FPGA for each partition. |
[320, 320, 320]
|
num_bram
|
List[int]
|
The number of BRAMs on the target FPGA for each partition. |
[1200, 1152, 1200]
|
num_lut
|
List[int]
|
The number of LUTs on the target FPGA for each partition. |
[386880, 364320, 395040]
|
max_widen_bitwidth
|
int
|
The maximum bitwidth to widen. |
512
|
imbalance_threshold
|
float
|
The imbalance threshold (>= 0) of graph partitioning, the larger the value, the more imbalance allowed. |
0.3
|
apply_runtime_codegen(module, host_func_name, timeout_in_ms=1000)
¶
Apply runtime C++ code generation transforms.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
module
|
Module
|
The module to be transformed. |
required |
host_func_name
|
str
|
The name of the host function. |
required |
timeout_in_ms
|
int
|
The timeout in milliseconds. |
1000
|
apply_tensor_to_dataflow_conversion(module)
¶
Apply tensor to dataflow conversion.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
module
|
Module
|
The module to be transformed. |
required |
construct_kernel_fusion_transform_sequence(target, design_space)
¶
Construct a transform sequence for applying kernel fusion.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
target
|
BlockArgument
|
The handle of the target op. |
required |
design_space
|
KernelFusionDesignSpace
|
The design space of kernel fusion. |
required |
Returns:
Type | Description |
---|---|
List[Value]
|
A list of transformed values, which is empty in this case. |
Raises:
Type | Description |
---|---|
ValueError
|
If the kernel is not a KernelOp. |
construct_linalg_tiling_transform_sequence(target, design_space, max_vec_bitwidth, convert_linalg_to_dataflow=True)
¶
Construct a transform sequence for applying linalg tiling.
The following parameters are required in the input design space
- parallel_tile_sizes: The parallel tile sizes.
- reduction_tile_sizes: The reduction tile sizes.
- unroll_sizes: The unroll sizes.
- inputs_vec_sizes: The input vector sizes.
- outputs_vec_sizes: The output vector sizes.
- permutation: The permutation of the loop order.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
target
|
BlockArgument
|
The handle of the target op. |
required |
design_space
|
LinalgTilingDesignSpace
|
The design space of linalg tiling. |
required |
max_vec_bitwidth
|
int
|
The maximum vectorize bitwidth. |
required |
convert_linalg_to_dataflow
|
bool
|
Whether to convert tiled linalg op to dataflow kernel. |
True
|
Returns:
Type | Description |
---|---|
List[Value]
|
A list of transformed values, which is empty in this case. |
Raises:
Type | Description |
---|---|
ValueError
|
If the required attributes are not found. |