Skip to content

'transforms' Submodule

Introduction

from streamtensor import transforms

Transformations for StreamTensor compiler.

This module provides a set of transformations for the StreamTensor compiler. These transformations are used to optimize the input MLIR module targeting spatial accelerators. The module is transformed in place.

apply_bufferization(module, task_partition_merging=True, task_chain_merging=False, data_driven_task_merging=False, max_stream_depth=None)

Apply bufferization.

Parameters:

Name Type Description Default
module Module

The module to be transformed.

required
task_partition_merging bool

Whether to merge task partitions.

True
task_chain_merging bool

Whether to merge task chains.

False
data_driven_task_merging bool

Whether to merge data-driven tasks.

False
max_stream_depth Optional[int]

The maximum stream depth.

None

apply_dataflow_optimization_transforms(module, max_widen_bitwidth=512, constant_folding=False)

Apply dataflow optimization transforms.

Parameters:

Name Type Description Default
module Module

The module to be transformed.

required
max_widen_bitwidth int

The maximum bitwidth to widen.

512
constant_folding bool

Whether to perform constant folding.

False

apply_greedy_kernel_fusion(module, entry_name, max_fusion_cost, dot_file=None, fused_dot_file=None, delete_sequence=True)

Apply greedy kernel fusion.

Parameters:

Name Type Description Default
module Module

The module to be transformed.

required
entry_name str

The entry function name.

required
max_fusion_cost int

The maximum fusion cost.

required
dot_file Optional[str]

The DOT file to print the kernel graph.

None
fused_dot_file Optional[str]

The DOT file to print the post-fusion kernel graph.

None
delete_sequence bool

Whether to delete the transform sequence after transform.

True

apply_hls_codegen(module, num_hbm_port=32, create_kernel_wrapper=True, assign_slr=False)

Apply HLS C++ code generation transforms.

Parameters:

Name Type Description Default
module Module

The module to be transformed.

required
num_hbm_port int

The number of HBM ports.

32
create_kernel_wrapper bool

Whether to create kernel wrapper function.

True
assign_slr bool

Whether to assign SLR for each task.

False

apply_intensity_aware_linalg_tiling(module, entry_name, default_tile_size, overall_unroll_size, max_vec_bitwidth, convert_linalg_to_dataflow, op_ii_map=None, dot_file=None, delete_sequence=True)

Apply intensity-aware linalg tiling.

Parameters:

Name Type Description Default
module Module

The module to be transformed.

required
entry_name str

The entry function name.

required
default_tile_size int

The default tile size.

required
overall_unroll_size int

The overall unroll size.

required
max_vec_bitwidth int

The maximum vectorize bitwidth.

required
convert_linalg_to_dataflow bool

Whether to convert tiled linalg op to dataflow kernel.

required
op_ii_map Optional[Dict[str, int]]

The map from op name to its initiation interval on hardware.

None
dot_file Optional[str]

The DOT file to print the linalg graph.

None
delete_sequence bool

Whether to delete the transform sequence after transform.

True

apply_linalg_optimization_transforms(module, fake_quantize=False, quantize_bitwidth=8, constant_quantize_bitwidth=8)

Apply linalg optimization transforms.

Parameters:

Name Type Description Default
module Module

The module to be transformed.

required
fake_quantize bool

Whether to apply fake quantization.

False
quantize_bitwidth int

The bitwidth for fake quantization.

8
constant_quantize_bitwidth int

The bitwidth of constants for fake quantization.

8

apply_linear_programming_resource_allocation(module, entry_name, pipeline_rewinding=True, stream_size_map=None, dot_file=None, partition_dot_file=None, partition=3, num_dsp=[2664, 2784, 2928], num_uram=[320, 320, 320], num_bram=[1200, 1152, 1200], num_lut=[386880, 364320, 395040], max_widen_bitwidth=512, imbalance_threshold=0.3)

Apply linear programming resource allocation.

Parameters:

Name Type Description Default
module Module

The module to be transformed.

required
entry_name str

The entry function name.

required
pipeline_rewinding bool

Whether to enable pipeline rewinding.

True
stream_size_map Optional[Dict[Tuple[str, str], int]]

The overriding map from edge to stream size.

None
dot_file Optional[str]

The DOT file to print the stream-sized graph.

None
partition_dot_file Optional[str]

The DOT file to print the partitioned graph.

None
partition int

The number of partitions on the target FPGA.

3
num_dsp List[int]

The number of DSPs on the target FPGA for each partition.

[2664, 2784, 2928]
num_uram List[int]

The number of URAMs on the target FPGA for each partition.

[320, 320, 320]
num_bram List[int]

The number of BRAMs on the target FPGA for each partition.

[1200, 1152, 1200]
num_lut List[int]

The number of LUTs on the target FPGA for each partition.

[386880, 364320, 395040]
max_widen_bitwidth int

The maximum bitwidth to widen.

512
imbalance_threshold float

The imbalance threshold (>= 0) of graph partitioning, the larger the value, the more imbalance allowed.

0.3

apply_runtime_codegen(module, host_func_name, timeout_in_ms=1000)

Apply runtime C++ code generation transforms.

Parameters:

Name Type Description Default
module Module

The module to be transformed.

required
host_func_name str

The name of the host function.

required
timeout_in_ms int

The timeout in milliseconds.

1000

apply_tensor_to_dataflow_conversion(module)

Apply tensor to dataflow conversion.

Parameters:

Name Type Description Default
module Module

The module to be transformed.

required

construct_kernel_fusion_transform_sequence(target, design_space)

Construct a transform sequence for applying kernel fusion.

Parameters:

Name Type Description Default
target BlockArgument

The handle of the target op.

required
design_space KernelFusionDesignSpace

The design space of kernel fusion.

required

Returns:

Type Description
List[Value]

A list of transformed values, which is empty in this case.

Raises:

Type Description
ValueError

If the kernel is not a KernelOp.

construct_linalg_tiling_transform_sequence(target, design_space, max_vec_bitwidth, convert_linalg_to_dataflow=True)

Construct a transform sequence for applying linalg tiling.

The following parameters are required in the input design space
  • parallel_tile_sizes: The parallel tile sizes.
  • reduction_tile_sizes: The reduction tile sizes.
  • unroll_sizes: The unroll sizes.
  • inputs_vec_sizes: The input vector sizes.
  • outputs_vec_sizes: The output vector sizes.
  • permutation: The permutation of the loop order.

Parameters:

Name Type Description Default
target BlockArgument

The handle of the target op.

required
design_space LinalgTilingDesignSpace

The design space of linalg tiling.

required
max_vec_bitwidth int

The maximum vectorize bitwidth.

required
convert_linalg_to_dataflow bool

Whether to convert tiled linalg op to dataflow kernel.

True

Returns:

Type Description
List[Value]

A list of transformed values, which is empty in this case.

Raises:

Type Description
ValueError

If the required attributes are not found.