'transforms' Submodule

Introduction
apply_bufferization
apply_dataflow_optimization_transforms
apply_greedy_kernel_fusion
apply_hls_codegen
apply_intensity_aware_linalg_tiling
apply_linalg_optimization_transforms
apply_linear_programming_stream_sizing
apply_runtime_codegen
apply_tensor_to_dataflow_conversion
construct_kernel_fusion_transform_sequence
construct_linalg_tiling_transform_sequence

Introduction¶

from streamtensor import transforms

Transformations for StreamTensor compiler.

This module provides a set of transformations for the StreamTensor compiler. These transformations are used to optimize the input MLIR module targeting spatial accelerators. The module is transformed in place.

`apply_bufferization(module, num_uram=960, num_bram=4032, size_lutram_in_bit=36700000, platform_num_bram=400, task_chain_merging=False, data_driven_task_merging=True)` ¶

Apply bufferization.

Parameters:

Name	Type	Description	Default
`module`	`Module`	The module to be transformed.	required
`num_uram`	`int`	The number of URAMs.	`960`
`num_bram`	`int`	The number of BRAMs.	`4032`
`size_lutram_in_bit`	`int`	The size of LUTRAM in bits.	`36700000`
`platform_num_bram`	`int`	The number of BRAMs on the platform.	`400`
`task_chain_merging`	`bool`	Whether to merge task chains.	`False`
`data_driven_task_merging`	`bool`	Whether to merge data-driven tasks.	`True`

`apply_dataflow_optimization_transforms(module, max_widen_bitwidth=512, max_vectorize_bitwidth=4096, constant_folding=False)` ¶

Apply dataflow optimization transforms.

Parameters:

Name	Type	Description	Default
`module`	`Module`	The module to be transformed.	required
`max_widen_bitwidth`	`int`	The maximum bitwidth to widen.	`512`
`max_vectorize_bitwidth`	`int`	The maximum bitwidth to vectorize.	`4096`
`constant_folding`	`bool`	Whether to perform constant folding.	`False`

`apply_greedy_kernel_fusion(module, entry_name, max_fusion_cost, dot_file=None, fused_dot_file=None, delete_sequence=True)` ¶

Apply greedy kernel fusion.

Parameters:

Name	Type	Description	Default
`module`	`Module`	The module to be transformed.	required
`entry_name`	`str`	The entry function name.	required
`max_fusion_cost`	`int`	The maximum fusion cost.	required
`dot_file`	`Optional[str]`	The DOT file to print the kernel graph.	`None`
`fused_dot_file`	`Optional[str]`	The DOT file to print the post-fusion kernel graph.	`None`
`delete_sequence`	`bool`	Whether to delete the transform sequence after transform.	`True`

`apply_hls_codegen(module, num_hbm_port=32, create_kernel_wrapper=True)` ¶

Apply HLS C++ code generation transforms.

Parameters:

Name	Type	Description	Default
`module`	`Module`	The module to be transformed.	required
`num_hbm_port`	`int`	The number of HBM ports.	`32`
`create_kernel_wrapper`	`bool`	Whether to create kernel wrapper function.	`True`

`apply_intensity_aware_linalg_tiling(module, entry_name, default_tile_size, overall_unroll_size, convert_linalg_to_dataflow, op_ii_map=None, dot_file=None, delete_sequence=True)` ¶

Apply intensity-aware linalg tiling.

Parameters:

Name	Type	Description	Default
`module`	`Module`	The module to be transformed.	required
`entry_name`	`str`	The entry function name.	required
`default_tile_size`	`int`	The default tile size.	required
`overall_unroll_size`	`int`	The overall unroll size.	required
`convert_linalg_to_dataflow`	`bool`	Whether to convert tiled linalg op to dataflow kernel.	required
`op_ii_map`	`Optional[Dict[str, int]]`	The map from op name to its initiation interval on hardware.	`None`
`dot_file`	`Optional[str]`	The DOT file to print the linalg graph.	`None`
`delete_sequence`	`bool`	Whether to delete the transform sequence after transform.	`True`

`apply_linalg_optimization_transforms(module, fake_quantize=False, quantize_bitwidth=8, constant_quantize_bitwidth=8)` ¶

Apply linalg optimization transforms.

Parameters:

Name	Type	Description	Default
`module`	`Module`	The module to be transformed.	required
`fake_quantize`	`bool`	Whether to apply fake quantization.	`False`
`quantize_bitwidth`	`int`	The bitwidth for fake quantization.	`8`
`constant_quantize_bitwidth`	`int`	The bitwidth of constants for fake quantization.	`8`

`apply_linear_programming_stream_sizing(module, entry_name, pipeline_rewinding=True, stream_size_map=None, stream_full_size=False, dot_file=None)` ¶

Apply linear programming stream sizing.

Parameters:

Name	Type	Description	Default
`module`	`Module`	The module to be transformed.	required
`entry_name`	`str`	The entry function name.	required
`pipeline_rewinding`	`bool`	Whether to enable pipeline rewinding.	`True`
`stream_size_map`	`Optional[Dict[Tuple[str, str], int]]`	The overriding map from edge to stream size.	`None`
`stream_full_size`	`bool`	Set all stream sizes to the maximum (token number).	`False`
`dot_file`	`Optional[str]`	The DOT file to print the stream graph.	`None`

`apply_runtime_codegen(module, host_func_name, timeout_in_ms=1000)` ¶

Apply runtime C++ code generation transforms.

Parameters:

Name	Type	Description	Default
`module`	`Module`	The module to be transformed.	required
`host_func_name`	`str`	The name of the host function.	required
`timeout_in_ms`	`int`	The timeout in milliseconds.	`1000`

`apply_tensor_to_dataflow_conversion(module)` ¶

Apply tensor to dataflow conversion.

Parameters:

Name	Type	Description	Default
`module`	`Module`	The module to be transformed.	required

`construct_kernel_fusion_transform_sequence(target, design_space)` ¶

Construct a transform sequence for applying kernel fusion.

Parameters:

Name	Type	Description	Default
`target`	`BlockArgument`	The handle of the target op.	required
`design_space`	`KernelFusionDesignSpace`	The design space of kernel fusion.	required

Returns:

Type	Description
`List[Value]`	A list of transformed values, which is empty in this case.

Raises:

Type	Description
`ValueError`	If the kernel is not a KernelOp.

`construct_linalg_tiling_transform_sequence(target, design_space, convert_linalg_to_dataflow=True)` ¶

Construct a transform sequence for applying linalg tiling.

The following parameters are required in the input design space

parallel_tile_sizes: The parallel tile sizes.
reduction_tile_sizes: The reduction tile sizes.
unroll_sizes: The unroll sizes.
inputs_vec_sizes: The input vector sizes.
outputs_vec_sizes: The output vector sizes.
permutation: The permutation of the loop order.

Parameters:

Name	Type	Description	Default
`target`	`BlockArgument`	The handle of the target op.	required
`design_space`	`LinalgTilingDesignSpace`	The design space of linalg tiling.	required
`convert_linalg_to_dataflow`	`bool`	Whether to convert tiled linalg op to dataflow kernel.	`True`

Returns:

Type	Description
`List[Value]`	A list of transformed values, which is empty in this case.

Raises:

Type	Description
`ValueError`	If the required attributes are not found.

'transforms' Submodule

Introduction¶

apply_bufferization(module, num_uram=960, num_bram=4032, size_lutram_in_bit=36700000, platform_num_bram=400, task_chain_merging=False, data_driven_task_merging=True) ¶

apply_dataflow_optimization_transforms(module, max_widen_bitwidth=512, max_vectorize_bitwidth=4096, constant_folding=False) ¶

apply_greedy_kernel_fusion(module, entry_name, max_fusion_cost, dot_file=None, fused_dot_file=None, delete_sequence=True) ¶

apply_hls_codegen(module, num_hbm_port=32, create_kernel_wrapper=True) ¶

apply_intensity_aware_linalg_tiling(module, entry_name, default_tile_size, overall_unroll_size, convert_linalg_to_dataflow, op_ii_map=None, dot_file=None, delete_sequence=True) ¶

apply_linalg_optimization_transforms(module, fake_quantize=False, quantize_bitwidth=8, constant_quantize_bitwidth=8) ¶

apply_linear_programming_stream_sizing(module, entry_name, pipeline_rewinding=True, stream_size_map=None, stream_full_size=False, dot_file=None) ¶

apply_runtime_codegen(module, host_func_name, timeout_in_ms=1000) ¶

apply_tensor_to_dataflow_conversion(module) ¶

construct_kernel_fusion_transform_sequence(target, design_space) ¶

construct_linalg_tiling_transform_sequence(target, design_space, convert_linalg_to_dataflow=True) ¶

`apply_bufferization(module, num_uram=960, num_bram=4032, size_lutram_in_bit=36700000, platform_num_bram=400, task_chain_merging=False, data_driven_task_merging=True)` ¶

`apply_dataflow_optimization_transforms(module, max_widen_bitwidth=512, max_vectorize_bitwidth=4096, constant_folding=False)` ¶

`apply_greedy_kernel_fusion(module, entry_name, max_fusion_cost, dot_file=None, fused_dot_file=None, delete_sequence=True)` ¶

`apply_hls_codegen(module, num_hbm_port=32, create_kernel_wrapper=True)` ¶

`apply_intensity_aware_linalg_tiling(module, entry_name, default_tile_size, overall_unroll_size, convert_linalg_to_dataflow, op_ii_map=None, dot_file=None, delete_sequence=True)` ¶

`apply_linalg_optimization_transforms(module, fake_quantize=False, quantize_bitwidth=8, constant_quantize_bitwidth=8)` ¶

`apply_linear_programming_stream_sizing(module, entry_name, pipeline_rewinding=True, stream_size_map=None, stream_full_size=False, dot_file=None)` ¶

`apply_runtime_codegen(module, host_func_name, timeout_in_ms=1000)` ¶

`apply_tensor_to_dataflow_conversion(module)` ¶

`construct_kernel_fusion_transform_sequence(target, design_space)` ¶

`construct_linalg_tiling_transform_sequence(target, design_space, convert_linalg_to_dataflow=True)` ¶