Triton Optimization
Optimize causal multi-head self-attention using Triton
This example demonstrates using Weco to optimize a causal multi-head self-attention mechanism, a core component of Transformer models, implemented in PyTorch. The optimization target is to leverage Triton for writing highly efficient GPU code, to accelerate the operation.
Setup
If you haven't already, follow the Installation guide to install the Weco CLI. Otherwise, install the CLI using pip
:
Choose your LLM provider:
Create your OpenAI API key here.
Install the dependencies of the scripts shown in subsequent sections:
(Note: Triton installation might require specific CUDA versions. Refer to the official Triton documentation if you encounter issues.)
Run Weco
Now run Weco to optimize your code using Triton:
Explanation
--source optimize.py
: Specifies the PyTorch self-attention implementation (optimize.py
) that Weco will optimize.--eval-command "python evaluate.py --solution-path optimize.py"
: Defines the command to execute the evaluation script. This script benchmarks the generated solution inoptimize.py
against a baseline and outputs thespeedup
.--metric speedup
: Sets the metric Weco should focus on improving during optimization.--goal maximize
: Instructs Weco to aim for the highest possible speedup value.--steps 30
: Determines the number of optimization iterations Weco will perform.--model o4-mini
: Specifies the large language model to drive the optimization process.--additional-instructions "..."
: Provides specific guidance to the LLM. In this case, it directs the model to use Triton for optimization, ensure the numerical difference ("max float diff") between the original and optimized code remains small, and keep the overall code structure consistent.
Weco will iteratively modify optimize.py
, incorporating Triton kernels, guided by the performance feedback (speedup
) from the evaluation script and the instructions provided.
What's Next?
- Lower-level GPU programming: Try CUDA Optimization for maximum performance control
- Different optimization types: Explore Model Development or Prompt Engineering
- Simpler GPU optimization: Start with PyTorch Optimization
- Better evaluation scripts: Learn Writing Good Evaluation Scripts
- All command options: Check the CLI Reference