Triton Optimization
Optimize a PyTorch activation function using Triton
This example demonstrates using Weco to optimize a simple activation function implemented in PyTorch. In this example, we'll ask Weco to leverage Triton to accelerate our code.
Setup
If you haven't already, follow the Installation guide to install the Weco CLI. Otherwise, install the CLI using pip:
Install the required dependencies:
This example requires an NVIDIA GPU.
Run Weco
Now run Weco to optimize your code using Triton:
Explanation
--source module.py: Specifies the PyTorch Swish activation implementation (module.py) that Weco will optimize.--eval-command "python evaluate.py --path module.py": Defines the command to execute the evaluation script. This script benchmarks the generated solution inmodule.pyagainst a baseline and outputs thespeedup.--metric speedup: Sets the metric Weco should focus on improving during optimization.--goal maximize: Instructs Weco to aim for the highest possible speedup value.--steps 15: Determines the number of optimization iterations Weco will perform.--model o4-mini: Specifies the large language model to drive the optimization process.--additional-instructions "...": Provides specific guidance to the LLM.--eval-timeout 120: Stop running the evaluation script if it does not complete in 120 seconds.
Weco will iteratively modify module.py, incorporating Triton kernels, guided by the performance feedback (speedup) from the evaluation script and the instructions provided.
What's Next?
- Lower-level GPU programming: Try CUDA Optimization for maximum performance control
- Different optimization types: Explore Model Development or Prompt Engineering
- Simpler GPU optimization: Start with PyTorch Optimization
- Better evaluation scripts: Learn Writing Good Evaluation Scripts
- All command options: Check the CLI Reference