Model Development (Kaggle Spaceship Titanic)
Iteratively improve a model's accuracy on Kaggle's Spaceship Titanic
This example demonstrates using Weco to optimize a Python script designed for the Spaceship Titanic Kaggle competition.
The goal is to improve the model's accuracy
metric by directly optimizing the evaluate.py
script.
You can find the complete files for this example here.
Setup
- Ensure you are in the
examples/spaceship-titanic
directory. - Install Dependencies: Install the required Python packages:
The Evaluation Script (evaluate.py
) is also the optimization target.
This script contains the model training, prediction logic, and evaluation. Weco will directly modify this file to optimize the accuracy
metric. It reads data from the ./data/
directory, trains a model, makes predictions, saves them to submission.csv
, and prints the accuracy score.
Run Weco
Run the following command to start optimizing the model:
Explanation
--source evaluate.py
: The script provides a baseline and is the target for optimization by Weco.--eval-command "python evaluate.py --data-dir ./data"
: This command is run by Weco in each step. It executes the (potentially modified)evaluate.py
script. The script should output the metric (e.g.,accuracy: 0.80
).--data-dir ./data
: Specifies the location of the training and testing data for theevaluate.py
script.
--metric accuracy
: The target metric Weco should optimize based on the output of the--eval-command
.--goal maximize
: Weco aims to increase the accuracy.--steps 10
: The number of optimization iterations.--model gemini-2.5-pro-exp-03-25
: The LLM driving the optimization.--additional-instructions "Improve feature engineering, model choice and hyper-parameters."
: Provides high-level guidance to the LLM on what aspects of the code to focus on for improvement. Alternatively, you could provide a path to a file with more detailed instructions.--log-dir .runs/spaceship-titanic
: Specifies the directory where Weco should save logs and results for this run.
Weco will iteratively modify the feature engineering, model selection, and hyperparameter tuning code within evaluate.py
, run the evaluation command, and use the resulting accuracy
to guide further improvements.