Weco Logo
Weco Docs

Model Development (Kaggle Spaceship Titanic)

Iteratively improve a model's accuracy on Kaggle's Spaceship Titanic

This example demonstrates using Weco to optimize a Python script designed for the Spaceship Titanic Kaggle competition. The goal is to improve the model's accuracy metric by directly optimizing the evaluate.py script.

You can find the complete files for this example here.

Setup

  1. Ensure you are in the examples/spaceship-titanic directory.
  2. Install Dependencies: Install the required Python packages:
    pip install weco pandas numpy scikit-learn torch xgboost lightgbm catboost

The Evaluation Script (evaluate.py) is also the optimization target.

This script contains the model training, prediction logic, and evaluation. Weco will directly modify this file to optimize the accuracy metric. It reads data from the ./data/ directory, trains a model, makes predictions, saves them to submission.csv, and prints the accuracy score.

import argparse
from pathlib import Path
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.dummy import DummyClassifier
from sklearn.metrics import accuracy_score
import joblib
import warnings
 
warnings.filterwarnings("ignore", category=UserWarning)  # keep Weco's panel tidy
 
 
def train(df: pd.DataFrame, test_df: pd.DataFrame, random_state: int = 0) -> float:
    train_df, val_df = train_test_split(
        df, test_size=0.10, random_state=random_state, stratify=df["Transported"]
    )
 
    y_train = train_df.pop("Transported")
    y_val = val_df.pop("Transported")
 
    model = DummyClassifier(strategy="most_frequent", random_state=random_state)
    model.fit(train_df, y_train)
    preds = model.predict(val_df)
    acc = accuracy_score(y_val, preds)
 
    # **Important**: Keep this step!!!
    # Save the model and generate a submission file on test
    joblib.dump(model, "model.joblib")
    test_preds = model.predict(test_df)
    submission_df = pd.DataFrame(
        {"PassengerId": test_df["PassengerId"], "Transported": test_preds.astype(bool)}
    )
    submission_df.to_csv("submission.csv", index=False)
 
    return acc
 
 
if __name__ == "__main__":
    p = argparse.ArgumentParser()
    p.add_argument("--data-dir", type=Path, default=Path("./data/"))
    p.add_argument("--seed", type=int, default=0)
    args = p.parse_args()
 
    train_df = pd.read_csv(args.data_dir / "train.csv")
    test_df = pd.read_csv(args.data_dir / "test.csv")
    acc = train(train_df, test_df, random_state=args.seed)
    print(f"accuracy: {acc:.6f}")

Run Weco

Run the following command to start optimizing the model:

weco run --source evaluate.py \
         --eval-command "python evaluate.py --data-dir ./data" \
         --metric accuracy \
         --goal maximize \
         --steps 10 \
         --model gemini-2.5-pro-exp-03-25 \
         --additional-instructions "Improve feature engineering, model choice and hyper-parameters." \
         --log-dir .runs/spaceship-titanic

Explanation

  • --source evaluate.py: The script provides a baseline and is the target for optimization by Weco.
  • --eval-command "python evaluate.py --data-dir ./data": This command is run by Weco in each step. It executes the (potentially modified) evaluate.py script. The script should output the metric (e.g., accuracy: 0.80).
    • --data-dir ./data: Specifies the location of the training and testing data for the evaluate.py script.
  • --metric accuracy: The target metric Weco should optimize based on the output of the --eval-command.
  • --goal maximize: Weco aims to increase the accuracy.
  • --steps 10: The number of optimization iterations.
  • --model gemini-2.5-pro-exp-03-25: The LLM driving the optimization.
  • --additional-instructions "Improve feature engineering, model choice and hyper-parameters.": Provides high-level guidance to the LLM on what aspects of the code to focus on for improvement. Alternatively, you could provide a path to a file with more detailed instructions.
  • --log-dir .runs/spaceship-titanic: Specifies the directory where Weco should save logs and results for this run.

Weco will iteratively modify the feature engineering, model selection, and hyperparameter tuning code within evaluate.py, run the evaluation command, and use the resulting accuracy to guide further improvements.

On this page