Model Development (Kaggle Spaceship Titanic)

This example demonstrates using Weco to optimize a Python script designed for the Spaceship Titanic Kaggle competition. The goal is to improve the model's accuracy metric by optimizing the train.py

You can find the complete files for this example here.

Setup

Ensure you are in the examples/spaceship-titanic directory.
Install Dependencies: Install the required Python packages:
pip install weco pandas numpy scikit-learn torch xgboost lightgbm catboost
Choose your LLM provider:
Create your OpenAI API key here.
export OPENAI_API_KEY="your_key_here"

Create the Optimization Target

This script contains the train_model and predict_with_model functions. Weco only edits this file during the run.

# train.py
 
# ============================================================================
# CRITICAL: DO NOT CHANGE FUNCTION NAMES OR SIGNATURES
# ============================================================================
# The following functions are part of the stable API contract:
# - train_model(train_df: pd.DataFrame, random_state: int = 0)
# - predict_with_model(model, test_df: pd.DataFrame) -> pd.DataFrame
#
# These function names and signatures MUST remain unchanged as they are
# imported and called by evaluate.py. Only modify the internal implementation.
# ============================================================================
 
import pandas as pd
from sklearn.dummy import DummyClassifier
 
 
def train_model(train_df: pd.DataFrame, random_state: int = 0):
    """
    Train a model on the training data and return it.
    This function will be optimized by Weco.
 
    IMPORTANT: This function name and signature must NOT be changed.
    Only the internal implementation should be modified.
 
    Args:
        train_df: Training DataFrame with features and target
        random_state: Random seed for reproducibility
 
    Returns:
        Trained model object
    """
    # Make local copy to prevent modifying DataFrame in the caller's scope
    train_df_local = train_df.copy()
 
    # --- Stage 1: Prepare training data ---
    # Separate target variable from training data
    y_train_series = train_df_local["Transported"]
    # Features for training: drop target and PassengerId
    X_train_df = train_df_local.drop(columns=["Transported", "PassengerId"])
 
    # --- Stage 2: Preprocessing and Model Training (THIS BLOCK WILL BE OPTIMIZED BY WECO) ---
    # WECO will insert/modify code here for:
    # - Imputation (e.g., SimpleImputer)
    # - Scaling (e.g., StandardScaler)
    # - Encoding categorical features (e.g., OneHotEncoder, LabelEncoder)
    # - Feature Engineering (creating new features)
    # - Model Selection (e.g., RandomForestClassifier, GradientBoostingClassifier, LogisticRegression)
    # - Hyperparameter Tuning (e.g., GridSearchCV, RandomizedSearchCV, or direct parameter changes)
    # - Potentially using sklearn.pipeline.Pipeline for robust preprocessing and modeling
 
    # --- Example: Current simple logic (Weco will replace/enhance this) ---
    model = DummyClassifier(strategy="most_frequent", random_state=random_state)
    model.fit(X_train_df, y_train_series)
    # --- End of WECO Optimizable Block ---
 
    return model
 
 
def predict_with_model(model, test_df: pd.DataFrame) -> pd.DataFrame:
    """
    Make predictions on test data using a trained model.
    This function should remain relatively stable.
 
    IMPORTANT: This function name and signature must NOT be changed.
    Only the internal implementation should be modified.
 
    Args:
        model: Trained model object
        test_df: Test DataFrame with features (and possibly target for validation)
 
    Returns:
        DataFrame with PassengerId and predictions
    """
    # Make local copy to prevent modifying DataFrame in the caller's scope
    test_df_local = test_df.copy()
 
    # Preserve PassengerId for the submission file
    passenger_ids = test_df_local["PassengerId"].copy()
 
    # Features for prediction: drop target (if present) and PassengerId
    X_test_df = test_df_local.drop(columns=["Transported", "PassengerId"], errors="ignore")
 
    # Make predictions
    predictions = model.predict(X_test_df)
 
    # Create the submission DataFrame
    submission_df = pd.DataFrame({"PassengerId": passenger_ids, "Transported": predictions.astype(bool)})
 
    return submission_df

Create the Evaluation Script

This script evaluates the generated dataframe. It will read train and test from data folder and add a validation dataset for intermediate evaluation (optimization target). And it will generate a submission.csv at the end.

# evaluate.py
import argparse
from pathlib import Path
import pandas as pd
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
 
 
class InvalidSubmissionError(Exception):
    pass
 
 
def evaluate_for_accuracy(
    submission_df: pd.DataFrame, answers_df: pd.DataFrame, target_column: str = "Transported", id_column: str = "PassengerId"
) -> float:
    # Answers checks
    assert target_column in answers_df.columns, f"Answers must have a `{target_column}` column"
    assert id_column in answers_df.columns, f"Answers must have a `{id_column}` column"
 
    # Submission checks
    if len(submission_df) != len(answers_df):
        raise InvalidSubmissionError("Submission must have the same length as the answers.")
    if target_column not in submission_df.columns:
        raise InvalidSubmissionError(f"Submission must have a `{target_column}` column")
    if id_column not in submission_df.columns:
        raise InvalidSubmissionError(f"Submission must have a `{id_column}` column")
 
    # Sort on id to ensure correct ordering
    submission_df = submission_df.sort_values(by=id_column)
    answers_df = answers_df.sort_values(by=id_column)
 
    if (submission_df[id_column].values != answers_df[id_column].values).any():
        raise InvalidSubmissionError(f"Submission and Answers `{id_column}`'s do not match")
 
    return accuracy_score(submission_df[target_column], answers_df[target_column])
 
 
def read_data(data_dir: Path) -> tuple[pd.DataFrame, pd.DataFrame, pd.DataFrame]:
    train_df = pd.read_csv(data_dir / "train.csv")
    train_df, validation_df = train_test_split(train_df, test_size=0.1, random_state=0)
    test_df = pd.read_csv(data_dir / "test.csv")
    return train_df, validation_df, test_df
 
 
if __name__ == "__main__":
    p = argparse.ArgumentParser()
    p.add_argument("--data-dir", type=Path, default=Path("./data/"))
    p.add_argument("--seed", type=int, default=0)
    args = p.parse_args()
 
    train_df, validation_df, test_df = read_data(args.data_dir)
 
    # Import training and prediction functions
    from train import train_model, predict_with_model
 
    # Validate that required functions exist and are callable
    assert callable(train_model), "train_model function must exist and be callable"
    assert callable(predict_with_model), "predict_with_model function must exist and be callable"
 
    # Step 1: Train the model (this will be optimized by Weco)
    print("Training model...")
    model = train_model(train_df, args.seed)
 
    # Step 2: Generate predictions on validation set (no retraining)
    print("Generating validation predictions...")
    validation_submission_df = predict_with_model(model, validation_df)
 
    # Step 3: Evaluate accuracy on validation set
    acc = evaluate_for_accuracy(validation_submission_df, validation_df)
    print(f"accuracy: {acc:.6f}")
 
    # Step 4: Generate predictions on test set (optional, for final submission)
    print("Generating test predictions...")
    test_submission_df = predict_with_model(model, test_df)
    test_submission_df.to_csv("submission.csv", index=False)
    print("Test predictions saved to submission.csv")

Run Weco

Run the following command to start optimizing the model:

weco run --source train.py \
     --eval-command "python evaluate.py --data-dir ./data --seed 0" \
     --metric accuracy \
     --goal maximize \
     --steps 10 \
     --model o4-mini \
     --additional-instructions "Improve feature engineering, model choice and hyper-parameters." \
     --log-dir .runs/spaceship-titanic

Explanation

--source train.py: The script provides a baseline as root node and directly optimize the train.py
--eval-command "python evaluate.py --data-dir ./data/": The weco agent will run the evaluate.py.
- [optional] --data-dir: path to the train and test data.
- [optional] --seed: Seed for reproduce the experiment.
--metric accuracy: The target metric Weco should optimize.
--goal maximize: Weco aims to increase the accuracy.
--steps 10: The number of optimization iterations.
--model o4-mini: The LLM driving the optimization.
--additional-instructions "Improve feature engineering, model choice and hyper-parameters.": A simple instruction for model improvement or you can put the path to comptition_description.md within the repo to feed the agent more detailed information.
--log-dir .runs/spaceship-titanic: Specifies the directory where Weco should save logs and results for this run.

Weco will iteratively update the feature engineering or modeling code within train.py guided by the evaluation method defined in evaluate.py

What's Next?

GPU optimization: Try CUDA or Triton to make models run faster
Different optimization types: Explore PyTorch Optimization or Prompt Engineering
Better evaluation scripts: Learn Writing Good Evaluation Scripts
All command options: Check the CLI Reference