Model Development (Kaggle Spaceship Titanic)
Iteratively improve a model's accuracy on Kaggle's Spaceship Titanic
This example demonstrates using Weco to optimize a Python script designed for the Spaceship Titanic Kaggle competition.
The goal is to improve the model's accuracy metric by optimizing the train.py
You can find the complete files for this example here.
Setup
Ensure you are in the examples/spaceship-titanic directory.
git clone https://github.com/WecoAI/weco-cli.git
cd weco-cli/examples/spaceship-titanicInstall the Weco CLI:
curl -fsSL https://weco.ai/install.sh | shpowershell -ExecutionPolicy ByPass -c "irm https://weco.ai/install.ps1 | iex"irm https://weco.ai/install.ps1 | iexpip install wecogit clone https://github.com/wecoai/weco-cli.gitcd weco-clipip install -e .Install the required Python packages:
pip install pandas numpy scikit-learn torch xgboost lightgbm catboostCreate the Optimization Target
This script contains the train_model and predict_with_model functions. Weco only edits this file during the run.
# train.py
# ============================================================================
# CRITICAL: DO NOT CHANGE FUNCTION NAMES OR SIGNATURES
# ============================================================================
# The following functions are part of the stable API contract:
# - train_model(train_df: pd.DataFrame, random_state: int = 0)
# - predict_with_model(model, test_df: pd.DataFrame) -> pd.DataFrame
#
# These function names and signatures MUST remain unchanged as they are
# imported and called by evaluate.py. Only modify the internal implementation.
# ============================================================================
import pandas as pd
from sklearn.dummy import DummyClassifier
def train_model(train_df: pd.DataFrame, random_state: int = 0):
"""
Train a model on the training data and return it.
This function will be optimized by Weco.
IMPORTANT: This function name and signature must NOT be changed.
Only the internal implementation should be modified.
Args:
train_df: Training DataFrame with features and target
random_state: Random seed for reproducibility
Returns:
Trained model object
"""
# Make local copy to prevent modifying DataFrame in the caller's scope
train_df_local = train_df.copy()
# --- Stage 1: Prepare training data ---
# Separate target variable from training data
y_train_series = train_df_local["Transported"]
# Features for training: drop target and PassengerId
X_train_df = train_df_local.drop(columns=["Transported", "PassengerId"])
# --- Stage 2: Preprocessing and Model Training (THIS BLOCK WILL BE OPTIMIZED BY WECO) ---
# WECO will insert/modify code here for:
# - Imputation (e.g., SimpleImputer)
# - Scaling (e.g., StandardScaler)
# - Encoding categorical features (e.g., OneHotEncoder, LabelEncoder)
# - Feature Engineering (creating new features)
# - Model Selection (e.g., RandomForestClassifier, GradientBoostingClassifier, LogisticRegression)
# - Hyperparameter Tuning (e.g., GridSearchCV, RandomizedSearchCV, or direct parameter changes)
# - Potentially using sklearn.pipeline.Pipeline for robust preprocessing and modeling
# --- Example: Current simple logic (Weco will replace/enhance this) ---
model = DummyClassifier(strategy="most_frequent", random_state=random_state)
model.fit(X_train_df, y_train_series)
# --- End of WECO Optimizable Block ---
return model
def predict_with_model(model, test_df: pd.DataFrame) -> pd.DataFrame:
"""
Make predictions on test data using a trained model.
This function should remain relatively stable.
IMPORTANT: This function name and signature must NOT be changed.
Only the internal implementation should be modified.
Args:
model: Trained model object
test_df: Test DataFrame with features (and possibly target for validation)
Returns:
DataFrame with PassengerId and predictions
"""
# Make local copy to prevent modifying DataFrame in the caller's scope
test_df_local = test_df.copy()
# Preserve PassengerId for the submission file
passenger_ids = test_df_local["PassengerId"].copy()
# Features for prediction: drop target (if present) and PassengerId
X_test_df = test_df_local.drop(columns=["Transported", "PassengerId"], errors="ignore")
# Make predictions
predictions = model.predict(X_test_df)
# Create the submission DataFrame
submission_df = pd.DataFrame({"PassengerId": passenger_ids, "Transported": predictions.astype(bool)})
return submission_dfCreate the Evaluation Script
This script evaluates the generated dataframe. It will read train and test from data folder and add a validation dataset for intermediate evaluation (optimization target). And it will generate a submission.csv at the end.
# evaluate.py
import argparse
from pathlib import Path
import pandas as pd
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
class IncorrectSubmissionError(Exception):
pass
def evaluate_for_accuracy(
submission_df: pd.DataFrame, answers_df: pd.DataFrame, target_column: str = "Transported", id_column: str = "PassengerId"
) -> float:
# Answers checks
assert target_column in answers_df.columns, f"Answers must have a `{target_column}` column"
assert id_column in answers_df.columns, f"Answers must have a `{id_column}` column"
# Submission checks
if len(submission_df) != len(answers_df):
raise IncorrectSubmissionError("Submission must have the same length as the answers.")
if target_column not in submission_df.columns:
raise IncorrectSubmissionError(f"Submission must have a `{target_column}` column")
if id_column not in submission_df.columns:
raise IncorrectSubmissionError(f"Submission must have a `{id_column}` column")
# Sort on id to ensure correct ordering
submission_df = submission_df.sort_values(by=id_column)
answers_df = answers_df.sort_values(by=id_column)
if (submission_df[id_column].values != answers_df[id_column].values).any():
raise IncorrectSubmissionError(f"Submission and Answers `{id_column}`'s do not match")
return accuracy_score(submission_df[target_column], answers_df[target_column])
def read_data(data_dir: Path) -> tuple[pd.DataFrame, pd.DataFrame, pd.DataFrame]:
train_df = pd.read_csv(data_dir / "train.csv")
train_df, validation_df = train_test_split(train_df, test_size=0.1, random_state=0)
test_df = pd.read_csv(data_dir / "test.csv")
return train_df, validation_df, test_df
if __name__ == "__main__":
p = argparse.ArgumentParser()
p.add_argument("--data-dir", type=Path, default=Path("./data/"))
p.add_argument("--seed", type=int, default=0)
args = p.parse_args()
train_df, validation_df, test_df = read_data(args.data_dir)
# Import training and prediction functions
from train import train_model, predict_with_model
# Validate that required functions exist and are callable
assert callable(train_model), "train_model function must exist and be callable"
assert callable(predict_with_model), "predict_with_model function must exist and be callable"
# Step 1: Train the model (this will be optimized by Weco)
print("Training model...")
model = train_model(train_df, args.seed)
# Step 2: Generate predictions on validation set (no retraining)
print("Generating validation predictions...")
validation_submission_df = predict_with_model(model, validation_df)
# Step 3: Evaluate accuracy on validation set
acc = evaluate_for_accuracy(validation_submission_df, validation_df)
print(f"accuracy: {acc:.6f}")
# Step 4: Generate predictions on test set (optional, for final submission)
print("Generating test predictions...")
test_submission_df = predict_with_model(model, test_df)
test_submission_df.to_csv("submission.csv", index=False)
print("Test predictions saved to submission.csv")Run Weco
Run the following command to start optimizing the model:
weco run --source train.py \
--eval-command "python evaluate.py --data-dir ./data --seed 0" \
--metric accuracy \
--goal maximize \
--steps 10 \
--model o4-mini \
--additional-instructions "Improve feature engineering, model choice and hyper-parameters." \
--log-dir .runs/spaceship-titanicweco run --source train.py ^
--eval-command "python evaluate.py --data-dir ./data" ^
--metric accuracy ^
--goal maximize ^
--steps 10 ^
--model o4-mini ^
--additional-instructions "Improve feature engineering, model choice and hyper-parameters." ^
--log-dir .runs/spaceship-titanicOr in PowerShell:
weco run --source train.py `
--eval-command "python evaluate.py --data-dir ./data" `
--metric accuracy `
--goal maximize `
--steps 10 `
--model o4-mini `
--additional-instructions "Improve feature engineering, model choice and hyper-parameters." `
--log-dir .runs/spaceship-titanicExplanation
--source train.py: The script provides a baseline as root node and directly optimize the train.py--eval-command "python evaluate.py --data-dir ./data/": The weco agent will run theevaluate.py.- [optional]
--data-dir: path to the train and test data. - [optional]
--seed: Seed for reproduce the experiment.
- [optional]
--metric accuracy: The target metric Weco should optimize.--goal maximize: Weco aims to increase the accuracy.--steps 10: The number of optimization iterations.--model o4-mini: The LLM driving the optimization.--additional-instructions "Improve feature engineering, model choice and hyper-parameters.": A simple instruction for model improvement or you can put the path tocomptition_description.mdwithin the repo to feed the agent more detailed information.--log-dir .runs/spaceship-titanic: Specifies the directory where Weco should save logs and results for this run.
Weco will iteratively update the feature engineering or modeling code within train.py guided by the evaluation method defined in evaluate.py
What's Next?
- GPU optimization: Try CUDA or Triton to make models run faster
- Different optimization types: Explore PyTorch Optimization or Prompt Engineering
- Better evaluation scripts: Learn Writing Good Evaluation Scripts
- All command options: Check the CLI Reference