Agentic Scaffold
Optimize a VLM function that extracts tabular data from chart images
This example shows how Weco can optimize an AI workflow that extracts tabular data from chart images using a Vision Language Model (VLM).
You can follow along here or directly checkout the files from here.
Prerequisites
If you haven't already, follow the Installation guide to install the Weco CLI. Otherwise, install the CLI:
curl -fsSL https://weco.ai/install.sh | shpowershell -ExecutionPolicy ByPass -c "irm https://weco.ai/install.ps1 | iex"irm https://weco.ai/install.ps1 | iexpip install wecogit clone https://github.com/wecoai/weco-cli.gitcd weco-clipip install -e .You'll also need:
- Python 3.9+
uvinstalled (see https://docs.astral.sh/uv/)- An OpenAI API key in your environment:
export OPENAI_API_KEY=your_key_herePrepare the Data
The example uses a subset of line charts from the ChartQA dataset. First, prepare the data:
cd examples/extract-line-plot
uv run --with huggingface_hub python prepare_data.pyThis script:
- Downloads the ChartQA dataset snapshot
- Produces a 100-sample subset of line charts in
subset_line_100/with:index.csv: mapping of example IDs to image and ground truth table pathsimages/: chart images (PNG/JPEG)tables/: ground truth CSV tables
Specify the file to be optimized
Point Weco to optimize.py, which contains the baseline VLM function that Weco will optimize. This file includes:
VLMExtractor.image_to_csv(): Main function that takes an image path and returns CSV textbuild_prompt(): Prompt template that instructs the VLM to extract dataclean_to_csv(): Post-processing function to clean the output
def build_prompt() -> str:
return (
"You are a precise data extraction model. Given a chart image, extract the underlying data table.\n"
"Return ONLY the CSV text with a header row and no markdown code fences.\n"
"Rules:\n"
"- The first column must be the x-axis values with its exact axis label as the header.\n"
"- Include one column per data series using the legend labels as headers.\n"
"- Preserve the original order of x-axis ticks as they appear.\n"
"- Use plain CSV (comma-separated), no explanations, no extra text.\n"
)
class VLMExtractor:
def image_to_csv(self, image_path: Path) -> str:
prompt = build_prompt()
image_uri = image_to_data_uri(image_path)
response = self.client.chat.completions.create(
model=self.model,
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": prompt},
{"type": "image_url", "image_url": {"url": image_uri}}
],
}
],
)
text = response.choices[0].message.content or ""
return clean_to_csv(text)Weco will edit this file during optimization, focusing on improving the prompt and extraction logic.
Create the Evaluation Script
The eval.py script evaluates the VLM extraction performance:
- Loads the prepared dataset from
subset_line_100/ - Calls
VLMExtractor.image_to_csv()for each chart image - Compares predicted CSV tables to ground truth using a similarity metric
- Writes predictions to
predictions/directory - Prints progress and a final
accuracy:line that Weco reads
The evaluation metric combines:
- Header match (20% weight): Exact match of column headers
- Content similarity (80% weight): Jaccard-based similarity of data rows using SMAPE (Symmetric Mean Absolute Percentage Error) for numeric values
Key configuration options:
--max-samples: Number of samples to evaluate (default: 100)--num-workers: Parallel workers for concurrent VLM calls (default: 4)--visualize-dir: Optional directory to save comparison plots
Run a baseline evaluation:
uv run --with openai python eval.py --max-samples 10 --num-workers 4This writes predicted CSVs to predictions/ and prints a final line like accuracy: 0.32.
Run Weco
Now run Weco to iteratively improve the extraction function:
weco run --source optimize.py \
--eval-command 'uv run --with openai python eval.py --max-samples 100 --num-workers 50' \
--metric accuracy \
--goal maximize \
--steps 20 \
--model gpt-5weco run --source optimize.py ^
--eval-command "uv run --with openai python eval.py --max-samples 100 --num-workers 50" ^
--metric accuracy ^
--goal maximize ^
--steps 20 ^
--model gpt-5Or in PowerShell:
weco run --source optimize.py `
--eval-command "uv run --with openai python eval.py --max-samples 100 --num-workers 50" `
--metric accuracy `
--goal maximize `
--steps 20 `
--model gpt-5Arguments:
--source optimize.py: File that Weco will edit to improve results--eval-command '…': Command Weco executes to measure the metric--metric accuracy: Weco parsesaccuracy: <value>fromeval.pyoutput--goal maximize: Higher accuracy is better--steps 20: Number of optimization iterations--model gpt-5: Model used by Weco to propose edits
During each evaluation round, you will see log lines similar to:
[setup] evaluating 100 samples using gpt-4o-mini …
[progress] 5/100 done, avg score: 0.3120, elapsed 12.3s
[progress] 10/100 done, avg score: 0.3280, elapsed 24.1s
...
accuracy: 0.3420Weco then mutates the prompt and extraction logic in optimize.py, tries again, and gradually pushes the accuracy higher.
How it works
- The evaluation script loads images and ground truth tables from the prepared dataset
- It sends VLM calls in parallel via
ThreadPoolExecutor, hiding network latency - Every 5 completed items, the script logs progress with current average score and elapsed time
- The final line
accuracy: valueis parsed by Weco for guidance - The metric includes a cost cap: if average cost per query exceeds $0.02, accuracy is set to 0.0
Tips
- Adjust
--num-workersto balance throughput and rate limits (50 workers works well for larger datasets) - You can tweak baseline behavior in
optimize.py(prompt, temperature, model) - Weco will explore modifications automatically - Use
--visualize-dirto generate comparison plots showing ground truth vs predictions - For faster iteration during development, reduce
--max-samplesto 10-20 samples
What's Next?
- Different optimization types: Try Model Development for ML workflows or GPU optimization with CUDA and Triton
- Better evaluation scripts: Learn Writing Good Evaluation Scripts
- All command options: Check the CLI Reference
- More examples: Browse all Examples