Optimization Loop#

Warning

Before running any code, ensure you are logged in to the Afnio backend (afnio login).. See Logging in to Afnio Backend for details.

Tip

Afnio lets you build custom training, validation, and test loops for full control over your agent’s optimization process. However, if you prefer a ready-made solution, Afnio provides the Trainer class, which handles standard training, validation, and testing routines out of the box. If you want to get started quickly, see Trainer for details and usage examples.

Training an AI agent or workflow is an iterative process: in each iteration, the agent makes a guess about the output, calculates the error in its guess (loss), collects feedback with respect to its parameters (see Automatic Differentiation), and optimizes these parameters to improve future predictions. In Afnio, this means iteratively refining parameters—such as prompts, templates, or logic—based on feedback from evaluators or loss functions to better achieve your desired outcomes.

A typical optimization loop in Afnio consists of:

  • Forward Pass: The agent or workflow processes input data and generates outputs.

  • Evaluation: Outputs are compared to ground truth or assessed by evaluators, producing scores and semantic feedback (gradients).

  • Backward Pass: Semantic feedback is backpropagated through the computational graph to accumulate gradients for learnable parameters.

  • Parameter Update: The optimizer uses accumulated gradients to update parameters, improving the agent’s performance.


Prerequisite Code#

Before running the optimization loop, you should define your agent, dataset, and data loaders. See Datasets and DataLoaders and Build the Agent or Workflow for details.

import os

import afnio
import afnio.cognitive as cog
import afnio.tellurio as te
from afnio.models.openai import AsyncOpenAI
from afnio.utils.data import DataLoader, WeightedRandomSampler
from afnio.utils.datasets import FacilitySupport

os.environ["OPENAI_API_KEY"] = "sk-..."  # Replace with your actual key

def compute_sample_weights(data):
    with te.suppress_variable_notifications():
        labels = [y.data for _, (_, y, _) in data]
        counts = {label: labels.count(label) for label in set(labels)}
        total = len(data)
    return [total / counts[label] for label in labels]

training_data = FacilitySupport(split="train", root="data")
validation_data = FacilitySupport(split="val", root="data")
test_data = FacilitySupport(split="test", root="data")

weights = compute_sample_weights(training_data)
sampler = WeightedRandomSampler(
    weights, num_samples=len(training_data), replacement=True
)

BATCH_SIZE = 33
train_dataloader = DataLoader(training_data, sampler=sampler, batch_size=BATCH_SIZE)
val_dataloader = DataLoader(validation_data, batch_size=BATCH_SIZE, seed=42)
test_dataloader = DataLoader(test_data, batch_size=BATCH_SIZE, seed=42)

SENTIMENT_RESPONSE_FORMAT = {
    "type": "json_schema",
    "json_schema": {
        "strict": True,
        "name": "sentiment_response_schema",
        "schema": {
            "type": "object",
            "properties": {
                "sentiment": {
                    "type": "string",
                    "enum": ["positive", "neutral", "negative"],
                },
            },
            "additionalProperties": False,
            "required": ["sentiment"],
        },
    },
}

afnio.set_backward_model_client(
    "openai/gpt-5",
    completion_args={
        "temperature": 1.0,
        "max_completion_tokens": 32000,
        "reasoning_effort": "low",
    },
)
fw_model_client = AsyncOpenAI()
optim_model_client = AsyncOpenAI()

class FacilitySupportAnalyzer(cog.Module):

    def __init__(self):
        super().__init__()
        self.sentiment_task = cog.Parameter(
            data="Read the provided message and determine the sentiment.",
            role="system prompt for sentiment classification",
            requires_grad=True,
        )
        self.sentiment_user = afnio.Variable(
            data="**Message:**\n\n{message}\n\n",
            role="input template to sentiment classifier",
        )
        self.sentiment_classifier = cog.ChatCompletion()

    def forward(self, fwd_model, inputs, **completion_args):
        sentiment_messages = [
            {"role": "system", "content": [self.sentiment_task]},
            {"role": "user", "content": [self.sentiment_user]},
        ]
        return self.sentiment_classifier(
            fwd_model,
            sentiment_messages,
            inputs=inputs,
            response_format=SENTIMENT_RESPONSE_FORMAT,
            **completion_args,
        )

agent = FacilitySupportAnalyzer()

Output:

INFO     : API key provided and stored securely in local keyring.
INFO     : Currently logged in as 'username' to 'http://localhost'. Use `afnio login --relogin` to force relogin.
INFO     : Project with slug 'my-project' already exists in namespace 'username'.
Downloading https://raw.githubusercontent.com/meta-llama/llama-prompt-ops/refs/heads/main/use-cases/facility-support-analyzer/dataset.json to data/FacilitySupport/raw/dataset.json
Downloading ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 383.7/383.7 kB 1.1 MB/s 0:00:00

Using downloaded and verified file: data/FacilitySupport/raw/dataset.json

Using downloaded and verified file: data/FacilitySupport/raw/dataset.json

Hyperparameters#

Hyperparameters are adjustable parameters that let you control the agent optimization process. Different hyperparameter values can impact agent training and convergence rates.

Common hyperparameters include:

  • Number of Epochs: How many times the agent iterates over the entire dataset.

  • Batch Size: Number of samples processed together before updating parameters.

Example: Basic hyperparameter settings

MAX_EPOCHS = 5
BATCH_SIZE = 32

Other important hyperparameters in Afnio include:

  • Backward Engine Settings: The language model (LM) and its parameters (such as temperature, max tokens, reasoning effort) passed to set_backward_model_client.

  • Optimizer Settings: Parameters used by optimizers like afnio.optim.TGD, including constraints, momentum, and model selection.

Example: Setting backward engine hyperparameters

afnio.set_backward_model_client(
    "openai/gpt-5",
    completion_args={
        "temperature": 1.0,
        "max_completion_tokens": 32000,
        "reasoning_effort": "low",
    },
)

Example: Setting optimizer hyperparameters

optimizer = afnio.optim.TGD(
    agent.parameters(),
    model_client=AsyncOpenAI(),
    momentum=3,
    model="gpt-5",
    temperature=1.0,
    max_completion_tokens=32000,
    reasoning_effort="low",
)

You can quickly adjust these hyperparameters to experiment with and improve agent performance.


Optimization Loop#

Once your hyperparameters are set, you can train and optimize your agent using an optimization loop. Each cycle through the loop is called an epoch.

Every epoch typically includes two main phases:

  1. Training Loop – Iterate over the training dataset to update and improve the agent’s parameters.

  2. Validation/Test Loop – Evaluate the agent on validation or test data to monitor performance and generalization on unseen data.

Below, we’ll introduce key concepts used in the training loop.
If you prefer to see the complete workflow, you can jump ahead to the Full Implementation section.

Loss Functions and Evaluators#

In Afnio, evaluators serve as both loss functions and metrics for assessing your agent’s predictions. When you present training data to an untrained agent, its outputs may not match the desired targets. Evaluators measure how close the agent’s output is to the ground truth, providing both a numeric score and a semantic explanation (used as a gradient for optimization).

To compute the loss, you make a prediction using your agent and compare it to the true label using an evaluator. During training, you typically aim to maximize this score or minimize the error.

Common evaluators (used as loss functions) include:

Example: Initializing an evaluator for loss calculation

# Initialize the evaluator (used as a loss function)
loss_fn = cog.ExactMatchEvaluator()

Optimizer#

Optimization is the process of updating agent parameters to minimize error and improve performance during training. In Afnio, optimization logic is encapsulated in the optimizer object, which manages how parameters are adjusted based on feedback.
For example, afnio.optim.TGD uses Textual Gradient Descent to rewrite prompts using language model feedback.

To initialize the optimizer, you register the agent’s parameters to be trained and specify relevant hyperparameters and constraints.

Example: Initializing an optimizer

# Initialize optimizer constraints
constraints = [
    afnio.Variable(
        data="The improved variable must never include or reference the characters `{` or `}`. Do not output them, mention them, or describe them in any way.",
        role="optimizer constraint",
    )
]

# Initialize the optimizer
optimizer = afnio.optim.TGD(
    agent.parameters(),
    model_client=optim_model_client,
    constraints=constraints,
    momentum=3,
    model="gpt-5",
    temperature=1.0,
    max_completion_tokens=32000,
    reasoning_effort="low",
)

During each training iteration, optimization typically involves:

  1. Call optimizer.clear_grad() to reset accumulated textual gradients for agent parameters. This prevents old feedback from biasing the next training iteration.

  2. Backpropagate the loss explanation with explanation.backward(), which computes the gradients of the loss w.r.t. each parameter.

  3. Call optimizer.step() to update parameters using the newly collected gradients.


End-to-End Training Workflow#

We define train_loop that loops over our optimization code, and test_loop that evaluates the agent’s performance against our test data.

Training Loop#

A typical training loop in Afnio looks like this:

import json
import re

def train_loop(dataloader, agent, loss_fn, optimizer):
    size = len(dataloader.dataset)

    # Set the agent to training mode - important for some operations
    # Unnecessary in this situation but added for best practices
    agent.train()

    for batch, (X, y) in enumerate(dataloader):
        _, gold_sentiment, _ = y

        # Forward pass: agent processes input and generates output
        pred = agent(
            fw_model_client,
            inputs={"message": X},
            model="gpt-4.1-nano",
            temperature=0.0,
        )
        pred.data = [
            json.loads(re.sub(r"^```json\n|\n```$", "", item))["sentiment"].lower()
            for item in pred.data
        ]

        # Evaluation: compare prediction to ground truth
        loss_score, loss_explanation = loss_fn(pred, gold_sentiment)

        # Backward pass: propagate feedback
        loss_explanation.backward()

        # Update parameters using optimizer
        optimizer.step()

        # Reset gradients for next iteration
        optimizer.clear_grad()

        # Print loss and accuracy
        batch_len = len(X.data)
        current = batch * BATCH_SIZE + batch_len
        accuracy = loss_score.data / batch_len
        print(
            f"loss: {loss_score.data:>7f} - "
            f"accuracy: {accuracy:>7f}  [{current:>5d}/{size:>5d}]"
        )

Validation and Testing#

After each epoch, you can validate and test your agent to monitor performance:

def test_loop(dataloader, agent, loss_fn):
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    tot_loss, correct = 0, 0

    # Set the agent to evaluation mode - important for some operations
    # Unnecessary in this situation but added for best practices
    agent.eval()

    # Disable gradient computation during evaluation with afnio.no_grad()
    # to save memory and speed up inference
    with afnio.no_grad():
        for X, y in dataloader:
            _, gold_sentiment, _ = y

            # Forward pass: agent generates predictions for the test set
            pred = agent(
                fw_model_client,
                inputs={"message": X},
                model="gpt-4.1-nano",
                temperature=0.0,
            )
            pred.data = [
                json.loads(re.sub(r"^```json\n|\n```$", "", item))[
                    "sentiment"
                ].lower()
                for item in pred.data
            ]

            # Evaluate predictions against ground truth labels
            loss_score, _ = loss_fn(pred, gold_sentiment)

            # Accumulate loss and correct predictions
            tot_loss += loss_score.data
            correct = tot_loss

    # Print average loss and accuracy
    tot_loss /= num_batches
    accuracy = (correct / size) * 100
    print(
        f"Test Error: \n Accuracy: {(accuracy):>0.1f}%, "
        f"Avg loss: {tot_loss:>8f} \n"
    )

End-to-End Training & Evaluation#

Below is a full example showing how to combine the training and testing loops for agent optimization in Afnio:

Tip

For a simpler way to run training and testing loops, track more metrics, and monitor granular LM costs, see the Trainer page. The Trainer class automates these routines and provides additional features for experiment tracking.

loss_fn = cog.ExactMatchEvaluator()
constraints = [
    afnio.Variable(
        data="The improved variable must never include or reference the characters `{` or `}`. Do not output them, mention them, or describe them in any way.",
        role="optimizer constraint",
    )
]
optimizer = afnio.optim.TGD(
    agent.parameters(),
    model_client=optim_model_client,
    constraints=constraints,
    momentum=3,
    model="gpt-5",
    temperature=1.0,
    max_completion_tokens=32000,
    reasoning_effort="low",
)

epochs = 5
with te.init("username", "my-project"):  # replace "username" with your Tellurio Studio username (slug format)
    for t in range(epochs):
        print(f"Epoch {t+1}\n-------------------------------")
        train_loop(train_dataloader, agent, loss_fn, optimizer)
        test_loop(test_dataloader, agent, loss_fn)
    print("Done!")

Output:

Epoch 1
-------------------------------
loss: 22.000000 - accuracy: 0.666667  [   33/   66]
loss: 23.000000 - accuracy: 0.696970  [   66/   66]
Test Error:
 Accuracy: 67.6%, Avg loss: 15.333333

Epoch 2
-------------------------------
loss: 16.000000 - accuracy: 0.484848  [   33/   66]
loss: 21.000000 - accuracy: 0.636364  [   66/   66]
Test Error:
 Accuracy: 79.4%, Avg loss: 18.000000

Epoch 3
-------------------------------
loss: 22.000000 - accuracy: 0.666667  [   33/   66]
loss: 23.000000 - accuracy: 0.696970  [   66/   66]
Test Error:
 Accuracy: 69.1%, Avg loss: 15.666667

Epoch 4
-------------------------------
loss: 25.000000 - accuracy: 0.757576  [   33/   66]
loss: 23.000000 - accuracy: 0.696970  [   66/   66]
Test Error:
 Accuracy: 76.5%, Avg loss: 17.333333

Epoch 5
-------------------------------
loss: 26.000000 - accuracy: 0.787879  [   33/   66]
loss: 21.000000 - accuracy: 0.636364  [   66/   66]
Test Error:
 Accuracy: 72.1%, Avg loss: 16.333333

Done!

Further Reading#