# Optimization Loop ```{warning} Before running any code, ensure you are logged in to the Afnio backend (`afnio login`).. See [Logging in to Afnio Backend](login) for details. ``` ```{tip} Afnio lets you build custom training, validation, and test loops for full control over your agent’s optimization process. However, if you prefer a ready-made solution, Afnio provides the [`Trainer`](../../generated/afnio.trainer) class, which handles standard training, validation, and testing routines out of the box. If you want to get started quickly, see [Trainer](trainer) for details and usage examples. ``` Training an AI agent or workflow is an iterative process: in each iteration, the agent makes a guess about the output, calculates the error in its guess (loss), collects feedback with respect to its parameters (see [Automatic Differentiation](automatic_differentiation)), and optimizes these parameters to improve future predictions. In Afnio, this means iteratively refining parameters—such as prompts, templates, or logic—based on feedback from evaluators or loss functions to better achieve your desired outcomes. A typical optimization loop in Afnio consists of: - **Forward Pass:** The agent or workflow processes input data and generates outputs. - **Evaluation:** Outputs are compared to ground truth or assessed by evaluators, producing scores and semantic feedback (gradients). - **Backward Pass:** Semantic feedback is backpropagated through the computational graph to accumulate gradients for learnable parameters. - **Parameter Update:** The optimizer uses accumulated gradients to update parameters, improving the agent’s performance. --- ## Prerequisite Code Before running the optimization loop, you should define your agent, dataset, and data loaders. See [Datasets and DataLoaders](datasets_and_dataloaders) and [Build the Agent or Workflow](build_agent_workflow) for details. ```python import os import afnio import afnio.cognitive as cog import afnio.tellurio as te from afnio.models.openai import AsyncOpenAI from afnio.utils.data import DataLoader, WeightedRandomSampler from afnio.utils.datasets import FacilitySupport os.environ["OPENAI_API_KEY"] = "sk-..." # Replace with your actual key def compute_sample_weights(data): with te.suppress_variable_notifications(): labels = [y.data for _, (_, y, _) in data] counts = {label: labels.count(label) for label in set(labels)} total = len(data) return [total / counts[label] for label in labels] training_data = FacilitySupport(split="train", root="data") validation_data = FacilitySupport(split="val", root="data") test_data = FacilitySupport(split="test", root="data") weights = compute_sample_weights(training_data) sampler = WeightedRandomSampler( weights, num_samples=len(training_data), replacement=True ) BATCH_SIZE = 33 train_dataloader = DataLoader(training_data, sampler=sampler, batch_size=BATCH_SIZE) val_dataloader = DataLoader(validation_data, batch_size=BATCH_SIZE, seed=42) test_dataloader = DataLoader(test_data, batch_size=BATCH_SIZE, seed=42) SENTIMENT_RESPONSE_FORMAT = { "type": "json_schema", "json_schema": { "strict": True, "name": "sentiment_response_schema", "schema": { "type": "object", "properties": { "sentiment": { "type": "string", "enum": ["positive", "neutral", "negative"], }, }, "additionalProperties": False, "required": ["sentiment"], }, }, } afnio.set_backward_model_client( "openai/gpt-5", completion_args={ "temperature": 1.0, "max_completion_tokens": 32000, "reasoning_effort": "low", }, ) fw_model_client = AsyncOpenAI() optim_model_client = AsyncOpenAI() class FacilitySupportAnalyzer(cog.Module): def __init__(self): super().__init__() self.sentiment_task = cog.Parameter( data="Read the provided message and determine the sentiment.", role="system prompt for sentiment classification", requires_grad=True, ) self.sentiment_user = afnio.Variable( data="**Message:**\n\n{message}\n\n", role="input template to sentiment classifier", ) self.sentiment_classifier = cog.ChatCompletion() def forward(self, fwd_model, inputs, **completion_args): sentiment_messages = [ {"role": "system", "content": [self.sentiment_task]}, {"role": "user", "content": [self.sentiment_user]}, ] return self.sentiment_classifier( fwd_model, sentiment_messages, inputs=inputs, response_format=SENTIMENT_RESPONSE_FORMAT, **completion_args, ) agent = FacilitySupportAnalyzer() ``` _Output:_ ```output INFO : API key provided and stored securely in local keyring. INFO : Currently logged in as 'username' to 'http://localhost'. Use `afnio login --relogin` to force relogin. INFO : Project with slug 'my-project' already exists in namespace 'username'. Downloading https://raw.githubusercontent.com/meta-llama/llama-prompt-ops/refs/heads/main/use-cases/facility-support-analyzer/dataset.json to data/FacilitySupport/raw/dataset.json Downloading ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 383.7/383.7 kB 1.1 MB/s 0:00:00 Using downloaded and verified file: data/FacilitySupport/raw/dataset.json Using downloaded and verified file: data/FacilitySupport/raw/dataset.json ``` --- ## Hyperparameters Hyperparameters are adjustable parameters that let you control the agent optimization process. Different hyperparameter values can impact agent training and convergence rates. Common hyperparameters include: - **Number of Epochs:** How many times the agent iterates over the entire dataset. - **Batch Size:** Number of samples processed together before updating parameters. **Example: Basic hyperparameter settings** ```python MAX_EPOCHS = 5 BATCH_SIZE = 32 ``` Other important hyperparameters in Afnio include: - **Backward Engine Settings:** The language model (LM) and its parameters (such as temperature, max tokens, reasoning effort) passed to `set_backward_model_client`. - **Optimizer Settings:** Parameters used by optimizers like `afnio.optim.TGD`, including constraints, momentum, and model selection. **Example: Setting backward engine hyperparameters** ```python afnio.set_backward_model_client( "openai/gpt-5", completion_args={ "temperature": 1.0, "max_completion_tokens": 32000, "reasoning_effort": "low", }, ) ``` **Example: Setting optimizer hyperparameters** ```python optimizer = afnio.optim.TGD( agent.parameters(), model_client=AsyncOpenAI(), momentum=3, model="gpt-5", temperature=1.0, max_completion_tokens=32000, reasoning_effort="low", ) ``` You can quickly adjust these hyperparameters to experiment with and improve agent performance. --- ## Optimization Loop Once your hyperparameters are set, you can train and optimize your agent using an optimization loop. Each cycle through the loop is called an epoch. Every epoch typically includes two main phases: 1. **Training Loop** – Iterate over the training dataset to update and improve the agent’s parameters. 2. **Validation/Test Loop** – Evaluate the agent on validation or test data to monitor performance and generalization on unseen data. Below, we’ll introduce key concepts used in the training loop. If you prefer to see the complete workflow, you can jump ahead to the [Full Implementation](#full-implementation) section. ### Loss Functions and Evaluators In Afnio, **evaluators** serve as both loss functions and metrics for assessing your agent’s predictions. When you present training data to an untrained agent, its outputs may not match the desired targets. Evaluators measure how close the agent’s output is to the ground truth, providing both a numeric score and a semantic explanation (used as a gradient for optimization). To compute the loss, you make a prediction using your agent and compare it to the true label using an evaluator. During training, you typically aim to maximize this score or minimize the error. Common evaluators (used as loss functions) include: - [`cog.ExactMatchEvaluator`](../../generated/afnio.cognitive.modules.exact_match_evaluator): for classification tasks. - [`cog.DeterministicEvaluator`](../../generated/afnio.cognitive.modules.deterministic_evaluator): for custom deterministic criteria. - [`cog.LMJudgeEvaluator`](../../generated/afnio.cognitive.modules.lm_judge_evaluator): for semantic or qualitative evaluation. **Example: Initializing an evaluator for loss calculation** ```python # Initialize the evaluator (used as a loss function) loss_fn = cog.ExactMatchEvaluator() ``` ### Optimizer Optimization is the process of updating agent parameters to minimize error and improve performance during training. In Afnio, optimization logic is encapsulated in the `optimizer` object, which manages how parameters are adjusted based on feedback. For example, [`afnio.optim.TGD`](../../generated/afnio.optim) uses [Textual Gradient Descent](https://arxiv.org/abs/2406.07496) to rewrite prompts using language model feedback. To initialize the optimizer, you register the agent’s parameters to be trained and specify relevant hyperparameters and constraints. **Example: Initializing an optimizer** ```python # Initialize optimizer constraints constraints = [ afnio.Variable( data="The improved variable must never include or reference the characters `{` or `}`. Do not output them, mention them, or describe them in any way.", role="optimizer constraint", ) ] # Initialize the optimizer optimizer = afnio.optim.TGD( agent.parameters(), model_client=optim_model_client, constraints=constraints, momentum=3, model="gpt-5", temperature=1.0, max_completion_tokens=32000, reasoning_effort="low", ) ``` During each training iteration, optimization typically involves: 1. Call `optimizer.clear_grad()` to reset accumulated textual gradients for agent parameters. This prevents old feedback from biasing the next training iteration. 2. Backpropagate the loss explanation with `explanation.backward()`, which computes the gradients of the loss w.r.t. each parameter. 3. Call `optimizer.step()` to update parameters using the newly collected gradients. --- ## End-to-End Training Workflow We define `train_loop` that loops over our optimization code, and `test_loop` that evaluates the agent’s performance against our test data. ### Training Loop A typical training loop in Afnio looks like this: ````python import json import re def train_loop(dataloader, agent, loss_fn, optimizer): size = len(dataloader.dataset) # Set the agent to training mode - important for some operations # Unnecessary in this situation but added for best practices agent.train() for batch, (X, y) in enumerate(dataloader): _, gold_sentiment, _ = y # Forward pass: agent processes input and generates output pred = agent( fw_model_client, inputs={"message": X}, model="gpt-4.1-nano", temperature=0.0, ) pred.data = [ json.loads(re.sub(r"^```json\n|\n```$", "", item))["sentiment"].lower() for item in pred.data ] # Evaluation: compare prediction to ground truth loss_score, loss_explanation = loss_fn(pred, gold_sentiment) # Backward pass: propagate feedback loss_explanation.backward() # Update parameters using optimizer optimizer.step() # Reset gradients for next iteration optimizer.clear_grad() # Print loss and accuracy batch_len = len(X.data) current = batch * BATCH_SIZE + batch_len accuracy = loss_score.data / batch_len print( f"loss: {loss_score.data:>7f} - " f"accuracy: {accuracy:>7f} [{current:>5d}/{size:>5d}]" ) ```` --- ### Validation and Testing After each epoch, you can validate and test your agent to monitor performance: ````python def test_loop(dataloader, agent, loss_fn): size = len(dataloader.dataset) num_batches = len(dataloader) tot_loss, correct = 0, 0 # Set the agent to evaluation mode - important for some operations # Unnecessary in this situation but added for best practices agent.eval() # Disable gradient computation during evaluation with afnio.no_grad() # to save memory and speed up inference with afnio.no_grad(): for X, y in dataloader: _, gold_sentiment, _ = y # Forward pass: agent generates predictions for the test set pred = agent( fw_model_client, inputs={"message": X}, model="gpt-4.1-nano", temperature=0.0, ) pred.data = [ json.loads(re.sub(r"^```json\n|\n```$", "", item))[ "sentiment" ].lower() for item in pred.data ] # Evaluate predictions against ground truth labels loss_score, _ = loss_fn(pred, gold_sentiment) # Accumulate loss and correct predictions tot_loss += loss_score.data correct = tot_loss # Print average loss and accuracy tot_loss /= num_batches accuracy = (correct / size) * 100 print( f"Test Error: \n Accuracy: {(accuracy):>0.1f}%, " f"Avg loss: {tot_loss:>8f} \n" ) ```` ### End-to-End Training & Evaluation Below is a full example showing how to combine the training and testing loops for agent optimization in Afnio: ```{tip} For a simpler way to run training and testing loops, track more metrics, and monitor granular LM costs, see the [Trainer](trainer) page. The Trainer class automates these routines and provides additional features for experiment tracking. ``` ```python loss_fn = cog.ExactMatchEvaluator() constraints = [ afnio.Variable( data="The improved variable must never include or reference the characters `{` or `}`. Do not output them, mention them, or describe them in any way.", role="optimizer constraint", ) ] optimizer = afnio.optim.TGD( agent.parameters(), model_client=optim_model_client, constraints=constraints, momentum=3, model="gpt-5", temperature=1.0, max_completion_tokens=32000, reasoning_effort="low", ) epochs = 5 with te.init("username", "my-project"): # replace "username" with your Tellurio Studio username (slug format) for t in range(epochs): print(f"Epoch {t+1}\n-------------------------------") train_loop(train_dataloader, agent, loss_fn, optimizer) test_loop(test_dataloader, agent, loss_fn) print("Done!") ``` _Output:_ ```output Epoch 1 ------------------------------- loss: 22.000000 - accuracy: 0.666667 [ 33/ 66] loss: 23.000000 - accuracy: 0.696970 [ 66/ 66] Test Error: Accuracy: 67.6%, Avg loss: 15.333333 Epoch 2 ------------------------------- loss: 16.000000 - accuracy: 0.484848 [ 33/ 66] loss: 21.000000 - accuracy: 0.636364 [ 66/ 66] Test Error: Accuracy: 79.4%, Avg loss: 18.000000 Epoch 3 ------------------------------- loss: 22.000000 - accuracy: 0.666667 [ 33/ 66] loss: 23.000000 - accuracy: 0.696970 [ 66/ 66] Test Error: Accuracy: 69.1%, Avg loss: 15.666667 Epoch 4 ------------------------------- loss: 25.000000 - accuracy: 0.757576 [ 33/ 66] loss: 23.000000 - accuracy: 0.696970 [ 66/ 66] Test Error: Accuracy: 76.5%, Avg loss: 17.333333 Epoch 5 ------------------------------- loss: 26.000000 - accuracy: 0.787879 [ 33/ 66] loss: 21.000000 - accuracy: 0.636364 [ 66/ 66] Test Error: Accuracy: 72.1%, Avg loss: 16.333333 Done! ``` --- ## Further Reading - [Trainer](trainer) - [Save, Load and Use Agent](save_load_use_agent) - [Runs and Experiments](runs_and_experiments)