{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "4c59c5bf",
   "metadata": {},
   "source": [
    "# QuickStart\n",
    "\n",
    "## Boosting Sentiment Detection for Enterprise Support Emails\n",
    "\n",
    "### 🏢 Overview\n",
    "A major **enterprise support team** manages thousands of facility maintenance requests via email every week. Each message can be:\n",
    "1. 😊 **Positive** — expressing satisfaction or thanks  \n",
    "2. 😐 **Neutral** — routine updates or requests  \n",
    "3. 😞 **Negative** — reporting issues or dissatisfaction  \n",
    "\n",
    "But **manual triage** is slow and inconsistent, and the team’s first AI solution struggled with accuracy — especially distinguishing between neutral and negative feedback.\n",
    "\n",
    "**Goal:** Rapidly improve sentiment classification **Accuracy** so every support request is routed and prioritized correctly, using real-world data from [Meta's Facility Support Analyzer](https://github.com/meta-llama/llama-prompt-ops/tree/main/use-cases/facility-support-analyzer) dataset.\n",
    "\n",
    "---\n",
    "\n",
    "<table width=\"100%\">\n",
    "  <tr>\n",
    "    <td style=\"vertical-align:top; width:50%\">\n",
    "      <h3>⚠️ Challenge</h3>\n",
    "      <ul>\n",
    "        <li>Imbalanced dataset: most emails are neutral or positive.</li>\n",
    "        <li>Initial AI agent accuracy: <b>66.4% ±1.5%</b> — too low for business needs.</li>\n",
    "        <li>Confusion between neutral and negative messages led to misrouted urgent issues.</li>\n",
    "      </ul>\n",
    "    </td>\n",
    "    <td style=\"vertical-align:top; width:50%\">\n",
    "      <h3>🚀 Results</h3>\n",
    "      <ul>\n",
    "        <li><b>Accuracy jumped to 80.8% ±12.5%</b> — a 14.5% absolute gain.</li>\n",
    "        <li>Neutral and negative messages are now reliably detected.</li>\n",
    "        <li>Support tickets are routed faster and more fairly.</li>\n",
    "      </ul>\n",
    "    </td>\n",
    "  </tr>\n",
    "</table>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e386426b",
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install afnio"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ff46b3c9",
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import json\n",
    "import re\n",
    "from getpass import getpass\n",
    "\n",
    "import afnio\n",
    "import afnio.cognitive as cog\n",
    "import afnio.cognitive.functional as F\n",
    "import afnio.tellurio as te\n",
    "from afnio.models.openai import AsyncOpenAI\n",
    "from afnio.trainer import Trainer\n",
    "from afnio.utils.data import DataLoader, WeightedRandomSampler\n",
    "from afnio.utils.datasets import FacilitySupport"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "38f707a2",
   "metadata": {},
   "source": [
    "### 🔑 Setup: API Keys and Project Initialization\n",
    "\n",
    "Set your OpenAI and Tellurio API keys, then initialize your project and experiment run."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "f56e3ec2",
   "metadata": {},
   "outputs": [],
   "source": [
    "if not (openai_api_key := os.getenv(\"OPENAI_API_KEY\")):\n",
    "    openai_api_key = getpass(\"🔑 Enter your OpenAI API key: \")\n",
    "os.environ[\"OPENAI_API_KEY\"] = openai_api_key"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "48e712dc",
   "metadata": {},
   "outputs": [],
   "source": [
    "if not (tellurio_api_key := os.getenv(\"TELLURIO_API_KEY\")):\n",
    "    tellurio_api_key = getpass(\"🔑 Enter your Tellurio API key: \")  # This is automatically generated at signup and visible on the Tellurio Studio overview page (or you can create a new one under `https://platform.tellurio.ai/settings/api-keys`)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "e2551ab1",
   "metadata": {},
   "outputs": [],
   "source": [
    "if not (username := os.getenv(\"TELLURIO_USERNAME\")):\n",
    "    tellurio_username = input(\"🔑 Enter your Tellurio username: \")  # Replace with your Tellurio username (in slug format). You can find this in the Tellurio Studio header bar or in the URL when logged in (e.g., `https://platform.tellurio.ai/your-username-slug`)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "a75d4aab",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[94m[afnio]\u001b[0m API key provided and stored securely in local keyring.\n",
      "\u001b[94m[afnio]\u001b[0m Currently logged in as \u001b[93m'dmpiergiacomo'\u001b[0m to \u001b[92m'https://platform.tellurio.ai'\u001b[0m. Use `afnio login --relogin` to force relogin.\n",
      "\u001b[94m[afnio]\u001b[0m Project with slug 'facility-support' already exists in namespace 'dmpiergiacomo'.\n",
      "\u001b[94m[afnio]\u001b[0m Run 'compassionate_sambar_231' created successfully at: https://platform.tellurio.ai/dmpiergiacomo/projects/facility-support/runs/compassionate-sambar-231/\n"
     ]
    }
   ],
   "source": [
    "te.configure_logging(\"INFO\")\n",
    "te.login(api_key=tellurio_api_key)\n",
    "run = te.init(tellurio_username, \"Facility Support\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "751013a9",
   "metadata": {},
   "source": [
    "### 📊 Data Preparation\n",
    "\n",
    "Balance the training set, prepare your data loaders, and get the dataset ready for training and evaluation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "0a175190",
   "metadata": {},
   "outputs": [],
   "source": [
    "# The training set is inbalanced, so we assign weights to each sample to ensure fair learning across all classes\n",
    "def compute_sample_weights(data):\n",
    "    with te.suppress_variable_notifications():\n",
    "        labels = [y.data for _, (_, y, _) in data]\n",
    "        counts = {label: labels.count(label) for label in set(labels)}\n",
    "        total = len(data)\n",
    "    return [total / counts[label] for label in labels]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "946f87ae",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Using downloaded and verified file: data/FacilitySupport/raw/dataset.json\n",
      "\n",
      "Using downloaded and verified file: data/FacilitySupport/raw/dataset.json\n",
      "\n",
      "Using downloaded and verified file: data/FacilitySupport/raw/dataset.json\n",
      "\n"
     ]
    }
   ],
   "source": [
    "BATCH_SIZE = 33\n",
    "\n",
    "training_data = FacilitySupport(split=\"train\", root=\"data\")\n",
    "validation_data = FacilitySupport(split=\"val\", root=\"data\")\n",
    "test_data = FacilitySupport(split=\"test\", root=\"data\")\n",
    "\n",
    "weights = compute_sample_weights(training_data)\n",
    "sampler = WeightedRandomSampler(weights, num_samples=len(training_data), replacement=True)\n",
    "\n",
    "train_dataloader = DataLoader(training_data, sampler=sampler, batch_size=BATCH_SIZE)\n",
    "val_dataloader = DataLoader(validation_data, batch_size=BATCH_SIZE, seed=42)\n",
    "test_dataloader = DataLoader(test_data, batch_size=BATCH_SIZE, seed=42)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ddba8c61",
   "metadata": {},
   "source": [
    "### 🧠 AI Agent Configuration\n",
    "\n",
    "Define the initial prompt, response format, LM model clients used for inference and optimization, and the sentiment classification agent."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "3ba30c76",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Start with a simple prompt. The optimizer will refine it, but it can't guess your intent—so clearly state what you want the model to do\n",
    "sentiment_task = \"Read the provided message and determine the sentiment.\"\n",
    "sentiment_user = \"Read the provided message and determine the sentiment.\\n\\n**Message:**\\n\\n{message}\\n\\n\"\n",
    "SENTIMENT_RESPONSE_FORMAT = {\n",
    "    \"type\": \"json_schema\",\n",
    "    \"json_schema\": {\n",
    "        \"strict\": True,\n",
    "        \"name\": \"sentiment_response_schema\",\n",
    "        \"schema\": {\n",
    "            \"type\": \"object\",\n",
    "            \"properties\": {\n",
    "                \"sentiment\": {\"type\": \"string\", \"enum\": [\"positive\", \"neutral\", \"negative\"]},\n",
    "            },\n",
    "            \"additionalProperties\": False,\n",
    "            \"required\": [\"sentiment\"],\n",
    "        },\n",
    "    },\n",
    "}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "2d5ca80c",
   "metadata": {},
   "outputs": [],
   "source": [
    "# We use gpt-4.1-nano for the forward pass (inference), gpt-5 for the backward pass (feeedback generation), and gpt-5 for the optimization step (prompt rewriting)\n",
    "afnio.set_backward_model_client(\"openai/gpt-5\", completion_args={\"temperature\": 1.0, \"max_completion_tokens\": 32000, \"reasoning_effort\": \"low\"})\n",
    "fw_model_client = AsyncOpenAI()\n",
    "optim_model_client = AsyncOpenAI()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "ed55d934",
   "metadata": {},
   "outputs": [],
   "source": [
    "class FacilitySupportAnalyzer(cog.Module):\n",
    "\n",
    "  def __init__(self):\n",
    "    super().__init__()\n",
    "    self.sentiment_task = cog.Parameter(data=sentiment_task, role=\"system prompt for sentiment classification\", requires_grad=True)\n",
    "    self.sentiment_user = afnio.Variable(data=sentiment_user, role=\"input template to sentiment classifier\")\n",
    "    self.sentiment_classifier = cog.ChatCompletion()\n",
    "\n",
    "  def forward(self, fwd_model, inputs, **completion_args):\n",
    "    sentiment_messages = [\n",
    "      {\"role\": \"system\", \"content\": [self.sentiment_task]},\n",
    "      {\"role\": \"user\", \"content\": [self.sentiment_user]},\n",
    "    ]\n",
    "    return self.sentiment_classifier(fwd_model, sentiment_messages, inputs=inputs, response_format=SENTIMENT_RESPONSE_FORMAT, **completion_args)\n",
    "\n",
    "  def training_step(self, batch, batch_idx):\n",
    "    X, y = batch\n",
    "    _, gold_sentiment, _ = y\n",
    "    pred_sentiment = self(fw_model_client, inputs={\"message\": X}, model=\"gpt-4.1-nano\", temperature=0.0)\n",
    "    pred_sentiment.data = [json.loads(re.sub(r\"^```json\\n|\\n```$\", \"\", item))[\"sentiment\"].lower() for item in pred_sentiment.data]\n",
    "    loss = F.exact_match_evaluator(pred_sentiment, gold_sentiment)\n",
    "    return {\"loss\": loss, \"accuracy\": loss[0].data / len(gold_sentiment.data)}\n",
    "\n",
    "  def validation_step(self, batch, batch_idx):\n",
    "    return self.training_step(batch, batch_idx)\n",
    "\n",
    "  def test_step(self, batch, batch_idx):\n",
    "    return self.validation_step(batch, batch_idx)\n",
    "\n",
    "  def configure_optimizers(self):\n",
    "    constraints = [\n",
    "      afnio.Variable(\n",
    "        data=\"The improved variable must never include or reference the characters `{` or `}`. Do not output them, mention them, or describe them in any way.\",\n",
    "        role=\"optimizer constraint\"\n",
    "      )\n",
    "    ]\n",
    "    optimizer = afnio.optim.TGD(self.parameters(), model_client=optim_model_client, constraints=constraints, momentum=3, model=\"gpt-5\", temperature=1.0, max_completion_tokens=32000, reasoning_effort=\"low\")\n",
    "    return optimizer\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d2cc1dbf",
   "metadata": {},
   "source": [
    "### 🚀 Training and Evaluation\n",
    "\n",
    "Instantiate the agent and trainer, establish baseline performance, train the agent, and validate results."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "bb4fb829",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "FacilitySupportAnalyzer(\n",
      "  (sentiment_classifier): ChatCompletion()\n",
      ")\n"
     ]
    }
   ],
   "source": [
    "agent = FacilitySupportAnalyzer()\n",
    "trainer = Trainer(max_epochs=5, enable_agent_summary=False)\n",
    "print(agent)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "76d075bd",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">Testing\n",
       "<span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">[Test] 68/68</span> <span style=\"color: #729c1f; text-decoration-color: #729c1f\">━━━━━━━━━━━━━━━━━━━━</span> <span style=\"color: #808000; text-decoration-color: #808000\">0:00:07</span> tot_cost: $0.0024  - test_loss: 17.3333 - test_accuracy: 0.6818\n",
       "</pre>\n"
      ],
      "text/plain": [
       "Testing\n",
       "\u001b[1;32m[Test] 68/68\u001b[0m \u001b[38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[33m0:00:07\u001b[0m tot_cost: $0.0024  - test_loss: 17.3333 - test_accuracy: 0.6818\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">\n",
       "</pre>\n"
      ],
      "text/plain": [
       "\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"></pre>\n"
      ],
      "text/plain": []
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": [
       "{'loss': 17.333333333333332, 'accuracy': 0.6818181818181818}"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Establish baseline performance by testing the untrained agent on the test set\n",
    "llm_clients=[fw_model_client, afnio.get_backward_model_client(), optim_model_client]\n",
    "trainer.test(agent=agent, test_dataloader=test_dataloader, llm_clients=llm_clients)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "62e992c6",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">Epoch 1/5\n",
       "  <span style=\"color: #000080; text-decoration-color: #000080; font-weight: bold\">[Training] 66/66</span> <span style=\"color: #729c1f; text-decoration-color: #729c1f\">━━━━━━━━━━━━━━━━━━━━</span> <span style=\"color: #808000; text-decoration-color: #808000\">0:01:41</span> <span style=\"color: #808000; text-decoration-color: #808000\">1.2m/step</span> tot_cost: $0.0104 train_loss: 24.5000 - train_accuracy:  \n",
       "                                                          0.7424 - val_loss: 22.0000 - val_accuracy: 0.6667        \n",
       "<span style=\"color: #800080; text-decoration-color: #800080; font-weight: bold\">[Validation] 66/66</span> <span style=\"color: #729c1f; text-decoration-color: #729c1f\">━━━━━━━━━━━━━━━━━━━━</span> <span style=\"color: #808000; text-decoration-color: #808000\">0:01:45</span>                                                                    \n",
       "\n",
       "Epoch 2/5\n",
       "  <span style=\"color: #000080; text-decoration-color: #000080; font-weight: bold\">[Training] 66/66</span> <span style=\"color: #729c1f; text-decoration-color: #729c1f\">━━━━━━━━━━━━━━━━━━━━</span> <span style=\"color: #808000; text-decoration-color: #808000\">0:01:18</span> <span style=\"color: #808000; text-decoration-color: #808000\">0.7m/step</span> tot_cost: $0.0223 train_loss: 31.5000 - train_accuracy:  \n",
       "                                                          0.9545 - val_loss: 25.5000 - val_accuracy: 0.7727        \n",
       "<span style=\"color: #800080; text-decoration-color: #800080; font-weight: bold\">[Validation] 66/66</span> <span style=\"color: #729c1f; text-decoration-color: #729c1f\">━━━━━━━━━━━━━━━━━━━━</span> <span style=\"color: #808000; text-decoration-color: #808000\">0:01:25</span>                                                                    \n",
       "\n",
       "Epoch 3/5\n",
       "  <span style=\"color: #000080; text-decoration-color: #000080; font-weight: bold\">[Training] 66/66</span> <span style=\"color: #729c1f; text-decoration-color: #729c1f\">━━━━━━━━━━━━━━━━━━━━</span> <span style=\"color: #808000; text-decoration-color: #808000\">0:03:01</span> <span style=\"color: #808000; text-decoration-color: #808000\">2.3m/step</span> tot_cost: $0.0353 train_loss: 27.0000 - train_accuracy:  \n",
       "                                                          0.8182 - val_loss: 21.0000 - val_accuracy: 0.6364        \n",
       "<span style=\"color: #800080; text-decoration-color: #800080; font-weight: bold\">[Validation] 66/66</span> <span style=\"color: #729c1f; text-decoration-color: #729c1f\">━━━━━━━━━━━━━━━━━━━━</span> <span style=\"color: #808000; text-decoration-color: #808000\">0:03:05</span>                                                                    \n",
       "\n",
       "Epoch 4/5\n",
       "  <span style=\"color: #000080; text-decoration-color: #000080; font-weight: bold\">[Training] 66/66</span> <span style=\"color: #729c1f; text-decoration-color: #729c1f\">━━━━━━━━━━━━━━━━━━━━</span> <span style=\"color: #808000; text-decoration-color: #808000\">0:03:12</span> <span style=\"color: #808000; text-decoration-color: #808000\">2.4m/step</span> tot_cost: $0.0479 train_loss: 23.5000 - train_accuracy:  \n",
       "                                                          0.7121 - val_loss: 20.0000 - val_accuracy: 0.6061        \n",
       "<span style=\"color: #800080; text-decoration-color: #800080; font-weight: bold\">[Validation] 66/66</span> <span style=\"color: #729c1f; text-decoration-color: #729c1f\">━━━━━━━━━━━━━━━━━━━━</span> <span style=\"color: #808000; text-decoration-color: #808000\">0:03:15</span>                                                                    \n",
       "\n",
       "Epoch 5/5\n",
       "  <span style=\"color: #000080; text-decoration-color: #000080; font-weight: bold\">[Training] 66/66</span> <span style=\"color: #729c1f; text-decoration-color: #729c1f\">━━━━━━━━━━━━━━━━━━━━</span> <span style=\"color: #808000; text-decoration-color: #808000\">0:03:00</span> <span style=\"color: #808000; text-decoration-color: #808000\">2.2m/step</span> tot_cost: $0.0628 train_loss: 23.0000 - train_accuracy:  \n",
       "                                                          0.6970 - val_loss: 22.5000 - val_accuracy: 0.6818        \n",
       "<span style=\"color: #800080; text-decoration-color: #800080; font-weight: bold\">[Validation] 66/66</span> <span style=\"color: #729c1f; text-decoration-color: #729c1f\">━━━━━━━━━━━━━━━━━━━━</span> <span style=\"color: #808000; text-decoration-color: #808000\">0:03:04</span>                                                                    \n",
       "</pre>\n"
      ],
      "text/plain": [
       "Epoch 1/5\n",
       "  \u001b[1;34m[Training] 66/66\u001b[0m \u001b[38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[33m0:01:41\u001b[0m \u001b[33m1.2m/step\u001b[0m tot_cost: $0.0104 train_loss: 24.5000 - train_accuracy:  \n",
       "                                                          0.7424 - val_loss: 22.0000 - val_accuracy: 0.6667        \n",
       "\u001b[1;35m[Validation] 66/66\u001b[0m \u001b[38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[33m0:01:45\u001b[0m                                                                    \n",
       "\n",
       "Epoch 2/5\n",
       "  \u001b[1;34m[Training] 66/66\u001b[0m \u001b[38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[33m0:01:18\u001b[0m \u001b[33m0.7m/step\u001b[0m tot_cost: $0.0223 train_loss: 31.5000 - train_accuracy:  \n",
       "                                                          0.9545 - val_loss: 25.5000 - val_accuracy: 0.7727        \n",
       "\u001b[1;35m[Validation] 66/66\u001b[0m \u001b[38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[33m0:01:25\u001b[0m                                                                    \n",
       "\n",
       "Epoch 3/5\n",
       "  \u001b[1;34m[Training] 66/66\u001b[0m \u001b[38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[33m0:03:01\u001b[0m \u001b[33m2.3m/step\u001b[0m tot_cost: $0.0353 train_loss: 27.0000 - train_accuracy:  \n",
       "                                                          0.8182 - val_loss: 21.0000 - val_accuracy: 0.6364        \n",
       "\u001b[1;35m[Validation] 66/66\u001b[0m \u001b[38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[33m0:03:05\u001b[0m                                                                    \n",
       "\n",
       "Epoch 4/5\n",
       "  \u001b[1;34m[Training] 66/66\u001b[0m \u001b[38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[33m0:03:12\u001b[0m \u001b[33m2.4m/step\u001b[0m tot_cost: $0.0479 train_loss: 23.5000 - train_accuracy:  \n",
       "                                                          0.7121 - val_loss: 20.0000 - val_accuracy: 0.6061        \n",
       "\u001b[1;35m[Validation] 66/66\u001b[0m \u001b[38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[33m0:03:15\u001b[0m                                                                    \n",
       "\n",
       "Epoch 5/5\n",
       "  \u001b[1;34m[Training] 66/66\u001b[0m \u001b[38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[33m0:03:00\u001b[0m \u001b[33m2.2m/step\u001b[0m tot_cost: $0.0628 train_loss: 23.0000 - train_accuracy:  \n",
       "                                                          0.6970 - val_loss: 22.5000 - val_accuracy: 0.6818        \n",
       "\u001b[1;35m[Validation] 66/66\u001b[0m \u001b[38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[33m0:03:04\u001b[0m                                                                    \n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">\n",
       "</pre>\n"
      ],
      "text/plain": [
       "\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"></pre>\n"
      ],
      "text/plain": []
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Train the agent and validate results\n",
    "trainer.fit(agent=agent, train_dataloader=train_dataloader, val_dataloader=val_dataloader, llm_clients=llm_clients)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "de73ca18",
   "metadata": {},
   "source": [
    "### 🏅 Loading and Testing the Optimized AI Agent\n",
    "\n",
    "<div style=\"background-color:#ffe066; color:#000; border-left:4px solid #ffd700; padding:0.75em 1em; margin-bottom:1em;\">\n",
    "<b>Tip:</b> The <b>best</b> checkpoint is the one with the highest <code>val_accuracy</code> (accuracy on validation set) during training. You can find its filename in the automatically created <code>checkpoints/</code> directory.\n",
    "</div>\n",
    "\n",
    "Load the best agent checkpoint, evaluate on the test set, and display the final results."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "477aabfc",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Only run this if you want to download our reference checkpoint\n",
    "checkpoint_path = 'checkpoints/checkpoint_epoch2_20250912-190039.hf'\n",
    "if not os.path.exists(checkpoint_path):\n",
    "  !mkdir -p checkpoints\n",
    "  !wget https://github.com/Tellurio-AI/tutorials/raw/main/facility_support/checkpoints/checkpoint_epoch2_20250912-190039.hf -P checkpoints/"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "df116212",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<All keys matched successfully>"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "checkpoint = afnio.load(\"checkpoints/checkpoint_epoch2_20250912-190039.hf\")  # Replace with your best checkpoint path, or use our reference checkpoint (downloaded with the previous cell)\n",
    "best_agent = FacilitySupportAnalyzer()\n",
    "best_agent.load_state_dict(\n",
    "    checkpoint['agent_state_dict'],\n",
    "    model_clients={\n",
    "        \"sentiment_classifier.forward_model_client\": fw_model_client,\n",
    "    }\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "3d0708c2",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">Testing\n",
       "<span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">[Test] 68/68</span> <span style=\"color: #729c1f; text-decoration-color: #729c1f\">━━━━━━━━━━━━━━━━━━━━</span> <span style=\"color: #808000; text-decoration-color: #808000\">0:00:04</span> tot_cost: $0.0697  - test_loss: 19.3333 - test_accuracy: 0.8990\n",
       "</pre>\n"
      ],
      "text/plain": [
       "Testing\n",
       "\u001b[1;32m[Test] 68/68\u001b[0m \u001b[38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[33m0:00:04\u001b[0m tot_cost: $0.0697  - test_loss: 19.3333 - test_accuracy: 0.8990\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">\n",
       "</pre>\n"
      ],
      "text/plain": [
       "\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"></pre>\n"
      ],
      "text/plain": []
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": [
       "{'loss': 19.333333333333332, 'accuracy': 0.8989898989898991}"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Test the best agent checkpoint on the test set\n",
    "trainer.test(agent=best_agent, test_dataloader=test_dataloader, llm_clients=llm_clients)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b1fb8e5c",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "\n",
       "<table style=\"width:100%;border-collapse:collapse;\">\n",
       "  <tr>\n",
       "    <th style=\"text-align:left;background-color:#e0e0e0; color:#222;font-weight:bold;\">BEFORE OPTIMIZATION</th>\n",
       "    <th style=\"text-align:left;background-color:#e0e0e0; color:#222;font-weight:bold;\">AFTER OPTIMIZATION</th>\n",
       "  </tr>\n",
       "  <tr>\n",
       "    <td style=\"text-align:left;vertical-align:top;word-break:break-word;\">\n",
       "      <pre style=\"margin:0;white-space:pre-wrap;word-break:break-word;\">Read the provided message and determine the sentiment.</pre>\n",
       "    </td>\n",
       "    <td style=\"text-align:left;vertical-align:top;word-break:break-word;\">\n",
       "      <pre style=\"margin:0;white-space:pre-wrap;word-break:break-word;\">You are a sentiment classifier. Read the provided message and output exactly one of: positive, negative, neutral — all lowercase, no punctuation, no extra text or spaces.\n",
       "\n",
       "Scope: Judge the author’s expressed sentiment toward the subject of the message (e.g., the company, product, service, or the issue described), not the topic content itself, roles/titles, greetings, or urgency alone.\n",
       "\n",
       "Decision rules:\n",
       "- If polarity evidence is weak, mixed, contradictory, or evenly balanced, output neutral.\n",
       "- Only output positive or negative when one clearly outweighs the other by intensity or count.\n",
       "\n",
       "Positive vs neutral boundary:\n",
       "- Positive only when there is clear, unambiguous, and sufficiently strong praise directed at the provider/service outcome (e.g., explicit evaluatives such as love, thrilled, amazing, excellent, fantastic, flawless, top-notch) and there are no concurrent concerns.\n",
       "- Default to neutral for inquiries, status updates, logistics, generic politeness or thanks without evaluative content, hedged or weak praise (okay, fine, pretty good, satisfied client, pleased), expressions of uncertainty, or mixed messages where positives do not clearly dominate by intensity or count.\n",
       "\n",
       "Cue handling:\n",
       "- Aggregate polarity cues across the entire message; account for intensifiers and negations.\n",
       "- Treat factual status updates, informational messages, inquiries, or logistical requests as neutral unless explicit sentiment is expressed, even if they include politeness or generic praise (e.g., thanks, appreciate your support, top-notch service).\n",
       "\n",
       "Negation and modifier guidance:\n",
       "- Negative: not good, not impressed, frustrated, unacceptable, skeptical.\n",
       "- Usually neutral unless accompanied by strong positive cues: not bad, mild or weak praise such as satisfied client or pleased, okay, fine, pretty good.\n",
       "\n",
       "Mixed or multi-issue messages:\n",
       "- If praise co-occurs with requests or concerns and neither side clearly dominates, choose neutral.\n",
       "- Choose neutral unless multiple strong positive indicators outweigh any negatives and there are no explicit negative cues.\n",
       "- If different parts convey opposing sentiments and there is no clear majority by intensity or count, choose neutral.\n",
       "\n",
       "Operational rule:\n",
       "- Aggregate cues with negation and intensifiers; label positive only if net positive clearly exceeds negative by a high margin or there is at least one strong positive indicator (superlatives, emphatic adverbs, exclamatory emphasis) directed at the subject; otherwise neutral.\n",
       "\n",
       "Examples (message → label):\n",
       "- Thanks for the quick reply; can you update the ticket by tomorrow? → neutral\n",
       "- Appreciate your support. Please fix the recurring billing error. → neutral\n",
       "- Top-notch service on the last order, but this one arrived damaged. → neutral\n",
       "- I’m pleased with the app overall, just a few minor issues to resolve. → neutral\n",
       "- Pretty good overall. → neutral\n",
       "- Not good — the installer keeps crashing. → negative\n",
       "- I’m not impressed with your response times. → negative\n",
       "- This delay is unacceptable and very frustrating. → negative\n",
       "- It’s not bad. → neutral\n",
       "- Absolutely love the new update; everything works flawlessly. → positive\n",
       "- Amazing job! → positive\n",
       "\n",
       "Output format reminder:\n",
       "- Emit exactly one of the following labels: positive or negative or neutral — all lowercase, no punctuation, no extra text or spaces.\n",
       "- Trim any leading/trailing whitespace or newlines before finalizing the single-word output.</pre>\n",
       "    </td>\n",
       "  </tr>\n",
       "</table>\n"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Compare the agent's prompt before and after training\n",
    "from IPython.display import display, HTML\n",
    "\n",
    "display(HTML(f\"\"\"\n",
    "<table style=\"width:100%;border-collapse:collapse;\">\n",
    "  <tr>\n",
    "    <th style=\"text-align:left;background-color:#e0e0e0; color:#222;font-weight:bold;\">BEFORE OPTIMIZATION</th>\n",
    "    <th style=\"text-align:left;background-color:#e0e0e0; color:#222;font-weight:bold;\">AFTER OPTIMIZATION</th>\n",
    "  </tr>\n",
    "  <tr>\n",
    "    <td style=\"text-align:left;vertical-align:top;word-break:break-word;\">\n",
    "      <pre style=\"margin:0;white-space:pre-wrap;word-break:break-word;\">{sentiment_task}</pre>\n",
    "    </td>\n",
    "    <td style=\"text-align:left;vertical-align:top;word-break:break-word;\">\n",
    "      <pre style=\"margin:0;white-space:pre-wrap;word-break:break-word;\">{best_agent.sentiment_task.data}</pre>\n",
    "    </td>\n",
    "  </tr>\n",
    "</table>\n",
    "\"\"\"))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "b645d44b",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[94m[afnio]\u001b[0m Run 'compassionate_sambar_231' marked as COMPLETED.\n"
     ]
    }
   ],
   "source": [
    "run.finish()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": ".venv",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}