{ "cells": [ { "cell_type": "markdown", "id": "4c59c5bf", "metadata": {}, "source": [ "# QuickStart\n", "\n", "## Boosting Sentiment Detection for Enterprise Support Emails\n", "\n", "### π’ Overview\n", "A major **enterprise support team** manages thousands of facility maintenance requests via email every week. Each message can be:\n", "1. π **Positive** β expressing satisfaction or thanks \n", "2. π **Neutral** β routine updates or requests \n", "3. π **Negative** β reporting issues or dissatisfaction \n", "\n", "But **manual triage** is slow and inconsistent, and the teamβs first AI solution struggled with accuracy β especially distinguishing between neutral and negative feedback.\n", "\n", "**Goal:** Rapidly improve sentiment classification **Accuracy** so every support request is routed and prioritized correctly, using real-world data from [Meta's Facility Support Analyzer](https://github.com/meta-llama/llama-prompt-ops/tree/main/use-cases/facility-support-analyzer) dataset.\n", "\n", "---\n", "\n", "
\n",
" β οΈ Challenge\n", "
| \n",
" \n",
" π Results\n", "
| \n",
"
Testing\n",
"[Test] 68/68 ββββββββββββββββββββ 0:00:07 tot_cost: $0.0024 - test_loss: 17.3333 - test_accuracy: 0.6818\n",
"\n"
],
"text/plain": [
"Testing\n",
"\u001b[1;32m[Test] 68/68\u001b[0m \u001b[38;2;114;156;31mββββββββββββββββββββ\u001b[0m \u001b[33m0:00:07\u001b[0m tot_cost: $0.0024 - test_loss: 17.3333 - test_accuracy: 0.6818\n"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"\n",
"\n"
],
"text/plain": [
"\n"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"\n"
],
"text/plain": []
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/plain": [
"{'loss': 17.333333333333332, 'accuracy': 0.6818181818181818}"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Establish baseline performance by testing the untrained agent on the test set\n",
"llm_clients=[fw_model_client, afnio.get_backward_model_client(), optim_model_client]\n",
"trainer.test(agent=agent, test_dataloader=test_dataloader, llm_clients=llm_clients)"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "62e992c6",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"Epoch 1/5\n",
" [Training] 66/66 ββββββββββββββββββββ 0:01:41 1.2m/step tot_cost: $0.0104 train_loss: 24.5000 - train_accuracy: \n",
" 0.7424 - val_loss: 22.0000 - val_accuracy: 0.6667 \n",
"[Validation] 66/66 ββββββββββββββββββββ 0:01:45 \n",
"\n",
"Epoch 2/5\n",
" [Training] 66/66 ββββββββββββββββββββ 0:01:18 0.7m/step tot_cost: $0.0223 train_loss: 31.5000 - train_accuracy: \n",
" 0.9545 - val_loss: 25.5000 - val_accuracy: 0.7727 \n",
"[Validation] 66/66 ββββββββββββββββββββ 0:01:25 \n",
"\n",
"Epoch 3/5\n",
" [Training] 66/66 ββββββββββββββββββββ 0:03:01 2.3m/step tot_cost: $0.0353 train_loss: 27.0000 - train_accuracy: \n",
" 0.8182 - val_loss: 21.0000 - val_accuracy: 0.6364 \n",
"[Validation] 66/66 ββββββββββββββββββββ 0:03:05 \n",
"\n",
"Epoch 4/5\n",
" [Training] 66/66 ββββββββββββββββββββ 0:03:12 2.4m/step tot_cost: $0.0479 train_loss: 23.5000 - train_accuracy: \n",
" 0.7121 - val_loss: 20.0000 - val_accuracy: 0.6061 \n",
"[Validation] 66/66 ββββββββββββββββββββ 0:03:15 \n",
"\n",
"Epoch 5/5\n",
" [Training] 66/66 ββββββββββββββββββββ 0:03:00 2.2m/step tot_cost: $0.0628 train_loss: 23.0000 - train_accuracy: \n",
" 0.6970 - val_loss: 22.5000 - val_accuracy: 0.6818 \n",
"[Validation] 66/66 ββββββββββββββββββββ 0:03:04 \n",
"\n"
],
"text/plain": [
"Epoch 1/5\n",
" \u001b[1;34m[Training] 66/66\u001b[0m \u001b[38;2;114;156;31mββββββββββββββββββββ\u001b[0m \u001b[33m0:01:41\u001b[0m \u001b[33m1.2m/step\u001b[0m tot_cost: $0.0104 train_loss: 24.5000 - train_accuracy: \n",
" 0.7424 - val_loss: 22.0000 - val_accuracy: 0.6667 \n",
"\u001b[1;35m[Validation] 66/66\u001b[0m \u001b[38;2;114;156;31mββββββββββββββββββββ\u001b[0m \u001b[33m0:01:45\u001b[0m \n",
"\n",
"Epoch 2/5\n",
" \u001b[1;34m[Training] 66/66\u001b[0m \u001b[38;2;114;156;31mββββββββββββββββββββ\u001b[0m \u001b[33m0:01:18\u001b[0m \u001b[33m0.7m/step\u001b[0m tot_cost: $0.0223 train_loss: 31.5000 - train_accuracy: \n",
" 0.9545 - val_loss: 25.5000 - val_accuracy: 0.7727 \n",
"\u001b[1;35m[Validation] 66/66\u001b[0m \u001b[38;2;114;156;31mββββββββββββββββββββ\u001b[0m \u001b[33m0:01:25\u001b[0m \n",
"\n",
"Epoch 3/5\n",
" \u001b[1;34m[Training] 66/66\u001b[0m \u001b[38;2;114;156;31mββββββββββββββββββββ\u001b[0m \u001b[33m0:03:01\u001b[0m \u001b[33m2.3m/step\u001b[0m tot_cost: $0.0353 train_loss: 27.0000 - train_accuracy: \n",
" 0.8182 - val_loss: 21.0000 - val_accuracy: 0.6364 \n",
"\u001b[1;35m[Validation] 66/66\u001b[0m \u001b[38;2;114;156;31mββββββββββββββββββββ\u001b[0m \u001b[33m0:03:05\u001b[0m \n",
"\n",
"Epoch 4/5\n",
" \u001b[1;34m[Training] 66/66\u001b[0m \u001b[38;2;114;156;31mββββββββββββββββββββ\u001b[0m \u001b[33m0:03:12\u001b[0m \u001b[33m2.4m/step\u001b[0m tot_cost: $0.0479 train_loss: 23.5000 - train_accuracy: \n",
" 0.7121 - val_loss: 20.0000 - val_accuracy: 0.6061 \n",
"\u001b[1;35m[Validation] 66/66\u001b[0m \u001b[38;2;114;156;31mββββββββββββββββββββ\u001b[0m \u001b[33m0:03:15\u001b[0m \n",
"\n",
"Epoch 5/5\n",
" \u001b[1;34m[Training] 66/66\u001b[0m \u001b[38;2;114;156;31mββββββββββββββββββββ\u001b[0m \u001b[33m0:03:00\u001b[0m \u001b[33m2.2m/step\u001b[0m tot_cost: $0.0628 train_loss: 23.0000 - train_accuracy: \n",
" 0.6970 - val_loss: 22.5000 - val_accuracy: 0.6818 \n",
"\u001b[1;35m[Validation] 66/66\u001b[0m \u001b[38;2;114;156;31mββββββββββββββββββββ\u001b[0m \u001b[33m0:03:04\u001b[0m \n"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"\n",
"\n"
],
"text/plain": [
"\n"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"\n"
],
"text/plain": []
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Train the agent and validate results\n",
"trainer.fit(agent=agent, train_dataloader=train_dataloader, val_dataloader=val_dataloader, llm_clients=llm_clients)"
]
},
{
"cell_type": "markdown",
"id": "de73ca18",
"metadata": {},
"source": [
"### π
Loading and Testing the Optimized AI Agent\n",
"\n",
"val_accuracy (accuracy on validation set) during training. You can find its filename in the automatically created checkpoints/ directory.\n",
"Testing\n",
"[Test] 68/68 ββββββββββββββββββββ 0:00:04 tot_cost: $0.0697 - test_loss: 19.3333 - test_accuracy: 0.8990\n",
"\n"
],
"text/plain": [
"Testing\n",
"\u001b[1;32m[Test] 68/68\u001b[0m \u001b[38;2;114;156;31mββββββββββββββββββββ\u001b[0m \u001b[33m0:00:04\u001b[0m tot_cost: $0.0697 - test_loss: 19.3333 - test_accuracy: 0.8990\n"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"\n",
"\n"
],
"text/plain": [
"\n"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"\n"
],
"text/plain": []
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/plain": [
"{'loss': 19.333333333333332, 'accuracy': 0.8989898989898991}"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Test the best agent checkpoint on the test set\n",
"trainer.test(agent=best_agent, test_dataloader=test_dataloader, llm_clients=llm_clients)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b1fb8e5c",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"| BEFORE OPTIMIZATION | \n", "AFTER OPTIMIZATION | \n", "
|---|---|
\n",
" Read the provided message and determine the sentiment.\n", " | \n",
" \n",
" You are a sentiment classifier. Read the provided message and output exactly one of: positive, negative, neutral β all lowercase, no punctuation, no extra text or spaces.\n",
"\n",
"Scope: Judge the authorβs expressed sentiment toward the subject of the message (e.g., the company, product, service, or the issue described), not the topic content itself, roles/titles, greetings, or urgency alone.\n",
"\n",
"Decision rules:\n",
"- If polarity evidence is weak, mixed, contradictory, or evenly balanced, output neutral.\n",
"- Only output positive or negative when one clearly outweighs the other by intensity or count.\n",
"\n",
"Positive vs neutral boundary:\n",
"- Positive only when there is clear, unambiguous, and sufficiently strong praise directed at the provider/service outcome (e.g., explicit evaluatives such as love, thrilled, amazing, excellent, fantastic, flawless, top-notch) and there are no concurrent concerns.\n",
"- Default to neutral for inquiries, status updates, logistics, generic politeness or thanks without evaluative content, hedged or weak praise (okay, fine, pretty good, satisfied client, pleased), expressions of uncertainty, or mixed messages where positives do not clearly dominate by intensity or count.\n",
"\n",
"Cue handling:\n",
"- Aggregate polarity cues across the entire message; account for intensifiers and negations.\n",
"- Treat factual status updates, informational messages, inquiries, or logistical requests as neutral unless explicit sentiment is expressed, even if they include politeness or generic praise (e.g., thanks, appreciate your support, top-notch service).\n",
"\n",
"Negation and modifier guidance:\n",
"- Negative: not good, not impressed, frustrated, unacceptable, skeptical.\n",
"- Usually neutral unless accompanied by strong positive cues: not bad, mild or weak praise such as satisfied client or pleased, okay, fine, pretty good.\n",
"\n",
"Mixed or multi-issue messages:\n",
"- If praise co-occurs with requests or concerns and neither side clearly dominates, choose neutral.\n",
"- Choose neutral unless multiple strong positive indicators outweigh any negatives and there are no explicit negative cues.\n",
"- If different parts convey opposing sentiments and there is no clear majority by intensity or count, choose neutral.\n",
"\n",
"Operational rule:\n",
"- Aggregate cues with negation and intensifiers; label positive only if net positive clearly exceeds negative by a high margin or there is at least one strong positive indicator (superlatives, emphatic adverbs, exclamatory emphasis) directed at the subject; otherwise neutral.\n",
"\n",
"Examples (message β label):\n",
"- Thanks for the quick reply; can you update the ticket by tomorrow? β neutral\n",
"- Appreciate your support. Please fix the recurring billing error. β neutral\n",
"- Top-notch service on the last order, but this one arrived damaged. β neutral\n",
"- Iβm pleased with the app overall, just a few minor issues to resolve. β neutral\n",
"- Pretty good overall. β neutral\n",
"- Not good β the installer keeps crashing. β negative\n",
"- Iβm not impressed with your response times. β negative\n",
"- This delay is unacceptable and very frustrating. β negative\n",
"- Itβs not bad. β neutral\n",
"- Absolutely love the new update; everything works flawlessly. β positive\n",
"- Amazing job! β positive\n",
"\n",
"Output format reminder:\n",
"- Emit exactly one of the following labels: positive or negative or neutral β all lowercase, no punctuation, no extra text or spaces.\n",
"- Trim any leading/trailing whitespace or newlines before finalizing the single-word output.\n",
" | \n",
"
| BEFORE OPTIMIZATION | \n", "AFTER OPTIMIZATION | \n", "
|---|---|
\n",
" {sentiment_task}\n",
" | \n",
" \n",
" {best_agent.sentiment_task.data}\n",
" | \n",
"