Before any real customer sees an AI-generated reply, run the agent end to end against scripted scenarios. Most launch problems show up in 20 minutes of structured testing.
Set up a test surface
You have two options:
- Use a dedicated test inbox. Create a sandbox inbox, add the agent as a member, and message it from your phone.
- Use your own number on the live inbox. Send messages from a personal phone to the production inbox. Useful for catching real-world quirks (carrier delays, MMS handling).
Either way, run tests in Supervised Mode so you can read each suggestion before it sends.
Scenarios to run
Test three categories We recommend building test prompts from real questions your team has answered before, common help-center searches, and mock conversations that include missing details. With these, test the following conditions:
Happy path. Five to ten of the most common questions you actually get. The agent should respond accurately, in the right tone, within your length limit.
Edge cases. Vague questions ("can you help me?"), compound questions ("what's pricing and how do I cancel?"), and off-topic questions. Watch for invented detailshallucinations, made-up features, and overly confident answers.
Emotional triggers. Frustration, urgency, profanity. The agent should hand off cleanly via your unassignment rules.
Verify each unassignment rule
For every rule you wrote, send a message designed to trigger it. Confirm:
- The agent unassigns itself promptly
- If you configured a re-route template, the contact receives it
- If you configured a specific team member, the chat lands with them
- If that user is not in the inbox, the chat is unassigned (expected behavior)
Spot-check knowledge accuracy
For three or four replies, verify the facts against your source. Look for:
- Pricing or plan details that match the source
- Feature descriptions that are not invented
- No references to articles or features that do not exist
[SCREENSHOT: Side-by-side view: an AI Agent reply on the left, the corresponding KB source page on the right. Use this approach to spot-check accuracy.]
If you catch an error, note it. Article 7 walks through how to feed it back into the knowledge base.
Pre-launch checklist
Before flipping the inbox toggle:
- All happy path scenarios produce acceptable suggestions
- Each unassignment rule has been triggered and behaves correctly
- Re-route template (if configured) reads correctly to a customer
- Knowledge sources show "Complete" sync status
- A reviewer is named and aware of the launch
- Daily review time is on someone's calendar for the first week
Next steps
Continue to Article 7: Train and Improve with Supervised Mode and Agent Enhancements.