CodePay · AI Customer Support · 0 → 1

Finding the friction, then building the AI that fixes it.

A 0 → 1 AI support initiative at a B2B in-person-payments company. I reframed an open-ended “use AI to cut support load” brief into a staged, de-risked system: scope what the AI may answer, give humans a knowledge base the AI cites, and grow from an internal tool toward a partner-facing agent.

AI agentRAGKnowledge baseQuestion taxonomyHuman-in-the-loopRed-teamingClaude CodeNext.js

Role: Product Builder + PM
Scope: Problem reframing · Workflow tooling · AI product judgment
Built with: Claude · Claude Code · Markdown · Lark Bitable
Status: Shipped assets · pipeline live · red-teaming before partner rollout

CodePay AI support agent, live preview: it types a partner question, streams retrieval from the knowledge base, then returns a cited answer.

Job 1 · for customers

Answer partners

Cited answers in the chat partners already use, or a clean handoff.

Answers

cited, or cleanly handed off

Job 2 · for product

Capture product signal

Bugs, repeat questions, and feature asks, captured as structured input.

Signal

structured, not noise

Job 3 · for ops

Make support quality visible

Who answered, at what confidence, and whether it’s actually resolved.

Oversight

a management view, not a chat log

0 → 1, led end-to-end · piloting, with first steps already shipped · no invented metrics.

A couple of small shipped steps already cut load and seed the knowledge base.

1Section

Four decisions that shaped it

The brief was “use AI to cut support load.” These four calls made it safe enough to ship.

01 · Answer only what’s safe.

Every question gets typed: A config/how-to, B bug, C account-specific, D feedback. The AI answers Type A alone. B routes to an owner, C collects info and hands off, D becomes product signal. Type-A-only is the biggest risk-control decision in the system.

question triage · one front door, four lanes

WeChatLarkWhatsApppartner questions, all day

every question gets typedconfig · bug · account · feedback

AConfig / how-toHow do I add a service charge?

AI answersAI answerscites its source, every reply

BBug / errorP5 prints blank receipts

Routed to an owner→ ownerAI hands off, doesn't fix

CAccount-specificWhere's my payout for 5/12?

Collects info → human→ humanAI never guesses balances

DFeedback / requestCan receipts show a QR menu?

Product signal board→ productintelligence, not noise

AHow do I add a service charge?how-toBP5 prints blank receiptsbugCWhere's my payout for 5/12?accountDCan receipts show a QR menu?feedback

02 · Deflection over accuracy.

The target isn’t “answers well”; it’s questions resolved with no human needed, paired with wrong-answer rate. A confidently wrong answer about payment setup costs a partner real money. Knowing when not to answer is the product.

03 · The 20% that stays human.

Before automating knowledge intake, I ran the loop by hand on a real FAQ merge, then gave the AI the 80%: capture every conversation into the log, cluster and dedupe, draft the knowledge-base entries. The 20% that stays human: a person verifies every entry before the AI may cite it. Human-in-the-loop isn’t a limitation here. It’s the design.

intake pipeline · the 80% and the gate

80%AIAI does the intakechats → capture log → cluster & dedupe → drafts the KB entries

20%humanA human verifies every entryrow by row: right, wrong, or missing

Verified knowledge basethe AI cites nothing else

AI accelerates the loop. It never skips the gate.

04 · Capture where partners talk. Answer where context lives.

Partners live in WeChat groups, so that’s where questions get captured. Answering is staged: internal first, then in-platform, where the system knows the partner, the device, the page, and the same question gets a precise answer instead of a generic one.

2Section

The knowledge architecture

Raw conversations → a human-reviewed knowledge table → the AI’s working knowledge. Humans maintain only the middle layer: when an answer is wrong, I ask “which entry did you use?”, fix that row, reload. Every answer cites its source: a closed correction loop. (I designed this before I knew it was called RAG.)

The schema carries trust, not just content: every entry holds its source thread, confidence, owner, last-verified date, applicability conditions, and common follow-ups.

three layers · one correction loop

01Raw conversations

chats · screenshots · machine-screen photos; nothing thrown away

02Human-reviewed knowledge tablethe only layer humans maintain

every entry carries its trust metadata:

source_threadsconfidenceownerlast_verifiedwhen_to_usefollow_up_qas

03AI working knowledge

reads from the table and cites the source entry on every answer

the loop always lands on layer 2: fix the table, not the model

KB template v1.1 → v1.4 · what changed, and why

v1.1One sheet, trust built in

13 columns; every answer already carries its owner and a last-verified date.

ownerlast_verifiedwhen_to_usequestion_patterns+9 more

1 sheet · Knowledge Base

v1.2Capture speaks first

A turn-level Capture Log lands beside the KB; every answer now traces back to a real thread.

+ source_threads+ confidence+ follow_up_qas+ media_desc

+ Capture Log · + Insight (auto)

v1.3Log the exchange, not the turn

One row = question + our answer + who answered. Turn-level rows were noise.

turnraw_text+ question+ our_answer+ answered_by

Capture Log restructured

v1.4“Answered” ≠ “resolved”

Every exchange now tracks resolution; the ops view runs on that difference.

+ resolution

the schema the pilot runs on

column names verbatim from the working xlsx templates

And the table is a database, not a doc: every column is a filter, so the same rows serve all three jobs: verified entries gate what the AI may cite, status × resolution runs the ops view, and type-D slices surface product signal. Local sheet first, now mirrored in Notion with the schema intact.

one table · three lenses

Every column is a filterquestion_type A–Dstatusresolutionconfidenceownertopicdevice_modelcross-filter · group · slice

Job 1 · for customersWhat may the AI cite?Verified entries only: confidence and last_verified gate the answer set.

Job 3 · for opsWhat's actually resolved?Group by status × resolution: who answered, what's still open.

Job 2 · for productWhat keeps coming back?Slice type D and recurring topics: feature asks and bugs, ranked.

the same table, sliced three ways: local sheet first, now mirrored in Notion, schema intact

3Section

From plan to running pipeline

A three-week POC with an engineering partner turned the design into a running system: WeChat groups → encrypted ingestion → OCR turns screenshots into searchable error codes and menu paths → daily distillation pairs threads, attributes merchants, classifies A–D, masks PII → a version-controlled source of truth syncs two-way with Notion, where ops reviews and every edit writes back.

one question's journey · group chat → knowledge base

01WeChat groupa partner posts a photophoto · “the screen shows an error”

02Encrypted ingestioningestion with minute-level alertsthread #388 · captured · encrypted

03OCRphotos → searchable error codesE03 · Settings → Printer → Feed

04Daily distillationpair · attribute · classify · mask PIItype A · merchant ▓▓▓ masked · → KB-041

05Source of truthversion-controlled ⇄ Notion write-backKB-041 v3 · reviewed in Notion · synced

The hardest mile: group-chat capture is the fragile part, so reliability is engineered around it: minute-level alerts, and gaps logged honestly instead of papered over.

Accepted against five metrics defined before the build: thread pairing · merchant attribution · capture completeness · account stability · OCR accuracy.

acceptance scorecard · agreed pre-build

Thread pairingquestion ↔ answer joined correctly

Merchant attributionevery thread tied to the right store

Capture completenessno silent gaps; misses get logged

Account stabilitycapture that stays up, day after day

OCR accuracyphotos become searchable text

defined before the build, not after: acceptance was a checklist, not a vibe

4Section

Red-teaming it, right now

The bot answers real merchant questions in an internal group; the team grades every answer before a partner ever sees one. Where it has no answer, it says so and routes to a human, and the gap lands in an “open questions” view that becomes the wiki roadmap. The system tells us what to write next.

Masked WeChat screenshot: a partner posts a long product report with an embedded screenshot. The group title, names, avatars, and the report text are all blurred; only routine shipping coordination stays readable.

Masked WeChat screenshot: routine partner questions about a service-charge setting and a collection-method option. The group title, names, avatars, and quoted threads are blurred.

behavior 1 · answer, with the source

Service charge shows on the receipt but not in the sales report. Why?

Service charge is booked under fees, not sales. Reports → fee breakdown shows the per-order split next to the receipt total.⌁ cites · KB entry service_charge · last verified + owner attached

behavior 2 · no verified answer? say so

Can the P5 print duplicate receipts for delivery orders?

I don't have a verified answer for this yet. Routing you to the ops owner on duty.→ humanlogged → open questions

Left · real partner questions in the internal group, masked. Right · the two graded answer behaviors, shown as system mocks in the product’s chat language, not screenshots.

open questions → the wiki roadmap

open · awaiting answer

Duplicate receipts on delivery orders

EBT on the newer processor

Refund window for partial voids

answered · pending verify

Tip adjust after batch close

Receipt logo upload fails

verified → wiki

Service charge in reports

P5 paper: tear direction

→ next wiki sprint

an illustrative view of the live board; the system tells us what to write next

Live pipeline, no invented metrics. When partner rollout ships, deflection and wrong-answer numbers replace the targets here; the story gets stronger without embellishment.