CodePay · AI Customer Support · 0→1

I find workflow friction
and ship AI tools that remove it.

A 0→1 AI support initiative at a B2B in-person-payments company. I reframed an open-ended “use AI to cut support load” brief into a staged, de-risked system: scope what the AI may answer, give humans a knowledge base the AI cites, and grow from an internal tool toward a partner-facing agent.

Led end-to-endPiloting · first steps shipped

Role

Product Builder + PM

Scope

Problem reframing · Workflow tooling · AI product judgment

Built with

Claude · Claude Code · Markdown · Lark Bitable

Status

Shipped assets · system piloting

Piloting, with a couple of small shipped steps that already cut load and seed the knowledge base. No invented metrics.

01

I reframed ‘build a support bot’ into a question-intelligence system.

The structured record of what partners struggle with was as valuable as the deflection itself.

The natural framing was “build an AI that answers customer questions.” My first useful move was noticing I’d bundled three different products into one — an internal tool (now), a partner-facing product (later), and a fully-automated agent (someday) — with completely different users, risks, and success criteria. I focused everything on the internal tool and treated the rest as extensions, not requirements. Then I flipped the approach from knowledge-first (dump every doc into the AI) to problem-first: build the knowledge base from the few dozen questions partners actually ask.

Layer 1 · Now

Internal AI support tool

Our own ops team. Low risk, shippable now. The deliberate focus.

Layer 2 · Later

Productized, partner-facing

Sell it to other companies. A natural extension — not a requirement.

Layer 3 · Someday

Action-taking agent

Opens & checks configs inside the platform. Vision, not scope.

Leverage · Scope to Layer 1 (the internal tool) only; treat the rest as extensions, not requirements.

Trade-off · Less ambitious on day one — but Layer 1 builds the knowledge asset every later stage depends on, so it's never throwaway.

02

Automate 80%. Never automate the one step where a wrong answer costs money.

I ran the loop by hand once to learn its shape.

I ran the loop by hand once to learn its shape. About 80% is automatable — collect (Lark + WeChat), cluster, dedupe, diff against the current FAQ, flag, and route to the right SME. The remaining ~20% — the authoritative answer(e.g. “Does EBT work on this processor?”) — is a human knowledge gate the AI routes to but never invents.

~80% automatable

the loop
  1. 01Collect (Lark + WeChat)
  2. 02Cluster & dedupe
  3. 03Diff vs. current FAQ
  4. 04Flag new / changed / conflicting
  5. 05Route each flag to the right SME

~20% human gate

by design

The authoritative answer.

Does EBT work on this processor? When does the warranty clock start? A confidently wrong answer here costs a partner real money — so the AI routes to a human, and never invents.

regenerate

Verified answer flows back into the deliverable. Re-run, ship.

Leverage · Keep the human gate by design — automate the 80%, make the human step faster and unskippable.

Trade-off · The tool can't dissolve the gate; accepting that is exactly what stops the system shipping confident, wrong answers.

Human-in-the-loop here isn’t a limitation to engineer away. It’s the design.

03

The system wins by knowing when not to answer.

A confidently wrong answer about payment setup costs a partner real money.

I classified every question into four types and let the AI confidently answer only Type-A(safe, high-volume how-to); bugs route to an owner, account-specific questions get collected and routed, feedback becomes product signal. So I stopped optimizing for “accuracy” and started optimizing for deflection + wrong-answer rate. Underneath sits a three-layer model — raw material → a structured, human-maintained table → AI working knowledge that cites its source entry on every reply (I reasoned my way to RAG before I knew the term). When an answer is wrong, I ask which entry it cited, fix that row, reload.

AMVP scope

Config / How-to

AI answers

The safe, high-volume sweet spot.

B

Bug / error

Route to owner

AI doesn't fix — it hands off.

C

Account-specific

Collect & route to human

AI never guesses; gathers info.

D

Feedback / request

Capture as product signal

Becomes intelligence, not noise.

Leverage · Type-A only, with citations as the maintenance handle.

Trade-off · Narrower coverage day one, but a wrong answer about payment setup costs real money — so the boundary is the point.

Raw material

Chats, screenshots, machine-screen photos — multimodal input.

Structured, human-maintained table

The only layer humans maintain. Version, last-verified date, owner. Fix the table, not the AI.

AI working knowledge

Reads from the table to answer — and cites the source entry on every reply.

Closed correction loop

Wrong answer?Which entry did it cite?Fix that rowReload

Deliberate constraints

Build vs buy

Start far smaller than the market says

Intercom Fin, Sierra, Decagon, and Ada are built for thousands of customers across industries. We serve a few hundred in one vertical — most of that complexity would be a burden, not an asset.

Channel = context

Answer on our own surface

In-platform, the AI already knows the partner, their device, and recent activity — the biggest accuracy multiplier. The same question gets a precise answer in-platform and a vague one in a group chat.

Tooling

Workflow before agents

A deterministic retrieve → answer → escalate flow is enough for Type-A. Full autonomous agents are for later, where genuine open-ended tool use is actually needed.

04

I can’t build the whole service in a day — but I can ship something that cuts load now.

Each step relieves a little support pressure today, and feeds the knowledge asset every later stage depends on.

Two deliberately small first steps. (1) A client-facing FAQ— I consolidated scattered internal Q&A into one clean, re-runnable FAQ and got it to partners; it deflects repetitive questions immediately, and its structured content becomes the first rows of the knowledge base. (2) A KB-template draftI’m actively using to collect raw support material into structured rows — the layer the architecture reads from. The wider system is piloting; the defensible output so far is the thinking made concrete, with no invented metrics.

Before · manual

  • Two source FAQs, maintained separately
  • Merge & dedupe by hand, every time
  • Conflicts found by chance, if at all
  • One-off output — stale the moment a doc changes

After · the FAQ accelerator

  • One structured content model as the source
  • Re-run on demand to regenerate the deliverable
  • Conflicts surfaced & flagged automatically
  • Client-ready HTML + PDF, re-runnable on every change
The consolidation, before and after — a re-runnable model that seeds the knowledge base, not a one-off doc.

The shipped output

A client-ready FAQ — search, categories, collapsible Q&A

Generated from the structured model, not hand-built. This is the actual artifact — open it.

Open the live FAQ

The raw partner questions these steps organize — and the first-draft table that structures them.

/images/codepay-support/wechat-questions.png

Raw partner questions in WeChat — the source material the FAQ consolidates.

/images/codepay-support/lark-questions.png

Lark group threads — repetitive how-to questions, clustering by topic.

/images/codepay-support/kb-template.png

The KB-template draft (CodePay_KB_Template_v1.3.xlsx) — structured rows the AI reads from.
05

The call I’m deciding now: meet partners in chat, or centralize in the platform?

I’d rather reduce the friction of the right architecture than accept the wrong one because it’s frictionless.

Where should support actually live? Meeting partners in the chat groups they already use is the lowest-friction path — but it’s costly to operate, hard to scale, and the data is messy and ToS-risky to capture. Centralizing in our ops platform gives clean, owned, structured data, real assignment and accountability, and the knowledge base every later stage depends on — at the cost of some login friction. My team leans platform; leadership leans chat. I’m holding it open on purpose — because it should be settled by users, not by whoever argues hardest.

Option A · meet them in chat

Answer inside the chat groups partners already use

  • + Lowest friction — partners are already there
  • − High ops cost; hard to scale across groups
  • − Messy, ToS-risky capture (no clean WeChat API)
vs

Option B · centralize in the platform

my lean

One front door inside our ops platform

  • + Owned, structured, accurate data
  • + Assignment, accountability — and it feeds the KB
  • − Login friction, mitigated by mobile-first, one-tap auth

My lean · Make the platform the system of record — the compounding, owned asset — and kill the friction objection with mobile-first, one-tap auth, rather than concede to scattered chats.

What resolves it · Deliberately deferred — I’d settle the front-door question with customer interviews on the friction hypothesis and real adoption + ops-cost data, not a meeting-room opinion.

Takeaways

Scope discipline is the skill

Type-A only, no enterprise-platform copying, a no-engineering first version. Shrinking the MVP is what keeps AI projects from dying in over-design.

Ship the tool, then defend the gate

I built a re-runnable accelerator and then refused to automate the one step where a wrong answer costs money.

The byproduct can be the product

Reframing 'support bot' into 'question-intelligence system' surfaced a stream of product signal that may be the more valuable half early on.

{ }

How this was built. The knowledge docs, the FAQ generator, and this case-study page were all built with Claude Code and Markdown-based workflows — the same AI-augmented method the support tool itself is designed around.

Claude Code · Next.js

Thanks for reading

View More Projects