Vibe-coded WhatsApp bot — why it answers wrong in production (and the hardening checklist)

The short answer, before the long story: a WhatsApp bot generated by Lovable, Bolt, or v0 breaks in 8 predictable spots once it's in production — webhook idempotency, media URL expiry, button race conditions, the 4,096-character body limit, sandbox-vs-prod env split, reply context, push-token rotation, and observability. This post is the hardening checklist I run before any WhatsApp bot ships to a paying client. Each item has bitten me; I'll tell you which one and what it cost.

I'm Ulisses, founder of Hens. I built OverAir — a WhatsApp-first memory SaaS with AI, in production since early 2026. Zero paying customers today, I'll be honest about that throughout. But the bot processes messages from a dozen beta testers daily, and the 8 bugs below are the same ones that showed up in three WhatsApp bots I shipped for clients in 2025 — the kind of thing the agent never writes on its own.

If your vibe-coded bot is about to enter production, read to the end. If it's already in production, open git blame in parallel — odds are 5 of these 8 are already on your bug list and you didn't notice.

Why the agent misses exactly these 8

Lovable, Bolt, and v0 are optimized to show a happy-path flow in 2 minutes. The bot answers "hello" in the preview and it feels finished. It isn't. Everything I describe below is what Meta documents as normal Cloud API behavior — duplicate deliveries, ephemeral URLs, automatic retries — and the agent just doesn't have that documentation in its context when it generates your handler.

I tested it. I asked three different agents (Lovable, Bolt, Cursor with Claude) to write a WhatsApp Cloud API webhook handler. All three returned code that assumed exactly-once delivery. Meta documents the opposite: "Webhooks are delivered at-least-once, which means duplicates are a normal operating condition, not an edge case" (Hookdeck WhatsApp guide). That's not trivia — it's the first bug that hits production.

The checklist — 8 items, in the order they bit me

1. Idempotency by `message_id` (the one that breaks billing)

The bug: Meta fires the same webhook twice in ~0.5% of cases. The bot charges twice. Customer sees the duplicate, files a dispute.

I hit this live on OverAir during an internal load test in February: two beta users got duplicate confirmations because checkout.session.completed arrived twice within 800ms. Cost in production: USD 15 per Stripe chargeback (Stripe dispute fees, 2026). At 1,000 monthly transactions × 0.5% × USD 15, that's USD 75/month evaporating into dispute fees — and a churned customer per duplicate.

The hardening: dedupe at the DB layer using messages[].id (inbound) or statuses[].id (status update) as a primary key. Lovable won't write this. The pattern I use on OverAir:

// Before processing anything
const eventRef = db.collection('whatsapp_events').doc(message.id);
const exists = await eventRef.get();
if (exists.exists) {
  // Already processed, return 200 without acting
  return res.status(200).send();
}
// Mark as seen BEFORE processing
await eventRef.set({ receivedAt: FieldValue.serverTimestamp() });
// Now process
await handleMessage(message);

The detail the agent skips: mark-as-seen has to happen before processing, not after. If you mark after and two webhooks arrive in parallel, both pass the if (exists) check before either of them marks. In Firestore, you fix this with a transaction. In Postgres, with INSERT … ON CONFLICT DO NOTHING and a rowCount check.

2. Media URLs expire in 5 minutes (silently)

The bug: User sends voice note. Bot receives webhook with media.id. Bot queues for processing. Worker picks it up 7 minutes later, GETs the URL, gets 404. Audio lost. Customer never knows.

This isn't speculation — Meta documents it: "All media URLs expire within 5 minutes" (WhatsApp Cloud API media docs). The URL also needs a bearer token in the header, so leaks aren't a concern — but caching for retry doesn't work either.

In a bot I shipped for a US client in 2025 (anonymizing — appointment-booking system over WhatsApp), I burned 6 hours debugging "missing audio" before it clicked. The dashboard showed the webhook arriving, but the file wasn't in S3. Cause: the queue had 3-minute lag at peak and the URL expired.

The hardening: download the blob inside the same webhook handler, before queueing. Don't queue the URL — queue the byte stream (S3, GCS, R2). If your serverless infra has a short timeout, the handler should:

1. Receive webhook
2. Download media immediately (within 60s of receipt)
3. Upload to storage (S3/GCS/R2)
4. Queue with the STORAGE_URL, not the media.id
5. Return 200 to Meta

The vibe-coded version writes "queue the media.id, download later". It works in dev. In production, it loses ~15% of audio messages at peak — the number I measured on OverAir before the fix.

3. Race conditions on approval buttons

The bug: User receives an "Approve booking" button. Taps it. Network stalls for 2 seconds. Taps again. Bot receives two callbacks. Without dedupe, books twice. User sees the duplicate and cancels the whole flow.

WhatsApp Cloud API delivers interactive.button_reply as a normal event — webhook, with a unique messages[].id. If you don't dedupe via item 1, you'll duplicate the action. But there's a subtlety: the button_reply doesn't carry the context.id of the original button-message by default — it carries the id of the user's reply. To correlate "which button was tapped from which message", save the payload → wamid mapping when you send the button.

In a system I shipped recently (an order-approval bot), the correct flow is:

1. Bot sends a message with 3 buttons (approve/reject/ask-info)
2. Bot persists {wamid_of_message, conversation_id, current_state}
3. User taps
4. Webhook arrives with button_reply.payload (the ID you set)
5. Bot looks up {conversation_id} via payload
6. Bot CHECKS current state INSIDE A TRANSACTION — if already approved, ignore
7. Updates state, sends confirmation as a reply (item 6 below)

The transaction in step 6 is what the agent never writes. Without it, two taps become two "approved". I saw that race land twice in 10 manual tests on a vibe-coded bot — not rare, it's the normal case on flaky mobile data.

4. The 4,096-character body limit (truncate or crash)

The bug: Bot has GPT/Gemini generating a long answer. Response is 5,200 chars. Cloud API rejects with (#100) Invalid parameter and the user gets nothing. In the log it looks like a success because the vibe-coded handler didn't check the response.

Official limit: 4,096 characters in a text message body (Meta Cloud API reference, Symphony WhatsApp limits). And it's UTF-8 characters, not bytes. A single emoji is 1 char, but ZWJ sequences (family 👨‍👩‍👧) take 7+.

The hardening:

function splitForWhatsApp(text: string, maxLen = 4000): string[] {
  if (text.length <= maxLen) return [text];
  const chunks: string[] = [];
  const paragraphs = text.split('\n\n');
  let buffer = '';
  for (const p of paragraphs) {
    if ((buffer + '\n\n' + p).length > maxLen) {
      chunks.push(buffer);
      buffer = p;
    } else {
      buffer = buffer ? buffer + '\n\n' + p : p;
    }
  }
  if (buffer) chunks.push(buffer);
  return chunks;
}

I work to 4,000 (not 4,096) for a safety buffer against emoji counting. Chunks go out sequentially with an 800ms gap — fire them all at once and you'll hit WhatsApp's "messages too fast" throttle.

5. Sandbox vs production — the env var is bug source #1 on deploy

The bug: Deploy to production. Bot keeps replying from the test number. Real customer gets nothing. It takes 2 days for anyone to notice because "it worked in dev".

Cause: Cloud API requires a specific phone_number_id per number. Sandbox and production have different IDs, and the WABA token has different scopes. Lovable persists these in the project's env and when you "publish", inherits the wrong .env in 1 of 3 deploys (measured on a vibe-coded client bot).

The hardening: add a smoke check on boot:

const expected = process.env.NODE_ENV === 'production'
  ? 'PROD_PHONE_NUMBER_ID'
  : 'SANDBOX_PHONE_NUMBER_ID';
if (process.env.WHATSAPP_PHONE_NUMBER_ID !== expected) {
  throw new Error('Wrong WhatsApp number ID for environment');
}

Drop that in the entrypoint. Fail early, loud. The agent never writes that guard — it assumes env is set-and-forget.

6. Reply with `context.message_id` (or the user gets lost)

The bug: Bot gets a question. Bot calls an external API for 4 seconds. Bot replies. But the user already sent another message in between. The reply lands loose in the chat, with no visual link to the question. User thinks the bot is broken.

Meta documented the fix: you can quote the original message by passing context.message_id in the send payload (Meta Cloud API send-messages). Shape:

{
  "messaging_product": "whatsapp",
  "to": "971...",
  "context": { "message_id": "wamid.HBgM..." },
  "type": "text",
  "text": { "body": "Reply to your question from 4s ago" }
}

Caveat: only works for messages aged under 30 days — past that Meta drops the context silently. Not relevant for live bot flows (you reply in seconds), worth noting anyway.

Why the agent forgets: it treats each send_message as independent. It has no mental model of "this reply continues that specific question". You have to spell it out: "always include context.message_id of the original user message when responding".

7. FCM token rotation (if your bot pushes to a companion app)

The bug: WhatsApp bot fires a push notification to a Flutter app via FCM. App stays inactive for 60 days, Firebase rotates the token. Bot keeps pushing to the old token. Gets messaging/registration-token-not-registered. Vibe-coded handler ignores it. Push goes silent.

This is specific to WhatsApp-first architectures with a companion app — exactly OverAir's case. Firebase documents the expiry: FCM tokens can rotate at any time (Firebase Cloud Messaging best practices). Lovable doesn't write the rotation loop — it assumes the token in the DB is the current one forever.

The hardening:

Save fcm_token with updatedAt.
On app open, always call messaging.getToken() and upsert it backend-side.
On send: catch messaging/registration-token-not-registered, mark the token invalid.
Daily job purges tokens marked invalid >7 days.

Without this, in 6 months 40% of your FCM tokens are dead and you don't know.

8. Observability — or: you only find out via the customer

The bug: Vibe-coded bot logs console.log to STDOUT. Lovable doesn't plug into anything. When production breaks, you only know because the customer complains 6 hours later.

Not an exaggeration — I lived it: a client bot broke at 11pm on a Friday because Meta rotated an internal endpoint. I only found out Monday at 9am, via the client's own WhatsApp message complaining "the bot has been silent since Friday". 48 hours of downtime in production, zero alerts.

The hardening: minimum viable observability for a WhatsApp bot:

Structured JSON logs, not free-form console.log. Each entry has event, message_id, wa_user, latency_ms, error_code.
Sentry in the error handler — free tier covers 5k events/month (Sentry pricing), enough for any bot under 500 customers.
Health-check endpoint UptimeRobot pings every 5min (free). If it drops, SMS to your phone.
Custom counters for webhook_received, webhook_processed, whatsapp_send_failed. A gap between received and processed = symptom. Without those counters, you're flying blind.

Most vibe-coded bots I've audited in production run with none of the 4 above. It's not lack of time — it's lack of knowing it's needed.

The summary table (print and pin)

#	Item	How vibe-coded breaks	Minimum hardening	Cost of the bug
1	Idempotency by `message_id`	Processes 2× on ~0.5% of webhooks	Transactional dedupe with wamid as PK	USD 15/Stripe chargeback
2	Media URLs expire in 5min	Audio/images lost at peak	Download within 60s, queue storage URL	~15% media loss at peak
3	Button race conditions	Two taps = two actions	State transaction + item 1 dedupe	Duplicate approvals
4	4,096-char body limit	Long message crashes, fails silently	Chunk at 4,000 + 800ms inter-chunk delay	Reply just disappears
5	Sandbox vs production	Wrong `phone_number_id` shipped	Smoke check in entrypoint, validate env	Real customer gets nothing
6	`context.message_id` on reply	Loose reply confuses user	Always include original question's context	UX broken, user abandons flow
7	FCM token rotation	Push goes silent after 60 days inactive	Upsert on `getToken()` + cleanup of invalids	40% dead tokens in 6 months
8	Observability	You find out from the customer	JSON logs + Sentry + healthcheck + counters	48h of silent downtime

What the hardening costs: 5 days or USD 1,500–3,000

I'm specific on purpose. For a vibe-coded bot already in production, hardening all 8 items takes 3 to 5 days of senior dev work if the codebase isn't a mess. Senior freelance dev in the US/UAE bracket runs USD 500–800 per day in 2026 (lower in LatAm — at Hens we run R$ 800–1,200/day in São Paulo, roughly USD 160–240). Total: between USD 1,500 and USD 4,000 depending on shop.

Compare with the cost of not doing it:

5 chargebacks/month × USD 15 = USD 75/month gone.
1 frustrated customer per week from a silent bot = ~USD 200 LTV lost each.
48h of quarterly downtime = average refund of USD 400 over a 200-customer base.

The hardening pays for itself in 3 months. Over 12 months without it, you'll spend 3× more in reactive remediation.

What I'd avoid, with conviction

I would not ship a vibe-coded WhatsApp bot to a paying production without this checklist. For an MVP, for a demo, for a 5-friend beta? Fine — Lovable gives you something talking in 2 days. For charging monthly subscription to a customer who depends on it for their patient bookings, their delivery orders, their payment confirmations? Expensive mistake.

The vibe-coded agent doesn't have a production mental model. It has a demo mental model. They're different things. If your bot is going to production, either you walk through these 8 items by hand or you hire someone who has done it three times — which is what I do for Hens clients and what I did on OverAir before opening it to beta.

If you want an audit of a WhatsApp bot in production, reach out via Hens WhatsApp. Flat fee USD 1,200 for the 8-item audit plus an actionable PDF report. No upsell.

Sources

Meta — WhatsApp Cloud API Webhooks — at-least-once delivery, payload structure
Meta — Media Download API — 5-minute URL expiry
Meta — Send Messages (context.message_id) — contextual replies
Meta — Cloud API Messages Reference — 4,096-char body limit
Hookdeck — Guide to WhatsApp Webhooks — idempotency as requirement, not edge case
Chatarmin — WhatsApp Messaging Limits 2026 — tiers, 80 mps throughput, 1,000 mps upgrade path
Symphony — WhatsApp character limitations — 4,096 confirmation
Firebase — FCM token management — token rotation and invalid-token handling
Stripe — Disputes & chargebacks — USD 15 dispute fee
Sentry — Pricing — 5k events/month free tier
TheNextWeb — Lovable security crisis 2026 — 91.5% of vibe-coded apps had at least one AI-hallucination flaw in Q1/2026