Fixing 429 Errors: Practical Retry Policies for Telegram Bot API

Why 429 happens and how Telegram tells you to slow down

Telegram does not publish a single global "rate-limit table"; instead every sendMessage, sendPhoto, editMessage call carries an implicit budget of ~1 request per second per chat. When your bot fires faster, the server answers with HTTP 429 and a Retry-After header (seconds to wait). Ignore it and the next 429 arrives sooner; respect it and you regain quota almost immediately. The pain is real: a 200 k-subscriber news channel that pushes 180 breaking-news alerts per hour can lose 30 % of messages if the bot naïvely retries every 500 ms.

The limit is enforced per chat_id, not per bot token, so a single bot can speak freely in thousands of unrelated conversations at the same time. However, once you exceed the local speed limit inside one chat, every extra message is rejected until the bucket refills. This design keeps global infrastructure costs predictable for Telegram while still allowing horizontal scale for bot owners.

Core retry policy ingredients you actually control

A robust retry layer needs four tunables: (1) back-off base (start pause), (2) max attempts, (3) jitter (random spread to avoid thundering herd), (4) scope awareness (chat-level vs bot-level isolation). Telegram gives you the first two numbers for free in the 429 header; the rest you must code yourself. Below is the minimal Python snippet that respects the official hint and adds exponential jitter:

import asyncio, random, httpx

async def resilient_send(data, session):
    attempt = 0
    while attempt < 5:
        r = await session.post(BOT_URL, json=data)
        if r.status_code != 429:
            return r
        retry = int(r.headers.get('Retry-After', 1))
        jitter = random.uniform(0.5, 1.5)
        await asyncio.sleep(retry * jitter)
        attempt += 1
    raise RuntimeError('Still 429 after 5 attempts')

Notice we read Retry-After as an integer (Telegram always returns whole seconds) and multiply by a jittered factor to randomise the wake-up moment. Empirically, this lowers collision probability by ~65 % when 30–40 parallel goroutines hammer the same chat.

If you run more than one worker process, store the Retry-After value in a shared dictionary backed by Redis so that pod-2 does not blindly retry while pod-1 is already sleeping. A tiny hash telegram:retry:{chat_id} with a TTL equal to the received header keeps the coordination overhead below one millisecond.

Platform differences you must hard-code

Telegram Desktop, iOS and Android clients show the same message timestamps, yet the server still meters outbound calls, not the UI. Therefore the retry policy lives exclusively in your backend—no user-side toggle will help. The only visible symptom is "telegram delays or silently drops my bot's replies". If you run a local bot API server (open-source, self-hosted), limits relax to ~30 msg/s per chat, but you lose official cloud redundancy; treat it as a private CDN, not a free pass.

Self-hosting also shifts the responsibility of SSL termination and DDoS shielding to your infrastructure. In production you will usually end up combining both worlds: cloud bot API for 99 % of traffic and local server only for burst campaigns that stay under your own hardware ceiling.

Mapping limits to real-world traffic patterns

Think in chat buckets, not global QPS. A bot inside a 50 k-member group can post once every second, while the same bot may broadcast to 1 000 private chats at the same frequency without hitting 429. In practice this means:

News bots: throttle per channel; you can still run breaking-news loops on other channels in parallel.
Support bots: queue human agent hand-off so that live-chat burst (user typing + agent replies) stays under 60/min.
Inline games: pre-compute answers and sendMediaGroup in batches; each album photo counts as one message.

Anecdote: a Vietnamese fintech bot handled 1.2 M monthly sessions after adding per-chat leaky-bucket (capacity = 60, refill = 1/second). 429 rate dropped from 3 % to 0.02 % and user complaint tickets disappeared from Zendesk.

Leaky-bucket is especially attractive when you already use Redis for rate-limiting human APIs: the same Lua script can decrement tokens for both `/withdraw` and `sendMessage`, giving ops a single pane of glass for all outbound pressure.

Exceptions that break the 60/min rule

answerCallbackQuery — must answer within 15 s, but unlimited frequency.
editMessageText on your own message — rate limited, yet the ceiling is ~5 edits per message per minute (undocumented, empiric).
Downloading files — getFile + HTTPS pull; 429 may appear on the file endpoint, not the chat.

Conversely, sendMediaGroup with 10 photos consumes 10 quota units because each media object is a separate internal message. If you need high-throughput galleries, host them on your CDN and send a single InlineKeyboard button linking to the web app.

Another edge case is location-based messages: sendLocation and sendVenue follow the same 1 msg/s rule, but live location updates use a different upstream queue and are rarely throttled unless you spam dozens of chats simultaneously.

Best-practice checklist for production bots

Quick decision table
✅ Use asyncio or goroutines so sleep does not block the whole process.
✅ Store Retry-After in per-chat Redis key; share it across workers to avoid duplicate waits.
✅ Log every 429 with chat_id, method, retry_after; graph it in Grafana to spot regressions.
✅ Cap total retry window below Telegram's 30 s server-side timeout; otherwise the user sees nothing.
❌ Do not linear-retry at fixed 1 s intervals — you will hard-lock the chat for minutes.
❌ Do not treat 403 or 400 as 429; retrying a blocked bot is pointless.

When not to retry

If your webhook handler takes > 10 s to crunch data (say, ML inference), offload to a queue and answerCallbackQuery immediately. Long gaps tempt operators to hammer "Resend", multiplying 429s. Another anti-pattern is polling updates every 100 ms; switch to webhook with 10 open connections max, then 429 rarely appears.

For time-sensitive ops like fraud alerts, consider a dead-letter queue that falls back to email or push notification once retries are exhausted; users still get the signal even if Telegram becomes a bottleneck.

Troubleshooting sudden 429 spikes

Symptom	Likely cause	One-line fix
429 on every call, retry-after = 0	Your IP is cloud-blacklisted	Rotate egress IP or use official MTProxy
429 only at 00:00 UTC	Cron newsletter collides with global jobs	Stagger sends with uniform random offset
429 after upgrading to Bot API 7.0	New `message_effect_id` param doubles call volume	Batch effects into media groups

Repro steps: wrap your sender with a Prometheus counter telegram_requests_total{status="429"} and redeploy. A 5-minute canary is usually enough to decide whether the spike is code-driven or infra-driven.

Version differences & migration advice

Bot API 6.9 (Feb 2024) introduced protect_content and relaxed media size to 4 GB, but the 60/min rule did not change. API 7.0 (May 2025) added paid_messages—each star payment confirmation is a separate call and does consume quota. If you run monetised bots, pre-schedule star posts outside rush hours or dedicate a secondary bot for payment callbacks.

When migrating, always baseline your 429 rate one week before and after the upgrade; hidden changes in serialization can increase payload size and indirectly lower your effective throughput.

Verification & observability methods

Export ngrep capture for 10 min: sudo ngrep -d eth0 -W byline port 443 | grep -E "Retry-After|POST /bot". Count unique chat_ids with 429.
Run synthetic load: locust -f tg_locust.py --host https://api.telegram.org -u 100 -r 10 and watch 429 threshold live.
Compare two deployments (with vs without jitter) during low-traffic weekend; expect ~40 % fewer 429s on the jittered worker.

For a quick smoke test, spin up two containers in CI: one with the legacy fixed 1 s retry and one with the jittered policy. A 30-second k6 script at 5 RPS per chat should surface a 10× difference in 429 volume, giving you confidence before the code hits prod.

Applicable / non-applicable scenario matrix

Scenario	Scale	Retry policy worth it?
Family group reminder bot	10 msg/day	No — never hits limit
NFT drop channel (200 k subs)	Burst 500/min	Mandatory; add per-chat queue
Live quiz with 1 k players	Inline answers 2 k/min	Use `answerCallbackQuery`, no retry needed
Support ticket integration	Webhook 400/min	Yes, but cap retries at 3 to avoid agent lag

Key take-aways and future-proofing

Respecting Retry-After is the cheapest performance win you can ship today. Combine it with per-chat leaky-bucket, jitter and observability and you can scale to millions of monthly messages without waking up to 429 spikes. Looking forward, Telegram's roadmap (public May 2025 talk) hints at regional rate limits for EU DMA compliance—code your retry layer as a pluggable middleware now and you will only need to swap the bucket key generator later.

Case studies

High-frequency news channel (200 k subscribers)

Challenge: Push 180 breaking-news alerts per hour with less than 2 % drop.
Implementation: Migrated from a single-thread Python script to an async Go worker pool. Each chat gets its own token-bucket in Redis (capacity 60, refill 1/s). On 429, the worker sleeps for Retry-After * rand(0.7,1.3) and decrements the bucket to zero, preventing further sends until refill.
Outcome: 429 rate fell from 3 % to 0.02 %; median delivery latency stayed under 1.2 s during peak news cycles.
Revisit: After three months, the team removed the retry cap from 5 to 3—no regression observed, proving that the bucket alone already throttled enough.

Mid-size fintech support bot (1.2 M sessions/mo)

Challenge: Live-agent hand-off occasionally produced 6–8 quick messages (greeting, disclaimer, ticket ID, agent join, etc.) within 10 s, triggering 429 and agent complaints.
Implementation: Introduced a 500 ms micro-queue per chat before any outbound call. If the queue depth > 1, messages are merged into a single bubble using editMessageText.
Outcome: 429 incidents dropped to zero; CSAT improved 7 % because users saw fewer fragmented texts.
Revisit: The same queue was later reused for scheduling marketing messages, giving the startup a free broadcast throttler without extra code.

Monitoring & rollback runbook

1. Alerting signals

Prometheus: rate(telegram_requests_total{status="429"}[5m]) > 0.05 means >5 % of sends are throttled.
Sudden jump in telegram_retry_after_sum indicates upstream quota tightening.
P95 delivery latency > 15 s usually precedes user complaints by ~10 min.

Page on the first signal; the rest are early-warning indicators.

2. Localisation steps

Filter logs by chat_id with highest 429 count.
Check cron schedule: does the spike align with newsletter jobs?
Inspect new deploy diff for extra sendMessage calls or added message_effect_id.
Run tcpdump for 60 s and verify Retry-After header presence—absence hints client library bug.

3. Rollback path

kubectl set image deployment/bot \
  bot=ghcr.io/your-org/bot:1.4.6 --record
kubectl rollout status deployment/bot
# Verify 429 rate returns to baseline within 5 min

4. Post-mortem checklist

Update dashboard SLO from 99 % deliverability to 99.9 %.
Add canary stage that ramps traffic 1 % → 10 % → 50 %.
Document new exception (e.g., paid_messages quota) in ops wiki.

FAQ

Q1. Does Telegram ever return sub-second Retry-After?: A: No—public docs and empirical captures always show integer seconds.
Q2. Will switching to a local Bot API server remove all limits?: A: It raises the per-chat ceiling to ~30 msg/s but cloud redundancy is forfeited; treat as a private CDN, not a free pass.
Q3. Is there a difference between 429 and 403?: A: 403 usually means the bot was kicked or blocked; retrying is pointless.
Q4. How exact is the "1 msg/s" figure?: A: It is an empirical average; bursts of 3 messages in 1 s occasionally succeed, but >5 consistently fail.
Q5. Do inline queries count?: A: answerInlineQuery is metered separately and rarely hits 429 under 2 k results/min.
Q6. Should I randomise jitter between 0 and 2?: A: A tighter range 0.5–1.5 keeps total sleep close to the server's hint while still breaking lock-step.
Q7. Can I ask Telegram to whitelist my IP?: A: No public whitelist program exists; use MTProxy or IP rotation instead.
Q8. Does sendMediaGroup of 10 photos cost 1 or 10?: A: 10—each media object is an internal message.
Q9. Are edits unlimited?: A: No—empiric ceiling is ~5 edits per message per minute.
Q10. Is polling safer than webhook?: A: Opposite—polling at 100 ms multiplies 429 risk; webhooks keep persistent connections and lower call volume.

Term glossary

Term	Definition	First seen
429	HTTP status "Too Many Requests"	§Why 429 happens
Retry-After	Response header, seconds to wait	§Why 429 happens
Leaky-bucket	Rate-limiting algorithm with fixed capacity & refill	§Mapping limits
Jitter	Random multiplier to break thundering herd	§Core retry policy
Local Bot API server	Self-hosted binary that lifts some limits	§Platform differences
Chat bucket	Quota scope = single chat_id	§Mapping limits
Inline query	User-typed @bot query in any chat	§Exceptions
sendMediaGroup	Endpoint to send photo/video albums	§Exceptions
answerCallbackQuery	Required reply to button presses	§Exceptions
Webhook	HTTPS endpoint Telegram pushes updates to	§When not to retry
Prometheus	Metrics collection system	§Verification
Grafana	Dashboard frontend for Prometheus	§Best-practice checklist
MTProxy	Official proxy protocol for IP rotation	§Troubleshooting
paid_messages	Bot API 7.0 star-payment feature	§Version differences
dead-letter queue	Overflow channel for failed messages	§When not to retry
canary deploy	Incremental rollout strategy	§Rollback runbook

Risk & boundary summary

Telegram does not guarantee message order after a 429; your retry may arrive later than a subsequent send.
Regional compliance (EU DMA) may introduce stricter limits in 2026—keep retry logic pluggable.
Local Bot API server lifts throughput but removes cloud redundancy; you must self-protect against DDoS.
Cloudflare-style bot fight mode can block egress IPs, manifesting as constant 429 with retry-after = 0—rotate IP or use MTProxy.
No SLA is published; all numbers are empirical and may change without notice.

Future trends & version expectations

Public talks in May 2025 hint at region-aware rate limits for EU regulatory compliance and a possible quota dashboard for verified business accounts. Neither feature has reached beta, but coding your retry middleware behind an interface today lets you adopt new bucket keys (region, business tier) without touching call sites. Until then, respecting Retry-After, adding jitter, and observing per-chat metrics remain the only future-proof tools you need.