"Automating Instagram publishing with AI: the technical pipeline from $0 to $40/month"

Short answer, before the long tutorial: you can automate Instagram publishing with AI for $0/month up to ~30 posts/month (Cloud Functions + Firestore + Gemini free tier), for $10–30/month up to ~500 posts (n8n self-hosted + Gemini + generated imagery), or for $60+/month once you stack Buffer on top of a custom pipeline just for the AI part. The cost isn't the scheduling — that's free on Meta's Graph API. The cost is image generation and human review, and that's where almost everyone gets the math wrong.

I'm Ulisses, founder of Hens. I built OverAir and Studio Kallos, and I've shipped a content-automation pipeline for a client who needed to feed Instagram without a social media hire. This post is the end-to-end technical pipeline — the 4 stages, the 3 stacks with pricing, and the one mistake that dropped a single account's reach by 70% before I figured out what was happening.

If you're a creator, an agency, or an e-commerce brand posting by hand every day, this is the playbook I'd follow today.

The pipeline is always 4 stages

Doesn't matter which stack. Instagram auto-publishing that doesn't turn into spam has four stages, always in this order:

Source — where the raw content comes from (a feed scrape, a spreadsheet, or manual input into a queue).
AI generates — caption + image. Gemini for text, Gemini 2.5 Flash Image (or the newer one) for art.
Human review — a queue where someone approves before it ships. Skipping this stage is what turns automation into a brand disaster.
Publish — schedule and push via the Instagram Graph API.

The most common mistake is treating this as a 3-stage pipeline and cutting the human review. Don't cut it. I'll explain why below, with the reach number that cratered.

The 3 stacks, side by side

	$0 stack	$10–30 stack	$60+ stack
Volume	up to ~30 posts/mo	100–500 posts/mo	larger / multi-account
Orchestration	Cloud Functions + Firestore	n8n self-hosted (VPS)	Buffer/Later + custom pipeline
AI text	Gemini free tier	Gemini paid Tier 1	Gemini paid
AI image	Gemini free / Canva by hand	Gemini 2.5 Flash Image	Gemini Flash Image
Scheduling	Graph API direct	Graph API via n8n	Buffer does it
Real cost/mo	$0	$10–30	$60–120
When it makes sense	1 account, low volume	agency, several accounts	wants ready UI + analytics

The column that fools people is the last one. Buffer charges $6 per channel/month on Essentials (monthly billing; $5 annual) and $12/channel on Team (Buffer pricing, 2026). Five accounts on Team is already $60/month for scheduling alone — and Buffer doesn't generate decent images or AI captions. You still pay Gemini on the side. That's why the $60+ stack is Buffer for the boring part (UI, calendar, analytics) + a Hens pipeline just for the AI.

Why the Graph API, not a bot that clicks

Before any code: forget Selenium, Puppeteer hitting instagram.com, or any tool that "logs in and clicks" your account. Meta detects unofficial automation and bans the account. The supported path is the Instagram Graph API with Content Publishing.

Prerequisites nobody warns you about until you've lost an afternoon:

A Professional Instagram account (Business or Creator), not personal.
For Facebook login: a linked Facebook Page plus the instagram_basic, instagram_content_publish, and pages_read_engagement permissions.
For direct Instagram login: the instagram_business_basic and instagram_business_content_publish permissions (Meta Content Publishing docs, 2026).

The rate limit is the thing that shapes your architecture: 100 API-published posts per rolling 24h window per account (Meta docs, 2026). A carousel counts as 1 post, not 10. You check current usage at GET /<IG_ID>/content_publishing_limit. For 99% of creators, 100/day is a ceiling you'll never touch — but if you run a multi-account agency, the limit is per account, not in aggregate.

Stage 1 — the source

The source is the dumbest part and the one people overcomplicate. Three options, simplest to most fragile:

Manual input into a queue (Firestore, Airtable, a Google Sheet). You drop the raw idea, the pipeline does the rest. This is the one I'd start with.
Scraping your own RSS feed or blog — turn a blog post into an Instagram post.
Scraping other accounts — don't. Reposting third-party content is the fastest route to a shadowban and a copyright strike.

In the $0 stack, the queue is a content_queue collection in Firestore. Each doc is { status: 'draft' | 'approved' | 'published', rawIdea, generatedCaption, imageUrl }. That simple.

Stage 2 — AI generates caption + image

This is where the real cost lives. Let's split text from image, because the prices are orders of magnitude apart.

Caption (text). Gemini 2.5 Flash / Gemini 3 Flash on the free tier gives you 10 requests per minute, 250k tokens per minute, and 1,500 requests per day (Gemini API rate limits, 2026). Limits are per project, not per key — spinning up 5 keys multiplies nothing, and the daily quota resets at midnight Pacific. To generate 30 captions/month, the free tier is overkill. You never pay for text at low volume.

The caption prompt I use (trimmed):

You're the social media manager for a [niche] brand.
Tone: [direct / playful / technical]. No emoji overload.
From this raw idea: "{rawIdea}"
Produce: (1) a caption under 125 chars before the "more" fold,
(2) 5 niche hashtags, nothing generic like #love,
(3) a 1-line CTA. Return JSON.

Image. Here's where the bill climbs. Gemini 2.5 Flash Image (the "Nano Banana") costs $0.039 per 1024×1024 image, or $0.0195 via the Batch API (Gemini image pricing, 2026). Important detail: Google is shutting down 2.5 Flash Image on October 2, 2026, and the recommended replacement, gemini-3.1-flash-image-preview, costs $0.067 per image — nearly double. If you're building a pipeline meant to last, price the new model in now.

For 100 posts/month with generated imagery: 100 × $0.067 = $6.70/month. It's the priciest line in the mid-volume stack, and that's exactly why the stack exists. At low volume, make the art in Canva by hand and stay at $0.

Stage 3 — the human review queue (don't skip)

This is the stage that separates professional automation from an embarrassment factory. Sooner or later the AI will generate a caption with a wrong fact, an invented number, or an image with mangled text. In production, that becomes a screenshot on X.

The pattern is simple: Stage 2 writes the post with status: 'draft'. A human opens a screen (or gets pinged on WhatsApp/Slack), approves or edits, and only then does the status flip to approved. Stage 4 only touches approved.

On a content pipeline I built for a client, the first version didn't have that queue — the AI published directly. It lasted a week. The AI generated a caption claiming a discount that didn't exist, it went live at 7am, and by 9am people in the DMs were demanding that price. I had to add the review queue on a Sunday. Cost of the shortcut: a Saturday rebuilding architecture and an uncomfortable call with the client. I've never shipped a pipeline without human review in the middle since.

Stage 4 — schedule and publish via Graph API

Publishing a feed image on the Graph API is two steps:

# 1. Create the container
POST /<IG_ID>/media
  ?image_url=https://your-cdn.com/art.jpg
  &caption=...
→ returns creation_id

# 2. Publish the container
POST /<IG_ID>/media_publish
  ?creation_id=<id>

A carousel is the same, except you create one container per image (up to 10 children), then a parent container with media_type=CAROUSEL listing the child IDs, and publish the parent. Heads up: the image_url has to be a public URL Meta can download. A private bucket won't work — Meta pulls the image from your server, it doesn't accept an upload.

The "scheduling" in the $0 stack is just a Cloud Scheduler firing the publish function at the set time, reading the queue. You don't need Buffer for that.

Reels: the async case that catches everyone

Reels don't publish instantly. The flow is:

Create a container with media_type=REELS and video_url. For large video, use upload_type=resumable and push the file to the rupload.facebook.com endpoint.
Meta processes the video asynchronously — it takes 60 to 180 seconds.
You poll GET /<IG_CONTAINER_ID>?fields=status_code and wait. The states are IN_PROGRESS, FINISHED, ERROR, EXPIRED (Meta Content Publishing docs, 2026). Meta itself recommends polling once per minute, for up to 5 minutes.
Only after FINISHED do you call media_publish.

If your code calls media_publish right after creating the Reels container, it fails. I burned a couple of hours on this because the image handler worked instantly and I assumed Reels would be the same. It's not. Reels needs a state machine with polling.

The mistake that kills you: posting the same art across accounts

This is the expensive mistake, and the one the AI "hides" because nobody documents it in the happy-path tutorial.

When I built the multi-account pipeline, the first version generated one piece of art and posted it identically across 4 accounts in the same network. Within about 3 weeks, organic reach cratered — it looked like a ~70% drop on the secondary accounts. It was a silent shadowban. Instagram's algorithm detects the digital fingerprint of the file and suppresses identical content reposted across accounts — it's named the #1 cause of shadowban in 2026 (Socialync, 2026). Recovery takes 7 to 14 days after you fix the cause (SocialzAI, 2026).

The technical fix turns the AI against the problem: instead of posting the same file, the pipeline generates small variations per account. Gemini Vision takes the base art and produces a variation — a different crop, a slightly shifted background color, a repositioned element. The fingerprint changes, the brand stays consistent, the shadowban doesn't fire. Captions have to vary across accounts too; an identical caption is the second trigger.

Let me be blunt: if you only have one account, this problem doesn't exist and you don't need to generate any variation. Premature optimization here just burns image tokens. The variation only pays off once you go multi-account.

What it actually costs — the itemized bill

$0 stack (1 account, 30 posts/month):

Item	Cost/mo
Cloud Functions (invocations)	$0 (free tier)
Firestore (queue)	$0 (free tier)
Gemini text (free tier)	$0
Image (Canva by hand)	$0
Total	$0

Mid stack (agency, 300 posts/month, generated imagery):

Item	Cost/mo
VPS for n8n (2GB)	~$6
Gemini text Tier 1	~$2
Image: 300 × $0.067	~$20
Total	~$28

Notice: in the mid stack, 71% of the cost is image generation. If you can reuse Canva templates for half the posts, you cut ~$10/month. AI imagery is a luxury that only pays off when volume makes by-hand impossible.

What I'd avoid

I would not start with n8n. Every tutorial pushes n8n because it renders nicely in a video, but for 30 posts/month it's a VPS, a security update, and a point of failure you don't need. Cloud Functions + Firestore solves it on the free tier, no server to babysit. Move up to n8n when volume passes ~100 posts/month and visual orchestration starts to earn the server's upkeep.

And I would not pay for Buffer just to schedule. If the AI part is already custom, scheduling via the Graph API directly is about 40 lines of code and $0. Buffer only makes sense when you want the ready-made calendar UI and analytics for a client — then you're paying for the interface, not the automation.

If you want one of these pipelines running — the $0 starter or the multi-account one with AI variation — that's exactly what Hens ships. For US and UAE clients, reach me here.