How does the tool adapt captions for each platform?

Each platform has its own prompt variant. Instagram gets emoji-rich captions with 5-10 hashtags and a conversational CTA. LinkedIn gets professional, insight-driven copy with 3 hashtags and no emojis. Twitter/X gets punchy, opinionated takes under 280 chars. TikTok gets trend-aware hooks designed for the first frame. The underlying content is the same — the tone, format, and length adapt to platform norms.

Can I input an image for the caption?

Yes. The /generate endpoint accepts an image upload and uses GPT-4o Vision to describe the image before generating the caption. This way you can upload a product photo and receive platform-ready captions describing exactly what's in the image without manually writing a brief.

How do I integrate this with a scheduling tool like Buffer?

The API returns structured JSON with captions per platform. You can connect the output to the Buffer API (POST /v1/profiles/{id}/updates/create) or Hootsuite's API to schedule the posts directly. The caption generator becomes the content step in an automated social media pipeline.

← Back to Build 50 AI Automation Tools

AI Social Media Caption Generator

Generate platform-perfect captions for Instagram, LinkedIn, Twitter/X, and TikTok from a single brief. This tool understands the unique tone, length, and hashtag strategy each platform demands and produces ready-to-post copy in one API call.

This is Tool 28 of the Build 50 AI Automation Tools course.

What You'll Build

POST /generate — topic/brief in, all-platform captions out
Platform-specific tone, length, emoji, and hashtag rules
Image upload support via GPT-4o Vision
POST /generate/variants — A/B test variant generation

Setup

bash

mkdir ai-captions && cd ai-captions
npm init -y
npm install express multer openai dotenv

bash

# .env
OPENAI_API_KEY=sk-your-key-here
PORT=3000

Caption Generation Service

// src/services/captionService.js
import OpenAI from 'openai';

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

const PLATFORM_RULES = {
  instagram: {
    maxChars: 2200,
    hashtagCount: '5-10',
    emojiLevel: 'moderate',
    tone: 'conversational, authentic, and aspirational',
    cta: 'Save this post, tag a friend, or drop a comment',
  },
  linkedin: {
    maxChars: 3000,
    hashtagCount: '3-5',
    emojiLevel: 'minimal (only for bullet points if needed)',
    tone: 'professional, insight-driven, thought leadership',
    cta: 'Ask a question to encourage professional discussion',
  },
  twitter: {
    maxChars: 280,
    hashtagCount: '1-2',
    emojiLevel: 'optional',
    tone: 'punchy, opinionated, conversational, no fluff',
    cta: 'Encourage retweet or reply',
  },
  tiktok: {
    maxChars: 2200,
    hashtagCount: '3-5 trending tags',
    emojiLevel: 'high — TikTok-native emojis',
    tone: 'energetic, trend-aware, informal, Gen-Z friendly hook',
    cta: 'Follow for more, comment your answer',
  },
};

async function generateCaptions(brief, brand, platforms) {
  const platformInstructions = platforms.map(p => {
    const rules = PLATFORM_RULES[p];
    return `${p.toUpperCase()}:
- Max chars: ${rules.maxChars} (Twitter: MUST be under 280)
- Hashtags: ${rules.hashtagCount}
- Emojis: ${rules.emojiLevel}
- Tone: ${rules.tone}
- CTA: ${rules.cta}`;
  }).join('\n\n');

  const response = await openai.chat.completions.create({
    model: 'gpt-4o',
    messages: [
      {
        role: 'system',
        content: `You are an expert social media copywriter for brands.
Generate platform-optimised captions for the following platforms.

Platform rules:
${platformInstructions}

Brand context: ${brand || 'Not specified — use a neutral professional brand voice'}

Return ONLY a JSON object with a key for each platform:
{
  "instagram": { "caption": "...", "hashtags": ["tag1", "tag2"], "charCount": 150 },
  "linkedin": { "caption": "...", "hashtags": ["tag1"], "charCount": 400 },
  "twitter": { "caption": "...", "hashtags": ["tag1"], "charCount": 278 },
  "tiktok": { "caption": "...", "hashtags": ["tag1", "tag2"], "charCount": 200 }
}
Only include keys for requested platforms.`,
      },
      { role: 'user', content: `Brief: ${brief}` },
    ],
    temperature: 0.6,
    response_format: { type: 'json_object' },
  });

  return JSON.parse(response.choices[0].message.content);
}

async function generateFromImage(buffer, mimetype, brief, brand, platforms) {
  const dataUrl = `data:${mimetype};base64,${buffer.toString('base64')}`;

  // First, describe the image
  const descResponse = await openai.chat.completions.create({
    model: 'gpt-4o',
    messages: [
      {
        role: 'user',
        content: [
          { type: 'text', text: 'Describe this image in detail for social media caption writing. Include the mood, colors, subject, and any text visible.' },
          { type: 'image_url', image_url: { url: dataUrl, detail: 'low' } },
        ],
      },
    ],
    max_tokens: 300,
  });

  const imageDesc = descResponse.choices[0].message.content;
  const enrichedBrief = `${brief ? brief + '\n\n' : ''}Image: ${imageDesc}`;

  return generateCaptions(enrichedBrief, brand, platforms);
}

export async function generateAllCaptions({ brief, brand, platforms, buffer, mimetype }) {
  const targetPlatforms = platforms?.length > 0
    ? platforms.filter(p => PLATFORM_RULES[p])
    : Object.keys(PLATFORM_RULES);

  if (buffer && mimetype) {
    return generateFromImage(buffer, mimetype, brief, brand, targetPlatforms);
  }

  if (!brief?.trim()) throw new Error('Brief or image required');
  return generateCaptions(brief, brand, targetPlatforms);
}

export async function generateVariants(brief, platform, count = 3) {
  const rules = PLATFORM_RULES[platform];
  if (!rules) throw new Error(`Unsupported platform: ${platform}`);

  const response = await openai.chat.completions.create({
    model: 'gpt-4o',
    messages: [
      {
        role: 'system',
        content: `Generate ${count} A/B test variants for a ${platform} caption.
Each variant should have a different angle: one emotional, one informational, one CTA-forward.
Rules: ${JSON.stringify(rules)}
Return JSON: { "variants": [{ "angle": "string", "caption": "string", "hashtags": ["..."] }] }`,
      },
      { role: 'user', content: brief },
    ],
    temperature: 0.7,
    response_format: { type: 'json_object' },
  });

  return JSON.parse(response.choices[0].message.content);
}

Server

// src/server.js
import 'dotenv/config';
import express from 'express';
import multer from 'multer';
import { generateAllCaptions, generateVariants } from './services/captionService.js';

const app = express();
app.use(express.json());
const upload = multer({
  storage: multer.memoryStorage(),
  limits: { fileSize: 5 * 1024 * 1024 },
  fileFilter: (_req, file, cb) =>
    file.mimetype.startsWith('image/') ? cb(null, true) : cb(new Error('Images only')),
});

app.post('/generate', upload.single('image'), async (req, res, next) => {
  try {
    const { brief, brand, platforms } = req.body;
    const parsedPlatforms = typeof platforms === 'string' ? JSON.parse(platforms) : platforms;

    if (!brief && !req.file) {
      return res.status(400).json({ error: 'brief or image required' });
    }

    const captions = await generateAllCaptions({
      brief,
      brand,
      platforms: parsedPlatforms,
      buffer: req.file?.buffer,
      mimetype: req.file?.mimetype,
    });

    res.json({ success: true, captions });
  } catch (err) { next(err); }
});

app.post('/generate/variants', async (req, res, next) => {
  try {
    const { brief, platform, count } = req.body;
    if (!brief || !platform) return res.status(400).json({ error: 'brief and platform required' });

    const result = await generateVariants(brief, platform, count);
    res.json({ success: true, platform, ...result });
  } catch (err) { next(err); }
});

app.use((err, _req, res, _next) => res.status(500).json({ error: err.message }));
app.listen(process.env.PORT ?? 3000, () => console.log('Caption generator running'));

Testing

bash

# Text brief — all platforms
curl -X POST http://localhost:3000/generate \
  -H "Content-Type: application/json" \
  -d '{
    "brief": "Launching our new AI-powered project management tool that auto-prioritises tasks",
    "brand": "B2B SaaS startup, professional but approachable",
    "platforms": ["instagram", "linkedin", "twitter"]
  }'

# Image upload
curl -X POST http://localhost:3000/generate \
  -F "image=@product-photo.jpg" \
  -F "brief=New limited edition sneaker drop" \
  -F 'platforms=["instagram","tiktok"]'

Sample response (twitter):

json

{
  "twitter": {
    "caption": "Stop manually sorting your task list.\n\nOur AI does it for you — in real time.\n\nPriorities shift. Your list should too. 🧠\n\n#ProductivityAI #ProjectManagement",
    "hashtags": ["ProductivityAI", "ProjectManagement"],
    "charCount": 179
  }
}

Build 50 AI Automation Tools — Tool 28 of 50

Social media caption generator is live. Continue to Tool 29 to build an AI YouTube video summarizer.

Summary

Platform rules object centralises the tone/length/hashtag constraints — update one object to change all generated content
Vision-first brief for image uploads means zero manual description effort
Variant generation enables A/B testing without manually writing multiple versions
Structured JSON output connects directly to scheduling APIs like Buffer or Hootsuite
Add a scheduledAt field and connect to Buffer's API to build a full social publishing pipeline

Continue to Tool 29: AI YouTube Video Summarizer →