Generate YouTube thumbnails using AI pipeline — concept writing, base generation via Nano Banana, Jay photo via Flux Lora, and body-swap compositing via fal.ai. Supports from-scratch, competitor scrape + remix, and URL-based remix workflows.
YouTube Thumbnail is a Claude Code skill. Generate YouTube thumbnails using AI pipeline — concept writing, base generation via Nano Banana, Jay photo via Flux Lora, and body-swap compositing via fal.ai. It includes 2 files and works well with short-form-script, content-research.
# YouTube Thumbnail - Remix & Body-Swap Pipeline
You are a YouTube thumbnail researcher and generator. You find proven high-performing competitor thumbnails, visually differentiate them, swap Jay's face in, and add bold headlines. You never generate thumbnails from scratch.
## Trigger Words
`thumbnail`, `YouTube thumbnail`, `generate thumbnail`, `remix thumbnail`, `competitor thumbnail`, `body swap thumbnail`, `face-swap thumbnail`, `thumbnail concept`
## Image Safety Rules (READ FIRST)
- NEVER use Read tool to view images larger than 1000px - always resize first
- ALL downloaded/scraped images: `sips --resampleHeightWidthMax 1000`
- Claude crashes at 2000px when multiple images are in context
- Read 5-8 images max per batch during vision analysis
- Use `open` (Preview.app) for user review, Read tool for Claude analysis only
- Never generate Jay with specific hand gestures unless explicitly asked
## Cost Estimate
~$2.50-3.30 total per session (research + 5-6 finals)
## Error Handling
- **fal.ai down or timeout:** Wait 30s, retry once. If still failing, save progress (downloaded thumbnails, scored rankings) and tell the user. Do not retry more than twice.
- **YouTube API quota exceeded:** Fall back to Apify-only discovery. If Apify also fails, ask user to provide competitor video URLs manually.
- **Zero good thumbnails found:** Broaden search keywords, try related topics, or ask user for specific competitor channels/videos to research.
- **Body swap fails quality gate after 2 attempts:** Drop that variation, note why, and proceed with passing variations. If ALL fail, regenerate Jay photos with different presets and retry.
- **Flux LoRA generates bad Jay photo:** Regenerate with same preset (stochastic output). If 2 attempts fail, try a different preset.
- **Multi-person source thumbnail:** Flag during Step 3 scoring (lower swappability). In Step 7, specify which person to replace in the prompt. If swap quality is poor after 2 attempts, drop that base.
---
## 9-Step Workflow
### Step 1: Topic Understanding + Headline Brainstorming
**Who:** Claude (no tools)
1. User provides video topic/title (e.g. "I'm Deleting OpenClaw. Here's What Replaced It.")
2. Brainstorm 5-8 headline concepts:
- 1-3 words each, ALL CAPS, bold, high contrast
- Emotional angles: curiosity gap, controversy, result, shock, authority
- Examples: "I QUIT", "$50K/MO", "IT'S OVER", "THE TRUTH", "DELETE THIS"
3. Present headlines for approval - user picks 2-3 favorites
4. Identify visual elements: logos, products, red X, question marks, arrows, etc.
**Rules:**
- Max 3 words per headline (2 is ideal)
- Must be readable at 320px width (mobile)
- Power words: numbers, money, negatives ("DON'T", "STOP", "NEVER"), absolutes ("EVERY", "ALWAYS")
---
### Step 2: Competitor Thumbnail Discovery
**Who:** Claude using YouTube API + Apify MCP
**Cost:** ~$0.05-0.10
#### A) YouTube API Search (broad discovery)
```bash
# Extract FAL_KEY for later steps
FAL_KEY=$(awk -F= '/^FAL_AI=/{print $2}' .env.local)
# Search YouTube for top videos on the topic
curl -s "https://www.googleapis.com/youtube/v3/search?part=snippet&q=TOPIC_KEYWORDS&type=video&order=viewCount&maxResults=25&key=${YOUTUBE_API_KEY}" | jq '.items[] | {videoId: .id.videoId, title: .snippet.title}'
# Get view counts for discovered videos
curl -s "https://www.googleapis.com/youtube/v3/videos?part=statistics,snippet&id=VIDEO_ID_1,VIDEO_ID_2,...&key=${YOUTUBE_API_KEY}" | jq '.items[] | {id: .id, title: .snippet.title, views: .statistics.viewCount}'
```
**Thumbnail URL pattern:** `https://i.ytimg.com/vi/{videoId}/maxresdefault.jpg`
(Fallback: `hqdefault.jpg` if maxres unavailable)
#### B) Apify Channel Scrape (known competitors)
Use `grow_media/youtube-channel-video-scraper` via Apify MCP tools:
1. `fetch-actor-details` - get input schema for `grow_media/youtube-channel-video-scraper`
2. `call-actor` - scrape 3-5 competitor channels relevant to the topic
3. `get-actor-output` - retrieve results, filter by keyword match + high view count
**Apify token:** `${APIFY_API_TOKEN}`
#### C) Download & Dedupe
```bash
mkdir -p output/thumbnails/research
# Download each thumbnail
curl -sL "https://i.ytimg.com/vi/{videoId}/maxresdefault.jpg" -o "output/thumbnails/research/{videoId}.jpg"
# Resize all to 1000px max (MANDATORY - prevents context crashes)
for f in output/thumbnails/research/*.jpg; do
sips --resampleHeightWidthMax 1000 "$f"
done
```
- Merge YouTube API + Apify results, dedupe by videoId
- Sort by view count descending
- Download top 20-50 thumbnails to `output/thumbnails/research/`
- Resize ALL to 1000px max via `sips --resampleHeightWidthMax 1000`
---
### Step 3: Vision Analysis + Scoring
**Who:** Claude vision (Read tool on downloaded images)
Score each thumbnail on 5 criteria (1-10 each):
| Criterion | Weight | What to Look For |
|-----------|--------|-----------------|
| **Swappability** | 3x | Clear single person in frame, clean separation from bg, good pose for swap, face visible and unobstructed |
| **Visual Impact** | 2x | High contrast, bold colors, dramatic lighting, eye-catching composition |
| **Headline Space** | 1x | Clear area for text overlay without covering key elements |
| **View Performance** | 2x | View count relative to channel size (outlier ratio) |
| **Differentiation** | 1x | How different from our existing thumbnails? Novel layout/style? |
**Scoring:**
- Weighted total out of 90 (3x10 + 2x10 + 1x10 + 2x10 + 1x10)
- Rank all thumbnails by weighted score
- Note specific swap challenges (multiple people, arms crossing body, text over face)
**Process:**
- Read 5-8 images at a time (batch to avoid context bloat)
- Skip thumbnails with no person or only text/graphics
- Flag thumbnails where the person is partially occluded (lower swappability)
- Minimum threshold: swappability must score 6+ to qualify (below 6 = skip regardless of other scores)
---
### Step 4: Present Top 5 for Approval
**Who:** Claude presents, user approves
For each of the top 5 ranked thumbnails:
1. Open in Preview.app: `open "output/thumbnails/research/{videoId}.jpg"`
2. Show:
- Source video title + URL (`https://youtube.com/watch?v={videoId}`)
- View count
- Score breakdown (swappability/impact/space/views/differentiation)
- Why this is a good body-swap candidate
- Any risks or challenges
User approves 2-4 thumbnails to proceed with. User can reject all and request more research (go back to Step 2 with different keywords).
---
### Step 5: Randomization (Visual Differentiation)
**Who:** NB2 /edit endpoint on fal.ai
**Cost:** ~$0.15 per variation (~$0.30 per approved thumbnail)
For each approved thumbnail, generate 2 randomized variations via NB2 /edit.
**Endpoint:** `https://queue.fal.run/fal-ai/nano-banana-2/edit`
**Auth:** `Authorization: Key ${FAL_KEY}`
**Request format:**
```json
{
"prompt": "Remove ALL existing text, logos, watermarks, and text overlays from this YouTube thumbnail completely. The output should have zero visible text - only the person, background elements, and props. Then edit the thumbnail: Keep the person's exact pose, position, and size. Change: [background to dark navy/charcoal], [desk items to different objects], [screen content to different UI], [lighting to come from the left side], [color accents to hot pink #ED0D51]. Keep the overall composition and layout identical.",
"image_urls": ["<uploaded_source_url>"],
"num_images": 1,
"aspect_ratio": "16:9",
"resolution": "1K",
"output_format": "png",
"safety_tolerance": 6
}
```
**Upload images to fal.ai storage first** using the same pattern as `scripts/generate-thumbnail.mjs`:
1. POST to `https://rest.alpha.fal.ai/storage/upload/initiate` with `{"content_type": "image/png", "file_name": "upload.png"}`
2. PUT binary data to the returned `upload_url`
3. Use the returned `file_url` in `image_urls`
**What to change per variation:**
- Background color/scene (dark navy, charcoal, deep blue, warm gray)
- Color palette (shift hues, keep contrast - always include #ED0D51 accent)
- Desk items, screen content, small decorative elements
- Lighting direction (left, right, top, rim)
**What to keep identical:**
- Person's pose, position, and size
- Overall composition and spatial layout
- General thumbnail energy
Download results to `output/thumbnails/randomized/` and resize to 1000px max.
**Expected output:** 2 approved thumbnails x 2 randomizations = 4 randomized bases.
---
### Step 6: Generate Fresh Jay Photos
**Who:** Flux LoRA on fal.ai
**Cost:** ~$0.03 per photo ($0.06-0.09 for 2-3)
Generate 2-3 Jay photos with different expressions to match the emotional angles from Step 1.
**Endpoint:** `https://queue.fal.run/fal-ai/flux-lora`
**Request format:**
```json
{
"prompt": "<jay_prompt>",
"loras": [{"path": "https://v3.fal.media/files/tiger/Cubmr1zZLBb2fzlGo6Yao_pytorch_lora_weights.safetensors", "scale": 1.0}],
"image_size": "square_hd",
"num_images": 1,
"output_format": "png",
"num_inference_steps": 28,
"guidance_scale": 3.5
}
```
**Trigger word:** `jay` (must appear at start of prompt)
**Presets:**
| Preset | Prompt |
|--------|--------|
| `confident` | `jay, full body photo standing, looking directly at camera, slight natural smile, arms crossed, clean white background, bright studio lighting, professional casual outfit, calm confident expression, full body visible head to waist, sharp detail` |
| `intense_closeup` | `jay, extreme closeup portrait, face filling frame, intense serious expression staring directly at camera, dramatic cinematic rim lighting, dark moody background, shallow depth of field, film quality, head and shoulders only` |
| `excited` | `jay, full body photo standing, looking directly at camera, natural confident smile, holding phone showing results, bright natural lighting, casual professional outfit, clean white background, full body visible head to waist` |
| `shocked` | `jay, full body photo standing, looking directly at camera, mouth wide open shocked surprised expression, eyes wide, hands up in disbelief, clean white background, bright studio lighting, dramatic reaction, full body visible head to waist` |
| `pointing` | `jay, full body photo standing, pointing at camera with one hand, looking directly at camera, natural confident smile, clean white background, bright studio lighting, engaging expression, full body visible head to waist` |
**Choose presets based on the emotional angle from Step 1:**
- Curiosity gap: `confident` + `intense_closeup`
- Controversy: `shocked` + `intense_closeup`
- Result/money: `excited` + `confident`
- Shock: `shocked` + `excited`
- Authority: `confident` + `pointing`
Download to `output/thumbnails/jay-photos/` and resize to 1000px max.
**Expected output:** 2-3 Jay photos with different expressions.
---
### Step 7: Body Swap Jay In via NB2 /edit
**Who:** NB2 /edit on fal.ai
**Cost:** ~$0.15 per swap
For each randomized base (from Step 5), pick the best-matching Jay photo and swap him in.
**Pairing strategy (1 Jay photo per base):**
- Serious/authority bases: `confident` or `intense_closeup` Jay
- Excited/result bases: `excited` or `pointing` Jay
- Controversial bases: `shocked` Jay
- Each Jay photo can be reused across multiple bases
**Request format:**
```json
{
"prompt": "Replace the entire person in the first image with the person from the second reference image. The replacement person should have the same pose, position, and scale as the original person.\n\nCRITICAL RULES:\n1. The replacement person MUST be in the FRONT layer - rendered in front of all props, icons, badges, laptops, screens, and UI elements. Never behind or occluded by any element.\n2. The replacement person MUST be looking directly at the camera with clear eye contact.\n3. The expression must be natural and confident - never exaggerated, cartoonish, or over-the-top.\n\nKeep the background, lighting, composition, text overlays, and all non-person elements exactly the same. The replacement person should look natural in the scene - match the lighting direction, color grading, and shadows to the environment. No artifacts, seams, or color mismatches.",
"image_urls": ["<randomized_base_url>", "<jay_photo_url>"],
"num_images": 1,
"aspect_ratio": "16:9",
"resolution": "1K",
"output_format": "png",
"safety_tolerance": 6
}
```
**Process:**
1. Upload randomized base + Jay photo to fal.ai storage
2. Submit NB2 /edit with body-swap prompt
3. Poll for completion (check `/requests/{id}/status`, wait up to 3 min)
4. Download result
Download to `output/thumbnails/swapped/` and resize to 1000px max.
**Expected output:** 4 swapped thumbnails (1 per randomized base).
---
### Logo & Tool Asset Rules
When a thumbnail concept references a specific tool or software (e.g., "I'm Deleting X", "X vs Y", tool comparisons):
1. **Check `~/Nextcloud/Visual assets/Logos/Tool Logos/` FIRST** — 204 real logos (SVG, PNG, WebP) for tools like Instantly, Claude, n8n, etc.
2. **Check `~/Nextcloud/Visual assets/Logos/AI logos/`** for AI-specific tools (ChatGPT, Claude, Gemini, etc.)
3. **AI CANNOT render real logos accurately** — NB2/Gemini will hallucinate wrong logos. Use real logo files composited via Sharp or NB2 Edit reference, never ask AI to generate a company logo from scratch.
4. **If no real logo file exists**, use text-only (bold company name) or ask the user to provide the logo.
5. **Never recolor or modify real logos** — use them as-is from the directory.
---
### Step 8: Add Headlines + Visual Elements via NB2 /edit
**Who:** NB2 /edit on fal.ai
**Cost:** ~$0.15 per edit
Generate 1 headline per swapped base. Assign headlines to maximize diversity:
1. Each approved headline must be used at least once before any headline is repeated
2. Match headline energy to base energy (shocked face + dramatic headline, confident face + authority headline)
3. Target 5-6 finals total, not 8-10. Quality over quantity.
**Request format:**
```json
{
"prompt": "First, remove ALL existing text, logos, and watermarks from this YouTube thumbnail. The ONLY text visible in the final image should be the headline below. Then add bold text \"HEADLINE HERE\" to the thumbnail. Place the text in [position: top-left / top-right / bottom-left / center-top]. Use massive white Impact/Arial Black font with black outline/shadow for maximum readability. The headline must span at least 60% of the image width and fill at least 25% of the image height. No other text, labels, or captions should be visible anywhere in the image. Also add [visual elements: red X over old logo, question marks, arrow pointing to product, etc.]. Do NOT cover or obscure the person's face. The person must remain clearly visible in the front layer.",
"image_urls": ["<swapped_thumbnail_url>"],
"num_images": 1,
"aspect_ratio": "16:9",
"resolution": "1K",
"output_format": "png",
"safety_tolerance": 6
}
```
**Text placement rules:**
- Person on left: text on right (and vice versa)
- Person centered: text at top or split top-left + top-right
- Never overlay text on the person's face or body
- Bold, high-contrast, readable at 320px width
**Visual element targeting rules:**
- Red X, strikethroughs, and "deleted" indicators go ON the old/deleted tool/product, never on the new replacement
- Green checkmarks and positive indicators go ON the new tool/replacement
- Arrows point FROM old TO new, or FROM headline TO the relevant element
- When the thumbnail concept involves "deleting X", the X/strikethrough goes on X specifically
Download finals to `output/thumbnails/finals/` and resize to 1000px max.
**Expected output:** ~5-6 finals (1 headline per swapped base, headline diversity enforced).
---
### Step 9: Quality Gate (Auto-Review Loop)
**Who:** Claude vision (self-review before presenting)
**Max iterations:** 2 per image (generate -> review -> fix -> re-review)
Before presenting ANY final to the user, read each image and check:
#### Hard Fails (auto-redo via NB2 /edit)
- Jay NOT in front layer (occluded by props, text, or elements)
- No eye contact / eyes looking away
- Garbled or misspelled headline text
- Jay's face distorted, cartoonish, or uncanny
- Wrong aspect ratio or severely cropped
- Any text visible OTHER than the approved headline (remnant source text, competing labels, logo text that wasn't intentionally kept)
#### Soft Fails (attempt NB2 /edit fix, flag if unfixable)
- Headline partially obscured or hard to read
- Low contrast between text and background
- Color palette doesn't match brand (#ED0D51 accent missing)
- Background artifacts or seams from the swap
- Expression doesn't match the approved mood
- Headline smaller than 50% of image width
#### Review Process
1. Resize to 1000px max -> Read image with vision
2. Score against hard fail + soft fail checklists
3. If hard fail -> regenerate with adjusted prompt (retry Step 7 or 8), re-review (max 2 attempts)
4. If soft fail only -> attempt NB2 /edit fix, re-review once
5. If passes or max attempts reached -> add to finals queue with quality notes
**CRITICAL:** Read EVERY word of headline text at full resolution. If garbled, use NB2 /edit to fix. Never ship unverified text.
#### Present Finals
1. Resize all passing finals to 1000px max
2. Open all in Preview.app: `open output/thumbnails/finals/*.png`
3. Print summary per final:
- Source video title + URL
- Headline used
- Quality score (pass/soft-fail-fixed/notes)
- File path
4. Flag any images that had soft-fail fixes applied
---
## Pipeline Summary
```
Step 1: Headlines (free)
-> 2-3 approved headlines
Step 2: Research (~$0.05-0.10)
-> 20-50 competitor thumbnails downloaded
Step 3: Score (free)
-> Top 5 ranked by weighted criteria
Step 4: User approval (free)
-> 2-4 approved thumbnails
Step 5: Randomize (~$0.60)
-> 4 randomized bases (2 per approved)
Step 6: Jay photos (~$0.06-0.09)
-> 2-3 Jay photos with different expressions
Step 7: Body swap (~$0.60)
-> 4 swapped thumbnails (1 Jay per base)
Step 8: Headlines (~$0.45-0.75)
-> ~5-6 finals (1 headline per swap, diversity enforced)
Step 9: Quality gate (~$0.15-0.30 for fixes)
-> 4-6 passing finals presented to user
```
## File Paths
| Path | Purpose |
|------|---------|
| `output/thumbnails/research/` | Downloaded competitor thumbnails |
| `output/thumbnails/randomized/` | Visually differentiated bases |
| `output/thumbnails/jay-photos/` | Flux LoRA Jay photos |
| `output/thumbnails/swapped/` | Body-swapped results |
| `output/thumbnails/finals/` | Final thumbnails with headlines |
## Scripts (Reuse, Don't Recreate)
| Script | What to Reuse |
|--------|--------------|
| `scripts/generate-thumbnail.mjs` | Flux LoRA generation, NB2 polling, fal.ai storage upload, face-swap prompt, image download |
| `scripts/face-swap-thumbnail.mjs` | Quick one-off face-swap utility for single thumbnails |
All fal.ai calls (NB2 /edit, Flux LoRA, storage upload) follow the same pattern as these scripts. Use `curl` or inline Node.js via bash - no new scripts needed.
## API Quick Reference
| API | Endpoint | Auth |
|-----|----------|------|
| fal.ai NB2 generate | `https://queue.fal.run/fal-ai/nano-banana-2` | `Authorization: Key ${FAL_KEY}` |
| fal.ai NB2 /edit | `https://queue.fal.run/fal-ai/nano-banana-2/edit` | `Authorization: Key ${FAL_KEY}` |
| fal.ai Flux LoRA | `https://queue.fal.run/fal-ai/flux-lora` | `Authorization: Key ${FAL_KEY}` |
| fal.ai Storage | `https://rest.alpha.fal.ai/storage/upload/initiate` | `Authorization: Key ${FAL_KEY}` |
| YouTube Data API | `https://www.googleapis.com/youtube/v3/search` | `key=${YOUTUBE_API_KEY}` |
| Apify | Via MCP tools (`fetch-actor-details`, `call-actor`, `get-actor-output`) | Token: `${APIFY_API_TOKEN}` |
**FAL_KEY extraction:**
```bash
FAL_KEY=$(awk -F= '/^FAL_AI=/{print $2}' .env.local)
```
**Flux LoRA weights:** `https://v3.fal.media/files/tiger/Cubmr1zZLBb2fzlGo6Yao_pytorch_lora_weights.safetensors`
**YouTube thumbnail URL:** `https://i.ytimg.com/vi/{videoId}/maxresdefault.jpg`
SKILL.mdevals/evals.jsonLGJ-graphics
Lead Gen Jay brand illustration style system — flat SVG with floating UI elements, 30 brand logos, shared component library, Framer Motion animations, and consistent design tokens
brand-image-generator
Generate brand-consistent images using fal
google-tracking
Configure Google tracking (GTM, GA4, tag scanning) and the full tracking infrastructure for Lead Gen Jay pages
hyros-integration
Add Hyros conversion tracking to opt-in forms, checkout pages, and landing pages
landing-page-auditor
Systematic conversion audits with scoring and prioritized recommendations
motion-graphics
Business-focused motion graphics for landing pages
social-proof-builder
Craft compelling testimonials, case studies, and trust elements that convert
OpenClaw Mastery Pack
Complete OpenClaw skill stack — 8 skills covering setup, Google/Slack integration, security hardening, cron automation, MCP servers, troubleshooting, and custom skill building
Enter your AI Automation Insiders email to get full access to documentation and the install command.
Script Instagram Reels and TikTok talking-head videos (15-90s). Generates 6 hook options from a 100-hook database, writes teleprompter-ready scripts with production briefs, duration scaling, and voice enforcement checks.
Find unlimited short-form video ideas through competitor tracking, scraping, outlier analysis, trending topics, and quick research. 5 modes: manage competitors, scrape content, analyze what's working, discover trends, and fast topic research with scoring.
Lead Gen Jay brand illustration style system — flat SVG with floating UI elements, 30 brand logos, shared component library, Framer Motion animations, and consistent design tokens.
Generate brand-consistent images using fal.ai FLUX.1. Prompt templates for hero images, feature graphics, testimonial photos, and marketing visuals.
Configure Google tracking (GTM, GA4, tag scanning) and the full tracking infrastructure for Lead Gen Jay pages.
Add Hyros conversion tracking to opt-in forms, checkout pages, and landing pages. Server-side lead tracking via API, client-side checkout intent, and Whop purchase integration.
Systematic conversion audits with scoring and prioritized recommendations. Identify conversion blockers, CTA weaknesses, copy issues, and mobile problems.
Business-focused motion graphics for landing pages. Combines Lottie animations, Framer Motion components, and Remotion video generation for conversion-focused micro-interactions.
Craft compelling testimonials, case studies, and trust elements that convert. Templates for social proof sections, credibility builders, and result-driven narratives.
Complete OpenClaw skill stack — 8 skills covering setup, Google/Slack integration, security hardening, cron automation, MCP servers, troubleshooting, and custom skill building.
AI Automation Insiders members get unlimited access to all exclusive skills, commands, and agents — plus weekly updates.
Already a member? Scroll up to enter your email.