A Complete Guide to Using AI Agents for Product Launch Videos, Growth Marketing, and Business Plans

Introduction: Launch Is Just the Beginning

After a product goes live, the real challenges are just getting started — you need to create launch videos, promote across multiple platforms, prepare BP (Business Plan) materials for fundraising, and continuously iterate on the product. For indie developers, this workload alone can be overwhelming.

In Episode 10 of the StoryCam series, the author demonstrates a highly inspiring workflow: letting an AI Agent handle nearly all the work from video production to growth marketing, compressing what would normally require team collaboration into something achievable by one person plus one Agent. This isn't just an efficiency improvement — it's a fundamental shift in how indie developers work.

Making Product Launch Videos with AI

From Remotion to HTML-Based Video Generation

In recent months, two notable tool directions have emerged in AI video generation:

Remotion: A video generation framework based on React frontend components that gained popularity about three months ago, capable of generating PPTs, presentation videos, and other content
HTML-based video generation: A newer direction that pushes the abstraction layer down to the HTML level

Remotion's Technical Background: Remotion was created by Jonny Burger in 2021, with the core philosophy of "writing videos the same way you write React components." Developers can use JSX to describe the visual state of each frame, control animations through a timeline API, and ultimately render to MP4 frame-by-frame using a headless browser (Headless Chrome). The advantage is that video content is fully programmable, version-controllable, and seamlessly integrated with the frontend ecosystem. "HTML-based video generation" pushes the abstraction layer even further down — instead of relying on a React component tree, it directly manipulates HTML/CSS/Canvas, allowing AI models to describe visual content in the language they know best. The essential difference: Remotion is better suited for engineers who want fine-grained control, while the HTML approach is better suited for AI Agents generating content autonomously, because LLMs have far more training data on HTML/CSS than on React DSL, resulting in higher generation quality and stability.

An industry consensus is forming: The two languages that LLMs express best are Markdown and HTML. Using HTML for video or PPT presentations naturally fits within AI Agents' capability boundaries.

AI Agent mimicking user input to create video

Letting the Agent Complete Video Production Autonomously

The approach is remarkably straightforward: tell the Agent "I want to publish a video" and let it figure out what additional information it needs. The Agent proactively asks these key questions:

Target audience: General users, AI enthusiasts, or early-stage investors?
Story examples: What story should the video tell?
Video specifications: Style, length, aspect ratio, etc.

With simple instructions — "general users, Chinese, 16:9, 45 seconds" — the Agent automatically completes the following:

Writing a 45-second video script
Automatically opening the product website to capture UI screenshots
Assembling everything into a complete launch video

While the final product isn't perfectly precise in some selection areas and still lacks voiceover and sound effects, the basic product demonstration is effectively communicated. The Agent also provides a simple editor interface where you can adjust different frames, font sizes, and other details — similar to video editing tools like CapCut.

From Video to Full-Platform Growth Marketing

One-Click Multi-Platform Content Distribution

After the video is done, you can directly have the Agent post the launch video to Twitter (first to a preview environment for verification). This concept extends to a complete growth pipeline:

Have the Agent write Twitter threads, posts, and other format-specific content
Generate corresponding materials for different platforms like Xiaohongshu (RED), WeChat Video Channels, etc.
Batch distribute to 10+ platforms

This is exactly the growth capability indie developers need most — pushing the product out, pulling users back in, and continuously iterating. The closed loop from prototype to product relies on exactly this kind of automated promotion mechanism.

Complete product workflow

Even the Business Plan Goes to the AI Agent

Even BP (Business Plan) creation can be handed off to an AI Agent. The method is simple: convert the questions investors care about into prompts and rapidly generate them within the Agent's conversational framework. This improves fundraising preparation efficiency by an order of magnitude.

Deployment and Technology Stack Selection Logic

Full-Stack Deployment Automation

At the deployment level, several key technical choices are worth noting:

Model integration: Using Volcengine (ByteDance), whose documentation already supports Markdown format copying — a better experience than some other cloud providers
Full-stack deployment: Bound to the GitHub main branch, automatically hot-updating the production environment whenever a new version is released, with old versions gracefully retiring
Environment variables: Automatically managed for production configuration via CLI
Monitoring: Built-in logs and metrics tracking CPU, memory, and other performance indicators

Component-Based Strategy for Frontend Efficiency

Frontend component library

A critical selection principle: Frontend components must simultaneously support CLI, SDK, and MCP, so that AI Agents can directly read and use these components. Common UI elements like buttons, calendars, and login cards don't need to be rewritten — just let the Agent call existing components directly.

Understanding the Importance of the MCP Protocol: MCP (Model Context Protocol) is a standard protocol proposed and open-sourced by Anthropic in late 2024, designed to solve the "last mile" connection problem between AI models and external tools/data sources. Before MCP, every AI application needed to implement its own tool-calling adaptation layer, leading to severe ecosystem fragmentation. MCP defines a unified Server-Client architecture: tool providers (such as component libraries, databases, file systems) implement an MCP Server, while AI Agents act as MCP Clients, discovering and calling these tools through a standardized JSON-RPC protocol. For frontend component libraries, supporting MCP means AI Agents can directly "understand" a component's props definitions, usage examples, and design specifications, enabling precise component calls when generating code rather than fabricating potentially non-existent APIs. CLI is for human developers, SDK is for code-level calls, and MCP is the interface layer specifically designed for AI Agents — all three must coexist to cover every human-machine collaboration scenario.

To avoid the product looking too "AI-generated," some detail polishing is still needed:

Replacing default icons with custom designs
Extracting elements from UI design mockups (slicing) and converting them to SVG vectors
Adjusting small details to give the interface more polish

Next Steps in Product Iteration: Compressing Three Steps into One

Trend Insights from Twitter

Popular Storyboard generation methods on Twitter

A significant trend worth noting: GPT's image generation capability can now complete scripts, storyboard text, and storyboard images all in one step.

The Film Industry Background of Storyboards: Storyboards originated in Disney's animation studios in the 1930s, initially created to preview animation sequences before full production and save expensive hand-drawing costs. The standard format includes three layers: scene description text, shot composition sketches, and transition annotations between shots. In Hollywood's industrial workflow, going from script to storyboard to animatic typically requires professional storyboard artists spending weeks. In the short-video era, this process has been greatly simplified, but the core logic remains unchanged: first determine narrative rhythm and visual language, then proceed with actual shooting or generation. The breakthrough of multimodal models like GPT-4o is their ability to simultaneously understand textual narrative logic and visual composition rules, merging "write script → draw storyboard → generate images" — three steps that originally required different professional skills — into a single inference process. Work that previously required directors, screenwriters, and storyboard artists collaborating can now be completed by one person plus one model for the entire pre-production phase.

Specifically, the current popular approach is:

Below is the Storyboard, containing storyboard script text and numbered shot sequences
Above is the video generated directly from the storyboard
A single text-to-image model handles the script, storyboard text, and storyboard images all at once

This means the original "three to four step" process can be compressed to two steps or even one: users no longer need to write scripts, don't need to write storyboard text, and don't need to separately generate storyboard images.

Product Vision for Native Mobile Apps

Based on this insight, the next product roadmap looks like this:

Template system: Use AI Agents to continuously crawl high-quality Storyboard examples from Twitter, categorize and organize them into templates
Image-to-image generation: Generate new storyboards based on reference templates
Touch interaction: Support intuitive editing on mobile with circling, dragging, and other gestures
One-click generation + publishing: Modified storyboards are sent directly to a video generation model, producing 10-15 second short videos with direct publishing to Douyin, WeChat Video Channels, and Xiaohongshu

This direction is particularly well-suited for a native iOS App, with a clear business model — charging per video generation.

Conclusion: AI Agents Are Redefining Indie Development

The most valuable aspect of this case study isn't any specific tool usage technique, but rather the entirely new product development and operations paradigm it demonstrates:

Development phase: AI Agent writes code, builds architecture, integrates APIs
Launch phase: AI Agent manages deployment, configures environment variables
Marketing phase: AI Agent creates launch videos, generates multi-platform content, auto-distributes
Fundraising phase: AI Agent assists in writing business plans
Iteration phase: AI Agent monitors data, tracks bugs, continuously optimizes

The History and Evolution of the "One-Person Company" Model: The "One-Person Company" (Solo Founder) isn't a new concept born from the AI era. As early as 2019, the book Company of One systematically articulated the viability of this business model — through extreme focus, outsourcing non-core work, and using SaaS tools to replace employees, a single founder can operate a company generating millions of dollars in annual revenue. Pieter Levels (founder of Nomad List) is the most famous practitioner of this model, independently operating multiple products with cumulative annual revenue exceeding $3 million. The emergence of AI Agents has dramatically raised the ceiling of this model — previously, the bottleneck for one-person companies was the founder's time and skill boundaries; AI Agents are essentially a "capability amplifier" that doesn't replace the founder's judgment and product intuition but can automate execution-layer work. The core constraint has shifted from "how much can you do" to "how many Agents can you orchestrate."

Of course, current costs still need consideration — each video generation costs roughly a few yuan, mainly consumed in the text-to-video step. But as model costs continue to decline, this "one-person company" model will become increasingly viable.

The real moat isn't whether you can use AI, but whether you can orchestrate AI Agents into a complete productivity closed loop.

Key Takeaways

Using HTML-based video generation tools, AI Agents can autonomously complete script writing, screen capture, and assembly for a 45-second product launch video
AI Agents can handle multi-platform content generation and distribution, covering growth marketing across 10+ platforms including Twitter and Xiaohongshu
The core principle for frontend component selection: must simultaneously support CLI, SDK, and MCP so Agents can call them directly
GPT's image generation capability can now complete scripts, storyboard text, and storyboard images in one step, compressing what was previously a multi-step process
The next product direction is a native mobile app that achieves a minimalist short-video creation experience through template systems + touch interaction + one-click generation and publishing