← Home

Technical FAQ

BanaToon Technical Guide

Check out in-depth answers about BanaToon's technical background and architecture. Explains how Cloudflare Workers, R2, D1, and Google Gemini technologies are applied.

Q. How is BanaToon's server architecture configured?

BanaToon runs on a 100% Serverless architecture based on Cloudflare Workers. Without traditional servers (like EC2), code deployed to over 300 edge locations worldwide handles user requests at the nearest location. This eliminates infrastructure management burdens and allows automatic infinite scaling even during traffic spikes.

Q. What AI model is used for image analysis?

We utilize Google's multimodal AI "Gemini Pro Vision" to accurately grasp the context and content of images. Instead of simply transforming pixels, Gemini understands the expressions, actions, and situations of figures in the photo as text (Captioning), and creates an optimized webtoon style generation prompt based on this.

Q. Are uploaded images stored safely?

Yes, security and privacy are top priorities. Uploaded images are encrypted and stored in Cloudflare R2 Object Storage. R2 is compatible with AWS S3 while applying strict access control policies (IAM) to prevent unnecessary data leakage.

Q. When is uploaded data deleted?

BanaToon follows an "Ephemeral Storage Policy". After conversion is complete and the user downloads the result or saves it to the gallery, original and temporary data are physically permanently deleted within 24 hours by Cloudflare R2 Lifecycle Policy.

Q. What technology is used for the database?

We use Cloudflare D1 (SQLite-based edge SQL DB) fully integrated with the global edge network. User metadata or session information is replicated and stored in edge nodes worldwide, not in a specific region, enabling fast data retrieval with no latency from any country.

Q. Is there any speed degradation when processing large images?

We utilize Cloudflare Workers' edge computing capabilities to perform image resizing and format optimization (WebP conversion) at the edge server closest to the client. This minimizes data transfer to the origin server and dramatically reduces network bottlenecks.

Q. Why choose Cloudflare ecosystem over AWS or Azure?

The biggest reason is the "Zero Egress Fee" policy and "Edge Computing without Cold Start". Due to the nature of image/video services, bandwidth costs are high, but we can save this through Cloudflare R2 to continue providing free services to users. Also, Workers' 0ms cold start maximizes user experience.

Q. How does video webtoon conversion work?

Video processing goes through a pipeline of Frame Extraction -> Gemini Analysis -> Style Transfer. Main keyframes are extracted from the client browser (WebAssembly) or edge worker, and processed in parallel to reconstruct the video into continuous webtoon cuts. Algorithms maintaining temporal consistency are applied in this process.

Q. Network optimization technology for global service provision?

Through Cloudflare's Anycast Network, users are automatically connected to the nearest data center geographically. Also, static assets (JS, CSS, images) are cached in CDNs worldwide, and dynamic API requests are routed via the shortest path through Workers.

Q. How is the consistency of AI generated results maintained?

We utilize Seed fixing and ControlNet technology to maintain the same character or style. The pipeline is designed to maintain the features (Feature Map) of the initial character analyzed by Gemini while changing only the style, so that even if multiple photos are transformed, the same person can be recognized.

Q. Are there API call limits or rate limits?

Currently, dynamic adjustment algorithms are applied according to Cloudflare Workers' rate limiting and Gemini API quotas. If traffic surges, a queue system operates to prevent server overload and guarantee stable processing.

Q. What technical features will be added in the future?

We are preparing for Long Context analysis of long videos through the introduction of the next-generation Gemini 1.5 Pro model. Through this, we are developing a function that understands the storyline of the entire video, not just simple cut conversion, and automatically generates speech bubbles and sound effects suitable for it.

Need deeper technical discussion? Contact Dev Team