A mobile PWA that turns a spoken idea into characters, illustrations, and narration
StoryMine takes a story a child tells out loud, creates a main character that resembles the child, draws an illustration for every page, and reads it aloud — a mobile-first storybook PWA. It supports Korean and English, and it's designed as a small but complete service (bookshelf, series, download, billing), not a one-shot toy.
Speak/Type Design Write Art + Narration (parallel) Reader
(STT·photo) ─▶ story structure ─▶ page text ─▶ gpt-image-1 ┐ auto-advance
name·age (gpt-5.4) (gpt-5.4) Neural TTS ┘─▶ store ─▶ ◀ prev/next ▶
┌──────────────────────────── Azure ────────────────────────────┐
mobile browser │ │
(PWA, single HTML)│ Container Apps (FastAPI) │
│ HTTPS │ │ keyless (Managed Identity, AAD token) │
└───────────┼─────▶│──▶ Azure OpenAI gpt-5.4 (story design·writing) │
│ │──▶ Azure OpenAI gpt-image-1 (page illustrations) │
│ │──▶ Azure AI Speech Neural TTS (page audio) │
│ │──▶ (vision) multimodal — photo→description, NOT stored │
│ │ │
│ └──▶ PostgreSQL Flexible Server (keyless/MI) │
│ · users·sessions·books·pages·usage(billing) │
│ · assets table (BYTEA): cover·art·audio bytes │
└────────────────────────────────────────────────────────────────┘
assets table (BYTEA), served back via an app proxy (/api/assets/...) so the browser never touches storage directly.| Component | Role | Note |
|---|---|---|
| Azure OpenAI gpt-5.4 | story design + page writing + photo→description (vision) | keyless (AAD) |
| Azure OpenAI gpt-image-1 | page illustration 1024² | text anchor keeps the character consistent |
| Azure AI Speech (Neural TTS) | page narration (SSML) | ko/en preset voices, keyless (resource_id) |
| Container Apps | FastAPI hosting (scale 0–N) | image from ACR |
| PostgreSQL Flexible | books·pages·usage + asset bytes | keyless (MI), assets in one place |
Image models have no seed to guarantee the same person across pages. So the English character description (anchor) produced by vision is prepended to every page prompt to keep the same hero. Meanwhile each page is forced to differ in action/setting/camera angle, so the art actively reflects the story (no near-duplicate pages).
Art and audio are independent per page, so they're generated concurrently via a ThreadPool (tunable). Drawing 8–16 pages sequentially takes minutes; parallelism cuts the perceived wait dramatically, paired with an async UX that lets you read or start another book while one is generating.
OpenAI, Speech and the DB are all accessed keyless via AAD token / Managed Identity — no key leakage or rotation burden, no secrets in environment.
Central governance periodically locks storage publicNetworkAccess. Since all assets are app-proxied (the browser never hits storage directly), the image/audio bytes were moved into a Postgres assets (BYTEA) table, making them governance-proof while keeping URLs unchanged.
The photo is used only at the instant a character description is created, then discarded. What's kept is just the login account + the child's nickname + a text character description; the thumbnail is an anonymous name-based avatar (SVG), not a photo. On startup, any previously-stored photo assets are purged.
Books, child profiles, and usage are all isolated by login user_id (no mixing of shelves). Billing is free (1 lifetime) / $9 (10/mo) / $19 (30/mo), enforced via a usage table (the limit is opened up wide during development).
Because it's a children's service, data protection comes first.
Regulatory specifics change over time, so this assumes verification against official sources (
law.go.kr,pipc.go.kr).
A private, kid-as-hero storybook service on keyless Azure AI (story·art·voice) + Container Apps + Postgres, with data minimization (no photo storage) and parallel generation for a fast feel as its core values.