TM Taeyang Moon
Demo · Architecture write-up

Shorts Factory — automated Sora 2 shorts

A private, keyless async pipeline: human approval → Sora 2 video generation → Blob storage

Sora 2Foundry gpt-5.4Service Bus PremiumContainer Apps (KEDA)Private EndpointManaged Identity
🔗
Live siteshorts-factory-web-secure.orangeisland-6a200f98.eastus2.azurecontainerapps.io/
Open site →

This document treats the Shorts Factory demo, which automatically generates vertical shorts videos with Sora 2, as an Azure architecture case study and explains which services it uses, how they are composed, and why it was designed this way from the perspective of engineers who want to build it themselves. (Baseline date: 2026-06-14, region: East US 2)

The live demo is protected by an access password. The "Open site" link above points to the Container Apps web app address, and a login screen appears first. This is a private pilot environment, not a public demo.


At a Glance

Shorts Factory is a multi-agent pipeline that automates the flow from "one-line topic → AI suggests 3 ideas → a human approves 1 → Sora 2 generates video → Blob storage → playback/download on the web."

The key themes are human-in-the-loop approval · asynchronous queues · keyless operation(no secrets) · private network isolation.

Product principle: public trends are used only as abstract signals; original videos are not copied, clipped, transcribed, or imitated.


Architecture

                       Internet ──HTTPS(only public entry)──▶  Container Apps public ingress
                                                                  │
  ┌───────────────────────────────────────────────────────────────────────────────────────┐
  │ VNet  vnet-shorts-app (10.42.0.0/16)                                                     │
  │                                                                                          │
  │  snet-containerapps (10.42.0.0/27, delegated to Microsoft.App/environments)              │
  │  ┌───────────────────────────┐        send only 1 approved job_id    ┌───────────────────┐ │
  │  │ Container App: Web         │ ──────────────────────────────────▶ │ (Service Bus queue)│ │
  │  │ shorts-factory-web-secure  │                                     └─────────┬─────────┘ │
  │  │  - FastAPI + web UI        │                                               │ KEDA rule │
  │  │  - Generate/approve 3 ideas│        ┌──────────────────────────────────────▼─────────┐ │
  │  │  - replica 1~1 (always on) │        │ Container App: Worker  shorts-factory-worker      │ │
  │  └──────────┬─────────────────┘        │  - Queue consume → direct/review/compliance/Sora/ │ │
  │             │                          │    assemble/QA                                   │ │
  │             │                          │  - No ingress, runs only when queue has messages  │ │
  │             │                          └──────────┬───────────────────────────┬──────────┘ │
  │  snet-private-endpoints (10.42.1.0/24)            │ read/write job JSON       │ save MP4   │
  │  ┌───────────────────┐   ┌───────────────────┐    │                            │            │
  │  │ pe-shorts-blob     │   │ pe-shorts-servicebus│◀─┘  (Sender=Web / Receiver=Worker)        │
  │  └─────────┬─────────┘   └──────────┬─────────┘                                            │
  └────────────┼────────────────────────┼─────────────────────────────────────────────────────┘
               │ Private DNS            │ Private DNS
               ▼                        ▼
   Azure Blob Storage          Azure Service Bus (Premium)        ┌──────────────────────────────┐
   stshortsfacbc56b85d17       sbshortsfac6c7f27                  │ Microsoft Foundry (East US 2) │
   - shorts-jobs (job JSON)    - queue shorts-generation          │  foundry-ncxnghbrr7n6w        │
   - shorts-videos (MP4)       - duplicate detection/DLQ/5 sends   │  - gpt-5.4  (5 text agents)   │
   * Public net Disabled, keys Disabled * Public net Disabled, Private Link │  - sora-2   (vertical video) │
                                                                  └──────────────────────────────┘
                          Worker ──Managed Identity(keyless)──▶ Foundry(gpt-5.4 / sora-2)

Flow Summary 1. A user enters topic, audience, tone, and length(default 36 seconds) on the web → Web uses gpt-5.4(Strategy) to generate 3 original ideas and stores the job in Blob as awaiting_approval. 2. The user approves 1 idea → Web places only the job_id on the Service Bus queue and immediately returns 202 Accepted. 3. When a message appears in the queue, KEDA wakes the Worker → Worker runs directing(Director) → story review(Story Editor) → compliance → sequentially generates three 12-second scenes with Sora 2 → assembles with FFmpeg → performs visual QA on the final video → stores the MP4 in Blob(shorts-videos) → marks the job completed. 4. The user checks status on the web and plays/downloads the video.


Azure Services Used (Actual Deployment-Verified Values)

Layer Resource Verified Configuration Role
Runtime cae-shorts-secure (Container Apps Env) VNet-connected, East US 2 Shared runtime for Web/Worker
Web shorts-factory-web-secure External HTTPS ingress, 1 vCPU / 2 GiB, replica 1~1 FastAPI + web UI, idea generation/approval
Worker shorts-factory-worker No ingress, 1 vCPU / 2 GiB, KEDA(Service Bus) rule Queue consumption + video generation pipeline
Registry acrue2423kqbdy4k Basic Web/Worker container images
Messaging sbshortsfac6c7f27 Premium, 1 MU, queue shorts-generation, public network Disabled Approved job queue(durability, duplicate detection, DLQ)
Queue policy shorts-generation lock 5 min, max delivery 5 times, duplicate detection 10 min window, DLQ on expiration One job at a time, safe retries
Storage stshortsfacbc56b85d17 StorageV2, Standard LRS, public network Disabled, Blob public access blocked, shared key blocked Job JSON + completed MP4
Containers shorts-jobs / shorts-videos Private jobs/{id}.json / {id}/short.mp4
AI foundry-ncxnghbrr7n6w Azure AI Services, East US 2 Foundry project/model hosting
Model gpt-5.4 (ver 2026-03-05) GlobalStandard Strategy, directing, review, compliance, visual QA
Model sora-2 (ver 2025-12-08) GlobalStandard 720×1280 vertical video generation
Network vnet-shorts-app + 2 PEs + 2 Private DNS zones Private Endpoints for Blob/Service Bus Private path isolation
Identity mi-ue2423kqbdy4k / mi-shorts-worker 2 user-assigned Managed Identities Keyless authentication(role separation)

Important Design Decisions (Why This Way)

1. A "human approval" gate before paid calls

Sora video generation is expensive. Therefore POST /api/jobs creates only 3 ideas and stops there. Only after a person selects 1 idea through POST /api/jobs/{id}/approve is the job placed on the queue. The compliance agent can reject risky concepts before the Sora call, adding one more filter before money is spent. → Beginner analogy: an expensive order(video generation) proceeds only after the clerk asks, "Do you really want to pay?"

2. Web/Worker separation + Service Bus queue (no synchronous processing)

Creating one video takes several minutes. If that work runs inside a web request, it causes timeouts, duplicate charges, and scaling limits. So Web puts only the job_id on the queue and immediately returns 202, while the Worker consumes the queue and performs the heavy work. The worker is awakened by KEDA based on queue length, so it uses almost no resources when there is no work.

3. Service Bus as Premium — not for throughput, but for Private Endpoint

The queue traffic for this demo would be sufficient on Standard. The single decisive reason Premium is used is that Service Bus Private Link/Private Endpoint is supported only on the Premium tier. Premium is required to close the public endpoint and access the queue only over a VNet private path(additionally bringing dedicated throughput and AZ redundancy options). → Trade-off: Premium incurs a fixed cost even when idle. If cost matters more than security isolation, Standard(+public network+Entra authentication) or Storage Queue are alternatives.

Option Pros Cons Best Fit
Current: Service Bus Premium Private Endpoint, DLQ, duplicate detection, dedicated throughput Fixed cost When private network isolation must be maintained
Service Bus Standard Lower cost, retains most queue features No Private Endpoint(public endpoint required) When public network + Entra authentication is acceptable
Azure Storage Queue Reuses existing Storage, supports Queue PE No built-in DLQ/duplicate detection(app must implement) When cost optimization matters more than security

4. Idempotent & resumable worker

process_job first stores each stage and each per-scene Sora operation ID in the Blob job JSON, and only then starts polling. Even if the worker dies or the message is redelivered, it does not create a new paid Sora job; it resumes by looking up the saved operation ID. The storage key is also deterministic, {job_id}/short.mp4, so retries overwrite the same Blob. → The core invariant that prevents duplicate charges.

5. Keyless(Managed Identity) + least privilege/separation of duties

No connection strings or API keys are used; everything authenticates through DefaultAzureCredential(Managed Identity at runtime). - Web ID mi-ue2423kqbdy4k: AcrPull, Foundry User, Cognitive Services User, Storage Blob Data Contributor, Service Bus Data Sender - Worker ID mi-shorts-worker: same as above + Service Bus Data Receiver

By separating Sender/Receiver into different identities, Web cannot consume the queue and Worker cannot arbitrarily submit new jobs.

6. Privatized data plane (Storage/Service Bus public network Disabled)

Storage has publicNetworkAccess=Disabled, with Blob public access and shared keys both blocked. Service Bus also blocks public network access and disables local key authentication. Both services are resolved/accessed only through Private DNS → Private Endpoint. The only external opening is the Web ingress(HTTPS). → Caution: uploading directly to Blob from a local PC will fail because of the private network. It works only from the Azure runtime(in the same VNet).

7. Three 12-second scenes + last-frame continuity + visual QA

The Sora SDK supports only 4/8/12 seconds per scene. So 36 seconds is split into 12-second hook → 12-second rise → 12-second payoff, and the last frame of one scene(720×1280 PNG) is passed to the next scene as an input_reference to create continuity. FFmpeg stitches them together with 0.12-second xfade/acrossfade transitions and encodes an H.264/AAC MP4. After assembly, a visual QA agent inspects 15 actual frames; if the overall score is below 82 or any item is below 75, it partially regenerates up to 2 times starting from the failed scene(accepted scenes are preserved).


Data Flow (State Machine)

   [*] ─▶ awaiting_approval ──human approval──▶ generating ──compliance rejection──▶ rejected
                                              │
                                              ├─ temp assembly ─▶ visual_review ─ quality approved ─▶ completed
                                              │                     │
                                              │                     └ partial regen from failed scene ─▶ generating
                                              └─ provider error ─▶ failed ── transient redelivery ─▶ generating

Stages the Worker records in the Blob job JSON: directing → creative_review → compliance → video → visual_review → storage → completed. The queue lock is 5 minutes, but the SDK AutoLockRenewer renews it for up to 75 minutes; the Sora timeout is 20 minutes per scene. After 5 failures, the message moves to the DLQ.


Cost Perspective

Prices change, so do not make definitive claims. Check actual estimates with the Azure Pricing Calculator and your subscription's Cost Management.


Key Build Points / Pitfalls

  1. Service Bus Private Endpoint is Premium-only. If you create Standard, you cannot attach a PE. If private isolation is the goal, start with Premium.
  2. The Container Apps subnet must be delegated to Microsoft.App/environments, and Private Endpoints should be placed in a separate subnet.
  3. Private DNS Zones(privatelink.blob.core.windows.net, privatelink.servicebus.windows.net) must be linked to the VNet so names resolve to private IPs. If missing, the worker cannot find the queue/Blob.
  4. Turning off the public network blocks direct access from a local development PC. Use mock providers locally, and separate the implementation so real Blob/Service Bus are used only in the Azure runtime.
  5. Do not forget Sender/Receiver role separation. If both are assigned to the same identity, the permission isolation loses meaning.
  6. Do not break the idempotent resume invariant. If operation ID persistence is moved after polling, every retry creates a new paid Sora job and costs can explode.
  7. Remaining operational hardening: Foundry currently has public network access enabled(the data plane is private, but the AI plane is not yet). Application Insights tracing, DLQ/cost alerts, Storage ZRS, and Entra External ID user authentication are next steps.
  8. Third-party key review: If external API keys are found in ordinary environment variables, rotate them and move them to Container Apps secrets or Key Vault references. Never expose them in source code or logs.

One-Line Summary (Translated into Customer Value)

"A secure asynchronous pipeline that processes only human-approved expensive AI video generation, inside a private network, without secrets(Managed Identity), and resumes without duplicate charges even after failures" — a pattern that satisfies security, cost control, and durability together.


References (Microsoft Learn)

← All demosPortal home