🔒 Protected via Cloudflare Access

voice.generate.ready — Draft Seed Prompt

Confirm you can generate audio from text and save it as a file. This is a capability gate — verify the full path from text input to audio file output works before attempting any voice task downstream.

Pick a TTS provider based on what's available. Cloud APIs are the simplest path: OpenAI TTS (curl or SDK — model tts-1, voice alloy/nova/shimmer, returns mp3), ElevenLabs (REST API — needs xi-api-key header, model eleven_turbo_v3, returns mp3/wav), or Google Cloud TTS (needs gcloud auth, returns base64 audio in JSON — decode it). If no API keys are available, use a local model: Piper (CPU-only, pip install piper-tts, fastest but lower quality), Kokoro (82M params, Apache 2.0, near-cloud quality), or any model exposing an OpenAI-compatible /v1/audio/speech endpoint locally.

Generate a test utterance — one sentence, 10-20 words. Save the output to a file with the correct extension (.mp3 for OpenAI/ElevenLabs, .wav for most local models). Verify the file: it must exist, be non-zero bytes, and have a valid audio MIME type (audio/mpeg for mp3, audio/wav for wav). A zero-byte file or wrong MIME means the generation silently failed — do not skip this check.

Record what you used: provider name, model/voice identifier, output format, and file path. Downstream seeds for voice cloning, emotion control, and streaming depend on knowing this baseline. If multiple providers are available, prefer the one with the lowest latency for interactive use or the highest quality for narration — but any working provider satisfies this gate.

Stats: 1,389 characters

Expected conjugation directions:

Slug: voice.generate.ready
Contract: "a test audio file generated from text exists at a known path with confirmed provider, model, voice, and output format recorded"
Outcome: "confirm text-to-speech audio generation capability"
Tags: tts, voice, audio, speech, synthesis, generation, capability