{"ok":true,"host":"xona.x402.fi","status":"ready","manifest":{"positioning":"Xona is a small-footprint AI media generation host offering two independent capabilities: converting text to spoken audio and generating images from text or image prompts. It serves agents that need on-demand media asset creation without subscriptions, using Solana micropayments per request. It does not offer streaming, voice cloning, image analysis, or model fine-tuning.","host_overview":"xona.x402.fi provides two AI media generation endpoints: text-to-speech audio synthesis and text-to-image generation via the Qwen multimodal model. Both are single-shot, pay-per-request services billed at $0.05 per call on Solana via x402 payment gating.","routing_guidance":"Use xona.x402.fi when an agent needs single-shot text-to-speech audio or text-to-image generation with per-request Solana micropayments and no subscription overhead. Do not use it for real-time or streaming audio synthesis, voice cloning, image classification, object detection, or image editing — those capabilities are absent. For image analysis or vision tasks, route to a multimodal vision API. For high-volume or streaming TTS, consider dedicated TTS platforms with streaming support. Both skills are independent and low-cost, making this host suitable for lightweight, infrequent media generation tasks.","capability_clusters":[{"skill_names":["synthesize-text-to-speech"],"cluster_name":"Audio Synthesis","cluster_summary":"Converts a text string into a natural-sounding audio file in a single API call, suitable for narration, accessibility output, or spoken responses."},{"skill_names":["generate-qwen-image"],"cluster_name":"Image Generation","cluster_summary":"Generates images from text prompts or combined text-and-image inputs using the Qwen multimodal model, producing a single image per request."}],"cross_skill_workflows":[{"steps":[{"skill_name":"generate-qwen-image","description":"Generate an image from a text prompt describing the subject or scene."},{"skill_name":"synthesize-text-to-speech","description":"Convert a descriptive or narration text about the same subject into spoken audio to accompany the generated image."}],"when_to_use":"Use when an agent needs to produce both a visual image and a spoken audio description or narration for the same subject in one pipeline.","workflow_name":"Multimedia Asset Creation"}]},"model":"claude-sonnet-4-6","version_no":2,"generated_at":"2026-05-20T04:39:27.685Z","provenance":"ai_authored_unreviewed","ai_authored":true,"merchant_reviewed":false,"merchant_edited":false,"merchant_reviewed_at":null,"merchant_edited_at":null,"skill_md_url":"https://x402gle.com/servers/xona.x402.fi/SKILL.md","skills_url":"https://x402gle.com/servers/xona.x402.fi/skills.json"}