Turn any image
into speech, instantly
Upload an image or paste a URL. Our AI describes what it sees, then speaks it aloud in any of 60+ languages. One API call — description and MP3 back together. Pay with Bitcoin, Ethereum, USDT. No credit card, no KYC.
Try an example:
Why developers choose img2voice
No SDKs to install, no vision pipeline to build, no surprise bills. Just image in and MP3 out.
Any image, instant speech
Send a JPEG, PNG, GIF, or WebP — by URL, file upload, or base64. Our AI describes what it sees and speaks the description aloud. One endpoint, three input methods.
Output in 60+ languages
Get the spoken description in any language — English, Japanese, Arabic, Portuguese, and 60+ more. Auto-translation is built in. No separate translation API needed.
Fast enough to feel instant
Image in, MP3 URL back typically in under 3 seconds. No polling, no queuing — the response contains everything you need.
One endpoint, no SDK needed
POST your image, get back an MP3 URL and the AI description. That's it. A single curl command gets you started in seconds. No client libraries, no auth flows.
Built for crypto and Web3
Pay with Bitcoin, Ethereum, USDT and more — no bank account, no credit card, no KYC. Credits never expire, so there's no pressure to use them before a monthly deadline.
Usage you can actually see
Credits used, requests made, and full history — all in one dashboard. Low-credit alerts by email so your app never goes silent unexpectedly.
How it works
From zero to spoken image descriptions in four steps.
Get your API key
Sign up with your email — your free API key arrives instantly. No credit card, no waiting. The free tier includes 25 image credits.
Send your image
POST to https://api.img2voice.com/v1 with your IMG2VOICE-API-KEY header. Send a URL, a base64 string, or upload a file directly.
AI describes it
Our vision model analyses the image and generates a natural-language description. You choose brief, standard, or detailed. The description is also returned in the response.
Receive your MP3
The description is spoken aloud in your chosen voice and language. You get back a signed audio URL valid for 24 hours — serve it directly or download it for permanent storage.
Straightforward pricing
Start free. Top up with crypto when you need more. No card, no KYC, no monthly deadlines — credits never expire.
- 25 image credits
- 5 voices
- 60+ languages
- MP3 output
- Auto-translation
- Usage dashboard
- WAV output
- Batch processing
- Webhooks
- CSV export
- 100 image credits
- 5 voices
- 60+ languages
- MP3 + WAV output
- Auto-translation
- Usage dashboard
- Webhooks
- CSV export
- Batch processing
- 500 image credits
- 5 voices
- 60+ languages
- MP3 + WAV output
- Auto-translation
- Usage dashboard
- Batch processing (20 images)
- Webhooks
- CSV export
- 2,000 image credits
- 5 voices
- 60+ languages
- MP3 + WAV output
- Auto-translation
- Usage dashboard
- Batch processing (20 images)
- Webhooks
- CSV export
What happens when you run out of credits?
API requests return a quota_exceeded error — no silent failures, no overage charges. We'll email you when you're down to 5 credits and again at 0. Top up any time from your dashboard.
Need 10,000+ images? Max pack: $200
Or contact us for volume discounts and custom SLAs — hello@img2voice.com.
Pay with Bitcoin, Ethereum, USDT, and 100+ other cryptocurrencies via NowPayments. No bank account required. 1 credit = 1 image, always. Credits never expire.
Built for developers
A clean REST API. POST an image, get back a spoken description as an MP3. No SDKs required. No vision pipeline to wrangle.
image_url, image_base64, or upload a file via multipart/form-data.detail to brief, standard, or detailed to control description verbosity.curl -X POST https://api.img2voice.com/v1 \
-H "IMG2VOICE-API-KEY: sk_live_your_api_key_here" \
-H "Content-Type: application/json" \
-d '{
"image_url": "https://example.com/photo.jpg",
"voice_id": "eve",
"language": "en",
"detail": "standard"
}'{
"id": "img_a640c7d5a74f45d3a599df20",
"status": "completed",
"description": "A golden retriever puppy playing in autumn leaves, tail mid-wag, mouth open in a joyful expression.",
"audio_url": "https://api.img2voice.com/audio/img_a640...mp3?expires=1775834406103&sig=857...",
"duration_seconds": 5.2,
"voice_id": "eve",
"image_credits_used": 1,
"image_credits_remaining": 24,
"created_at": "2026-04-09T15:20:06Z"
}Built for
Wherever images meet your users, img2voice fits without friction.
Accessibility tools
Automatically narrate images for visually impaired users. Turn photos, charts, and diagrams into clear spoken descriptions.
Photo & media apps
Let users hear what's in their photos. Add audio captions to galleries, social feeds, or camera rolls without any manual work.
AI agents & pipelines
Feed images into your agent and get audio output. Ideal for multimodal workflows where you need to go image → description → speech in one step.
DeFi & crypto apps
Narrate charts, NFT artwork, and on-chain activity. Pay with crypto, no KYC, no bank account needed.
Multilingual content
Describe images and deliver the narration in any of 60+ languages. Great for global apps that need localised audio without a localisation team.
Indie hackers
Add image narration to your product over a weekend. No vendor lock-in, no complex setup, predictable one-time credit costs.
Ready to give your
images a voice?
Get your free API key and start converting images to speech in minutes. Upload a file, paste a URL, or POST base64 — we handle the rest. No credit card. No sales call. Just image in, MP3 out.
25 free image credits • No credit card • Credits never expire