Turn any image
into speech, instantly

Upload an image or paste a URL. Our AI describes what it sees, then speaks it aloud in any of 60+ languages. One API call — description and MP3 back together. Pay with Bitcoin, Ethereum, USDT. No credit card, no KYC.

Try live demo
No credit card required
Pay with crypto
Live in < 5 minutes
Live Playground3 free credits remaining

Try an example:

Live demo • Audio expires after 24 hours

Why developers choose img2voice

No SDKs to install, no vision pipeline to build, no surprise bills. Just image in and MP3 out.

Any image, instant speech

Send a JPEG, PNG, GIF, or WebP — by URL, file upload, or base64. Our AI describes what it sees and speaks the description aloud. One endpoint, three input methods.

Output in 60+ languages

Get the spoken description in any language — English, Japanese, Arabic, Portuguese, and 60+ more. Auto-translation is built in. No separate translation API needed.

Fast enough to feel instant

Image in, MP3 URL back typically in under 3 seconds. No polling, no queuing — the response contains everything you need.

One endpoint, no SDK needed

POST your image, get back an MP3 URL and the AI description. That's it. A single curl command gets you started in seconds. No client libraries, no auth flows.

Built for crypto and Web3

Pay with Bitcoin, Ethereum, USDT and more — no bank account, no credit card, no KYC. Credits never expire, so there's no pressure to use them before a monthly deadline.

Usage you can actually see

Credits used, requests made, and full history — all in one dashboard. Low-credit alerts by email so your app never goes silent unexpectedly.

How it works

From zero to spoken image descriptions in four steps.

01

Get your API key

Sign up with your email — your free API key arrives instantly. No credit card, no waiting. The free tier includes 25 image credits.

02

Send your image

POST to https://api.img2voice.com/v1 with your IMG2VOICE-API-KEY header. Send a URL, a base64 string, or upload a file directly.

03

AI describes it

Our vision model analyses the image and generates a natural-language description. You choose brief, standard, or detailed. The description is also returned in the response.

04

Receive your MP3

The description is spoken aloud in your chosen voice and language. You get back a signed audio URL valid for 24 hours — serve it directly or download it for permanent storage.

Straightforward pricing

Start free. Top up with crypto when you need more. No card, no KYC, no monthly deadlines — credits never expire.

Free
Try it, no card needed
$0to start
25 image credits
  • 25 image credits
  • 5 voices
  • 60+ languages
  • MP3 output
  • Auto-translation
  • Usage dashboard
  • WAV output
  • Batch processing
  • Webhooks
  • CSV export
Starter
Side projects & quick integrations
$5one-time
100 image credits
  • 100 image credits
  • 5 voices
  • 60+ languages
  • MP3 + WAV output
  • Auto-translation
  • Usage dashboard
  • Webhooks
  • CSV export
  • Batch processing
Best value
Standard
Apps, bots & content pipelines
$20one-time
500 image credits
  • 500 image credits
  • 5 voices
  • 60+ languages
  • MP3 + WAV output
  • Auto-translation
  • Usage dashboard
  • Batch processing (20 images)
  • Webhooks
  • CSV export
Pro
High-volume apps & agencies
$60one-time
2,000 image credits
  • 2,000 image credits
  • 5 voices
  • 60+ languages
  • MP3 + WAV output
  • Auto-translation
  • Usage dashboard
  • Batch processing (20 images)
  • Webhooks
  • CSV export

What happens when you run out of credits?

API requests return a quota_exceeded error — no silent failures, no overage charges. We'll email you when you're down to 5 credits and again at 0. Top up any time from your dashboard.

Need 10,000+ images? Max pack: $200

Or contact us for volume discounts and custom SLAs — hello@img2voice.com.

Get in touch

Pay with Bitcoin, Ethereum, USDT, and 100+ other cryptocurrencies via NowPayments. No bank account required. 1 credit = 1 image, always. Credits never expire.

One endpoint. Three input methods.

Built for developers

A clean REST API. POST an image, get back a spoken description as an MP3. No SDKs required. No vision pipeline to wrangle.

1
Three input methods
Send image_url, image_base64, or upload a file via multipart/form-data.
2
Detail control
Set detail to brief, standard, or detailed to control description verbosity.
3
Audio URL + description
The response includes both the AI description text and a signed audio URL valid for 24 hours.
bash — request
curl -X POST https://api.img2voice.com/v1 \
  -H "IMG2VOICE-API-KEY: sk_live_your_api_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "image_url": "https://example.com/photo.jpg",
    "voice_id": "eve",
    "language": "en",
    "detail": "standard"
  }'
json — response
{
  "id": "img_a640c7d5a74f45d3a599df20",
  "status": "completed",
  "description": "A golden retriever puppy playing in autumn leaves, tail mid-wag, mouth open in a joyful expression.",
  "audio_url": "https://api.img2voice.com/audio/img_a640...mp3?expires=1775834406103&sig=857...",
  "duration_seconds": 5.2,
  "voice_id": "eve",
  "image_credits_used": 1,
  "image_credits_remaining": 24,
  "created_at": "2026-04-09T15:20:06Z"
}

Built for

Wherever images meet your users, img2voice fits without friction.

Accessibility tools

Automatically narrate images for visually impaired users. Turn photos, charts, and diagrams into clear spoken descriptions.

Photo & media apps

Let users hear what's in their photos. Add audio captions to galleries, social feeds, or camera rolls without any manual work.

AI agents & pipelines

Feed images into your agent and get audio output. Ideal for multimodal workflows where you need to go image → description → speech in one step.

DeFi & crypto apps

Narrate charts, NFT artwork, and on-chain activity. Pay with crypto, no KYC, no bank account needed.

Multilingual content

Describe images and deliver the narration in any of 60+ languages. Great for global apps that need localised audio without a localisation team.

Indie hackers

Add image narration to your product over a weekend. No vendor lock-in, no complex setup, predictable one-time credit costs.

Ready to give your
images a voice?

Get your free API key and start converting images to speech in minutes. Upload a file, paste a URL, or POST base64 — we handle the rest. No credit card. No sales call. Just image in, MP3 out.

25 free image credits • No credit card • Credits never expire