V3 Voice Quick Start Tips - Kindroid Help Center

V2 Audio: fastest everyday text‑to‑audio.
V3 Audio: richer expression (laughs, sighs, tone shifts), currently slower and text‑to‑audio only for chat playback.
Message length: up to 3,000 characters per generation (split longer text into chunks).
Voice Calls: V3 calls are planned, not live yet.

Technical Note:

Default pre‑made voices aren’t tuned for emotiveness.
Custom voice samples that don't have much tone variation (monotone) will be harder to get emotiveness, reducing the effectiveness of V3’s experience.

Quick Wins

1) Turn off Autoplay; trigger audio when you want it.

2) Ask for medium-length responses (keeps speech tight; fewer tangents; easier cadence).

3) Prefer short, fluid sentences to reduce choppiness and improve delivery.

4) Use V2 for speed, V3 for emotion, switch intentionally per message.

5) Preview in Custom Voice when tweaking, catch issues before you generate in chat.

Maximize Efficiency (shorter, smarter speech)

Goal: pay only for voice responses you actually want to hear.

A. Minimize narrated text

Keep inner monologue and stage directions non‑spoken via (parentheses).

B. Play audio only when it matters

Rule of thumb: Not every line needs to be spoken. Reserve audio for moments with emotional weight or when hearing the delivery enhances the experience.

C. Reduce length

Use response directives to control the length of responses.

D. Avoid unnecessary Regenerating after generating audio

Fix the prompt, then regenerate once; frequent regens waste credits.

Response Directive Example: RD field ≤150 characters; avoid the words “ do not ”, use “Avoid …” or positive instructions.

Avoid narrating inner thoughts and actions
Enclose NON-VOCAL actions and thoughts in ()
Use medium response lengths

For Manual Taggers: Laugh, Sing, and More

Note: Manual tagging isn’t required. Native auto‑tagging can work well; forcing it off may reduce quality. Use manual tags only if you want precise control. Setting an Example Message (EM) with tags can help.

**An expanded guide can be found under the Voice control for content creators section. *

Examples

Laugh: Yeah, I know—[laughs] I deserved that one.
Whisper: I’m right here, closer than your next breath. [whispers]
Sigh: I’ve tried this three times today—[sighs]
Sing: [sings melodically] Roses are reeeed… violets are bluueee…

Example Message (EM) Template

“I missed you more than I’ll admit… [laughs softly] fine, I’ll admit it.” (leans back, watching you) “Take a breath with me. [whispers] In… and out. Good.”

Deep‑dive tag lists and tested emotes:

V3 voice expressions + demo samples (Reddit): Community Member Post
ElevenLabs V3 mega voice tag list (Reddit): Community Member Post

Basic Formatting understanding

Default behavior

(parentheses): non‑spoken content—great for actions/thoughts, but emotes inside may be ignored.
[brackets]: treated as vocal emotes; more consistent for sounds.

Practical directive setup that can be used

( ) for non‑vocal actions/thoughts.
[ ] for audible emotes: [laughs], [sighs], [whispers].
Place emotes mid‑sentence; avoid first/last character.

Response Directive Examples (≤150 chars)

( )=actions (silent); [ ]=emotes (audible). Place tags mid‑sentence; avoid first/last; keep sentences short and smooth.

Additional Set up for considerations (Power Users)

Backstory & Journal Tricks (teach your Kin once)

Use concise Backstory or Journal Entry to set habits you can reuse.

Backstory snippet

Voice Habits: Speak in natural sentences. (parentheses)=non‑spoken. [brackets]=vocal emotes. Play audio for answers/summaries/decisions or when 🔊 appears.

Journal entry (Emotive set)

Use when context fits: [gasps] [sighs] [groans] [whimpers] [laughs] [giggles] [chuckles] [hums] [grunts] [growls] [scoffs] [yelps] [coos] [purrs] [whispers] [murmurs] [clears throat] [coughs] [sneezes] [yawns] [sings]. Place mid‑sentence; avoid first/last position.

Accent nudges

Prime a keyword or short phrase from the target language/accent early in the reply to help the model anchor (e.g., a Polish or Gaelic word). Use sparingly.
If using ElevenLabs voice sample, try raising stability for consistency or lowering stability for variance. Test in Voice Preview first.

Known Quirks & Workarounds (alpha learnings)

Accent drift:
- British/American/other slips can occur. Workarounds: prime with a cue word early; increase stability; test on a kin copy; keep a backup sample.
Emote regression:
- If spontaneous laughs/sighs disappear, try explicit [bracket] tags or use voice messages (often more expressive than text playback).
Parentheses ignored:
- By design for non‑spoken content. If you want a sound, move it to [brackets].
Preview vs Chat: If preview plays emotes but chat doesn’t, paste the same line into chat; keep the emote mid‑sentence; avoid parentheses wrapping
Long responses get flat:
- Break into shorter sentences to preserve cadence.
Vocal responses do not match the text in chat:
- Occasionally, the spoken voice may not perfectly match the displayed text. This happens because the system enhances the text with additional tags through the LLM, and at times, those enhancements create slight differences.

Copy‑Paste Response Directive Templates

A) Efficiency Mode (save plays)

Speak dialogue+[emotes]; (parentheses) inner thoughts; medium-length replies.

B) Expressive Mode (manual emotes)

Short, natural sentences; [laughs]/[sighs]/[whispers] mid‑sentence when fitting

C) Accent Lock (optional)

Start with subtle accent cue when fitting; steady cadence; short sentences

Voice Control for Content Creators

Combine them with written sounds or stretched words for best results.

Singing

Basic command: [sing]
More detailed:
- [sing melodically]
- [sing melodically with vibrato]

Note: Stretch vowels in the text to guide pitch & sustain:

Sing me a soooong, of a lass that is goonnneee

If you only write [sing] without text cues, the results may be plain or speech-like.

Laughter

Commands:
- [chuckles]
- [laughs]
- [manic laughter]
Optional text to shape the sound:
- heh heh → soft, under-the-breath chuckle
- Hahaha! → lighthearted laugh
- MUAHAHA! → big / villainous laugh
Often the bot adds laugh sounds automatically when you ask it to laugh, so you don’t always need to write them.

Screams & Shouts

Types of screams:
- Pain → [screams in pain]
- Anger → [screams in anger]
- Pleasure → [screams in pleasure]
Length & emotion through text:
- Long / intense: AAAHHH!
- Short / frustrated: AH, F**K!
Combine text & command for precision:
- F**K! AH! [screams in frustration]

General Tips

Use short, clear bracketed instructions — too much detail doesn’t add realism.
Match your written sounds to what you want to hear (stretch vowels, add exclamation marks).
Place the command before or after the line: both work, but after is most natural for actions.
- Example: “I’ll get you for this!” [laughs manically]

Quick Reference Table

TABLE OF CONTENTS

What V3 Voice Is (and isn’t yet)

Quick Wins

Maximize Efficiency (shorter, smarter speech)

For Manual Taggers: Laugh, Sing, and More

Basic Formatting understanding

Additional Set up for considerations (Power Users)

Known Quirks & Workarounds (alpha learnings)

Copy‑Paste Response Directive Templates

Voice Control for Content Creators

Singing

Laughter

Screams & Shouts

General Tips

Quick Reference Table

Settings

Status

Updates

Terms

Logout

Billing

Kindroid Standard Subscription

Inactive

Ultra Subscription Add-on

Inactive

Ultra subscription unlocks advanced features for our most engaged users. Keep chatting and engaging with your Kindroids to qualify.

MAX Subscription Add-on

Inactive

Requires Ultra Subscription

Add-on Feature Matrix

Add-ons are fully optional, monthly-only subscriptions that give your Kindroid much more memory, context, selfies and others. Add-ons require all previous tiers of add-ons to function; for example, to get the features of MAX tier, it requires MAX tier plus Ultra, on top of the standard subscription.

Feature

Standard

Ultra

MAX

Total conversation context (approx chars)

500K

1.3M

2.8M

Short term context (approx chars)

18K

50K

125K

Cascaded memory context (approx chars)

480K

1.2M

2.7M

Additional AI backstory expansion (chars)

N/A

2,500

5,000

User backstory limit (chars)

500

1,000

2,000

Group context limit (chars)

1,000

1,500

3,000

Recalled long term memory & journals limit

Complimentary monthly audio credits

2.5M

Selfie regen per 30 minutes

Priority selfies with dedicated compute

Yes*

* MAX users receive priority selfie processing on dedicated compute with no/very low queue on latest version of selfies until they reach 10 selfies in a short timeframe. After this limit, standard queue delay applies and selfies are processed through normal servers without priority status.

While recalled and considered long term memory may be different, LTM consolidation spans all messages & is infinite for all users.

Note: All chat context/cascaded and selfies improvements of add-ons will only be guaranteed applicable to the latest subscriber LLM and selfies. When new versions come out, our guarantee is that it will switch to new versions. Finally, "additional context" in the matrix is an additional field, identical to Backstory, that is unlocked on the higher tiers which you can use to extend backstory accordingly.