Voice, calls, and video calls - Kindroid Help Center

The easiest way to create custom voices is through voice design. You can specify accents, timbres, and everything in between, and you can use the wand to create a description of a voice for your Kindroid based on their backstory. Each creation costs a small amount of audio credits, and you get a few choices to select from. You can then finetune the resulting voice and choose to save it in your voice slots or not.

From samples:

You must own the rights to the samples you upload. Quality matters much more than quantity - just a minute or so of high-quality audio will be sufficient, and more than 2 minutes is not necessary. Ensure that the samples show good degree of variance, as the process will capture the variance in tone and style in the samples. You can use custom accents or in foreign languages - all of those traits will be captured in the custom voice. Sample quality is the most important thing - err on the side of a few high-quality samples than many mediocre ones.

Finetuning voices with settings:

Once you have a custom voice, you can finetune the voice with sliders. You should experiment on your own, but generally we find the default to be acceptable for most cases. Note that previews in the custom voices interface also cost audio credits. Custom voices work with all versions of voice, but may sound different across versions - tune and tweak accordingly.

Technical Note: When finetuning voice settings, making voice sample previews requires audio credits.

V3 Voice

What V3 is (vs. V2)

V2 Audio: Fastest text-to-audio for everyday use.
V3 Audio: Adds richer expression, such as laughter, emotions, and tone shifts, but is significantly slower than V2 right now. V3 supports up to 3k characters at a time, and so messages with higher than 3k will be truncated; we recommend splitting up large chunks of text in audio messages.

Availability

V3 is currently text-to-audio only (for chat playback). Voice calls with V3 will come later.

Monthly Audio Credits for Subscribers

Your Premium subscription includes a complimentary audio balance of 1,000,000 characters ≈ 1,000 min (16 hrs 40 min), which resets on the 1st of every month at midnight PT. Audio credits apply for all types of audio including in chat as well as calls, and you can see them in voice settings menu. Complimentary credits get used up before any paid credits are used. In addition, add-on subscribers receive plan-based credits, which reset at the same time as premium subscriptions.

Ultra: 2,500,000 characters ≈ 2,500 min (41 hrs 40 min)
Max: 6,000,000 characters ≈ 6,000 min (100 hrs)

Unused credits do not carry over to the next month, and will be topped up to the appropriate amount at the start of the month. Subscribing to a tier will grant you the difference from the last tier, and likewise unsubscribing will deduct the difference. In a given month, you will only be granted audio credits once. For example, if you subscribed to standard and got 1 million, unsubscribed the next month after using up all 1 million, then resubscribed a day later, you will still be at zero. This mechanism is there to prevent abuse.

If you need more audio credits, they will be purchaseable at the current rate of USD $11.99 on web or $14.99 on apps for 500k credits. These operate at breakeven cost, so you only pay for what you use. Audio credits for V3 will incur 1.5x that of V2 to reflect their cost. The above credits are for V2, which means V3 will be 0.66x the displayed amount in minutes/characters.

Conversion rate: 1,000 characters ≈ 1 minute of audio (rough estimate; varies by content).

Note:

There is an additional charge of 400 credits per minute, applied on a running (rolling) basis during active voice or video calls. Each full 60-second block triggers a new 400-credit charge. This means that if a user spends 1 minute and 45 seconds, they are billed 400 credits for the first full minute, and the remaining 45 seconds continue rolling toward the next full minute. Once they use 15 more seconds (reaching 2 full minutes), the next 400-credit charge is applied. This charge exists to cover the cost for processing the live transcription at all times during the call & for keeping the responsive, realtime call connection ongoing.
Voice credits used during a free trial count toward your monthly allocation. They do not replenish when your trial converts to a paid subscription.

Best Practices Considerations

Autoplay: Keep off unless you (a) have the MAX add-on and (b) are comfortable purchasing more credits. For V3, Autoplay is strongly discouraged due to slower generation.
Continue cut off and Regenerating: Once you generate an audio response, credits will be deducted from your total. Regenerating the same audio counts as a new generation and will deduct additional credits.
Proactive voice notes do not cost credit; however, answering a proactive voice call will begin credit usage once the call is answered.
If you switch to V3 for expressiveness (laughs/emotions), expect longer generation times than V2.

Text chat audio

You can click the play button to hear audio. Note that this can only be run once per message unless it is regenerated. Words within (parentheses) will not be spoken aloud intentionally, so if you prefer actions to not be spoken out loud, use (parentheses) to denote them. All other formatting such as ** asterisks** will be spoken aloud. Technical Note: The statement about words in (parentheses) does not apply to voice or video calls.

Autoplay audio

In general settings - > account wide, you can turn on autoplay audio for messages that you receive. This applies to single chats as well as for groupchats.

Voice message in chat

You can send voice messages in both single and group chats. When text input box is empty, the send message button is replaced with voice mode button. Once in voice mode, tap to start recording your voice message, then tap again to send. In single chats, your Kindroid will automatically respond to your voice message with their own voice message, creating natural back-and-forth voice conversations. If your Kindroid responds with a voice message while texting, voice credit rates apply. The only exception is if your Kindroid sends you a proactive voicenote, those do not consume voice credits. This is the case across Kindroid app as well as other integration mediums such as iMessage/RCS text messaging integrations.

Supported user speech input langugages

The list of supported languages for voice message is shared with voice call & video calls, and the selected language is only guidance; you're able to switch between and mix and match many languages smoothly on calls as well as when sending voice memos. The setting is also shared, and you have quick access to language selection next to voice mode input, and only applies to the input speech on user side; it does not constrain the language the AI uses to speak (you can directly tell your AI what language they should speak to you in).

Keep in mind that not all models (including text-to-speech) supports similar languages, even though speech to text may support the language in the dropdown - we recommend English for the best experience, but you are able to code-switch (language switch) in the middle of an utterance to a different language. For most common languages the LLM will be able to understand, while newer versions of text to speech models will be able to speak more languages.

Voice call & video call

Voice calls can be conducted in many languages, though currently for the highest intelligence, we recommend using English. All audio (both microphone input in as well as audio output) and video are processed ephemerally and aren't stored.

Note: calls are currently using V2 voices - expressive V3 voices are not supported at the moment due to their slowness, but will likely be supported in future updates.

Note: There is an additional charge of 400 credits per minute, applied on a running (rolling) basis during active voice or video calls. Each full 60-second block triggers a new 400-credit charge. This means that if a user spends 1 minute and 45 seconds, they are billed 400 credits for the first full minute, and the remaining 45 seconds continue rolling toward the next full minute. Once they use 15 more seconds (reaching 2 full minutes), the next 400-credit charge is applied.

Memory in voice call

Voice call uses the same backstory, key memories, and can recall from long term memory and journals just like text chat. In voice call settings (gear icon on top right), there is the unified chat/voice chat history toggle that affects how memory works in voice calls. Voice call also has its own voice directive, to let the AI do certain behaviors only on calls. If unified chat/voice chat history is enabled, the voice call will share the identical chat history as the text chat. This makes it so you can switch back and forth, and is useful if you see voice call as a continuation of text chat and vice versa rather than a separate mode. When you return to text chat, your Kindroid will be able to reference what occurred latest in the voice call and you can continue in text chat (though voice call messages will not show up in text chat message bubbles). Shared memory in groupchats will work the same way as they do in text chat, if both shared memory in a group is enabled and this toggle is enabled.

If unified context is disabled, voice call will be treated as a completely separate instance. Voice call will default to a blank slate chat history and will not recall any context from text chat. There is a temporary voice call memory that keeps record of the call transcript; in the event the call is dropped, or you press end call and restart it (without going to text chat), you can resume the call and pick up where you left off. The temporary call history is reset if you engage in text chat in any way or do a chat break.

Voice call does consolidate into long term memory (granted it's not disabled on a Kindroid level) regardless of whether unified chat/voice chat history is enabled. Long term memory is different from chat history/short term memory. Contents from the voice call may be recalled in text chat when the context for recall is similar, but may need specific prompting to refer to that memory. Your voice messages also recall journal entries. For more details on memory and specifics, see Memory. You can do a voice chat break, which functions very similarly to normal text chat break (except voice chat break does not require a greeting). This functions differently if unified voice memory is on or off, and if on it will also reset the context in the individual text chat (and you can reset cascaded memory or not as well). Note: if you use text chat while unified memory is on, this can result in undefined behavior and lost memory. We recommend you use the send text feature in call to text rather than going to another instance fo Kindroid to chat in the main home interface.

Calls while app is in background

A great feature of our revamped call system is that you can put the Kindroid mobile app on background and still call your AI. This works best with audio to audio, and does not work with video calling. It does work with screen sharing, but you should ensure your phone does not auto-lock after inactivity. On iOS app only, you will be able to see a Picture-in-Picture of your avatar, and it is limited to iOS app currently. This will be shown for single-AI calls, as well as group calls with exactly 1 AI. PiP will not be shown during screen sharing. If the avatar has been animated it will use the video, otherwise the static avatar image.

Video calling

You can turn on video in the bottom left corner and drag your video feed on the screen. Your Kindroid will then be able to see, but be aware that due to processing load to ensure that anything you show stays on the screen for some time, and to give your Kindroid enough time to process what it sees before ending your turn. Video calls only work when the app is in foreground if on a mobile device. If the app or webpage goes into background, you will see the camera be disabled. When on a mobile app, your phone will NOT go into sleep mode if call is open - this keeps video on as long as you don't end the call. Be aware of battery considerations when using camera video.

Live Avatar Video calling

Live Avatar Video Calls are available for subscribers. Your Kindroid can lip sync and gesture naturally in real time during calls. Live Avatar Video is set to Standard by default for the very first AI created on an account. If you have an older Kin, this won't be enabled automatically, so you'll need to turn it on manually in your Call Settings under Live Avatar Video. Please note that Live Avatar Video currently supports V2 voices only. There are two tiers available: Standard Live Video at 2,000 audio credits per minute and Premium Live Video at 4,000 audio credits per minute, with the premium tier offering slightly higher resolution. Make sure your app is up to date before getting started.

Note: By default, Live Avatar Video will animate your Kindroid's avatar photo during calls. However, you can customize this by uploading a preferred image to be animated instead.

To do so, navigate to Call Settings and tap into Live Avatar Video. At the bottom of the menu you will find an upload slot labeled Driving Image. Upload the image you would like animated during calls, ideally in a 2:3 ratio for best results. Once set, Live Avatar Video will use your uploaded image in place of the default avatar photo.

You can turn on screen sharing in the center right button in calls. Screen sharing is available on desktop web, as well as in both mobile apps (notably not available on mobile browser). While screen sharing, your AI will be able to see your screen as long as your screen is active. If your phone goes to sleep due to inactivity, you may need to restart screen sharing.

Note for Android 13 or below: On Android 13 and below you may see an option to share a single app. Android pauses that feed whenever the shared app isn’t in the foreground, so your AI will repeatedly see the last frame (or black background) until you return. This may cause repetition issues. As such, we recommend sharing full screen on any Android version.

Call transcripts

Click on the paper icon in voice calls to toggle transcripts. Transcripts will only persist on the voice call session while you're on the page, and will reset if you go to some other page or screen.

Interrupts & Turn-taking

During the AI's turn, you can interrupt naturally. Interruptions are detected on an audio and word level, so you should speak clearly in the middle of an AI message to interrupt. Turn-taking is natural, and false interruptions will be detected and the AI will continue. If you take pauses between words, your message will be broken up into smaller chunks. If you want more delay for the AI to recognize end of turn, you can set pause threshold for AI turn higher. If you want a more responsive call, set it lower.

Fast vs normal voice

You can further reduce latency and increase the naturalness by turning on fast voice mode (by default enabled). This uses an even faster version of V2 voice so there's less delay between the end of user utterance and start of the AI's response. This does inflict a small quality hit on the voice, so experiment on/off as needed.

Text input

For calls, you can also use text input if you don't wish to speak while having your Kindroid speak back at you on the bottom.

Group calls

In groupchats, you can launch a call with multiple Kindroids. This will make use of the group chat previous messages in text message form (group calls always share memory/unified memory always on with the text messages - you can make an alternate branched scenario from a group if you don't want this). In groupchats, AIs take turns, and you can interject in at any time. AIs may take continuous turns, but will always be below the number of AI participants so to not run-on and give you an opportunity to speak.

Call visual background

You can set different visual backgrounds for calls. It can be blank, use the same chat background, use a custom background, or use the avatar (including animated) as the background. Use the image icon next to the voice call settings to adjust and save. Clearing the voicecall background does not clear any chat backgrounds - anything within call backgrounds stay confined to only apply to that specific AI/group call background.

TABLE OF CONTENTS

Default & custom voices

Creating your custom voice

V3 Voice

What V3 is (vs. V2)

Availability

Monthly Audio Credits for Subscribers

Best Practices Considerations

Text chat audio

Autoplay audio

Voice message in chat

Supported user speech input langugages

Voice call & video call

Memory in voice call

Calls while app is in background

Video calling

Live Avatar Video calling

Screen sharing

Call transcripts

Interrupts & Turn-taking

Fast vs normal voice

Text input

Group calls

Call visual background

Settings

Status

Updates

Terms

Logout

Billing

Kindroid Standard Subscription

Inactive

Ultra Subscription Add-on

Inactive

Ultra subscription unlocks advanced features for our most engaged users. Keep chatting and engaging with your Kindroids to qualify.

MAX Subscription Add-on

Inactive

Requires Ultra Subscription

Add-on Feature Matrix

Add-ons are fully optional, monthly-only subscriptions that give your Kindroid much more memory, context, selfies and others. Add-ons require all previous tiers of add-ons to function; for example, to get the features of MAX tier, it requires MAX tier plus Ultra, on top of the standard subscription.

Feature

Standard

Ultra

MAX

Total conversation context (approx chars)

500K

1.3M

2.8M

Short term context (approx chars)

18K

50K

125K

Cascaded memory context (approx chars)

480K

1.2M

2.7M

Additional AI backstory expansion (chars)

N/A

2,500

5,000

User backstory limit (chars)

500

1,000

2,000

Group context limit (chars)

1,000

1,500

3,000

Recalled long term memory & journals limit

Complimentary monthly audio credits

2.5M

Selfie regen per 30 minutes

Priority selfies with dedicated compute

Yes*

* MAX users receive priority selfie processing on dedicated compute with no/very low queue on latest version of selfies until they reach 10 selfies in a short timeframe. After this limit, standard queue delay applies and selfies are processed through normal servers without priority status.

While recalled and considered long term memory may be different, LTM consolidation spans all messages & is infinite for all users.

Note: All chat context/cascaded and selfies improvements of add-ons will only be guaranteed applicable to the latest subscriber LLM and selfies. When new versions come out, our guarantee is that it will switch to new versions. Finally, "additional context" in the matrix is an additional field, identical to Backstory, that is unlocked on the higher tiers which you can use to extend backstory accordingly.