Logo
Junwoo
2024-08-27 12:38:59

Behind
UX Details to Make All TTS Engines Properly Read Korean

VideoStew offers a variety of AI voices. From Google Wavenet, Amazon Polly, KT AI Voice, Naver Clova, Azure, to ElevenLabs...

As a result, a natural issue arises: each model is trained differently, so even the same sentence is read slightly differently. This difference is particularly noticeable when reading units.

In this post, we’ll discuss the considerations we took into account to ensure proper reading of units in Korean when generating TTS.

Getting Measurement Units Right

When using various TTS engines, we encountered an issue where even the expression "100kg" was read differently by each engine. Some would read it as "hundred K-G," while others would trail off awkwardly at the end with "hundred krr..." (Yes, even AI can get flustered...)

Of course, there were engines that read it accurately as "hundred kilograms."

To address this, we developed a preprocessing library to standardize how these units are read across all engines.


Getting Numbers Right

Running this service, we encountered another unexpected issue. (Korean is truly a fascinating language..)

In Korean, there are two ways to read numbers, something we instinctively use but might not notice.

For instance, when telling time, "10μ‹œ 10λΆ„" is read as "yeol si sip bun." Why...?

And while general numbers are read in Sino-Korean ("il, i, sam"), when a quantity unit is added, they are read in native Korean ("han gae, du gae, se gae"). Official measurement units use Sino-Korean. (e.g., 10cm = "sip sen-ti-mi-teo")

Alright, let me give you a number. β€œ90”. How would you read it?

Technically, up to 90 should be read as β€œninety.” Some of you might have read it as β€œnine zero.” As numbers get larger, we tend to adopt a more convenient way to read them.

Let’s think about smaller numbers. β€œ9.” Yes, everyone would read this as β€œnine.”

Following this principle, we developed a preprocessing system that reads numbers from 1 to 99 in their unique word forms. However, this is just a guideline, and we’re open to changing it based on customer feedback. For instance, β€œforty” might feel more natural than β€œfour zero.”

Anyway, to ensure a consistent user experience, we created a unit library corresponding to the unique word forms like β€œone, two, three.”

If you're looking to implement a TTS service, this library could be useful. However, keep in mind that it's not a complete version as we continuously discover unexpected units through customer feedback.

Detail in UX = Success in SaaS

Encountering the term β€œquantifier” for the first time, we faced many challenges while providing a video editing SaaS solution in Korea. It was yet another reminder of how dynamic and remarkable the Korean language is…

While we sometimes envy English-speaking services, they surely have their own set of challenges.

Of course, VideoStew offers a feature called [Manual Text Designation]. This allows you to generate TTS regardless of what's displayed on the screen.

< Setting the sound to be read by TTS regardless of the subtitles displayed on the screen >

By using this method, you can write as you like and as it sounds, leading TTS to pronounce more naturally and allowing users to correct any misread units directly.

The reason we focus on such details is our conviction that these small differences determine the growth of the service.

Users have a preconceived notion of how a sentence with numerical expressions should be read the moment they input it. If it doesn't generate as expected, that's an initial problem. Our ultimate UX goal is to produce the expected result as quickly as possible without additional editing.

We will continue to post our UX considerations like this in the future. As mentioned earlier, in the current situation where all services are leveling up, we believe that pondering over such details determines the success or failure of a service.

Go to Article
πŸŽ“ 5 Types of Instagram Reels Small Commerce Businesses Should Try Right Now Instagram Reels have become an essential marketing channel for commerce marketers, not just an option. In fact, according to Meta's announcement in the second q...
5 Types of Instagram Reels Small Commerce Businesses Should Try Right Now
Junwoo 2025-03-26
πŸ—žοΈ [Update] Smarter Wizard Mode (Text-to-Video) We’ve been continuously enhancing the flow of Wizard Mode recently, and it’s been quite the magical journey! πŸ§™‍β™‚οΈβœ¨Now, when you pick a template and start editin...
[Update] Smarter Wizard Mode (Text-to-Video)
Junwoo 2025-03-21
πŸ€” Create Advertisements with Videostew (Behind the Scenes) I'd like to share my experience of creating an English version of an advertisement video for posting on the Videostew Global channel. πŸŽ₯✨Rather than a tutorial, ...
Create Advertisements with Videostew (Behind the Scenes)
Junwoo 2025-03-12
πŸŽ“ Factors to Consider When Choosing an AI Video Editor (2025) Recently, AI video editors have become a hot topic in the video editing industry. There's a surge of automated video editing software that promises to create st...
Factors to Consider When Choosing an AI Video Editor (2025)
Junwoo 2025-03-05
πŸ“£ Beyond the Limits of Text: Korea Press Foundation's Bold Challenge in Video Content Innovation Korea Press Foundation (KPF) is an organization dedicated to enhancing public information welfare through a variety of projects across multiple media channels.W...
Beyond the Limits of Text: Korea Press Foundation's Bold Challenge in Video Content Innovation
Junwoo 2025-02-26
πŸ—žοΈ [Update] Improved Flow for Wizard Mode We're excited to share some updates about the recipe selection process when creating a new project in Wizard Mode. πŸŽ‰First, Choose Your Project's Aspect RatioAft...
[Update] Improved Flow for Wizard Mode
Junwoo 2025-02-20
πŸ—žοΈ [Update] Introducing Flux Model in AI Image Generation The AI Photo generation feature now includes the Flux model, renowned for delivering the highest quality among realistic image generation models.Automatic Creat...
[Update] Introducing Flux Model in AI Image Generation
Junwoo 2025-02-11
πŸ—žοΈ [Update] Smarter Image/Video Matching Due to popular demand, our 'Stock Image/Video Matching' feature has been revamped. Now, not only can you search through videos, but also browse through an image...
[Update] Smarter Image/Video Matching
Junwoo 2025-02-07
[Stop]