Junwoo

2024-08-27 12:38:59

Behind
UX Details to Make All TTS Engines Properly Read Korean

VideoStew offers a variety of AI voices. From Google Wavenet, Amazon Polly, KT AI Voice, Naver Clova, Azure, to ElevenLabs...

As a result, a natural issue arises: each model is trained differently, so even the same sentence is read slightly differently. This difference is particularly noticeable when reading units.

In this post, we’ll discuss the considerations we took into account to ensure proper reading of units in Korean when generating TTS.

Getting Measurement Units Right

When using various TTS engines, we encountered an issue where even the expression "100kg" was read differently by each engine. Some would read it as "hundred K-G," while others would trail off awkwardly at the end with "hundred krr..." (~~Yes, even AI can get flustered...~~)

Of course, there were engines that read it accurately as "hundred kilograms."

To address this, we developed a preprocessing library to standardize how these units are read across all engines.

Getting Numbers Right

Running this service, we encountered another unexpected issue. (Korean is truly a fascinating language..)

In Korean, there are two ways to read numbers, something we instinctively use but might not notice.

For instance, when telling time, "10시 10분" is read as "yeol si sip bun." Why...?

And while general numbers are read in Sino-Korean ("il, i, sam"), when a quantity unit is added, they are read in native Korean ("han gae, du gae, se gae"). Official measurement units use Sino-Korean. (e.g., 10cm = "sip sen-ti-mi-teo")

Alright, let me give you a number. “90”. How would you read it?

Technically, up to 90 should be read as “ninety.” Some of you might have read it as “nine zero.” As numbers get larger, we tend to adopt a more convenient way to read them.

Let’s think about smaller numbers. “9.” Yes, everyone would read this as “nine.”

Following this principle, we developed a preprocessing system that reads numbers from 1 to 99 in their unique word forms. However, this is just a guideline, and we’re open to changing it based on customer feedback. For instance, “forty” might feel more natural than “four zero.”

Anyway, to ensure a consistent user experience, we created a unit library corresponding to the unique word forms like “one, two, three.”

If you're looking to implement a TTS service, this library could be useful. However, keep in mind that it's not a complete version as we continuously discover unexpected units through customer feedback.

Detail in UX = Success in SaaS

Encountering the term “quantifier” for the first time, we faced many challenges while providing a video editing SaaS solution in Korea. It was yet another reminder of how dynamic and remarkable the Korean language is…

While we sometimes envy English-speaking services, they surely have their own set of challenges.

Of course, VideoStew offers a feature called [Manual Text Designation]. This allows you to generate TTS regardless of what's displayed on the screen.

< Setting the sound to be read by TTS regardless of the subtitles displayed on the screen >

By using this method, you can write as you like and as it sounds, leading TTS to pronounce more naturally and allowing users to correct any misread units directly.

The reason we focus on such details is our conviction that these small differences determine the growth of the service.

Users have a preconceived notion of how a sentence with numerical expressions should be read the moment they input it. If it doesn't generate as expected, that's an initial problem. Our ultimate UX goal is to produce the expected result as quickly as possible without additional editing.

We will continue to post our UX considerations like this in the future. As mentioned earlier, in the current situation where all services are leveling up, we believe that pondering over such details determines the success or failure of a service.

Go to Article

🗞️ [Update] Pick Up Where You Left Off – Seamless Long-Form Rendering is Here! 🚀 Until today, Videostew quietly tapped you on the shoulder when your video hit the plan’s length limit. Not anymore—say hello to the new “Project Merge” update! ...

Junwoo 2025-12-10

🗞️ [Update] Your Personal Dictionary (AI Voice Pronunciation Customization) Your AI voice keeps butchering that one fancy word? Meet the shiny-new User Dictionary—your pronunciation fairy-godmother! 🪄Check out these real-life hacks:Got ...

Junwoo 2025-11-27

🗞️ [Update] Project Favorites Are Here! 🌟 Your go-to My Template just got a turbo-boost: meet Project Bookmarks! ⚡️Hit the gear icon on any project, drop it into your bookmarks bar, and boom—next time y...

Junwoo 2025-11-24

🗞️ [Update] Projects & Library Folders: Finally, a Place for Everything! Wave goodbye to the chaos—folders have landed to keep your projects and library assets neat, tidy, and downright respectable.Project FoldersNeed a spot for that...

Junwoo 2025-11-13

📣 The 50-Something Real-Estate Broker’s Secret to Cranking Out Blog & YouTube Content at the Same Time (Without Losing His Mind) These days, even real-estate agents are jumping on YouTube to showcase listings and dish out market insights through slick video content.Meet Mr. Tae-yong Ahn f...

Junwoo 2025-10-01

🎓 Turn Your Script into a Video: Practical Content Repurposing Hacks This post is your no-fluff, real-world playbook for turning the flow of your writing straight into repurposed video content. If video production has always felt...

Junwoo 2025-09-26

🗞️ [Update] AI Voice Cloning Has Arrived 🎙️✨ 1~3분 정도 길이의 목소리 음성 파일로 나만의 TTS를 생성할 수 있는 AI보이스 클로닝 기능이 업데이트되었습니다.AI보이스 > Custom브라우저 상에서 실시간으로 녹음하거나, 미리 녹음된 음성파일을 업로드하여 생성할 수 있습니다.워크스페이스별로 1개의 클로닝 보이스를 생성할 ...

Junwoo 2025-09-22

🤔 The Video-Editing Sidekick That Slides Right Into Your Workflow: Where Videostew Is Headed You just hit "publish" on your blog post, but the video keeps getting pushed to the mystical land of “later.” Sound familiar? I’ve lost count of how many market...

Junwoo 2025-09-17

상호 (주)비디오스튜 / 사업자등록번호 113-86-86287 / 대표자명 이흥현 / 주소 5F(Videostew), 55, Digital-ro 32-gil, Guro-gu, Seoul, Republic of Korea / 고객센터 (10:00 ~ 18:00) / 제휴문의 support@videostew.com

Behind
UX Details to Make All TTS Engines Properly Read Korean

Getting Measurement Units Right

Detail in UX = Success in SaaS

Product

Templates

Company

Pricing

Community

Resources

Behind UX Details to Make All TTS Engines Properly Read Korean

Getting Measurement Units Right

Detail in UX = Success in SaaS

Product

Templates

Company

Pricing

Community

Resources

Behind
UX Details to Make All TTS Engines Properly Read Korean