π£
Videostew Success Story: Meet 'Jalhaja,' a 55-Year-Old Full-Time Homemaker and YouTube Creator with 150,000 Subscribers
From the early days of Videostew, our dedicated user ‘Jalhaja’ has been a loyal customer from our perspective. After more than a year of consistent use, we requ...
Junwoo
2025-05-19
ποΈ
[Update] Some Feature Improvements Notice
Increased Character Limit for Text EditorWe've boosted the maximum character limit in Wizard Mode from 5,000 to a whopping 10,000 characters. More room for your...
Junwoo
2025-05-07
π
YouTube Shorts Aspect Ratio and Layout Guide (2025)
Why is 'Aspect Ratio' the First Step for YouTube Shorts?Have you ever heard of the concepts of lean back and lean forward?Lean back is literally a way of consum...
Junwoo
2025-05-01
ποΈ
[Update] Added 10 new openAI voices
Try out the voices used in chatGPT voice mode directly in Videostew. We've added 10 AI voices provided by OpenAI.If you want to create interactive videos betwee...
Junwoo
2025-04-30
π
Instantly Transform Real Estate Listings into Promotional Videos
If you're a real estate agent creating promotional posts for properties on Naver Blog, why not easily manage a YouTube channel too with this method?With Videost...
Junwoo
2025-04-23
ποΈ
[Update] Script Generation Feature Enhanced with GPT-4.1 Model
All script generation features have been powered by the GPT-4.1 model.Starting with an idea or a website URL is the best way to experience its performance.Espec...
Junwoo
2025-04-18
ποΈ
[Update] Text Timing Just Got More Precise π―
When you break lines while entering a text script within a slide, the text appears in sync with the timing.We've significantly improved the accuracy of the text...
Junwoo
2025-04-11
π
How to Automatically Convert News Articles into Short-form Videos: A Guide to Using Videostew
Why Media Outlets are Diving into Short-form VideosRecent statistics reveal that videos under a minute are gaining explosive attention on social media and mobil...
Junwoo
2025-04-08
VideoStew offers a variety of AI voices. From Google Wavenet, Amazon Polly, KT AI Voice, Naver Clova, Azure, to ElevenLabs...
As a result, a natural issue arises: each model is trained differently, so even the same sentence is read slightly differently. This difference is particularly noticeable when reading units.
In this post, weβll discuss the considerations we took into account to ensure proper reading of units in Korean when generating TTS.
Getting Measurement Units Right
When using various TTS engines, we encountered an issue where even the expression "100kg" was read differently by each engine. Some would read it as "hundred K-G," while others would trail off awkwardly at the end with "hundred krr..." (
Yes, even AI can get flustered...)Of course, there were engines that read it accurately as "hundred kilograms."
To address this, we developed a preprocessing library to standardize how these units are read across all engines.
Getting Numbers Right
Running this service, we encountered another unexpected issue. (Korean is truly a fascinating language..)
In Korean, there are two ways to read numbers, something we instinctively use but might not notice.
For instance, when telling time, "10μ 10λΆ" is read as "yeol si sip bun." Why...?
And while general numbers are read in Sino-Korean ("il, i, sam"), when a quantity unit is added, they are read in native Korean ("han gae, du gae, se gae"). Official measurement units use Sino-Korean. (e.g., 10cm = "sip sen-ti-mi-teo")
Alright, let me give you a number. β90β. How would you read it?
Technically, up to 90 should be read as βninety.β Some of you might have read it as βnine zero.β As numbers get larger, we tend to adopt a more convenient way to read them.
Letβs think about smaller numbers. β9.β Yes, everyone would read this as βnine.β
Following this principle, we developed a preprocessing system that reads numbers from 1 to 99 in their unique word forms. However, this is just a guideline, and weβre open to changing it based on customer feedback. For instance, βfortyβ might feel more natural than βfour zero.β
Anyway, to ensure a consistent user experience, we created a unit library corresponding to the unique word forms like βone, two, three.β
If you're looking to implement a TTS service, this library could be useful. However, keep in mind that it's not a complete version as we continuously discover unexpected units through customer feedback.
Detail in UX = Success in SaaS
Encountering the term βquantifierβ for the first time, we faced many challenges while providing a video editing SaaS solution in Korea. It was yet another reminder of how dynamic and remarkable the Korean language isβ¦
While we sometimes envy English-speaking services, they surely have their own set of challenges.
Of course, VideoStew offers a feature called [Manual Text Designation]. This allows you to generate TTS regardless of what's displayed on the screen.
< Setting the sound to be read by TTS regardless of the subtitles displayed on the screen >
By using this method, you can write as you like and as it sounds, leading TTS to pronounce more naturally and allowing users to correct any misread units directly.
The reason we focus on such details is our conviction that these small differences determine the growth of the service.
Users have a preconceived notion of how a sentence with numerical expressions should be read the moment they input it. If it doesn't generate as expected, that's an initial problem. Our ultimate UX goal is to produce the expected result as quickly as possible without additional editing.
We will continue to post our UX considerations like this in the future. As mentioned earlier, in the current situation where all services are leveling up, we believe that pondering over such details determines the success or failure of a service.