Sung-Feng Huang

Sung-Feng Huang is a Research Scientist at NVIDIA Research Taiwan. His research focuses on generative AI for speech and audio, with expertise in speech recognition, synthesis, separation, and machine learning techniques such as self-supervised and meta learning.

He received his Ph.D. from National Taiwan University, where he was co-advised by Prof. Lin-shan Lee and Prof. Hung-yi Lee. During his doctoral studies, he worked extensively on advanced speech processing technologies and machine learning methodologies, contributing to the development of innovative AI-driven audio applications.

Before joining NVIDIA as a full-time Research Scientist, Sung-Feng interned with the same research team, where he gained hands-on experience in cutting-edge AI research. Now, he is dedicated to advancing generative AI in speech and audio, driving new possibilities in human-computer interaction and audio-based AI systems.

Publications

Generative Speech Foundation Model Pretraining for High-Quality Speech Extraction and Restoration

Pin-Jui Ku, Alexander H. Liu, Roman Korostik, Sung-Feng Huang, Szu-Wei Fu, Ante Jukić

IEEE International Conference on Acoustics, Speech, and Signal Processing 2025

Detecting the Undetectable: Assessing the Efficacy of Current Spoof Detection Methods Against Seamless Speech Edits

Sung-Feng Huang, Heng-Cheng Kuo, Zhehuai Chen, Xuesong Yang, Chao-Han Huck Yang, Yu Tsao, Yu-Chiang Frank Wang, Hung-yi Lee, Szu-Wei Fu

arXiv.org 2025

Detecting the Undetectable: Assessing the Efficacy of Current Spoof Detection Methods Against Seamless Speech Edits

Sung-Feng Huang, Heng-Cheng Kuo, Zhehuai Chen, Xuesong Yang, Chao-Han Huck Yang, Yu Tsao, Yu-Chiang Frank Wang, Hung-yi Lee, Szu-Wei Fu

Spoken Language Technology Workshop 2024

Generative Speech Foundation Model Pretraining for High-Quality Speech Extraction and Restoration

Pin-Jui Ku, Alexander H. Liu, Roman Korostik, Sung-Feng Huang, Szu-Wei Fu, Ante Juki'c

arXiv.org 2024

Maximizing Data Efficiency for Cross-Lingual TTS Adaptation by Self-Supervised Representation Mixing and Embedding Initialization

Wei-Ping Huang, Sung-Feng Huang, Hung-yi Lee

Automatic Speech Recognition & Understanding 2023

Personalized Lightweight Text-to-Speech: Voice Cloning with Adaptive Structured Pruning

Sung-Feng Huang, Chia-Ping Chen, Zhi-Sheng Chen, Yu-Pao Tsai, Hung-yi Lee

IEEE International Conference on Acoustics, Speech, and Signal Processing 2023

Learning Phone Recognition From Unpaired Audio and Phone Sequences Based on Generative Adversarial Network

Da-Rong Liu, Po-chun Hsu, Da-Yi Wu, Shun-Po Chuang, Yi-Chen Chen, Sung-Feng Huang, Hung-yi Lee

IEEE/ACM Transactions on Audio Speech and Language Processing 2022

Few-Shot Cross-Lingual TTS Using Transferable Phoneme Embedding

Wei-Ping Huang, Po-Chun Chen, Sung-Feng Huang, Hung-yi Lee

Interspeech 2022

Meta-TTS: Meta-Learning for Few-Shot Speaker Adaptive Text-to-Speech

Sung-Feng Huang, Chyi-Jiunn Lin, Da-Rong Liu, Yi-Chen Chen, Hung-yi Lee

IEEE/ACM Transactions on Audio Speech and Language Processing 2021

SpeechNet: A Universal Modularized Model for Speech Processing Tasks

Yi-Chen Chen, Po-Han Chi, Shu-Wen Yang, Kai-Wei Chang, Jheng-hao Lin, Sung-Feng Huang, Da-Rong Liu, Chi-Liang Liu, Cheng-Kuang Lee, Hung-yi Lee

arXiv.org 2021

Non-Autoregressive Mandarin-English Code-Switching Speech Recognition

Shun-Po Chuang, Heng-Jui Chang, Sung-Feng Huang, Hung-yi Lee

Automatic Speech Recognition & Understanding 2021

Self-supervised Pre-training Reduces Label Permutation Instability of Speech Separation

Sung-Feng Huang, Shun-Po Chuang, Da-Rong Liu, Yi-Chen Chen, Gene-Ping Yang, Hung-yi Lee

arXiv.org 2020

Stabilizing Label Assignment for Speech Separation by Self-Supervised Pre-Training

Sung-Feng Huang, Shun-Po Chuang, Da-Rong Liu, Yi-Chen Chen, Gene-Ping Yang, Hung-yi Lee

Interspeech 2020

Pretrained Language Model Embryology: The Birth of ALBERT

Cheng-Han Chiang, Sung-Feng Huang, Hung-yi Lee

Conference on Empirical Methods in Natural Language Processing 2020

Audio Word2vec: Sequence-to-Sequence Autoencoding for Unsupervised Learning of Audio Segmentation and Representation

Yi-Chen Chen, Sung-Feng Huang, Hung-yi Lee, Yu-Hsuan Wang, Chia-Hao Shen

IEEE/ACM Transactions on Audio Speech and Language Processing 2019

From Semi-supervised to Almost-unsupervised Speech Recognition with Very-low Resource by Jointly Learning Phonetic Structures from Audio and Text Embeddings

Yi-Chen Chen, Sung-Feng Huang, Hung-yi Lee, Lin-Shan Lee

arXiv.org 2019

Improved Audio Embeddings by Adjacency-Based Clustering with Applications in Spoken Term Detection

Sung-Feng Huang, Yi-Chen Chen, Hung-yi Lee, Lin-Shan Lee

arXiv.org 2018

Almost-unsupervised Speech Recognition with Close-to-zero Resource Based on Phonetic Structures Learned from Very Small Unpaired Speech and Text Data

Yi-Chen Chen, Chia-Hao Shen, Sung-Feng Huang, Hung-yi Lee, Lin-Shan Lee

arXiv.org 2018

Phonetic-and-Semantic Embedding of Spoken words with Applications in Spoken Content Retrieval

Yi-Chen Chen, Sung-Feng Huang, Chia-Hao Shen, Hung-yi Lee, Lin-Shan Lee

Spoken Language Technology Workshop 2018

Towards Unsupervised Automatic Speech Recognition Trained by Unaligned Speech and Text only

Yi-Chen Chen, Chia-Hao Shen, Sung-Feng Huang, Hung-yi Lee

arXiv.org 2018