Research

Table of Contents

As a Software Engineer working for GenAI organization at Google DeepMind, I specialize in developing and refining large multimodal AI models, leveraging Large Language Models (LLMs) alongside techniques like Parameter Efficient Fine Tuning (PEFT), Supervised Fine-Tuning (SFT), and Reinforcement Learning from Human Feedback (RLHF). Previously, as part of Google’s Speech Research Team, I contributed to building large-scale Automatic Speech Recognition (ASR) models, with a focus on domain adaptation, data minimization through unsupervised learning, parameter-efficient fine-tuning, speech personalization, contextualization, and bias mitigation."

Strongest Areas: GenAI, AI, LLM, ASR, Deep Learning, Natural Language Processing, Speech, Machine Learning Data Structures and Algorithms

Publications #

Improving Speech Recognition for African American English with Audio Classification
Authors: Shefali Garg, Zhouyuan Huo, Khe Chai Sim, Suzan Schwartz, Mason Chua, Alëna Aksënova, Tsendsuren Munkhdalai, Levi King, Darryl Wright, Zion Mengesha, Dongseong Hwang, Tara Sainath, Françoise Beaufays, Pedro Moreno Mengibar
ICASSP 2024 - IEEE International Conference on Acoustics, Speech and Signal Processing, 2024
Link to Paper
Large-scale ASR Domain Adaptation Using Self-and Semi-supervised Learning
Authors: Dongseong Hwang, Ananya Misra, Zhouyuan Huo, Nikhil Siddhartha, Shefali Garg, David Qiu, Khe Chai Sim, Trevor Strohman, Françoise Beaufays, Yanzhang He
ICASSP 2022 - IEEE International Conference on Acoustics, Speech and Signal Processing, 2022
Link to Paper
A Comparison of Supervised and Unsupervised Pre-Training of End-to-End Models
Authors: A. Misra, D. Hwang, Z. Huo, S. Garg, N. Siddhartha, A. Narayanan, K.C. Sim
Interspeech, 731-735, 2021
Link to Paper
UserLibri: A Dataset for ASR Personalization Using Only Text
Authors: Theresa Breiner, Swaroop Ramaswamy, Ehsan Variani, Shefali Garg, Rajiv Mathews, Khe Chai Sim, Kilol Gupta, Mingqing Chen, Lara McConnaughey
Interspeech 2022
Link to Paper
Pentagon at MEDIQA 2019: Multi-task Learning for Filtering and Re-ranking Answers Using Language Inference and Question Entailment
Authors: H. Pugaliya, K. Saxena, S. Garg, S. Shalini, P. Gupta, E. Nyberg, T. Mitamura
ACL-BioNLP Workshop 2019 arXiv preprint arXiv:1907.01643, 2019
Link to Paper
Incremental Layer-wise Self-supervised Learning for Efficient Speech Domain Adaptation on Device
Authors: Zhouyuan Huo, Dongseong Hwang, Khe Chai Sim, Shefali Garg, Ananya Misra, Nikhil Siddhartha, Trevor Strohman, Françoise Beaufays
arXiv preprint arXiv:2110.00155, 2021
Link to Paper

For a detailed list of publications see my Google Scholar or ResearchGate profile.

Press & media #

Intern developing facial recognition app for Google Glass [link]

Memberships & academic services #

Reviewer for International Conference on Speech and Computer (SPECOM) 2024
Reviewer for Conference on Neural Information Processing Systems (NeurIPS) 2024
Reviewer for IEEE Spoken Language Technology Workshop (SLT) 2024
Member IEEE Signal Processing Society (SPS) Society

Scholarships & Awards #

Mitacs Globalink Research Internship by Mitacs, Canada in Mar 2015
Inspire Scholarship for Higher Education (S.H.E) : Issued by Department of Science and Technology (DST), Ministry of Science and Technology, Government of India in Aug 2011
National Talent Search Examination Scholar (NTSE) : Issued by National Council of Educational Research and Training in Mar 2009