Skip to main content

Research

As a Software Engineer working for GenAI organization at Google DeepMind, I specialize in developing and refining large multimodal AI models, leveraging Large Language Models (LLMs) alongside techniques like Parameter Efficient Fine Tuning (PEFT), Supervised Fine-Tuning (SFT), and Reinforcement Learning from Human Feedback (RLHF). Previously, as part of Google’s Speech Research Team, I contributed to building large-scale Automatic Speech Recognition (ASR) models, with a focus on domain adaptation, data minimization through unsupervised learning, parameter-efficient fine-tuning, speech personalization, contextualization, and bias mitigation."

Strongest Areas: GenAI, AI, LLM, ASR, Deep Learning, Natural Language Processing, Speech, Machine Learning Data Structures and Algorithms

Publications #

  • Improving Speech Recognition for African American English with Audio Classification
    Authors: Shefali Garg, Zhouyuan Huo, Khe Chai Sim, Suzan Schwartz, Mason Chua, Alëna Aksënova, Tsendsuren Munkhdalai, Levi King, Darryl Wright, Zion Mengesha, Dongseong Hwang, Tara Sainath, Françoise Beaufays, Pedro Moreno Mengibar
    ICASSP 2024 - IEEE International Conference on Acoustics, Speech and Signal Processing, 2024
    Link to Paper

  • Large-scale ASR Domain Adaptation Using Self-and Semi-supervised Learning
    Authors: Dongseong Hwang, Ananya Misra, Zhouyuan Huo, Nikhil Siddhartha, Shefali Garg, David Qiu, Khe Chai Sim, Trevor Strohman, Françoise Beaufays, Yanzhang He
    ICASSP 2022 - IEEE International Conference on Acoustics, Speech and Signal Processing, 2022
    Link to Paper

  • A Comparison of Supervised and Unsupervised Pre-Training of End-to-End Models
    Authors: A. Misra, D. Hwang, Z. Huo, S. Garg, N. Siddhartha, A. Narayanan, K.C. Sim
    Interspeech, 731-735, 2021
    Link to Paper

  • UserLibri: A Dataset for ASR Personalization Using Only Text
    Authors: Theresa Breiner, Swaroop Ramaswamy, Ehsan Variani, Shefali Garg, Rajiv Mathews, Khe Chai Sim, Kilol Gupta, Mingqing Chen, Lara McConnaughey
    Interspeech 2022
    Link to Paper

  • Pentagon at MEDIQA 2019: Multi-task Learning for Filtering and Re-ranking Answers Using Language Inference and Question Entailment
    Authors: H. Pugaliya, K. Saxena, S. Garg, S. Shalini, P. Gupta, E. Nyberg, T. Mitamura
    ACL-BioNLP Workshop 2019 arXiv preprint arXiv:1907.01643, 2019
    Link to Paper

  • Incremental Layer-wise Self-supervised Learning for Efficient Speech Domain Adaptation on Device
    Authors: Zhouyuan Huo, Dongseong Hwang, Khe Chai Sim, Shefali Garg, Ananya Misra, Nikhil Siddhartha, Trevor Strohman, Françoise Beaufays
    arXiv preprint arXiv:2110.00155, 2021
    Link to Paper

For a detailed list of publications see my Google Scholar or ResearchGate profile.

Press & media #

  • Intern developing facial recognition app for Google Glass [link]

Memberships & academic services #

  • Reviewer for International Conference on Speech and Computer (SPECOM) 2024
  • Reviewer for Conference on Neural Information Processing Systems (NeurIPS) 2024
  • Reviewer for IEEE Spoken Language Technology Workshop (SLT) 2024
  • Member IEEE Signal Processing Society (SPS) Society

Scholarships & Awards #

  • Mitacs Globalink Research Internship by Mitacs, Canada in Mar 2015
  • Inspire Scholarship for Higher Education (S.H.E) : Issued by Department of Science and Technology (DST), Ministry of Science and Technology, Government of India in Aug 2011
  • National Talent Search Examination Scholar (NTSE) : Issued by National Council of Educational Research and Training in Mar 2009