Tag
This paper presents NEST-V1, a proof-of-concept multimodal framework for generating emotion-conditioned Nepali Sign Language avatars from spoken input, achieving 81.1% ASR accuracy and 79.21% emotion recognition accuracy on a dataset of 600 audio samples from 50 speakers.