VOICE-ACTIVATED SMART ENVIRONMENTS: DEEP LEARNING APPROACH FOR PASHTO SPEECH COMMAND PROCESSING
Keywords:
Artificial Intelligence, Automatic Speech Recognition, Natural Language Processing, Pashto Language, Machine TranslationAbstract
Modern Automatic Speech Recognition-ASR systems leverage deep learning architectures like transformer-based models to convert spoken language into text with human-level accuracy. The integration of a command extraction controller enables real-time parsing of semantic intent, transforming raw audio signals into executable instructions for IoT devices, robotics, or assistive technologies. This dual-stage pipeline—combining acoustic modeling with context-aware natural language understanding (NLU). This article describes the creation and deployment of a Speech Recognition and Command Extraction Controller-integrated Automatic Speech Recognition (ASR) system for the Pashto language. By utilizing NLP technology, the Speech Recognition Controller was easily included into the system, improving the comprehension of user commands. The use of appliance controls showed effective command execution and security features. The command list worked well and gave users precise instructions. Entirely 150 different participants participated in the dataset gathering process to guarantee that the system would work with a range of voices, accents, and speech patterns. With an overall performance parameter of 92%, which indicates good accuracy (94%), precision (92%), recall (90%), and F1-score (92%), the system's overall performance was reliable and effective. The system's speed, dependability, and efficiency in interpreting and comprehending Pashto instructions make it a workable option for a range of applications using Pashto voice recognition.