This paper explores the potential for utilizing visible object detection strategies for phrase localization in speech information. Object detection has been completely studied within the up to date literature for visible information. Noting that an audio might be interpreted as a 1-dimensional picture, object localization strategies might be basically helpful for phrase localization. Constructing upon this concept, we suggest a light-weight answer for phrase detection and localization. We use bounding field regression for phrase localization, which permits our mannequin to detect the incidence, offset, and length of key phrases in a given audio stream. We experiment with LibriSpeech and prepare a mannequin to localize 1000 phrases. In comparison with present work (SpeechYolo), our technique reduces mannequin measurement by 94%, and improves the F1 rating by 6.5%