Published on Apr 02, 2024
The Objective : Why do voice recognition systems (VRS's) often misinterpret words of foreign-born English speakers-even without an accent? Is there a physical distinction between the sounds native- & foreign-born English speakers speak, maybe a distinctive frequency signature? Hypothesis:People of different ethnicities speaking the same English words produce physically different output waves with distinct frequency signatures dependent on their geographical location.
1.Get sound clips of foreign language speakers(FLS) speaking an English paragraph-mine were from George Mason Universitys linguistics database.
2.Organize these sound files into a database categorized by the location (continents) of the FLS & by the time the speaker has lived in an English-speaking country (less than 1/2 of life, more than 1/2 of life).Choose a specific phoneme to focus the study on (as VRS's break down words similarly).
3. Use FFT algorithm, examining frequency, to analyze data. Using SpectraPLUS software, run the sound samples through the spectrogram plot and cut the spectrogram output down until it includes only the phoneme focused on.Repeat step 3 for all sound files.
4.Organize the outputs into a display (placing them side-by-side) in order to:
a.Phase1: Compare speakers with others from their own region,
b.Phase2: Compare Phase1 speakers with long-term emigrants to English-speaking countries.,
c.Phase3: Compare long-term emigrants from Phase2 to native English speakers.
-There is indeed a distinct physical difference in the English words of speakers from around the world, with similarities seen in clusters dependent on geographical location.
-With more exposure to native-born English speakers in English-speaking countries, the frequency signature of most foreign-language speakers shifted to the frequency signature of English spoken by native speakers(ESNS).
Lower frequencies shifted as much as 60 Hz down to the standard ESNS threshold;higher frequencies shifted up as much as 800 Hz up to the standard ESNS threshold.
The implications of my study are that current VRS's are ineffective because of the range of frequencies they analyze may not be the same as users'.I surmised that a possible solution to the issue of a VRS programmed only to detect certain frequency intervals could be to create a dial that could be turned to tell the VRS by how much to bias the frequencies.
This project maps the geographical locations of English speakers (both first & and second language) to the physical characteristics of the sounds they speak, specifically frequency.