Nowadays voice communication has become the easiest way to interact than the other mediums of communications. Since 1994, when Jeff Bezos founded Amazon, they have been the inventors from STEM to Prime to Web Services to Kindle and the latest addition of Echo, Echo Dot and Echo Show. Echo series connects to the voice-controlled intelligent personal assistant service Alexa, one among that best till date. Alexa is named after the ancient library of Alexandria. Using Alexa you can call out your wishes and see them fulfilled—at least simple ones. For example to know the weather of any place, play music, do a Google search etc..
Alexa Enabled Devices available in the market are Amazon Echo, Echo Dot, Echo Show and a new addition announced The Echo Look. You can explore these amazing products in https://echosim.io and login to Amazon.
The Alexa Voice Service is currently only available for US, German and UK customers with an Amazon account.
The architecture of Alexa is, when the user asks something like “Alexa, tell me the weather of San Francisco”, the audio request will go to the Amazon voice Service(AVS) i.e Alexa .It converts speech to text. The keywords are “Weather” and “San Francisco”, processes it and returns as Voice to User. Alexa Skills have two parts Configuration i.e. data in Developer Portal and Hosted Service are responding to User requests The Hosted Services available are Amazon Lambda or an internet accessible HTTPS endpoint with a trusted certificate. You can build skills using Alexa Skills Kit(ASK). The Skills that are supporting here are Custom Skills, Flash Briefing Skill and Smart Home Skill.
About the architecture of Alexa Skills Kit(ASK), when the user speaks a phrase beginning with “Alexa” and the Echo hears it, the audio is sent to AVS for processing. An Alexa skill request is sent to your server(Lambda) for business logic processing. Then server responds with a JSON payload including text to speak. Finally, AVS sends your text back to the device for voice output.
The specialties of these device are the far field’s microphone and there’s no need of an activation, simply say the trigger words like “Alexa”(default),”Echo”, “Computer”. So that it can respond to Voice Commands from almost anywhere within Earshot. Microsoft’s Cortana , Google Assistant and Apple’s Siri provides the similar Services. However, if you get used to Alexa it feels much more natural and responsive than speaking to a phone-based voice assistant. Voice control frees you from being constantly tethered to your smartphone.
Manufacturers of automobiles, kitchen appliances, door locks, sprinklers, garage-door openers and many other recently connected products are working to bring to Alexa or a similar voice-driven service to their devices
Alexa is particularly useful for smart-home because it allows you to control your connected devices without having to take out your phone and launch an App.
Despite the success and growing interest in Alexa products and services, Amazon still faces scrutiny over the potential privacy implications of having an always-on, always-listening device in peoples’ homes, cars and other personal spaces.
I was excited to know about Echo, so tried my part to add Custom Skill in Alexa. I could build a sample Quiz where Alexa acts as a Quiz Master. It was fun, but more importantly, I am onto see how effectively this can be benefited for Connected Homes.
Ultimately, Alexa is using natural language processing system(voice)to interact, so no need for the user to change his accent. Be You and Enjoy Alexa!
Controlling devices at home and workplace with voice was what we saw only in SciFi movies a few years ago. However with the advancement in AI, NLP etc. this has become a reality now. The number of devices with which we can interact via voice at home and office has grown in past 3-4 years.
Amazon has an undisputed leadership in this and its ecosystem is way ahead in market penetration compared to its competitor. There are many more companies in the foray including OK google and Cortana from Microsoft. There are also many smaller companies like Cyberon, Conexant all working towards enabling voice controlled devices.
As per Strategy Analysis voice could capture up to 12% of Industrial-IoT applications by 2022 and In the consumer segment, voice has the potential to capture up to 18% of application in the 2020 to 2022 timeframe.
Using voice to control devices will dominate and become the most preferred way to interact with IoT devices in coming years. It is because voice is the most natural way to communicate. Amazon and Google have made it very easy to voice enable smart devices. It is also very affordable today as devices don’t need to add much processing power because they leverage the cloud infrastructure to do the heavy lifting. Many companies are going to leverage this model, but there are companies like sensory which is trying to push the voice enablement to the edge. Its TrulyNatural is an embedded large vocabulary continuous speech recognizer system for devices which may not be cloud connected like a toaster or a space heater or a coffee maker.
Device manufacturers will have to pick bets on which technologies to integrate into their product for voice control as the landscape is still evolving. Most likely companies will end up integrating 2-3 leading voice enablement API’s into their product lines.
The key factors to be considered while making the technology choice is (a) Signal to noise ratio. These smart devices will be used in a variety of environment and the ability to capture voice by reducing background sounds is very important (b) Identification & Isolation – The system should have the capability to separate command from other ambient sounds. (c) Capture – Most times the source could be moving or at distance from the mic ( 1 meter to 10 meters) based on the environment in which it is installed. The devices should have the capability to accommodate this.
However, there are privacy and security issues which will have to be addressed as the popularity grows because voice-enabled devices are always listening for the wake-up keywords. So if someone hacks into the system, they will be able to listen to your private conversations. Also, ethical usage of the information obtained becomes important because companies trying to build on their NLP and AI algorithms may decide to listen to all our conversations to strengthen their capabilities.
Interacting with devices via voice reduces multiple steps in our daily lives when we control them. ( Example controlling a thermostat at home, in past we had to get up, go to the thermostat and press buttons multiple times to set the desired temperature. Today we can do it just with a simple statement ” Set temperature to 72 Degrees.”) . So irrespective of the challenges it may bring, we will continue to expand the boundaries of voice capability to control devices. As speech changed humans forever, enabling voice-based commands to communicate with everyday devices will change the world forever.
Link to article on Linkedin