It's not surprising then that I decided to spin my commentary for New Marketing around this thought, which I was asked to share in the context of an interesting strategic move by Facebook.
edrone's representation is significant in this time. Apart from me, the comment was also given by Arek Flinik: Senior Artificial Intelligence Specialist at edrone and Co-Founder / CTO at Lekta.
Our comments concerned Facebook's recent move, which involved the announcement of the addition of the 'Hey Facebook' wake word to two peripherals. We're talking about the Oculus and Portal devices. While the former probably needs no introduction, Portal may require a brief one.
What is Portal?
Portal looks like a tablet, and its primary purpose is to allow you to make hands-free video calls (via Messenger or WhatsApp), with a few fancy enhancements such as automated frame setting and zooming.
It also offers functionalities related to:
- playing music (via Spotify)
- using Amazon / Alexa services (same way as the Amazon Echo speaker)
- photos display so that in standby mode, it can act as an electronic frame.
The last function isn't really a game-changer. Still, the average user of Portal will undoubtedly appreciate it, as it is aimed mainly at seniors or—more generally—older generations rather than at geeks.
The device was initially activated with a wake word, 'Hey Portal'. On the other hand, Oculus didn't have a call word – the voice interface was enabled with a button on the case. Now both can use the new one mentioned above.
First, the marketing
At first glance, all of this seems aimed at raising brand awareness since this is what Facebook badly needs in the 'voice' field. We’ve described this issue before so let me quote our previous commentary here (originally in Polish on New Marketing)
Commentary for New Marketing
Facebook is far behind the big four in voice solutions (the four should be instead called the big three, though). This is ironic because it is Facebook that is most often accused of wiretapping our conversations. Adding the "Hey Facebook" command to Oculus and Portal is an explicit declaration and marking its presence in this area.
Siri and Alexa are strongly associated with Apple and Amazon. Google Assistant has its branding in its name. But Portal and Oculus... to the average user, they are simply Portal and Oculus, not clearly connected by any brand.
From now on, they will be Facebook, and what you get with them (mostly Oculus) is a complete voice experience combined with augmented reality. This will help the brand promote itself as a company that is changing the world for the better, making life easier for people with various disabilities that have previously excluded them technologically and socially.
Will we receive even more personalized ads by giving authorized access to "listening-in"? I doubt it. This is hype spread by people who don't realize how difficult and power-consuming NLP is.
If ad targeting improves, it will be because of the new behavioral data we send to the tech-giant if only its devices become as popular as its competitors' products. Ultimately, the wake word feature is disabled by default, and some users don't even know they can use it. Due to eavesdropping suspicions, I doubt Facebook will change it any time soon.
Commentary for New Marketing
Portal's feature is not so impressive yet – only a dozen or so commands can be executed by voice to control the device, so it is hard to talk about an actual alternative to the rest of the big four's solutions (Amazon, Apple, Google). The real breakthrough would be to open such functionality to external entities and create a marketplace with voice applications. Facebook could then compete with Amazon or Google, who conquered this area a few years earlier and can now boast of hundreds of thousands of available bots. Meanwhile, Facebook has entered, at best, the second league, occupied by Samsung's Bixby or Microsoft's Cortana.
This is not the first announcement of this kind made by Facebook, so we cannot be sure that the features will be further developed. In 2015, the company bought Wit.ai, a natural language processing startup, to accelerate its own virtual assistant development, concisely named "M". The project was cancelled shortly after that. In 2018, in turn, a new idea called "Aloha" was reported. This time it was about a voice assistant and about devices able to compete with Amazon Echo or Google Home, but also, there was no official launch.
One more thought from my end. While rumors that all devices listen to us non-stop are heavily exaggerated (the battery of a typical smartphone wouldn't handle more than an hour of continuous speech processing), the wake word change is noteworthy.
To utter words that sound like "Hey Facebook" in a regular conversation is much more likely than saying "Hey Siri" or "OK Alexa," We can expect that Facebook devices will much more likely "accidentally" forward the recorded conversations to Menlo Park.
Invocation commands do indeed serve a marketing function. They are simply branding, and due to the specifics of using this device – hands-free, without physical interaction with them – you need to trademark it differently. The competition never sleeps.
Portal and Oculus are devices used almost exclusively for home use, so it seems that it is more about making the user aware of the brand's presence in his life and familiarizing him with it in the context of voice interfaces.
But not only that...
From the very beginning, however, it seemed to me that the change had something to do with UX issues, and the more I got into the topic, the more obvious it became. An unexpected breakthrough was a spontaneous discussion on edrone's internal messenger, joined by Hubert Karbowy, a Software Engineer in the edrone AVA project.
Stop-plosive and spiran-fricative
Hubert Karbowy: From a technical point of view – it is recommended that wakeup words should contain at least one cluster of stop/plosive and spirant/fricative (spirant/fricative) voicing, e.g. "xa/xb" in "Alexa", "Bixby". This sequence of phonetic features is the easiest to detect accurately, according to the research.
Arek Flinik: This would also explain why "Hey Facebook" is better than "Hey Portal" There is "sb" in the former, and at most "rt" in the latter.
How is Siri doing in this case?
Hubert: According to the other current version I've heard, just the presence of a spirant/fricative voice is enough.
AVA obviously came up in the discussion
Hubert: If we wanted to directly copy the solution and implement AVA in the form of an assistant on a smart speaker, the wake word "Hey Ava!" while using a shallow network will generate lots of errors.
Theoretically, /v/ is a spirant/fricative, but acoustically it has a lower frequency, so there are spirant/fricative assimilation issues in context. There's more to it than that.
We're also concerned with making what's around it reasonably "rare". We would need to check how much sounds overlap with the left context of "x" in Alexa ("ex", "alex", etc.), because there is quite a bit of overlap with "ava" in Polish: "kawa", "prawa", "trawa"... (coffee, rights/laws, grass).
Other good practices
However, there is no one right way regarding the choice of keyword phrases. As Aliaksei Kolesau and Dmitry Šešok state in their article "Voice Activation Systems for Embedded Devices: Systematic Literature Review":
We noticed that the acoustic features and the length of the keyword have a significant impact on the quality of activation. For example, in Jansen and Niyogi [paper] it is shown that there is a strong correlation between the quality of work and the length of the keyword. However, the open question as to what other properties of the key phrase are important for the good operation of the system remains. Also, it would be interesting to investigate whether there are any general rules for choosing a good keyword.
Google Assistant, for example, was initially invoked using the command "OK Google Now." It turned out to be simply too long. On the other hand, short – too short – invocations are also not desirable, but more about that in a moment.
Another "good practice" is to use wake words that are very likely to be pronounced correctly in other countries.
Hubert: The x/gz clusters are a Western world perspective. They're difficult for people speaking languages that don't allow for clusters in syllable articulation. E.g. Koreans who struggle with "bic-xsee-be". Probably in such languages, it is better to invent something else unique, and also easily detectable.
Compared to 'Portal' and 'Oculus', ‘Facebook’ is a much easier word to pronounce correctly regardless of your nationality. Therefore, a command that is easier to catch. I'm talking about both differences in accent and simply usage practice.
Let's face it, "Ow-key-Goo-gl" is a pronunciation killer for most of Poles. I know several people who almost choked themselves, trying to evoke the assistant. By the way, I personally prefer Hey Google. It is pronounced much more easily.
So is it also UX?
Every advantage over voice competitors is worth its weight in gold. When it comes to usability and whether the solution will be accepted – ultimately, it's the average end user who is right.
The wake-word Hey Facebook seems like an excellent call phrase. It complies with all the tips we mentioned. Additionally, due to many years of practice and the ease of pronouncing 'Facebook', there is a big chance that it will be pronounced correctly and similarly in all languages (however, there will probably be exceptions here too).
On the other hand, all good practices advise choosing the 'wake-word' in a manner that helps you avoid both false negatives and false positives.
Arek: As I suggested in New Marketing, it seems to me that Facebook's higher recall is to Facebook's advantage because 'accidentally' with its hands clean, they will record more.
Saying words that sound like "Hey Facebook" in an ordinary conversation is much more likely than saying "Hey Siri" or "OK Alexa," so we can expect Facebook devices to be much more likely to forward "accidentally" recorded conversations to Menlo Park.
While Facebook declared that it would not use recorded conversations for smarter targeting from the outset, it later clarified that it could use, for example, the frequency of conversations and their length, and undoubtedly other metadata that relates to voice assistant use.
If ads will be more tailored to users because of the new behavioral data, we will be sent to the giant, as long as its devices are as popular as its competitors' products.
How will Facebook fare in the voice race? Time will tell!