Q&A with Affectiva: Answers to Your Most Emotional Questions

Professor Rosalind Picard and I spun Affectiva out of MIT Media Lab in 2009 with the vision of bringing emotional intelligence to the digital world. Our interactions with technology are becoming more conversational and relational and we believe that the next frontier of AI is artificial emotional intelligence - what we have begun to call Emotion AI. We use computer vision and machine learning to build SDKs and an API that can read your emotions from the face (and speech, coming soon!)

I recently did an AMA Slack Chat with Udacity, and ahead of my speaking opportunity at the Udacity Intersect 2017 Conference next week, I thought I would share some of the great questions I received as well as my responses. You can also reference this earlier post on emotion AI that tackles the more technical questions we received around our technology.

Have you given any thought to working with autistic subjects, who might benefit from either better recognition of their own emotional state or some form of explicit training in the area?

Yes! we got our start in autism. At MIT we developed a google glass like device that helped individuals on the autism spectrum read the social and emotional cues of other people they are interacting with. We deployed that in a school in Providence, RI - the device helped kids do more eye contact and elicit and understand more facial expressions of emotion. Today, we are partnered with a company called BrainPower that uses Google Glass and our technology to commercialize this application.

Screen Shot 2017-03-01 at 1.38.33 PM.png

See what our partners at BrainPower are doing with autism and emotion tech.

Are you at all concerned about the extent to which patent wars could frustrate progress in AI research or prove an obstacle to democratizing access to the technology?

At Affectiva, we have a strong patent portfolio but we also make a point of publishing our research. In my experience, the algorithms (which is what the patent protects) is only one piece of the puzzle. Combining this with huge and meaningful data is critical, as well as doing the hard work necessary to take it from an algorithm to a product that provides value for a customer.

Is sound important for your analysis, or is it entirely vision-based?

People express emotions in a number of ways - face, voice, gestures, posture, and even physiology. Affectiva's current suite of products in the market are face-based, but we know that multi-modal is key and are working on adding speech, such as one of voice, etc.

How did you get around either low visible faces, darker faces, or faces with glasses or facial hair,to still detect facial expressions?

The short answer is we have TONS of data that provides examples of that - in fact, most of our data comes from spontaneous people expressing emotions at their homes. Low light is very common, and we have many examples with glasses and different types of facial hair. The same with faces of darker skin tone.

When we train our algorithms, we make sure that the data is balanced in terms of ethnic and racial representation. This is actually quite critical to guard against any biases that the system may learn. This way it is equally accurate on a British AND a Chinese face.

To date, we have analyzed 5 million face videos (which if you do the math translates to over 10 billion facial frames). This data has been gathered in 75 countries - this is the largest emotion data repository in the world, which helps with machine learning.

How would emotionally intelligent machines adapt to different cultures and special needs?

Data again - while facial expressions are largely universal there are cultural norms (called Display Rules) that depict when people choose to portray, mask, or amplify an emotion - so at Affectiva we have built a norms database by country. China, for example, is a different norm than Brazil.

What's the next step after emotion recognition?

Emotion sensing and recognition is one side of the coin, the other is how do you ADAPT an interface based on that emotion. It could be embodied, like a social robot, self-driving car or Alexa, a non-embodied interface like a Siri, or an avatar. The idea is that these interfaces take the emotion data and use it to adapt its behavior, becoming more engaging, building more rapport and possibly being more persuasive too!

emotion recognition technology

With this amazing technology at grasp, do you have any security concerns?

Yes! We recognize that people's emotions are very personal - so we take privacy very seriously. Everything we do is opt-in. We ask for people's permission to turn the camera on. Our technology also does NOT save any video or send it to the cloud - all the processing happens on device. We feel this is very important from a privacy standpoint. Generally speaking, as a team we focus on applications where people get value from sharing their emotion data. And there's a lot of those - such as social robotics, automotive, health, education.

How does one store an "emotion" in a MySQL table?

We have 7 emotion detectors, 20 facial expression "classifiers", gender, age, ethnicity. Each of those classifiers provide a probability score, and that is what we store. Now some of our algorithms are temporal in that they consider previous probability scores to estimate the current emotion, so they are stateful in that sense.

If you are interested in a more technical explanation of our technology, check out this blog post our tech team did on Affectiva's emotion technology architecture.

Ever run into the situation where your recognizers are just plain wrong (or slightly wrong) after already having been trained with copious amount of data?

Yes! Like any machine learning system it Is not 100% accurate. So the system sometimes misses an expression of emotion, or misidentifies it. When we are able to get labels for those examples, we feed them back into our models and retrain.

I should add we have a team of labelers and we use active learning (so a human-machine combo) to hone into data where we know would help improve the accuracy.

Have you thought of implementing this software in a car with a driver facing camera to detect if the driver is drunk before they even start the vehicle?

Emotion recognition for automotive is a HUGE application area - not just drunk, but also distracted driving, drowsiness, anger. And that’s not even thinking about self-driving cars (anxiety, etc) We believe that Emotion AI will play a key role in helping transform the automotive experience. The automotive manufacturers and their suppliers are keenly aware of this as well. At Affectiva we have seen a notable increase in interest from the automotive industry.

emotion for cars.jpg

You can read more here about Emotion AI for cars, including where we see emotion tech shaping the automotive industry. We also did this guest blog on AI for cars for VentureBeat that has some interesting information around emotion use cases for automotive as well.

Given that cultures evolve, how do you manage the updates to the contextual filters?

For cultures, we are continuously re-training our classifiers using more data. Also - you are right that it is not just cultural but also contextual. We started with data that was primarily watching online videos, now our database includes people playing online video games, interacting with other people (video conversation), as well as driving. The broader the contexts, the more accurate the results.

We are currently experimenting with different DL models - some end to end, some not. Eventually we would like to explore model compression to get the DL models onto device.

Has your software uncovered any new insights into how humans express emotion that you were surprised to see?

Yes! I'll share two:

First, we initially weren't sure if people "emoted" when they were alone with a device (phone or laptop;) - we were surprised by the number of emotional expressions that people did even when there were no other people around - I guess we're social animals after all.

Another example: we discovered that there are many flavors of a smile. Some smiles - like a punch smile - are very fast and huge, others like a social smile, are subtle (We see a lot of that in Asia btw) From mining half a million smiles, we uncovered at least 5 smile signatures, so that was cool!

How important do you think having an advanced degree was to your success? Would you encourage others to seek an advanced degree or go into practitioning ASAP?

Thats a good question! I think it depends on the exact circumstance of course - but in my own personal story, doing a PhD allowed me to dive deep into a topic I was curious about. My hypothesis was "Can I build an emotionally intelligent machine?" and that was in the year 2000, over 17 years ago - and I am still trying to answer that ;)

I like to say pick something you're super passionate about, and become the world expert in it, and then persist (because you will run into a lot of naysayers who will think you're nuts and what you're doing is not possible). So you just have to persevere!

For further information, you can learn more about our work in this TED Talk I did.

BLOG