Affectiva Behind the Curtain: Q&A with our Tech Team

Affectiva is the pioneer of Emotion AI: we’ve coined the phrase that has started the conversation around emotion-enabling the technology that surrounds us. From the app on your phone to the tech in your car, the science fueling our SDK and APIs allows for many possibilities that developers are exploring every day.

We would never be able to accomplish what we do without our science, engineering and product teams here at Affectiva. So we figured we would give you a peek behind the curtain and answer some of the more technical questions we’ve been asked from developers that are curious about our technology and its architecture. Check out some of the answers below:

What's your web architecture? Monolithic or microservices?

Somewhere in between. We have services but they're more coarse-grained than microservices. It's closer to a classic service-oriented architecture where the services are more coarse-grained than what you'd see in a microservice architecture.

What's your front end stack?

Front End: angularjs & we don't really have a front-end application. Our dashboard is D3+jQuery+much much more. It varies by application -- we have some Angular, some ad-hoc JS, and some server-side rendering.

What's your backend stack?

We have services in Rails (Puma/MRI) and some that run in the JVM, and use relational databases, object stores, and in-memory caches. Our preference is Python for number-crunching and we're big fans of Hadoop. In fact, we do almost everything in the cloud.

Does the technology impose a big CPU footprint in a real-time applications or is it purely server-based processing? In this case, does it require a huge bandwidth to transmit video, or does it work out of more sparse "snapshots"?

We have both on-device and cloud solutions because there's no "one size fits all" solution. Over time, we found that different applications have different processing needs - some work well with periodic single-frame snapshots, and others need continuous processing. Still others need to do the processing on-device and others are better suited to store-and-forward with processing in the cloud. Cloud and on-device solutions both let you manage resource consumption by controlling, e.g., the on-device frame rate or the upload bit rate.

What tools/libraries do you use for the business logic of crunching numbers on face data?

Much of our post-hoc number crunching is done in Python using numeric processing libraries like numpy, scipy, etc. We use a number of visualization libraries e.g., D3 or sometimes even Tableau. We've also built some "event detectors" on top of the raw moment-by-moment metrics which can be overwhelming for some of our SDK users / developers - this means that you can just "listen" for a certain emotion. For databases, we use MySQL + S3 though we are starting to look into using distributed query systems like presto.

What classic AI/ML techniques are you using for the magic to happen, could you also give us an overview of the tech tools (programming languages, databases, frameworks) used in your product?

Over the years, we’ve made use of a number of ML techniques for performing classification, specifically Random Forests and Support Vector Machines (SVM). Now our primary research is focused around using deep learning, with most of the researching using Convolutional Neural Nets (CNN).

Our core runtime is written in C++ and makes use of OpenCV, however most of our researchers work with a mixture of C++ and Python. Our deep learning experiments are still being conducted on a few frameworks with caffe and Tensorflow being the most commonly used.

What is an interesting / surprising problem you needed to solve to get the performance you have now?

There's always a tradeoff between accuracy and the cost of computation so we've implemented different approaches - some are more accurate but at a greater cost, and others trade some accuracy for greatly reduced resource consumption. Implementing algorithms that strike the right balance for different use cases is always a challenge.

What kind of compute power is necessary to run these operations? Are you self-hosting hardware or using cloud provider?

Different operations require different resources to run. Our on-device SDK is highly optimized to deliver great performance in a resource-constrained environment while cloud applications throw more resources (including time) at calculating results. Other operations such as building training sets might require dozens to hundreds of machine-hours to compute. Still others, such as reprocessing our video corpus with new algorithms, might require hundreds to thousands of machine-hours. As you can imagine, the elasticity that the cloud provides is a boon to us - we probably couldn't do what we do if we had to buy physical machines.

Since all processing of the neural network is done on-device and not at the cloud, what is the processing footprint of that for third-party app? I'm afraid this could be a bit heavy for games, for instance, which require as much CPU/GPU time and RAM as possible while running.

The SDKs are designed to perform the emotion processing on device without needing cloud support. We test the SDK performance on a range of mobile devices available in the market. On the iPhone 5s with its dual core 1.3GHz CPU, the SDK uses on average 50% of the CPU capacity. In contrast on iPad Air which has quad core 1.5 CPU the SDK uses on average 8% of the CPU capacity. In our tests we observe a similar performance on recent android devices like the Samsung Galaxy S5 and Galaxy Note 3. Also note the "no one size fits all" answer above.

Does Affectiva systems' have the capacity to respond in real time systems to rapidly changing environments?

Yes, our SDK's process data on device in real time.

Have you guys faced any scaling issues especially with Mysql?

Sure, but the key is to use the appropriate data store for the task at hand. We use a variety of data stores such as in-memory caches, object stores, and relational databases because we have different data sets with radically different access patterns.

What's your database? MongoDB/NoSQL or Postgres/MySQL or something crazy like Datomic or other?

We use a variety of data stores such as in-memory caches, object stores, and relational databases because we have different data sets with radically different access patterns. We have millions of videos, for example, and it would be difficult to manage them in mysql (or any relational store) but object stores like S3 were designed for that purpose and do it well. We use mysql as a back-end store for our web applications because it's great for managing data like accounts, users, permissions, etc. We use in-memory caches where speed is more important than durability, such as online session management.

Visit our developer portal to learn more, download the SDK, or get in touch with us!

BLOG