For all the news coverage of facial recognition technology, it's currently just not that good.
“Traditionally, face recognition software has worked very well on what I would call highly controlled photos,” said Chris Boehnen, a program manager with the Intelligence Advanced Research Projects Activity. Those would include pictures where the subject is looking at the camera and where there is good lighting, like a drivers license, visa or mugshot picture.
When the lighting isn’t optimal, the person is looking to the side or is making different facial expressions, results are frequently unreliable.
Recognizing a person from the side is easy for people, but very hard for computers, Boehnen said. Technology can detect minor differences if the images are very similar, but trying to understand the 3-D structure of the face and understand how it changes given different poses has “long been a huge challenge.”
However, advances in both hardware and software could be brought to bear on the complex relationships between all the curves and crevices that make up an individual’s face. So IARPA, in partnership with the National Institute of Standards and Technology, sponsored a challenge for software that takes advantage of deep neural network technologies to recognize facial commonalities in what Boehnen called “unconstrained photos” -- those where the subjects were unaware of the camera or did not pose for the picture.
For its facial recognition challenge, IARPA wanted to see improvements in three specific areas: verification accuracy, identification accuracy and speed.
Verification accuracy requires the algorithm to match two faces of the same person while correctly rejecting faces of different persons, the way the iPhone's face unlock feature works. Identification accuracy measures the software's ability to do a one-to-many search in which a face in a given photo can be matched with one from a gallery of pictures.
IARPA also wanted to see improvements in processing speed because facial recognition applications rely on large galleries of images to work. With traditional technology, the time to make a match increases as more photos are added to a gallery. The goal is constant performance, making the search time independent of gallery size.
Ntechlabs, a Russian company, had the best results in both for both identification speed and verification accuracy. Yitu, a Chinese company, had the highest-performing software for identification accuracy.
Ntechlab's CEO, Mikhail Ivanov, told GCN in an email that his company's software is currently used by law enforcement in Russia to identify criminals in CCTV footage.
“We have created a specific neural network architecture and compact and informative feature vector,” Ivanov said. “We also have training algorithms for specific ethnic, gender and age groups.”
Yitu’s technology is also used in public safety along with applications in financial services, healthcare, customs and integrated marketing, according to a release announcing its win.
IARPA is not just evaluating facial recognition technology, however. It's also developing its own.
The Janus program has provided funding to three prime contractors – the University of Maryland, Systems & Technology Research and the University of Southern California – that are also researching problems related to facial recognition within unconstrained galleries. The teams have been using creative commons photos of public figures pulled from the internet for the research, and also have been looking at video.
The four-year program ends next September, Boehnen said, but it already has produced multiple peer-reviewed publications, and some datasets and models have been opened for use by other researchers.
The goals of the program have changed over the last three years, he told GCN. The original goal for the end of phase three was a false positive rate of .01 percent. The new goal is to drive that false positive rate down to .00001 percent. That improvement is due in large part to breakthrough research published in 2012 that introduced the idea of using graphics processing units create neural networks that have 50 or 100 layers rather than one or two.
“Thanks to deep learning and GPUs, really this class of technology is chugging along at breakneck speeds,” Boehnen said.