Confidence Values in the Teenie Harris Archive

January 19th, 2017 by Zaria Howard

As I search into the photographs of Teenie Harris, there will be two inevitable questions of every result collected. The first, "How confident are you that this is the right statistic?" and the other is "How much error is present when using these stats for prediction". The answer to both of these questions lie in error estimation. Due to the size of the Teenie Harris Archive, it is nearly impossible to check the accuracy of every single feature of a face or photo, therefore it is important to know how much error is in our results.

To be more specific, in this post I'm looking at the confidence and accuracy of the face detector provided by the dlib library. The dlib library that was used gives a confidence metric from 0 to 1.2. The metric is the probability of the pixels detected actually containing a human face.

Intuitively the factor that would most affect the confidence of the face is the size of it. If a face is really small and far away, then the detector should have more difficult than if the face is large and close to the camera. The histogram below gives an idea of the range of face sizes in the photo. Note that the face sizes retrieved from the library are fixed and discrete.

The majority of faces in the photos are smaller, this is most likely an artifact of the fact that most of Teenie's photos are group pictures and crowd shots. Likewise the faces in his portrait shots are larger. For reference the photo on the left has a face size of 630 and the faces in the picture on the right all have sizes between 30 and 40.

Faces that are bigger in the photo almost always have a higher confidence associated with them. Faces that are smaller have a confidences anywhere from 0 to 1.2, the variance is greater in small faces.

As you can see in the chart above, it's rare to have a low confidence on a large face. Even the faces with low confidences such as 0.2 are still faces. Below is an example of the highest confidence faces (>1.2) on the left and the lowest confidence faces (< 0.02) on the right:

The most frequent confidence level was 1.05. That means most of the photos have higher clarity like the one above on the left. From the histogram of confidence(the purple chart) and from visually inspecting the photos it is clear that we can be confident that the faces found and analyzed by the detector aren't false positives, or results that have to be parsed through which will come in handy when looking at how faces affect the rest of the composition.