In a recent project, we found which of ~700000 property parcels in Adelaide, Australia, contain swimming pools. We used a combination of crowdsourcing and supervised machine learning in order to harness the inherent ability of humans to identify objects in imagery and the speed of machines, which can perform this task much faster than humans, once trained sufficiently. Our initial approach consisted of training a random forest classifier with a set of crowdsourced labels, then using the machine classifications to present to the crowd only the parcels that were likely to contain swimming pools. Since only a small percentage of the parcels actually contain pools, the efficiency gain of this approach is huge compared to a pure crowdsourcing campaign.
Deep Learning with Convolutional Neural Networks
At first glance, identifying a pool in a high-resolution satellite image might appear to be a simple task for a human and a machine alike. A human can easily detect bright blue water; similarly, a machine can easily pick up the spectral signature of pool water. Yet our initial approach revealed the problem to be much more complicated! Pools vary in shape, color, size and location within the property; many are covered with tarps and tents or are completely empty. It turned out that both humans and machines can have a hard time detecting pools! Could we come up with more sophisticated machine learning algorithms to help us identify pools at scale?
Each of the above images contains a pool inside the pink parcel polygon. Note the variability of color, location, size and visibility.
Enter Deep Learning and Convolutional Neural Networks (CNN). CNNs are a promising approach to object detection due to their inherent flexibility and large number of configurable parameters. These qualities allow CNNs to effectively learn common abstract properties of pools independently of their location within the property bounds. These are properties that other machine learning algorithms, and even the human eye, may overlook.
We used the results of our previous approach in order to train and test a 16-layer CNN with architecture based on VGGNet, the winner of the 2014 ImageNet challenge1. We baptized our pool detector PoolNet.
VGGNet architecture. The input layer (blue) are the pixels within the original property polygon. Each green layer represents a convolution operation on the previous layer, which produces a set of features which are fed to the subsequent layer. Max-pooling (MP) layers are used for feature downsampling in order to reduce the total number of parameters of the model. The two yellow nodes at the end contain the probabilities that the polygon belongs to the class 'pool' vs. 'no pool'.
Given our prior knowledge that only a small percentage of the polygons contain pools (~6%), selecting training data from the original distribution would cause the net to be biased in favor of ‘no pool’ polygons. To avoid this phenomenon, we train the net in two phases.
Two-phase training schema for PoolNet. The model is first trained on a dataset containing equal amounts of 'pool' and 'no pool' polygons, followed by phase two, during which only the output layers are re-trained on data with the original class distribution.
In the first training phase, we construct a training dataset of 10000 polygons such that both classes are equally represented and train the net until the validation loss stops decreasing. In the second phase, we retrain only the output layer (keeping all other layers fixed) on 5000 polygons drawn from the original distribution. This method gives us the combined benefit of having the network learn features that define a pool in phase 1, and tuning the probability of producing a ‘pool’ classification in phase 2 (due to the re-training of the output layer with the natural frequencies of the two classes in the data).
Using this two-phase procedure, the model took about two hours to train on a total of 15,000 polygons.
We tested PoolNet on 5000 polygons and confirmed that the fully trained model classifies over 3750 polygons per minute. The results were visually inspected to gain insight into the strengths and weaknesses of the model. A few causes of misclassification became apparent: properties that contain swimming pools that are partially covered by trees or a tarp, are either empty or small, or are located at the property boundary were often falsely classified as ‘no pool’.
Samples of pools that the net missed. Notice that many are difficult to see, covered by trees, unusually dark or at the edge of the property.
The model was also confused by bright blue objects such as tarps or back yard trampolines, falsely classifying the corresponding properties as ‘pool’.
Polygons falsely classified as having pools. Notice that most contain a bright blue region resembling a pool.
Test Data Errors
Our visual inspection revealed that the test data included a number of incorrectly classified polygons, from both classes, which the model was actually classifying correctly. The noise in the test data is present due to the fact that the classifications produced by our previous approach are imperfect! Some examples are shown below.
Test data labeled as 'no pool' but classified correctly by PoolNet as 'pool'.
Test data labeled as 'pool' but classified correctly by PoolNet as 'no pool'.
Given the fact that the noise in the test data prevents an accurate assessment of the performance of PoolNet , we created a small ground-truth data set by manually classifying 1650 polygons, 150 in the ‘pool’ class and 1500 in the ‘no pool’ class. In cases where the answer was not obvious by looking at our own imagery, we used other data sources in order to confirm whether a pool was present in the property. The results of our sampling are summarized in the table below. Our trained model achieved indicative recall 93% and precision 88%.
PoolNet accuracy metrics.
Our goal is to classify properties in the whole Australian continent!
We want to find pools everywhere in Australia! Given the vast area that must be inspected, an accurate and scalable classification solution is necessary.
The initial results of PoolNet are very promising. Recall and precision are satisfactory at moderate computational cost, as PoolNet can classify more than 60 polygons/sec. A potential improvement to PoolNet includes finding examples of underrepresented pool types, e.g., empty, covered, etc, and constructing an artificial training data set which contains a higher percentage of these types, with the goal to direct PoolNet to learn to identify these pool types.
Our plan is to deploy PoolNet at a continental scale by integrating it with the GBDX platform. You can also play with the PoolNet code here; email GBDX-Support@digitalglobe.com to get GBDX credentials.
Stay tuned for more results!
- K. Simonyan and A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition”, arXiv:1409.1556, 2014.