In Computer Vision (or Image Processing) a common task is to compute a segmentation mask. A segmentation mask is a binary image (either the pixels are white or black) that represents a potential object of interest using one of the aforementioned two pixel values.
The methods and criteria to compute the segmentation mask depends on the application. If the object of interest has a particular color, that color can be defined in a specific color space range, (where it remains reasonable constant) and its possible expected values used to get a segmentation mask. One common application of this idea is the HSV-based segmentation, which is somewhat robust to illumination changes.
In this post we will explore another, less used, method to compute a segmentation mask. This approach uses K-Means to group similar colors into defined clusters. This output image can be thresholded to get a final binary image of the object of interest. Here, I’m setting K-Means to find clusters of 2 colors: the background color and the foreground color.
This is the input image:
Our goal is to create a segmentation mask containing only the spoon. The image has a noisy (high-frequency) background, it can be blurred a bit to get a smoother gradient and improve segmentation. Let’s apply a Gaussian Blur with a standard kernel size of 3 x 3. Check out the difference between the input and the smoothed image:
//read the input image:
std::string imageName = "C://opencvImages/segmentationTest.png";
cv::Mat testImage = cv::imread( imageName );
//apply Gaussian Blur to smooth out the input:
cv::GaussianBlur( testImage, testImage, cv::Size(3,3), 0, 0 );
Now, let’s pass this image to K-means. imageQuantization
is a function that implements segmentation based on K-means (More about this function in a little bit). As I mentioned, it can group colors of similar value in clusters. That’s very handy! Let’s cluster the colors in 2 groups: foreground object and background.
//total number of clusters in which the input will be segmented:
int segmentationClusters = 2;
//k-means iterations:
int iterations = 5;
//get the segmented image:
cv::Mat segmentedImage = imageQuantization( testImage, segmentationClusters, iterations );
This is the result:
That’s very nice. Check out how the color gradients on the original image are clustered in one group. Different shades of red, for example, are grouped together in a “red super cluster” that uses just one, very specific, red color value.
It will be difficult, however, to get a 100% accurate segmented image. In our example, there’s a little gap/hole in the spoon due to the fact that in that region the background color appears to be reflected. We can further improve the result applying a very basic pipeline of morphological operations.
Let’s convert the segmented image to grayscale, apply Outsu’s thresholding and then perform a morphological closing. It is common practice to render all the pixels belonging to the object of interest in white. This is accomplished by inverting the output of Outsu’s thresholding.
//compute grayscale image of the segmented output:
cv::Mat grayImage;
cv::cvtColor( segmentedImage, grayImage, cv::COLOR_RGB2GRAY );
//get binary image via Otsu:
cv::Mat binImage;
cv::threshold( grayImage, binImage, 0, 255, cv::THRESH_OTSU );
//invert the image:
binImage = 255 - binImage;
//perform a morphological closing to lose up holes in the target blob:
cv::Mat SE = cv::getStructuringElement( cv::MORPH_RECT, cv::Size(3, 3) );
int opIterations = 5;
cv::morphologyEx( binImage, binImage, cv::MORPH_CLOSE, SE, cv::Point(-1,-1),
opIterations );
I use a rectangular structuring element of size 3 x 3 and 5 iterations of the closing operation, this is the result:
This is pretty cool. From here, you can do whatever is needed with the segmentation mask. For example, let’s compute the spoon’s bounding box:
Now, let’s go back to that imageQuantization
function. The function prototype looks like this:
cv::Mat imageQuantization( cv::Mat inputImage, int numberOfClusters, int iterations );
The first step is to map the input image to a float matrix of samples, this is very straightforward:
//step 1 : map the src to the samples
cv::Mat samples( inputImage.total(), 3, CV_32F );
auto samples_ptr = samples.ptr<float>(0);
for( int row = 0; row != inputImage.rows; ++row){
//obtain a pointer to the beginning of the row:
auto src_begin = inputImage.ptr<uchar>(row);
//obtain a pointer to the end of the row
auto src_end = src_begin + inputImage.cols * inputImage.channels();
//while the end of the image hasn't been reached...
while(src_begin != src_end){
samples_ptr[0] = src_begin[0];
samples_ptr[1] = src_begin[1];
samples_ptr[2] = src_begin[2];
samples_ptr += 3; src_begin += 3;
}
}
The second step is to actually perform K-Means on this samples matrix. Let’s configure K-Means with the following parameters:
//step 2 : apply kmeans to find labels and centers
int clusterCount = numberOfClusters; //number of clusters
cv::Mat labels;
int attempts = iterations; //number of times the algorithm is executed using different initial labels
cv::Mat centers;
int flags = cv::KMEANS_PP_CENTERS;
cv::TermCriteria criteria = cv::TermCriteria( CV_TERMCRIT_ITER | CV_TERMCRIT_EPS, 10, 0.01 );
After setting up K-Means, call the OpenCV function:
//the call to k-means:
cv::kmeans( samples, clusterCount, labels, criteria, attempts, flags, centers );
Now, for every sample pixel we have a label and a cluster centroid. The label denotes the cluster to which that particular pixel belongs, while the cluster centroid has the actual value (in this case, a color) of all the pixels belonging to that cluster. This is the color value we’re looking for – in the spoon case, the red color!
Let’s loop through the original image, and for every pixel, get its corresponding label. We use that label to fetch (wow, ok, a little bit of the hardware engineer in me just came out) the new color, that is, the cluster’s centroid:
//step 3 : map the centers to the output
cv::Mat clusteredImage( inputImage.size(), inputImage.type() );
for( int row = 0; row != inputImage.rows; ++row ){
//obtain a pointer to the beginning of the row
auto clusteredImageBegin = clusteredImage.ptr<uchar>(row);
//obtain a pointer to the end of the row
auto clusteredImageEnd = clusteredImageBegin + clusteredImage.cols * 3;
//obtain a pointer to the label:
auto labels_ptr = labels.ptr<int>(row * inputImage.cols);
//while the end of the image hasn't been reached...
while( clusteredImageBegin != clusteredImageEnd ){
//current label index:
int const cluster_idx = *labels_ptr;
//get the center of that index:
auto centers_ptr = centers.ptr<float>(cluster_idx);
clusteredImageBegin[0] = centers_ptr[0];
clusteredImageBegin[1] = centers_ptr[1];
clusteredImageBegin[2] = centers_ptr[2];
clusteredImageBegin += 3; ++labels_ptr;
}
}
And that’s that! The final image is clusteredImage
. We have a handy function that performs K-means image segmentation!
The following post is based on one of my answers at Stack Overflow. This is the link to the original question.