Xi'an Technological University
Subject: Computer Science, Software Engineering
eISSN: 2470-8038
SEARCH WITHIN CONTENT
Keywords : Interactive Segmentation, Super-Pixel, Graph Cut, Maximum Flow/ Minimum Cut, K-Means, SLIC
Citation Information : International Journal of Advanced Network, Monitoring and Controls. Volume 3, Issue 4, Pages 25-30, DOI: https://doi.org/10.21307/ijanmc-2019-017
License : (CC-BY-NC-ND 4.0)
Published Online: 14-October-2019
Image segmentation has always been a research hotspot in computer vision. Traditional image segmentation is based on pixels, and it is difficult to achieve the desired target effect. The manual segmentation method is too cumbersome and has limited precision. There is a certain degree of instability in the selection of interactively segmented pixels, which will affect the final segmentation results to a large extent.
In this paper, the superpixel block generated by the SLIC algorithm is used to preprocess the image, and then based on the Graph cut algorithm, the selection of seed points is improved, and the pixels of the background area are attempted to be optimized to obtain a more excellent target point sample. On top of the image segmentation, the foreground object is further divided with a frame, the pixels in the background are optimized, and the advantages of other segmentation algorithms are extracted to achieve the goal of global optimization, and interactive image segmentation is achieved.
Image segmentation is the division process of independent regions in an image with particular meaning which makes the same region represent the same features. The purpose is extracting the interesting parts called “the foreground”, and the rest parts are called “the background”. However, the characteristic differences between the foreground and the background may be reflected in various aspects such as grayscale, contour, texture, etc., so there is currently no general algorithm to solve the both problems. The result of image segmentation will directly affect the subsequent image processing results. At present, the application field of interactive image processing is very broad, and there are many researches on segmentation algorithms for stereoscopic images and color images.
The superpixel is a set of adjacent pixels which have similar features in space. The superpixel block can retain the effective information of image without affecting the boundary information of the images. In the conventional image processing, the pixel is regarded as a basic processing unit with high complexity. If replace the pixel point with superpixel block, the complexity in the process can be reduced. Therefore, in image segmentation, the image is preprocessed by using superpixels to improve the accuracy and speed of image segmentation.
The basic idea of the SLIC (simple linear iterative clustering) algorithm is converting a color image into a 5-dimensional feature vectors in CIELAB color space, and setting a specific distance metric for the vectors, finally clustering the pixels in the image. The superpixel produced by the SLIC algorithm is not only fast, compact, but also nearly uniform and with good contour.
Specific implementation steps of the SLIC algorithm:
1) Initializing the seed points (or call the cluster center): to distribute the seed points evenly in the inner area of the image according to the number of superpixels we set. If the pixel of image is N and the number of divided superpixels is K, so the N/K is the size occupied by one superpixel, and the S=sqrt(N/K) is the distance between adjacent seed points.
2) Finding new seed points in the n*n proximity range: to calculate the gradient values of all the pixels in the range, and to set the seed points at the minimum.
3) Label distributing: to distribute the corresponding label for each pixel of each seed point in the adjacent area. It should be noted that the SLIC algorithm can only be searched within the range of 2S*2S.
4) Distance measuring: to calculate color distance and spatial distance of each searched pixel point and seed point separately. The calculation method of each distance is as shown in formulas (1), (2), and (3):
Among them, d_{c} represents the color, d_{s} represents the space, N_{s} represents the maximum spatial distance within the class, and the formula N_{s}=S=sqrt(N/K) is applicable to every cluster. The maximum color distance is related to the image and is usually replaced by a fixed constant m. The calculation formula of the distance metric D' in the above formula can be changed to the following formula (4):
Since some pixels are not searched by only one but multiple seed points, it is necessary to calculate all their distances and to compare the minimum value as the cluster center of this pixel.
5) Optimizing: to optimize the code by iterative method, the specific iteration idea pseudo code is implemented as follows.
/*initialization*/
Sampling the pixel by the step size S, initializing the cluster center C_{k} = [l_{k}a_{k}b_{k}x_{k}y_{k}]^{T}
Moving the smallest gradient seed point position in the S*S cluster center
Setting the label for each pixel i = -1
Setting the distance for each pixel d(i) = ∞
/*distribution*/
for every cluster center
for each pixel i is in a surrounding 2S*2S area calculating the distance D between C and i
if D<^{d(i)}
setting d(i)=D
setting l(i)=k
/*Update*/
calculating a new cluster center
calculating the residual E
If E<= threshold
End the cycle
6) Enhancing connectivity: after iterative optimization, the algorithm may have some shortcomings. For example, multiple connectivity phenomena, the generated superpixels are too small, one superpixel is cut into many discontinuous superpixels, and the like. The main solution to the problems is increasing connectivity. The specific method is employing a new tag table, setting the elements as -1 and using the "Z" type to redistribute the generated undesirable superpixels. Then stop experiment after all the points traversal ends.
In the SLIC superpixel generation algorithm, the segmentation results of the image are different because of the number of superpixel generation and the setting of the compact coefficient. The compact coefficient is the ratio of tightness between the color feature and the XY coordinate feature. Generally, compact coefficient affects the shape of the superpixel.
The number of superpixels has a certain influence on the segmentation effect. In the case where the compact coefficient is 32, change the number of superpixels, and the number of superpixels generated is 128, 256, 512. Segmenting the images (a) and (b) in Fig. 1 respectively, and the experimental results are as follows:
It can be seen from Fig. 1 that, sometimes, the number of divided super-pixels is slightly smaller than the set value. It is mainly because that the too small segmentation regions are merging together in the processing. We can obtain from the experimental results that as the set value of superpixel number increase, the edge degree of generated superpixel block and the accuracy of segmentation result are high and vice versa.
According to the definition of compact coefficient, it will affect the shape of generated superpixel. Figure 2, the image (a) is divided with the size of the compact factor changing. The experimental results are as follows:
It can be seen from Fig. 2, in the case where the number of superpixels is set to be certain, as the compact coefficientIt can be seen from Fig. 2, in the case where the number of superpixels is set to be certain, as the compact coefficient increases, the generated superpixels become more and more regular; on the contrary, they become more and more irregular. However, the larger compact coefficient changes, the greater time complexity of the algorithm will be. After multiple experiments, it is found that when the compact coefficient is set at 40-60, the generated superpixel blocks are relatively regular, which is in line with the further experimental requirements of this paper.
The basic idea of the Graph cut algorithm: users mark the foreground object and the background object, and assign the pixels with the highest probability to the pixels of unknown label according to the existing labels. The algorithm needs to combine the image segmentation problem with the image min cut problem, to parse and store the digital image by related properties of the matrix. Pixels in the grayscale image can be represented by a matrix, the row of the matrix represents the height of the image, the columns represents the width, the elements stored in the matrix are the pixels of image, and the values of matrix elements are the grayscale of pixels.
Step 1: Construct an undirected graph to represent an image which will be split, wherein V and E represent the sets of vertex and edge respectively. Using the Graph Cut algorithm to process the graphs, and the two endpoints are represented by "S" and "T", which called terminal vertices. The remaining endpoints are all connected with these two terminal vertices to form a part of edge collection. There are two different vertices, simultaneously there are also two edges in Graph Cut algorithm, as shown in Fig. 3.
If there is a value whose sum of edge weights is the smallest, then this energy-minimization-based algorithm can be converted into the formula (5):
How to cut these edges to find the minimum cut? In order to solve this problem, we can regard the entire Graph cut image segmentation as a pixel mark problem. The target label set as 1, and the background label set as 0. The entire setup process can be obtained by the energy minimization function described above. Then the cut which can separate the target from the background is the requirements of this article. Using the found cut to classify the pixels in the image and to make different marks on the background and foreground, then the segmentation process of the entire image ends.
The process of energy minimization can be solved according to the maximum flow algorithm, as shown in Fig. 4:
In this paper, the SLIC algorithm and the Graph cut algorithm are combined to segment the image. We use the average pixel of all pixels in the generated superpixel block to instead the original pixel, and the interactive factor is added. The user object selects the area where the target object is located, and the rest of the area is treated as the background point. The image is segmented by the Graph cut algorithm, and the segmentation result and the standard segmentation image are compared each other to analyze the segmentation effect. The algorithm flow is shown in Fig. 5:
We selected a set of images for segmentation, and compared the results by three data: scorpion rate, accuracy, and recall rate. Among them, the scorpion rate is a more comprehensive indicator that can reflect the regression rate and accuracy rate. The segmentation results are shown in Table 1:
Among the above 10 pictures, the 6th image "soldier" has a high accuracy rate, but a low regression rate, only 75%, similarly as the 1st image "house" and the 4th image "athlete". These three images are relatively complicate in gray value of the main object, all of which have the pixels of high and low gray value simultaneously. For example, "soldier" has a lower gray value clothing, while the helmet has a higher one. When the helmet is divided into the background, the final segmentation result lost a part of foreground object, so the accuracy is high, but the recall rate is low. On the contrary, it is also the reason why the 2nd image "chicken" has a high recall rate.
The parameter K value determines the proportional relationship between the boundary term and the region term. In this section, we study the difference between the image segmentation results under different K values. As shown in Fig. 7.
Figure 7 analyzes the change of recall rate under different K values, and we find that not all curves have a recall rate that increases with the rise of K value. This shows that when the image is segmented, only adjusting K value does not effectively increase the recall rate. Through the rate of change in the line chart, it is found that when K increases from 500 to 40500, the curve with the largest amplification in recall value changes by less than 7%. The change of K value is very limited to adjust the recall rate.
Figure 8 analyzes the relationship between K value and accuracy. We find that not all the accuracy of images increase with K value. Some images do not change much with increase of K, such as "cat"; some images will decrease as the K value increases, such as "car." It shows that the increase of K value is not as good as the selection of previous scenic spots and background points, or the selection of the frame selection. It can only distinguish the influence on segmentation result by eyes. Therefore, in actual operation, K value does not have to be set as a variable parameter, but can be specified as an appropriate value by users at the beginning.
It is concluded by complexed Fig. 7 and Fig. 8 that the selection of the K value can only change the edge of the image segmentation in terms of visual effects, and does not significantly affect the segmentation result numerically, and the difference in the influence of the K value on different images is also very big. The 11th sample "dog" image of Fig. 5 is significantly less accurate than the other images. Compared with other samples, we found that the foreground gray value of "dog" is larger than the background gray value, and the detection algorithm uses the small gray value points as the segmentation target to intersect with the marker image (GT image), which makes the image with light color as the main object has a high recall rate in the segmentation task but a obviously low accuracy. On the contrary, when the foreground and background of the image are quite different, the recall rate and accuracy will be high, so the image segmentation effect will be better.
Figure 9 analyzes the effect of the number of superpixels on the segmentation results. The algorithm process is: first, preprocess the superpixel block generated by the image; then use the superpixel block average value to instead the original pixel value; finally, the preprocessed image is operated by the Graph cut algorithm. If the number of super pixel blocks is different, the experimental results will be different. As shown in Fig. 9, we analyze the segmentation result by changing the number of superpixel blocks generated in a same photo.
As can be seen from Fig. 9, the effect of the number of superpixels on the segmentation result is different for different divided images. Within a certain range, as the number of superpixels increases, the accuracy of the segmentation result increases. While after exceeding this range, the accuracy of segmentation results will decrease when the number of superpixels increases. From the segmentation results of the above several images, we can infer that it is not the more the number of superpixels divided, the better the segmentation effect. When the number of superpixels increases to a certain range, the segmentation effect is almost no longer improved.
This paper implements an improved image segmentation software design based on the combination of superpixel and Graph cut algorithm. Test and analyze different scenarios and parameters, we have achieved a better robustness of the software, and making it adapt to most interactive segmentation tasks.
In the result detection, we find that the selection of the K value and the number of superpixels will determine the degree of detail retention at the edge of the image foreground. Since the changed points are the small proportion of the whole picture pixels, currently it is difficult to judge by the recall rate and the accuracy rate, and only the method of human eye recognition can be used to identify the advantages and disadvantages. Compared with the selection of the K value and the number of superpixels, the selection of the foreground or background points and the optimization of the frame clipping method can have a numerical influence on the segmentation result. The foreground gray value selection should be as large as possible from the background, which is beneficial to the image segmentation result. In the selection of the optimization frame, the foreground object should be framed as completely as possible, and the background should be placed as little as possible to achieve the effect of optimizing the segmentation.
The point and frame selection steps reflect the superiority of interactive segmentation, which can achieve the users' subjective willingness to split in the program. Even if there are multiple theme objects in some scenes, you can divide the different objects by framing them.
This work was supported by National Natural Science Foundation of China (Grant No. 61572392, 61671362), Natural Science Foundation in Shaanxi Province of China (Grant No.2017JC2-08) and National Joint Engineering Laboratory Fund Project of New Network and Detection Control (Grant No. GSYSJ2016006).