Share / Export Citation / Email / Print / Text size:

International Journal on Smart Sensing and Intelligent Systems

Professor Subhas Chandra Mukhopadhyay

Exeley Inc. (New York)

Subject: Computational Science & Engineering, Engineering, Electrical & Electronic


eISSN: 1178-5608



VOLUME 8 , ISSUE 2 (June 2015) > List of articles


Vektor Dewanto * / Aprinaldi / Zulfikar Ian / Wisnu Jatmiko

Keywords : vision-based knowledge, knowledge-compatibility benchmarker, semantic segmentation, averaged class accuracy, regression

Citation Information : International Journal on Smart Sensing and Intelligent Systems. Volume 8, Issue 2, Pages 1,284-1,312, DOI: https://doi.org/10.21307/ijssis-2017-807

License : (CC BY-NC-ND 4.0)

Received Date : 15-January-2015 / Accepted: 24-March-2015 / Published Online: 01-June-2015



The quality of a semantic annotation is typically measured with its averaged class-accuracy value, whose computation requires scarce ground-truth annotations. We observe that humans accumulate knowledge through their vision and believe that the quality of a semantic annotation is proportionally related to its compatibility with the vision-based knowledge. We propose a knowledge-compatibility benchmarker, whose backbone is a regression machine. It takes as input a semantic annotation and the vision-based knowledge, then outputs an estimate of the corresponding averaged class-accuracy value. The knowledge encodes three kinds of information, namely: cooccurrence statistics, scene properties and relative positions. We introduce three types of feature vectors for regression. Each specifies the characteristics of a probability vector that captures the compatibility between an annotation and each kind of the knowledge. Experiment results show that the Gradient Boosting regression outperforms the n-Support Vector regression. It achieves best performance at an R2-score of 0.737 and an MSE of 0.034. This indicates not only that the vision-based knowledge resembles humans’ common sense but also that the feature vector for regression is justifiable.

Content not available PDF Share



[1] J. Shotton, J. Winn, C. Rother, and A. Criminisi, “Textonboost for image understanding: Multiclass
object recognition and segmentation by jointly modeling texture, layout, and context,” Int. J.
Comput. Vision, vol. 81, no. 1, pp. 2–23, Jan. 2009.
[2] L. Ladicky, C. Russell, P. Kohli, and P. H. S. Torr, “Associative hierarchical crfs for object class
image segmentation,” in Computer Vision, 2009 IEEE 12th International Conference on, Sept
2009, pp. 739–746.
[3] P. Kr¨ahenb¨uhl and V. Koltun, “Efficient inference in fully connected crfs with gaussian edge potentials,”
in Advances in Neural Information Processing Systems 24, J. Shawe-Taylor, R. Zemel,
P. Bartlett, F. Pereira, and K. Weinberger, Eds. Curran Associates, Inc., 2011, pp. 109–117.
[4] X. Boix, J. M. Gonfaus, J. van de Weijer, A. D. Bagdanov, J. S. Gual, and J. Gonz`alez, “Harmony
potentials - fusing global and local scale for semantic image segmentation.” International Journal
of Computer Vision, vol. 96, no. 1, pp. 83–102, 2012.
[5] J. Alvarez, M. Salzmann, and N. Barnes, “Large-scale semantic co-labeling of image sets,” in
Applications of Computer Vision (WACV), 2014 IEEE Winter Conference on, March 2014, pp.
[6] J. Z. Ning Zhang, “A study of x-ray machine image local semantic features extraction model based
on bag-ofwords for airport security,” Internatioanal Journal on Smart Sensing and Intelligent Systems,
vol. 8, no. 1, p. 45, 2015.
[7] Aprinaldi, I. Habibie, R. Rahmatullah, A. Kurniawan, A. Bowolaksono, W. Jatmiko, and B. Wiweko,
“Arcpso: Ellipse detection method using particle swarm optimization and arc combination,”
in Advanced Computer Science and Information Systems (ICACSIS), ser. ICACSIS 2014. IEEE,
2014, pp. 408 – 413.
[8] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman, “The
PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results,” http://www.pascalnetwork.
[9] L. Ladicky, C. Russell, P. Kohli, and P. H. S. Torr, “Inference methods for crfs with co-occurrence
statistics,” International Journal of Computer Vision, vol. 103, no. 2, pp. 213–225, 2013.
[10] A. Rabinovich, A. Vedaldi, C. Galleguillos, E. Wiewiora, and S. Belongie, “Objects in context,” in
Computer Vision, 2007. ICCV 2007. IEEE 11th International Conference on, Oct 2007, pp. 1–8.
[11] S. Gould, R. Fulton, and D. Koller, “Decomposing a scene into geometric and semantically consistent
regions,” in Computer Vision, 2009 IEEE 12th International Conference on, Sept 2009, pp.
[12] A. Gupta, A. A. Efros, and M. Hebert, “Blocks world revisited: Image understanding using qualitative
geometry and mechanics,” in European Conference on Computer Vision(ECCV), 2010.
[13] A. Gupta and L. S. Davis, “Beyond nouns: Exploiting prepositions and comparative adjectives for
learning visual classifiers,” in Proceedings of the 10th European Conference on Computer Vision:
Part I, ser. ECCV ’08. Berlin, Heidelberg: Springer-Verlag, 2008, pp. 16–29.
[14] S. Gould, J. Rodgers, D. Cohen, G. Elidan, and D. Koller, “Multi-class segmentation with relative
location prior.” International Journal of Computer Vision, vol. 80, no. 3, pp. 300–316, 2008.
[15] S. Divvala, D. Hoiem, J. Hays, A. Efros, and M. Hebert, “An empirical study of context in object
detection,” in Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on,
June 2009, pp. 1271–1278.
[16] M. J. Choi, J. Lim, A. Torralba, and A.Willsky, “Exploiting hierarchical context on a large database
of object categories,” in Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference
on, June 2010, pp. 129–136.
[17] N. E. Maillot and M. Thonnat, “Ontology based complex object recognition,” Image and Vision
Computing, vol. 26, no. 1, pp. 102 – 113, 2008, cognitive Vision-Special Issue.
[18] J. Tighe and S. Lazebnik, “Understanding scenes on many levels,” in Proceedings of the 2011
International Conference on Computer Vision, ser. ICCV ’11. Washington, DC, USA: IEEE
Computer Society, 2011, pp. 335–342.
[19] S. Gould, J. Zhao, X. He, and Y. Zhang, “Superpixel graph label transfer with learned distance
metric,” in ECCV, 2014.
[20] A. Oliva and A. Torralba, “Modeling the shape of the scene: A holistic representation of the spatial
envelope,” Int. J. Comput. Vision, vol. 42, no. 3, pp. 145–175, May 2001.
[21] A. J. Smola and B. Sch¨olkopf, “A tutorial on support vector regression,” Statistics and Computing,
vol. 14, no. 3, pp. 199–222, Aug. 2004.
[22] J. H. Friedman, “Greedy function approximation: A gradient boosting machine,” Annals of Statistics,
vol. 29, pp. 1189–1232, 2000.
[23] G. Li, H. Meng, M. Q. Yang, and J. Y. Yang, “Combining support vector regression with feature
selection for multivariate calibration,” Neural Computing and Applications, vol. 18, no. 7, pp.
813–820, 2009.
[24] J. H. Friedman, “Stochastic gradient boosting,” Comput. Stat. Data Anal., vol. 38, no. 4, pp. 367–
378, Feb. 2002.
[25] A. Natekin and A. Knoll, “Gradient boosting machines, a tutorial,” Frontiers in Neurorobotics,
vol. 7, 2013.
[26] B. Andres, B. T., and J. H. Kappes, “OpenGM: A C++ library for discrete graphical models,” ArXiv
e-prints, 2012.
[27] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer,
R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and
E. Duchesnay, “Scikit-learn: Machine learning in Python,” Journal of Machine Learning Research,
vol. 12, pp. 2825–2830, 2011.