CHAPTER 2: LITERATURE REVIEW
2.3 Different Modules in a FOCR System
2.3.3 Feature Extraction and Feature Selection
18.104.22.168 Features Extraction in OCR Systems
Various kinds of features can be found and/or calculated in the feature extraction part in a FOCR system. Usually, features are categorized into statistical (Parvez & Mahmoud, 2013), structural (Shanbehzadeh, Pezashki & Sarrafzadeh, 2007), global transformations (Al-Khateeb, Jiang, Ren, Khelifi & Ipson, 2009), and template-based matching and correlation (Rafaa & Nordin, 2011).
a) Structural Features
The Structural Features (StrF) describe the geometrical and topological characteristics of patterns, using their global and local properties (Gonzalez, Woods & Eddins, 2009). They are the most popular features investigated by researchers in handwritten FOCR systems, because they are the intuitive aspects of writing (Khorsheed, 2002). StrF are less influenced by sources of distortions, but they are highly dependent on the style of writing (Shanbehzadeh, Pezashki & Sarrafzadeh, 2007; Khedher, Abandah & Al-Khawaldeh, 2005). Although StrF are effective, but they are not easy to extract and usually, the researchers find them heuristically. They may be extracted from each row, column, skeleton or contour of an image.
b) Statistical Features
The Statistical Features (StaF) are derived from the statistical distribution of the image’s pixels and describe the characteristics measurement of a pattern. They include numerical
46 values computed from a part or the whole of an image. Although these features are easy to extract, they can lead the system to a wrong way, because most of them are very sensitive to noise, scale, rotation, and other changes in the patterns.
c) Global Transformation Features
Transformation process maps an image from one space to another space. These transforms usually reduce the dimensionality and order of computing in new space. Transformation processes provide feature that are invariant to global deformation like translation, dilation and rotation (Khorsheed, 2002). Some of well-known transformations are Fourier, Wavelet, Discrete Cosine transformation, and Fractal Codes. These transformations generate different features such as: Fourier descriptors, Modified Fourier Spectrum descriptors, wavelet coefficients and so on.
d) Template-based Features
Template-based features are usually created by matching pre-defined templates on graphical input data. However, they are completely data dependent, and thus, they usually cannot be transferred from one PR system to another.
Some of the most-used structural features (Zhang, Suen & Bui, 2004; Karic &
Martinovic, 2013; Peng, Cao, Setlur, Govindaraju & Natarajan, 2013; Impedovo, 2013;
Shanbehzadeh, Pezashki & Sarrafzadeh, 2007), statistical features (Parvez & Mahmoudi, 2013; Shah & Jethava, 2013; Chen, Beijun, Asharif & Yamashita, 2009; Abandah &
Anssari, 2009; Alaei, Nagabhushan & Pal, 2010a; Noaparast & Broumandnia, 2009;
Broumandnia & Shanbehzadeh, 2007; Enayatifar & Alirezanejad, 2011; Izakian, Monadjemi, Tork Ladani & Zamanifar, 2008; Rashnodi, Sajedi & Saniee, 2011; Kheyrkhah
& Rahmanian, 2007; Singh, Singh & Dutta, 2010), and transformation features in OCR
47 applications are shown in Table 2.6. Also, Table 2.7 summarizes feature extraction task in FOCR systems.
Table 2.6 : Some of the most used features in OCR applications Some of most used features
Type of Features
Simple, Double and Complex Loops
Loops positions, their types and their relative locations Hills and Valleys
Open curves in different directions Ascenders, Descenders
Number and locations of dot(s) in each character Location of dots relevant to baselines
Starting, Ending, Branching, Crossing, Turning and Corner points in image skeleton
Curvature and length of the image segments
Length of a character segment relative to other segments
Location of a character segment relative to center of mass image skeleton Image Area and Image Perimeter
Aspect Ratio Structural
Normalized Central, Zernike, Pseudo Zernike, Fast Zernike, Legendre, Orthogonal Fourier-Mellin, Rotational, and Complex moments extracted from the whole body, only from the main body, or only from the secondary parts of an image
Pixels distribution in left, right, up and down halves of the image Image Density
Mean, Mode, Variance, 2D Standard Deviation
Average and Variance of X and Y changes in portions of the image skeleton Ratio of horizontal variance histogram to vertical variance histogram Ratio of up-halve variance to down-halve variance of an image Thinness Ratio
The ratio of pixel distribution between two or more parts of the image Center of Mass (COM) (Center of Gravity)
Centroid distance Radial coding
Top, Bottom, Left and Right Profile histograms
Number of Horizontal (Row) and Vertical (Column) transitions Number of Modified Horizontal and Vertical transitions Outer and Inner Contour directional Chain Code histograms Normalized, and Modified Contour Chain Code
Fractal, Shadow Code Descriptors Energy of original image
Number of specific points such as end, branch, and cross points Number of connected components
Relative location of start and end points of an image skeleton Pen width and Line height
Histogram of slopes along contour Skeleton-based N-degree directional descriptors
M-band packet wavelet coefficients Fourier, DCT, and Radon coefficients Transformation
48 Table 2.7 : Summarization of researches in handwritten FOCR systems, based on feature
No. of Features
Digit Character Word
Shirali et al., 1994 Zernike moments 45 *
Shirali et al., 1995 Shadow code descriptors 32 *
Hosseini and Bouzerdoum, 1996
Number of crossing between digit body and horizontal and vertical raster lines
Mowlaei et. al, 2002 Harr wavelet coefficients 64 * *
Mowlaei and Faez, 2003
Harr wavelet coefficients 64 * *
Sadri et al., 2003
Derivative of 4 different views from 4 main directions using counting the number of background pixels between border and outer boundary.
Soltanzadeh and Rahmati, 2004
Outer profiles of images at multiple orientation, Crossing counts, Projection histograms
Mozaffari et al., 2004a Fractal codes 64 * *
Mozaffari et al., 2004b Fractal codes, Harr Wavelet transform 64 * *
Mozaffari et al., 2005a Fractal code 240 *
Mozaffari et al., 2005b
Average and variance of X and Y changes in different portion of the skeleton, End points, Intersection points
Mozaffari et al., 2005c Fractal code 240 *
Mozaffari et al., 2005d Fractal code 64 * *
Pixels density in 12-segment digit pattern, Moment inertia, Center of mass
Ziaratban et al., 2007a
Position of the best occurred matching in the horizontal and vertical coordinate template, Amount of the best matching template.
Templates : slanted lines, T junction, up, down, right and left curvature and so on.
Alaei et al., 2009a Chain code direction frequencies of image contour
Alaei et al., 2009b
Modified chain code direction frequencies in contour.
Modified horizontal and vertical transition levels
Salehpour and Behrad, 2010
Automatic feature extraction using PCA 20, 30, 40, 50
* Enayatifar &
Pixels accumulation, Pixels Direction 48 *
Alirezaee et al., 2004a
Relative energy of eroded versions respect to original image, Displacement of center of mass. Minimum and maximum eigenvalue.
Alirezaee et al., 2004b Invariant central moments 7 *
Shanbehzadeh et al., Number of component in each character, 78 *
49 2007 Number and location of dots relevant to
baseline, Number of pixels in each frame cell, Center of mass of each cell
Ziaratban et al., 2008b Terminal points, Two-way branch points, Three-way branches points
32, 40, 64, 108
* Gharoie and Farajpoor,
All pixels of an image 900 *
Alaei et al., 2010 Modified chain code direction frequencies of the contour
Jenabzade et al., 2011 Wavelet coefficients from outer border, Central moments
Rajabi et al., 2012 Zoning densities, crossing count, outer profiles
Alaei et al., 2012 Dimensional gradient 400 *
Dehghan et al. 2001a
Slope, curvature, number of active pixels and slope and curvature of each section extracted from the contours.
20 × number image’s frames
Dehghani et al. 2001
Pixels densities in various regions, contour pixels, angle of line passing through the first and end point in each image parts, ….
Safabakhsh and Adibi 2005
Moments, Fourier descriptors, Number of loops, Aspect ratio, Pixel densities, Position of right and left connections End, Junction, Branch, and Crossing Points
Broumandnia et al.
Wavelet packet transform coefficients 16, 32, 96, 128, 160
Vaseghi et al. 2008
Statistical Density Values 4 × number of image frames
Mozaffari et al. 2008b
Black – white pixel transition 10 × number of
Bagheri and Broumandnia 2009
Zernike moments 9, 25, 49,
72, 100, 182
Bahmani et al. 2010
Wavelet coefficients extracted from smoothed word image profile in four up, down, left and right directions.
200, 400 *
22.214.171.124 Features Selection (Features Reduction, Dimensionality Reduction) in OCR Systems
Various kinds of features are computed and/or extracted in the feature extraction step of an OCR system. Some of the extracted features, however, might correspond to very small details of the patterns, or might be a combination of other features (non-orthogonal
50 features), while some others might not have any efficacy in the recognition stage (Shayegan
& Chan, 2012). Irrelevant or redundant features may degrade the recognition results, reduce the speed of learning of the algorithms, and significantly increases the time complexity of the recognition process (Azmi, Pishgoo, Norozi, Koohzadi & Baesi, 2010). Hence, using all extracted features does not always produce the desired results, and could also increase the time complexity of the recognition process (Shayegan & Aghabozorgi, 2014a).
Therefore, following the feature extraction process, the issue of Feature Selection (FS) (Feature Reduction, Dimensionality Reduction) arises.
FS is typically a search problem to find an optimal subset with m features out of the original M features. In other words, FS is a process for excluding irrelevant and redundant features from the feature vector. The final goal in FS operation is reducing system complexity; reducing the overall processing time, and increasing system accuracy (Guyon
& Elisseeff, 2003).
In this respect, some features subsets selection algorithms have been proposed. According to the criterion function used for finding one m members subset out of 2M possible subsets (M is the number of initial features), two general categories have been introduced for this important task: Wrapper algorithm and Filter algorithm (Dash & Liu, 1997). In the Wrapper algorithm, the classifier performance is used to evaluate the performance of a features subset. In the Filtering algorithm, the features evaluation function is used rather than optimizing the classifiers performance. In this category, the best individual features are found one by one. However, the m best features are not the best m features (Peng, Long &
Ding, 2005). Usually, the Wrapper methods are slower, but perform better than the Filter methods.
51 Dimension reduction (feature selection) is one of the objectives of this research. Hence, the various methods and researches are reviewed in more details. Based on the removing strategies, the feature selection methods are categorized into three general groups, as follows:
a) Sequential Backward Selection
One of the methods for decreasing the number of features is Sequential Backward Selection (SBS) technique (El-Glaly & Quek, 2011). In this approach, features are deleted one by one, and the system performance is measured to determine the feature performance.
b) Genetic Algorithms
Another method for selecting a limited number of features out of a large set of features comprises the random search methods such as Genetic Algorithms (GA). GA keeps a set of the best answers in a population (Bahmani, Alamdar, Azmi & Haratizadeh, 2010).
Azmi, Pishgoo, Norozi, Koohzadi and Baesi (2010) used GA in a handwritten FOCR system. Their proposed system initially had 81 features with an accuracy of 77%. After applying GA, the number of features was reduced to 55 and the accuracy increased from 77% to 80%.
Kheyrkhah and Rahmanian (2007) employed GA to optimize the number of initial extracted features in a handwritten Farsi digit recognition system. They showed that not only all extracted features are not useful in classification, but they also reduce the recognition accuracy and increase the system learning time. They first selected a random subset of the initial features set, trained a Bayesian classifier with this features subset, and reported recognition errors. In the next stage, 50% of the previous features and 50% new
52 features made the a features subset, and recognition rate was computed and compared with the old results. Comparing these two recognition rates led to selection of some features.
This operation was repeated until appropriate results were achieved. Their system reduced the number of features from 48 to 30 and it increased recognition rate from 75% to 94%, but the elapsed time for this significant increase was significant.
c) Principal Component Analysis
The third method for feature selection operation is a group of statistical methods, such as Principal Component Analysis (PCA) (Song, Yang, Siadat & Pechenizkiy, 2013), and Random Projection (RP) (Van der Maaten, Postma, & Van Den Herik, 2009), that have been applied to find important patterns in high-dimension input data. PCA is a pre-processing step in recognition applications which involves converting a correlated features space to a new non-correlated features space. In the new space, features are reordered in decreasing variance value such that the first transformed feature accounts for the most variability in the data. PCA is based on the statistical representation of a random variable, and has been widely used in data analysis (Bouchareb, Hamidi & Bedda, 2008). A brief description of PCA technique has been explained in Appendix III. The discussion about these feature selection techniques is postponed to the end of this chapter.