Colored Pixels: 2015

Monday, December 21, 2015

Expectation-Maximization Algorithm

Data: Problems, Challenges, and the EM Algorithm

A common challenge encountered in machine learning and pattern recognition occurs when the observed data is incomplete or the distribution from which the observed data was generated (or drawn) is: a) unknown but parameters can be estimated or b) can be modeled as a mixture of PDFs but its parameters are not known. In both cases reasonable estimates for the mean and variances can be made even if some of the data is missing.

The Expectation-Maximization (EM) algorithm is ideally suited in situations where the available data is incomplete. Coined and explained in (Dempster, et. al, 1997), the moniker comes from its iterative two-step process called expectation (E) and maximization (M), although the use of the algorithm has been recorded as early as 1950, applied to gene frequency estimation.

In the expectation step, current estimates of the parameters (initially, a guess) are used to generate the "expected" observed data by drawing from the unobserved, complete set. The maximization step computes new estimates of the parameters by maximizing the expected log-likelihood of the data (generated in the E-step). These new estimates are then used as inputs to the E-steps and so on and so forth. The process continues until convergence, i.e. there is no significant change in the parameter estimates or until a set number of iterations is reached.

Exact expressions for both steps are indicated in the next sections.

Mixture Models

An unknown or arbitrary distribution p(x) can be modeled as a linear combination of PDFs p(x|j) in the form

(Introduction to Pattern Recognition, p. 11)

with the condition that the sum of all prior probabilities is equal to 1 for sufficiently large J. In most cases, p(x|j) are modeled as Gaussians N(m_j, s_j), j = 1, 2, ... J. The resulting PDF is multi-modal, i.e. many peaks.

Shown below are 500 data points using a mixture model of 3 Gaussian PDFs. For easy visualization, we have color-coded the points into three classes, although the groupings are generally unknown. Clustering algorithms may be easily adopted to classify them, otherwise visual inspection should suffice for now.

Data points drawn from a mixture of 3 Gaussian PDFs

Expectation Step

From the observed data, we can compute the joint PDF p(x|j, theta) with the prior probabilities P_jand the assumption that it is generated from a Gaussian mixture.

E-step: Data points are generated using parameter estimates. The expected log-likelihood is then computed
(Pattern Recognition, p. 46)

E-step: exact expression for the E-step log-likelihood
(Pattern Recognition, p. 47)

Shown above are exact expressions of the log-likelihood for the E-step. Because most components of the log-likelihood are expressed as exponential functions through the use of a Gaussian mixture model, we arrive at a form that can be easily evaluated. x_k is the k-th generated data point. The unknown parameters are the mean (u_j), variance s_j (sigma), and P_j. j is the number of PDFs.

Maximization Step

From the E-step, we obtained an exact expression for the log-likelihood function. These are maximized for each unknown parameter. This is done by computing for the gradient of the log-likelihood with respect to each parameter then equating it to zero, i.e. finding the maxima at the critical point.

M-step: maximization of each unknown parameter
(Pattern Recognition, p. 47)

The expressions P(j|x_k; Theta(t)) (capital/big Theta) is evaluated using

M-step: expressions for P(j|x_k; theta(t)) and p(j|x_k; theta(t))
(Pattern Recognition, p. 47)

p(x_k|j; theta(t)) (small theta) is simply the value of Gaussian pdf at x_kusing the current estimates theta(t) as parameters. The algorithm is initiated by providing valid estimates for the prior probabilities P_j(t = 0). It is valid only if it all adds up to 1.

Demonstration

For this demonstration, we have converted Matlab^(R) implementations of the algorithms (mixture models, EM) into R instead of creating our own versions of the code since we are concerned only with the exposition of the technique. Links to our conversions of the codes are provided in the reference section below. I have used examples from Introduction to Pattern Recognition: A Matlab Approach book. Further investigations may be done by adjusting the initial estimates as well as the tolerance.

Important parameters for the mixture above are listed below. 500 data points are distributed among three two-dimensional (2D) PDFs with apriori probabilities of 0.4, 0.4, and 0.2 respectively. We shall estimate all parameters (mean, S, P, N) using the EM algorithm.

Mixture Parameters

Parameter	X₁	X₂	X₃
mean₁	1	3	2
mean₂	1	3	6
S	0.1	0.2	0.3
P (apriori)	0.4	0.4	0.2
N	500

Results

Initial guess

Parameter	X₁	X₂	X₃
mean₁	0	5	5
mean₂	2	2	5
S	0.15	0.27	0.40
P (apriori)	0.33 (1/3)	0.33 (1/3)	0.33 (1/3)
tolerance	10^-5

Although we have fixed our initial guesses for the mean here, we may choose random values. We used a tolerance of 10^-5, although, it may be set to a lower value at the expense of computational time (# of iterations). The prior probabilities are set equally among the three sets of data points.

Final estimates

Parameter	X₁	X₂	X₃
mean₁	0.9749602	2.9819268	2.0322002
mean₂	0.9800478	2.9859954	5.9081155
S	0.1012807	0.2139030	0.2915451
P (apriori)	0.399999	0.395992	0.204009
error	6.925152^-7
iterations	16

The algorithm was able to provide a reasonable estimate (error ~ 10^-7) of the parameters in 16 iterations. We have attained mean, variance (S), and P estimates from the data that agrees closely with the parameters used to generate the 500 data points (from a Gaussian mixture model).

error vs. iteration

regions within 2-sigma about the mean

In the above figure, we have highlighted regions within 2 sigma (square root of the variance) about the mean. Theoretically, these regions should encompass ~ 95% of all the points.

Poor initial estimates of parameters

It must be noted that the EM algorithm is highly sensitive to the validity or goodness of the initial estimates for the mean, variances, and prior probabilities. For example, using the initial estimates below, we ran the EM algorithm again.

Sample poor initial estimates

Parameter	X₁	X₂	X₃
mean₁	1.6	1.4	1.3
mean₂	1.4	1.6	1.5
S	0.2	0.4	0.3
P (apriori)	0.2	0.4	0.4
tolerance	10^-5

Not only did it not converge after 1000 iterations for the same tolerance (10^-5), its final estimates after a hard/fixed termination, are mostly incorrect.

Parameter	X₁	X₂	X₃
mean₁	0.5085580	2.6059821	0.9592023
mean₂	0.9145188	3.8784266	0.9630420
S	0.00100000	1.42680773	0.09210382
P (apriori)	1.539032e-17	6.231477e-01	3.768523e-01
error	0.0003108576
iterations	1000

error after 1000 iterations

Incorrect parameter estimates

Shown above are the incorrect estimates of pdf parameters due to bad initial estimates. Notice that the 95% region of the red pdf is neglible, i.e. it is nowhere to be found in the plot above. The green pdf parameter estimates of the mean and variance are off. It is totally incorrect for the blue pdf, i.e. mean and variances encompasses the red pdf.

R-script used to test EM-Algorithm


# Mixture Models -----------------------------



# set initial random number generator seed

set.seed(0)



# generate mean for three 2D Gaussian PDFs

m = array(0, c(2,3))

m[,1] = c(1,1)

m[,2] = c(3,3)

m[,3] = c(2,6)



# ... and distribution variances (co-variances), each a 2x2 matrix

S = array(0,c(2,2,3))

S[,,1] = 0.1*diag(2)

S[,,2] = 0.2*diag(2)

S[,,3] = 0.3*diag(2)



# set prior probabilities for each PDF

P = c(0.4, 0.4, 0.2)



# distribute among 500 data points and initialize random number generator seed

N = 500

sed = 0



# generate mixture model

data = mixt_model(m, S, P, N, sed)



# plot mixture and their corresponding mean centroids

plot_data(data$X, data$y, m, 1)



# EM Algorithm -----------------------------



# set initial estimate of the mean

m_ini = array(0, c(2,3))

m_ini[,1] = c(0, 2)

m_ini[,2] = c(5, 2)

m_ini[,3] = c(5, 5)



# set initial estimate of the variances

s_ini = c(0.15, 0.27, 0.4)



# set initial apriori probability estimates

Pa_ini = c(1/3, 1/3, 1/3)



# set algorithm tolerance level

e_min = 10^(-5)



# generate estimates using EM algorithm

em_alg_function(data$X, m_ini, s_ini, Pa_ini, e_min)

References

A. P. Dempster, N. M. Laird, D. B. Rubin, "Maximum Likelihood from Incomplete Data via the EM Algorithm", Journal of the Royal Statistical Society. Series B (Methodological), Vol. 39, No. 1. (1977), pp. 1-38.

B D Chuong and B Serafim, "What is the expectation maximization algorithm?", Nature Biotechnology, Vol. 26, No. 1 (2008), pp. 897-899

S Theoridis and K Koutroumbas, Pattern Recognition. 4th Ed. United Kingdom: Academic Press, 2009

S. Theododis and K. Koutroumbas, "Introduction to Pattern Recognition: A Matlab Approach", United Kingdom: Academic Press, 2010

https://github.com/daelsepara/pixelprocessing/tree/master/Feature (see: plot_data, gauss, mixt_model, and em_alg_function)

Series of videos explaining the EM-Algorithm: https://www.youtube.com/playlist?list=PLD0F06AA0D2E8FFBA (see videos 16.3-16.10)

Wikipedia contributors, "Standard deviation," Wikipedia, The Free Encyclopedia, https://en.wikipedia.org/wiki/Standard_deviation

Thursday, December 17, 2015

Classes Apart

Curse of Dimensionality

A usual problem associated with pattern recognition is the so-called curse of dimensionality. A increased number of features being considered for the classification task exponentially increases the required number of needed samples. This is a problem because of it increases the required computational complexity in order to maintain an acceptable classifier performance. Oftentimes, there is not much gain especially if samples can be classified into classes based on a smaller number of features.

This activity focuses on demonstrating how class separability measures are utilized in selecting and reducing the number of features and/or otherwise measuring the performance of classifier. It is assumed that we already have a classifier and we only need to measure its performance.

Class Separability Measures (Scatter Matrices)

In order to select features that allow maximal classification, it is useful to be able to transform data according to some optimality criterion in order to measure or discriminate which features maximizes classification performance. In general this involves selecting features that maximizes between-class separation among the sample data, while minimizing within-class separation.

This is illustrated in the figure below (modified figure 5.5 in Pattern Recognition book)

(Pattern Recognition, p. 282)

In the above figure, we have samples classified using two features into three classes. In (a) the small within-class separation can be seen in how points cluster about the three classes. Although visual inspection seem to indicate that the samples were classified well even with a small between-class separation, there is the possibility that any outlier from the three classes will be incorrectly classified. In (b), the classification is even more problematic because of the large within-class and small between-class separation. This may indicate that the features selected for the classifier are not enough or do not discriminate well. Finally in (c), this is the best case scenario where there is a large between-class separation while maintaining the small within-class. This indicates a higher performing classifier than is demonstrated by (a) or (b). Any outlier in any of the three classes may classed as an exception.

Scatter Matrices

The difficulty in using class separability measures is that they are not easily computed unless a Gaussian assumption is employed, i.e. data is normally distributed or may be modeled using a Gaussian distribution. A simpler criteria is available using scatter matrices.

(Pattern Recognition, p. 280)

(Pattern Recognition, p. 281)

where P_i ~ n_i/N is the prior probability that the sample belongs to class i. n_i is the number of samples out of the total N. u_iis the mean vector of all samples belonging to class i. The results are expressed matrices because of the multiplication between the feature vectors and/or the mean vector.

From these matrices, another can be derived, the mixture matrix S_m, as well as the separability measures J₁, J₂, and J₃.

(Pattern Recogntion, p. 281)

From the form of these matrices, we can see that it computable for any number of features as well as any number of classes and samples. It is also computable when the number of samples between classes is not equal, though this is generally not ideal.

The trace function is the sum of the matrix' diagonal, while the |x| computes the determinant of the matrix x. trace{S_b} is a measure of the average distance of the mean over all classes from the global. trace{S_w} is the average (over all classes) of the variance of the features. Finally, the trace of S_m is the sum of variances of the features around their respective global mean.

J₁values are large when samples in the l-dimensional feature space are well clustered around their mean, within each class, and the clusters of the different classes are well separated. Large values of J₁ also correspond to the large values of the criterion J₂ and J₃are also invariant under linear transformations.

Demonstration

We now follow this discussion with a demonstration

A. large between-class, small within-class

Characteristics

Property	X₁	X₂	X₃
mean (feature 1, 2)	2.5, 7.5	7.5, 7.5	5.0, 2.5
Standard Deviation (feature 1, 2)	0.25, 0.25	0.50, 0.50	0.75, 0.75

S_b

4.1845548	-0.1811118
-0.1811118	5.9490106

S_w

0.92267549	0.02379992
0.02379992	0.94401401

Separability measures

Measure	Value
J₁	6.428629
J₂	40.41522
J₃	12.85402
trace{S_w}	1.86669
trace{S_b}	10.13357

B. small between-class, large within-class

Characteristics

Property X₁ X₂ X₃

mean (feature 1, 2) 3.5, 7.5 6.5, 7.5 5.0, 3.5

Standard Deviation (feature 1, 2) 0.75, 0.75 0.65, 0.65 0.80, 0.80

S_b

1.47388079 0.06511365

0.06511365 3.80642500

S_w

1.3612369 0.0512204

0.0512204 1.3673277

Separability measures

Measure Value

J₁ 2.935195

J₂ 7.884645

J₃ 5.868463

trace{S_w} 2.728565

trace{S_b} 5.280306

Property	X₁	X₂	X₃
mean (feature 1, 2)	3.5, 7.5	6.5, 7.5	5.0, 3.5
Standard Deviation (feature 1, 2)	0.75, 0.75	0.65, 0.65	0.80, 0.80

Measure	Value
J₁	2.935195
J₂	7.884645
J₃	5.868463
trace{S_w}	2.728565
trace{S_b}	5.280306

C. large between class, small within-class

Characteristics

Property

X₁

X₂

X₃

mean (feature 1, 2)

1.5, 7.5

8.5, 7.5

5.0, 1.5

Standard Deviation (feature 1, 2)

0.50, 0.50

0.20, 0.25

0.05, 0.05

S_b

8.296655045

0.009157317

0.009157317

8.022083912

S_w

0.055378282

-0.005966965

-0.005966965

0.076863988

Separability measures

Measure

Value

J₁

124.4003

J₂

16025.31

J₃

258.3551

trace{S_w}

0.1322423

trace{S_b}

16.31874

Property	X₁	X₂	X₃
mean (feature 1, 2)	1.5, 7.5	8.5, 7.5	5.0, 1.5
Standard Deviation (feature 1, 2)	0.50, 0.50	0.20, 0.25	0.05, 0.05

Measure	Value
J₁	124.4003
J₂	16025.31
J₃	258.3551
trace{S_w}	0.1322423
trace{S_b}	16.31874

To simplify this demonstration, we used only two features and only 100 samples for each class. Variances used in the Gaussian distribution are kept equal between features in each class,

As demonstrated in the three cases above, the separability measures (J₁, J₂, and J₃) are large if there is a good separation between classes (S_b) and the within class variance (S_w) is small. Between (A) and (B), we see the drop in the measurements. While the best case scenario (C), have shown significant gains.

Receiver Operating Characteristics Curve (ROC Curve)

Another useful tool in gauging the performance of the classifier as well as feature selection is through the use of the Receiver Operating Characteristic Curve (ROC curve). In the simplest case, it measures the performance of a binary classifier (i.e. two output classifications).

(Pattern Recognition, p. 275)

In the example above (Figure 5.3 of Pattern Recognition, 2009), are two overlapping distributions, with the second distribution inverted for easier visualization. The vertical line separating the overlapped regions representing the classifier threshold.

The ROC measures the performance of the classifier by comparing the number of those correctly classified compared to those incorrectly classified, with respect to the threshold. Each point compares the number of correct classifactions (probability 1-B) at threshold a. If there is minimal overlap, i.e. the classifier correctly classifies the majority of the sample, it is possible to measure a larger shaded area in the ROC curve. Thus the curve will approach the corner (a = 0, 1-B = 1). Consequently, if the classifier were poor, i.e. more overlapping regions, the area under the curve is reduced.

In case the there are more than two classes being compared, the comparison is usually one to all, i.e. compare one class to the all of the other classes.

Demonstration

We generated two sets of samples with 1000 data points each, both from the Gaussian distribution, and with these characteristics

Property

X₁

X₂

mean

0.40

0.50

Standard Deviation
0.05

0.05

Property	X₁	X₂
mean	0.40	0.50
Standard Deviation	0.05	0.05

Shown above is the histogram of the sample data points for two classes, red and blue. To construct the ROC curve, we count the number of correct classifications of the blue curve with respect to the threshold.

At each threshold value a, we count the total number of those classified as blue and divide it by the number of samples (1000). We also count the number of samples falsely classified, i.e. toal number of red samples above threshold.

For example, at a threshold value of a = 0.5, the number of correct classifications (blue) is 513 while the number of red samples incorrectly classified as blue is 14. Dividing both by 1000 (number of samples per class; may be different), we obtain a coordinate value of (0.014, 0.513) at threshold a = 0.5.

ROC curve for correct classifcations into blue class as a function of threshold a

The full ROC curve representing the thresholds 0.0 <= a <= 1.0 at 0.1 intervals is shown above. The curve can be smoothened by increasing the number intervals in the a. If we increase the separation further, we should see an improvement in the ROC curve.

classifier outputs with no overlaps

ROC curve for classifier outputs with no overlaps

If the classifier outputs show no overlap, as shown in the histogram above, we see a vastly improved ROC curve completely encompassing the area above the threshold line. We also increased the number of intervals for a to 100.

classifier outputs with a significant amount of overlap

If we increase the overlap further between the distributions, we see a reduced area under the ROC curve.

References

S Theoridis and K Koutroumbas. Pattern Recognition. 4th Ed. United Kingdom: Academic Press, 2009

https://github.com/daelsepara/pixelprocessing/blob/master/Feature/class-separability.R

http://www.dataschool.io/roc-curves-and-auc-explained/

Monday, December 7, 2015

Face Time

Color: Better late than never

For a blog boasting the title, Colored Pixels, we are curiously bereft of colorful examples. None of the previous activities discussed here have dealt exclusively with color; instead, all of our image processing techniques thus far, have been applied to grey scale images. In this activity, we will now focus our attention to manipulating colored digital images.

Color is a property of an object that is visually perceived by a human and is classified according to categories like red, blue, yellow, etc. This happens when a spectrum of light (from an illuminated object) interacts with the receptor cells in the eye. In objects and materials, color categories are also associated with physical properties such as light absorption, reflection, or emission spectra. Human vision is considered trichromatic, i.e. a minimum of three primary colors combined in different proportions allow all (or a very wide range ~ 10 million) of possible colors to be represented. However, due to the variations in the spectral sensitivities of these receptors among humans, color perception may be a subjective process: different persons perceive the same illuminated object differently.

Colored Pixels

Unlike texture, which is the property of an area, color in digital images is a property of a pixel. In order to quantify color, color spaces are often used. Colors can be organized according to some criteria and represented in a graph or a plot (2D or 3D). This is done so that a specific color can be referenced as a coordinate to the graph or plot representation. Often these coordinates are triples (e.g. RGB), or quadruples (e.g. CMYK). Examples of color spaces are CIELAB and CIEXYZ.

Comparison of some RGB and CMYK colour gamuts

Early attempts at using color as a metric for scientific and practical applications have struggled because captured color (film, or digital) is very UNSTABLE. It is highly dependent on the material (e.g. surface reflectivity), the environment (e.g. lighting condition), as well as the properties of the capture device (e.g. camera sensitivity).

Effects of Ligting Condition to the Captured Image

Selfie for Science

In this activity we learn the basics of face detection using color image processing techniques.

You may have noticed while operating your digital camera how a free-moving rectangle outline manages to zoom onto a face's location and often perform centering and auto-focusing on that face. This is an ubiquitous application of color image processing techniques. Another application that already made its way into the market include is face recognition technology. Hardware implementations have been integrated in most modern notebook computers, as well as security appliances.

We shall begin our study of face detection with a simple model of the skin. Using sample regions from the forehead and the cheek, we computed for the normalized chromaticity coordinates (NCC) representation of each sample.

Normalized Chromaticity Coordinates (NCC)

This is done by normalizing 2 of the color channels, e.g. R, G, over the total for all channels I to obtain the coordinate representations in the rg space. The third coordinate b is optional because it can be derived from the other two.

(a)

(b)

(c)

Normalized chromaticity coordinates for (a) full image, (b) forehead, (c) left cheek sample

As shown in the results above, the pixels corresponding to the forehead and left cheek regions are localized in the NCC space. This region is the skin locus. With these samples we are able to detect the face in the image. This can be done by setting the pixels which does not correspond to the skin regions to zero, or by histogram backprojection.

Face detection: histogram backprojection using NCC of skin samples

Skin locus under different lighting conditions

It now remains to demonstrate the effects of lighting condition on skin locus. If possible, we shall also determine the challenges affecting the implementing this in a more real-time setting. We took a 19 second video where we walk through an area of varying lighting conditions. Shown here are 9 still images from the video. We computed for the skin locus of the forehead region and plotted them into NCC space.

Skin locus under different lighting conditions

Even under varying lighting conditions, the skin locus is remarkably robust, being localized in NCC space. Using the skin locus, and blob detection on the color histogram backprojected image, it is possible to track the face (or any object with a similar skin locus profile). It is possible to model the upper and lower boundaries of this curve using two polynomials. This is useful because we may not need to apply explicit auto-white balancing or sample other frames under different lighting conditions as we have done here. The computations are quite straightforward and easily implemented.

References

Wikipedia contributors, Color, Wikipedia, The Free Encyclopedia, https://en.wikipedia.org/wiki/Color

Rolf Kuehni (2010) Color spaces. Scholarpedia, 5(3):9606

Maricor Soriano, Birgitta Martinkauppi, Sami Huovinen, Mika Laaksonen, Adaptive skin color modeling using the skin locus for selecting training pixels, Pattern Recognition, Volume 36, Issue 3, March 2003, Pages 681-690, ISSN 0031-3203, http://dx.doi.org/10.1016/S0031-3203(02)00089-4

Soriano, M (2015), Color Features, lecture notes distributed in Physics 301 - Special Topics in Experimental Physics (Advanced Signal and Image Processing) at National Institute of Physics, University of Philippines Diliman, Quezon City on 12 September 2015.

Friday, September 18, 2015

Local Binary Patterns as Texture Descriptors

Textures

In image processing, textures refer to a set of calculated metrics designed to quantify the features of area or a specific region. Often, these contain information describing the spatial arrangement and variation of color or intensities in an image (or in a selected part). They can be artificially created or are observed in captured images of natural scenes. They can be random or stochastic or periodic, especially in the case of artificially created textures. Because it is a feature of an area or a spatial variation, a pixel does not have any texture.

Texture analysis is used in many image segmentation and classification research. It has many industrial applications as well, such as surface inspection, remote-sensing, and bio-medical image analysis.

There are two approaches when analysing textures: structured and statistical. In the structured approach, textures are considered as containing repeated occurrences of primitive texture elements called texels. The statistical approach treats textures as a quantitative measure describing the arrangement of intensities in a region. In general the statistical approach is more widely used.

Local Binary Pattern

There are lots of ways to statistically analyze and extract textures. This is a challenging task because often, real world textures are often not uniform due to variations in scale, orientation, or other visual features. At the minimum, texture extraction methods must be gray-scale invariant such that uneven illumination does not cause great variability in the classification.

The local binary pattern (LBP) operator was developed in order to allow two-dimensional surface textures to be described by two complementary measures: local spatial patterns and gray scale contrast.

In the original implementation by Ojala et al (1996), the LBP encodes the spatial variation around the 3x3 neighborhood surrounding the pixel. Each surrounding pixel also has a weight, corresponding to an exponent of 2. A threshold is applied to all the neighboring pixels. Neighbors with grey values greater than that of the center are set to 1, 0 otherwise. The LBP value of the pixel is then the sum of the 2^Pvalues of the pixels that are equal to 1 after the threshold was applied. Contrast can also be computed as a ratio between pixels having a grey value larger than that of the center pixel. One advantage of the LBP is that it is not computationally expensive and very simple to implement.

The LBP values then map to 256 values (0-255) and can be used for comparison or as part of the classification process. This operator was also extended by Ojala et. al in 2002 to handle different neighborhood sizes.

Textures (MIT Visual Textures database)

We selected 5 samples from 3 categories of textures in the analysis. These texture data sets are available at MIT Vision Texture Database. We've computed the LBP and the corresponding thresholds of each of the 15 images. Here are the results

Bark

Barks 0001 and 0002 have very similar histogram plots, as well as their LBP patterns. They are almost indistinguishable. There are some gaps in the histogram may indicate that they are reducible. In the next section we will perform the histogram reduction based on a enhanced LBP operator.

Brick

The same thing is observed here in Brick 000 and 0001 having very similar histogram profiles as well as the near indistinguishable LBP patterns. Brick 0003 is just Brick 0002 enlarged at a specific part. The peaks in the histogram of 0002 and 0003 are almost correlated except that changes in magnitude are very apparent especially the LBP values between 112 and 148.

Fabric

Fabric 0000 and 0001, as well as 0002 and 0003 are almost identical except 0001 and 0003 have different orientations compared to 0000 and 0002 (respectively). Histograms of 0000 and 0001 are nearly identical and indistinguishable. While in 0002 and 0003 the changes in the frequencies are much more pronounced (between LBP values of 0 and 52). Meanwhile Fabric 0004 is completely different from the other four.

Histogram reduction via LBP_P,R^rui2 operator

There are rotational symmetries LBP values (or LBP neighborhood set) . These are related to the number of transitions int the binary digits of the LBP, i.e. 0 to 1 and vice-versa. Thus patterns of 0000000 have 0 transitions, 0000001 and 11000000 have 2 transitions. A pattern is uniform if it contains at most 2 transitions.

Extending the LBP operator further, we obtain

where U(LBP_P,R) computes the number of binary transitions. riu2 in LBP_P,R^riu2

implies only uniform patterns are uniquely labeled. The rest of non-uniform patterns are labeled P+1. Thus the histogram contains at most P+2 values (including 0), where uniform patterns are labelled 0 to P, while non-uniform patterns are labelled P+1.

Reduced histograms

Overall, the reduced histograms are easier to compare. Bark 0001 and 0002 are clearly shown to have very similar if not identical textures. Bark 0000 and 0003 are also comparable to some extent.

Bricks 0000 and 0001 have nearly identical histograms. 0002 and 0003 have somewhat similar histograms, i.e. some features are found in both histograms. However there is a pronounced dip in the LBP_P,R^riu2 = 6 peak. Is this the effect of scale? Possibly. When you zoom in or magnify a region within a rough texture, it becomes smoother.

Finally, for the reduced histograms of the fabric textures, the slight differences in the orientation of each texture did not contribute to very significant changes in the reduced histogram.

Summary

We have investigated the use of local binary patterns as texture descriptors. The LBP operator is computationally inexpensive and we were able to demonstrate their use in classifying different classes of textures.

References

Matti Pietikäinen (2010) Local Binary Patterns. Scholarpedia, 5(3):9775.

Wikipedia contributors, Image Texture, Wikipedia, The Free Encyclopedia, https://en.wikipedia.org/wiki/Image_texture

Wikipedia contributors, Local binary patterns, Wikipedia, The Free Encyclopedia, https://en.wikipedia.org/wiki/Local_binary_patterns

Timo Ojala, Matti PietikaÈinen, Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 24, NO. 7, July 2002

Soriano, M (2015), Texture Features, lecture notes distributed in Physics 301 - Special Topics in Experimental Physics (Advanced Signal and Image Processing) at National Institute of Physics, University of Philippines Diliman, Quezon City on 3 September 2015.

Monday, December 21, 2015

Expectation-Maximization Algorithm

Data: Problems, Challenges, and the EM Algorithm

Mixture Models

Expectation Step

Maximization Step

Demonstration

Mixture Parameters

Results

Initial guess

Final estimates

Poor initial estimates of parameters

Sample poor initial estimates

R-script used to test EM-Algorithm

References

Thursday, December 17, 2015

Classes Apart

Curse of Dimensionality

Class Separability Measures (Scatter Matrices)

Scatter Matrices

Demonstration

A. large between-class, small within-class

B. small between-class, large within-class

C. large between class, small within-class

Receiver Operating Characteristics Curve (ROC Curve)

Demonstration

Property X1 X2 mean 0.40 0.50 Standard Deviation 0.05 0.05

References

Monday, December 7, 2015

Face Time

Color: Better late than never

Colored Pixels

Selfie for Science

Skin locus under different lighting conditions

References

Friday, September 18, 2015

Local Binary Patterns as Texture Descriptors

Textures

Local Binary Pattern

Textures (MIT Visual Textures database)

We selected 5 samples from 3 categories of textures in the analysis. These texture data sets are available at MIT Vision Texture Database. We've computed the LBP and the corresponding thresholds of each of the 15 images. Here are the results

Bark

Brick

Fabric

Histogram reduction via LBPP,Rrui2 operator

Reduced histograms

Finally, for the reduced histograms of the fabric textures, the slight differences in the orientation of each texture did not contribute to very significant changes in the reduced histogram.

Summary

References

Property

X₁

X₂

mean

0.40

0.50

Standard Deviation
0.05

0.05

Histogram reduction via LBP_P,R^rui2 operator