Splitting technique initialization in local PCA

Journal of Computer Science, Jan, 2006 by Alok Sharma, Kuldip K. Paliwal, Godfrey C. Onwubolu

Abstract: The local Principal Component Analysis (PCA) reduces linearly redundant components that may present in higher dimensional space. It deploys an initial guess technique which can be utilized when the distribution of a given multivariate data is known to the user. The problem in initialization arises when the distribution is not known. This study explores a technique that can be easily integrated in the local PCA design and is efficient even when the given statistical distribution is unknown. The initialization using this proposed splitting technique not only splits and reproduces the mean vector but also the orientation of components in the subspace domain. This would ensure that all clusters are used in the design. The proposed integration with the reconstruction distance local PCA design enables easier data processing and more accurate representation of multivariate data. A comparative approach is undertaken to demonstrate the greater effectiveness of the proposed approach in terms of percentage error.

Key words: Local PCA, hybrid distance, vector quantization, splitting technique, VQPCA

INTRODUCTION

Dimension reduction methods are employed in statistical pattern classification problem to represent higher dimensional embeddings in a lower dimensional space by eliminating or removing the redundant components that may present in multivariate data so that the data loss is minimal. The interpretation of multivariate data or feature vectors becomes quite unmanageable when the dimension size is high. This severely increases the memory/storage requirements and augments the problems in pattern classification. It then becomes essential to express and understand a given high dimensional vector onto a parsimonious data space that best describes the feature vectors. The given multivariate data can be in the form of image, sound/speech, financial data or any statistical data. The given data depend upon several characteristics; for example, in face recognition, the classification of faces depends upon the location of eyes, width and height of nose/mouth, length of eyebrows, complexion etc. All these characters constitute as one vector of a given multivariate data.

The conventional technique for dimension reduction is PCA also known as Karhunen-Loeve technique (KLT) [1]. The objective of PCA is to reduce redundant dimensional components subject to minimal loss of information. It will find a global linear transform of a given data in the feature space. The linear transform gives several basis vectors. The first basis vector will be in the direction of maximum variance of the given data. The second basis vector will be mutually orthogonal to the first basis vector. Similarly the remaining basis vectors are mutually orthogonal to the previous basis vectors and in order, maximize the remaining variances subject to the orthogonal condition. The principal axes of PCA are those orthonormal axes onto which the remaining variances under projection are maximum. These orthonormal axes are given by the dominant eigenvectors i.e. those eigenvectors that corresponds to the largest associated eigenvalues. The obtained components in reduced dimensional space are optimal in minimum mean square error (MSE) sense.

Perceiving the constraints involved in PCA, researchers have extended the basic PCA model. Oja [2] introduced a simple linear neuron model for PCA with constrained Hebbian-type modification and derived unconstrained learning rules and showed how the neuron model extracts the one dimensional principal components. Several other neural network algorithms for PCA have also been developed [3-9]. Hastie [10] introduced principal curves and surfaces as the estimates of non-linear generalizations of linear one dimensional PCA technique and Tibshirani [11] presented an alternative definition of principal curves based on a mixture model. Tipping and Bishop [12] demonstrated how principal axes of a set of observed patterns are determined through maximum likelihood estimation. Xu [13], De la Torre and Black [14] and Koren and Carnel [15] suggested robust PCA model which can perform well under the presence of outliers. A nonlinear form of PCA [16], local linear PCA [17-19] and mixture of local PCA [20] have also been developed. In the local linear PCA approach a class is partitioned into several disjoint regions by vector quantization and then performs local PCA about each cluster. This local PCA approach is further extended by utilizing hybrid distance measure also referred to as reconstruction distance [17], which has been proved to be one of the optimal techniques in terms of producing low reconstruction error. Local PCA based on hybrid distance is a replacement of Euclidean distance, which has been proved to be a better distance measurement tool in local linear approach for the cluster separability [17] (Vector quantization is one of the approaches used in removal of redundant feature vectors subject to minimal loss of information. This technique separates a set of feature vectors into small and dense regions known as Voronoi regions [21] and estimates the feature vectors in the obtained regions by their corresponding mean, referred as codewords. The number of Voronoi regions in a given class is known as the levels of the quantizer. For further information on VQ refer [22,23]). This distance criteria is derived from the MSE of the system. The hybrid-distance local PCA also known as vector quantization principal component analysis (VQPCA) for reconstruction distance deploys an initial guess technique [23]. This initial guess technique is usually applied to the data when there is at least some information about the distribution since an arbitrary initialization could lead to poor performance. This means that for a known statistical distribution (not completely) the reproductive vectors or codewords are initially defined by manually placing them in the vicinity of the data. The number of codewords previously defined will not change during the process except the location of codewords, which will be updated until the best region or cluster is found. In most of the practical cases, statistical distribution is not known to the user which augments the problem of initialization of codewords. Furthermore, in VQPCA some of the codewords if not very carefully selected, end up being isolated having no samples associated to it. This restricts the performance of VQPCA and thus the model is strongly dependent on the selection of initial codewords.


 

BNET TalkbackShare your ideas and expertise on this topic

Please add your comment:

  1. You are currently: a Guest |
  2.  

Basic HTML tags that work in comments are: bold (<b></b>), italic (<i></i>), underline (<u></u>), and hyperlink (<a href></a)

advertisement
CXO UnpluggedSmart Business interviews on BNET

See and hear how senior level executives across the Asia Pacific are developing smart business ideas across a variety of sectors. The focus is on the future, and on how businesses need to evolve.

advertisement
  • Click Here
  • Click Here
  • Click Here
advertisement
Click Here

Content provided in partnership with Thompson Gale