Principal Components Analysis (PCA)
Table of contents
Introduction
Principal Components Analysis (PCA) is a dimensionality-reduction algorithm that projects spectra onto orthogonal components that capture major variance patterns. In SpectraGuru, PCA helps users inspect sample separation, explained variance, and loading patterns.
How to use
- Upload data and finish preprocessing if needed.
- Open Analytics Page.
- In Select Analytics Plot, choose Principal Components Analysis (PCA).
- Select Select Horizontal PC and Select Vertical PC to choose the scatter plot axes.
- Toggle Coloring by setting labels if label data from Data Upload should color the points.
Behavior
SpectraGuru displays a two-dimensional PCA score plot, a cumulative explained variance plot, a loading plot for the first three components, and a table of principal component values. When labels are available and label coloring is enabled, samples are colored by their uploaded labels.
Method
SpectraGuru scales the spectra before PCA:
\[Z=\frac{X-\mu}{\sigma}\]The displayed scores are projections onto principal component directions:
\[T=ZW\]| Parameter | Tunable or fixed | Implementation |
|---|---|---|
| Select Horizontal PC | Tunable | Principal component shown on the x-axis |
| Select Vertical PC | Tunable | Principal component shown on the y-axis |
| Coloring by setting labels | Tunable | Uses label data from Data Upload when enabled |
| Scaling | Fixed | Standard scaling before PCA |
| PCA implementation | Fixed | sklearn.decomposition.PCA |
References
- Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., … Duchesnay, E. (2011). Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12, 2825-2830. https://doi.org/10.48550/arXiv.1201.0490
- Scikit-learn Developers.
PCA. https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html