Visualizing Principle Components for Images

[This article was first published on R – Hi! I am Nagdev, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Principle Component Analysis (PCA) is a great tool for a data analysis projects for a lot of reasons. If you have never heard of PCA, in simple words it does a linear transformation of your features using covariance or correlation. I will add a few links below if you want to know more about it. Some of the applications of PCA are dimensional reduction, feature analysis, data compression, anomaly detection, clustering and many more. The first time I learnt about PCA, it was not easy to understand and quite confusing. But, as I started to read about its applications in research papers, I started to get curious and try them all out. Now, I use it for most of my projects as a pre-processing step.

I recently added this topic to my data science curriculum as PCA has become relevant in data science today. The first time I taught this to my students, 90% of the class had a blank look on their face. Honestly, it was my own reflection. Then, I leaned towards demonstrative teaching rather than using slides and talking for an hour. This actually made it a lot easier to understand. I thought of sharing this example on my blog and help those in need.

For this example we will use this grey scale image as shown below. Also, I will try to keep R code used in this example as minimalistic as possible.

Step 1: Image processing

Load imager library, load the image and convert the image to row x column matrix grid.

Next, we will visualize our image using image function. A post on stack overflow helped me out on using image function the right way.

library(imager)

# load the image and look at the image properties
image = load.image("/cloud/project/bwimage.JPG")
image
# Image. Width: 282 pix Height: 220 pix Depth: 1 Colour channels: 3 

# convert image data to data frame
image_df = as.data.frame(image)

head(image_df)
# x y cc     value
# 1 1 1  1 0.9372549
# 2 2 1  1 0.9254902
# 3 3 1  1 0.9254902
# 4 4 1  1 0.9294118
# 5 5 1  1 0.9372549
# 6 6 1  1 0.9372549

# convert image into x and y grid using matrix function
image_mat = matrix(image_df$value, nrow = 220, ncol = 282, byrow = TRUE)

# visualize the image
image(t(apply(image_mat, 2, rev)), col=grey(seq(0,1,length=256)))

Step 2: PCA analysis

The next step is to load the matrix to principal component function to perform reconstruction. Scaling is very important for PCA. Since the image I used is grey scale, I have not scaled the data to keep it simple. Then we visualize principal components and identify that the first 5 contribute to the highest variance in the data as shown in the below image.

# pca analysis     
pca_model = prcomp(image_mat)

# plot the scree plot
plot(pca_model)

Step 3: Reconstruction and visualization

The final step is to visualize the reconstructed image for each of the components. Here, we will use alternating components from 1 to 9 and plot them on a grid to visualize PCA reconstruction.

To perform the reconstruction, we will first do a matrix multiplication of say, first PC and the transpose of rotation of the first component. This will generate a matrix resembling our image dimension. Finally, we will take this reconstructed data and plot an image.

To make this little more easier, I have put all the reconstruction and visualization into a function. Then loop through lappy to visualize the reconstructed images as shown below.

# Reconsturction and plotting
par(mfrow= c(3,3))
recon_fun = function(comp){
  recon = pca_model$x[, 1:comp] %*% t(pca_model$rotation[, 1:comp])
  image(t(apply(recon, 2, rev)), col=grey(seq(0,1,length=256)), main = paste0("Principle Components = ", comp))
}

# run reconstruction for 1:17 alternating components
lapply(seq(1,18, by = 2), recon_fun)

As we see in the above image, as we add more components for reconstruction, the image gets clearer. In real world application we could just store few components of the data as a representation of the image and reconstruct the image. We could also use this reconstructed image and feed it to neural network to enhance the quality of the image. Now, you know how dimensionality reduction works for images using PCA. This step by step demonstrative approach has definitely helped while teaching in my class and I wished if I was taught this way.

Below are some of the best tutorials on PCA out there.

I have written few jupyter notebooks on applications of PCA in anomaly detection and dimensionality reduction on my GitHub page. Feel free to check it out.

Thanks for stopping by and reading this article. Feel free to comment below and share this article with your colleagues. Also, check out my other articles.

The post Visualizing Principle Components for Images appeared first on Hi! I am Nagdev.

To leave a comment for the author, please follow the link and comment on their blog: R – Hi! I am Nagdev.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

from R-bloggers https://ift.tt/3gdkfWF
via IFTTT

Explaining models with Triplot, part 1

[This article was first published on R in ResponsibleML on Medium , and kindly contributed to R-bloggers ]. (You can report issue about the content on this page here ) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. Explaining models with triplot, part 1 tl;dr Explaining black box models built on correlated features may prove difficult and provide misleading results. R package triplot , part of the DrWhy.AI project, is aiming at facilitating the process of explaining the importance of the whole group of variables, thus solving the problem of correlated features. Calculating the importance of explanatory variables is one of the main tasks of explainable artificial intelligence (XAI). There are a lot of tools at our disposal that helps us with that, like Feature Importance or Shapley values, to name a few. All these methods calculate individual feature importance for each variable separately. The problem arises when features used ...

DataScience4you2me

Search This Blog