Skip to main content

101 Machine Learning Algorithms for Data Science with Cheat Sheets

(This article was first published on R Programming - Data Science Blog | AI, ML, big data analytics , and kindly contributed to R-bloggers)

101 Machine Learning Algorithms for Data Science with Cheat Sheets

Think of this as the one-stop-shop/dictionary/directory for your machine learning algorithms. The algorithms have been sorted into 9 groups: Anomaly Detection, Association Rule Learning, Classification, Clustering, Dimensional Reduction, Ensemble, Neural Networks, Regression, Regularization. In this post, you’ll find 101 machine learning algorithms, including useful infographics to help you know when to use each one (if available).

101 Machine Learning Algorithms

Each of the accordian drop downs are embeddable if you want to take them with you. All you have to do is click the little ’embed’ button in the lower left hand corner and copy/paste the iframe. All we ask is you link back to this post.

By the way, if you have trouble with Medium/TDS, just throw your browser into incognito mode.

Classification Algorithms

Any of these classification algorithms can be used to build a model that predicts the outcome class for a given dataset. The datasets can come from a variety of domains. Depending upon the dimensionality of the dataset, the attribute types, sparsity, and missing values, etc., one algorithm might give better predictive accuracy than most others. Let’s briefly discuss these algorithms. (18)

Regression Analysis

Regression Analysis is a statistical method for examining the relationship between two or more variables. There are many different types of Regression analysis, of which a few algorithms can be found below. (20)

Neural Networks

A neural network is an artificial model based on the human brain. These systems learn tasks by example without being told any specific rules. (11)

Anomaly Detection

Also known as outlier detection, anomaly detection is used to find rare occurrences or suspicious events in your data. The outliers typically point to a problem or rare event. (5)

Dimensionality Reduction

With some problems, especially classification, there can be so many variables, or features, that it is difficult to visualize your data. Correlation amongst your features creates redundancies, and that’s where dimensionality reduction comes in. Dimensionality Reduction reduces the number of random variables you’re working with. (17)

Ensemble

Ensemble learning methods are meta-algorithms that combine several machine learning methods into a single predictive model to increase the overall performance. (11)

Clustering

In supervised learning, we know the labels of the data points and their distribution. However, the labels may not always be known. Clustering is the practice of assigning labels to unlabeled data using the patterns that exist in it. Clustering can either be semi parametric or probabilistic. (14)

Association Rule Analysis

Association rule analysis is a technique to uncover how items are associated with each other. (2)

Regularization

Regularization is used to prevent overfitting. Overfitting means the a machine learning algorithm has fit the data set too strongly such that it has a high accuracy in it but does not perform well on unseen data. (3)

Scikit-Learn Algorithm Cheat Sheet

First and foremost is the Scikit-Learn cheat sheet. If you click the image, you’ll be taken to the same graphic except it will be interactive. We suggest saving this site as it makes remembering the algorithms, and when best to use them, incredibly simple and easy.

101 Machine Learning Algorithms for Data Science with Cheat Sheets

SAS: The Machine Learning Algorithm Cheat Sheet

You can also find many of the same algorithms on SAS’s machine learning cheet sheet as the one above. The SAS website (click the pic) also gives great  descriptions about how, when, and why to use each algorithm.

101 Machine Learning Algorithms for Data Science with Cheat Sheets

Microsoft Azure Machine Learning: Algorithm Cheat Sheet

Microsoft Azure’s cheet sheet is the simplest cheet sheet by far. Even though it is simple, Microsoft was still able to pack a ton of information into it. Microsoft also made their algorithm sheet available to download.

101 Machine Learning Algorithms for Data Science with Cheat Sheets

There you have it, 101 machine learning algorithms with cheat sheets, descriptions, and tutorials! We hope you are able to make good use of this list. If there are any algorithms that you think should be added, go ahead and leave a comment with the algorithm and a link to a tutorial. Thanks!

101 Machine Learning Algorithms for Data Science with Cheat Sheets

Sources

To leave a comment for the author, please follow the link and comment on their blog: R Programming - Data Science Blog | AI, ML, big data analytics .

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...


from R-bloggers https://ift.tt/30w7ZIE
via IFTTT

Comments

  1. Thanks for sharing this valuable information and we collected some information from this blog.

    Machine learning in-house Corporate training in Nigeria

    ReplyDelete
  2. Please continue this great work and I look forward to more of your awesome posts.

    in-house training program in Nigeria

    ReplyDelete

Post a Comment

Popular posts from this blog

Using RStudio and LaTeX

(This article was first published on r – Experimental Behaviour , and kindly contributed to R-bloggers) This post will explain how to integrate RStudio and LaTeX, especially the inclusion of well-formatted tables and nice-looking graphs and figures produced in RStudio and imported to LaTeX. To follow along you will need RStudio, MS Excel and LaTeX. Using tikzdevice to insert R Graphs into LaTeX I am a very visual thinker. If I want to understand a concept I usually and subconsciously try to visualise it. Therefore, more my PhD I tried to transport a lot of empirical insights by means of  visualization . These range from histograms, or violin plots to show distributions, over bargraphs including error bars to compare means, to interaction- or conditional effects of regression models. For quite a while it was very tedious to include such graphs in LaTeX documents. I tried several ways, like saving them as pdf and then including them in LaTeX as pdf, or any other file ...

Explaining models with Triplot, part 1

[This article was first published on R in ResponsibleML on Medium , and kindly contributed to R-bloggers ]. (You can report issue about the content on this page here ) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. Explaining models with triplot, part 1 tl;dr Explaining black box models built on correlated features may prove difficult and provide misleading results. R package triplot , part of the DrWhy.AI project, is aiming at facilitating the process of explaining the importance of the whole group of variables, thus solving the problem of correlated features. Calculating the importance of explanatory variables is one of the main tasks of explainable artificial intelligence (XAI). There are a lot of tools at our disposal that helps us with that, like Feature Importance or Shapley values, to name a few. All these methods calculate individual feature importance for each variable separately. The problem arises when features used ...