Note: The blog is intended for beginners to Machine Learning. This is a reasonably long blog with many concepts that you might not have heard. The blog will be textual with no images.
It is getting difficult to work in Information Technology (IT) without the understanding of Machine Learning (ML). We have ended up with a unique situation. Five generations (Generation Z, Generation Y or Millennials, Generation X, Baby Boomers, & Traditionalists) are trying to learn machine learning and there is plethora of content available if we google. I often get asked the question, what is the best way to learn ML. I don't think there is a single best way to learn. There are many paths available and based on the situations, people might prefer in choosing the path. Connecting the dots across concepts is one of the toughest job in any learning.
Our educational system teaches us bottom up. That means, learn basics first and then build upon it. Industry might prefer the top down. Start applying the concept with the help of a mentor or senior and on the way brush up your basics and fill your gaps. Others might prefer the meet in the middle approach. Learn bare minimum fundamentals and apply the concepts in real world problems. Enrich your basics as and when you apply. Words like basics, bare minimum fundamentals, etc are subjective. What is basic for some might be advanced for others.
How do we go about learning ML? To keep it simple, we will define two stages for each of the below identified learning topics. Basic and Intermediate or Advanced. The journey is as follows.
- Stats, math, & probability
- Linear algebra
- Machine learning - Shallow learning
- Machine learning - Deep learning
There is plenty of materials available for the above topics. Many of them are free. What is right amount of information for you depends on your end goal of learning. The end goal could be
- A salesman trying to sell a product and wants to be comfortable with ML concepts if questions arises in how his product incorporates ML.
- A student who is trying to attend an interview and wants to impress the interviewer with his ML understanding.
- An employee working in IT trying to switch his role.
- A researcher trying to improve the field of machine learning.
- An executive who is trying to expand his business horizons.
- An architect who is trying to solution in ML space.
Defining a single path of learning for the varying end goals are difficult. I would recommend taking a look at "Proficiency in Data Science Skills" by Job Role.
The materials available for learning are in different forms. Some forms of learning are appealing to a set of people, but not for others. For example, some prefer to read a book as opposed to attending a course. Some prefer hands on over theory. Similar to retail, the learners might prefer click and mortar or brick and mortar. Not sure if you got the point, in other words, you can learn online or attend colleges to gain insights into the field of ML. Online learning is cheaper than attending college campuses or other organizations that teach us. Some prefer the physical connect over virtual meetups, some prefer free over paid.
The materials available for learning can be broadly grouped into
- Text books, eBooks, freeBooks, etc.
- Massive Open Online Course (MOOC) like coursera, edX, udacity, udemy, bigdatauniversity, nptel, etc.
- Blogs on specific topics by individuals.
- Materials created by individuals which are paid but are cheaper than few other options. Content by Jason Brownlee.
- University websites that provide free materials like Stanford, MIT, etc.
- Video channels like youtube.
- Non-profit educational organization like Khan Academy.
- Websites of companies like Intel, sas, NVIDIA, IBM, MapR, Cloudera, Horton Works, etc sharing materials in ML.
- Websites of frameworks like Apache Spark, scikit-learn, H2O.ai, etc.
- Discussion forums which brings in different perspectives like Quora, Stack Overflow, etc.
- Competition platforms like Kaggle
The learns experience can differ while consuming materials from these sources. The experience differs based on
- Is it free versus paid?
- Is it accessible all the time?
- Can I get a certification at the end of my learning?
- Is it specific to a programming language or framework?
- Does it provide an overall road map?
- Does it provide examples for me to try out?
- Will the experience be different while using mobile, tablet, or desktop?
- How much of bandwidth does it use provided it is online?
- How much time do you need at your disposal during the journey?
- Is it theory versus hands on?
- Can I collaborate with other learns if I am stuck?
- Can I compete to see where I stand?
The role of a mentor in learning is more important in machine learning because of inter disciplinary nature. You can get lost by going too deep or you can get lost by not knowing the best strategy available at your disposal. It is a trade of between breadth and depth. I highly recommend that you find a mentor who can guide in this journey at least during your initial days. It is relatively easy to get lost while learning ML.
You might hear terms like Data Science, ML, Statistics, & Data Mining. It is bit difficult to tell what exactly is making this fields different. All of them focus on how to learn from data with varying emphasis. You can see numerous articles debating on terminologies. Comparison of terms in machine learning versus statistics by Professor Rob Tibshirani highlight how close these fields provided we ignore terminologies. Now you know why some calls it Statistical Machine Learning. Some blogs highlight the difference using Venn Diagrams. Please take a look at the article "Battle of the Data Science Venn Diagrams". The Venn diagrams are visually appealing to the eye, but are they appealing to the brain? In short, there are subtle differences. Not sure if it worth hunting it down.
Knowledge Nugget: Definition of Data Science. "Work that takes more programming skills than most statisticians have, and more statistics skills than a programmer has." Source: Battle of the Data Science Venn Diagrams
Stats, Math, & Probability
Now we are on our journey to learning stats, math, & probability. As I mentioned previously, we will have to divide this into basic and intermediate or advanced. I am using my experience to define basic and intermediate or advanced. Not everyone has to accept this view.
Why do we need stats, math, & probability in ML? Our ultimate goal is to learn patterns from data and use the patterns to influence the business processes. Whenever you are dealing with data, you end up using stats, math, & probability for many reasons. There are no hard and fast rule as to what makes something basic vs intermediate or advanced. Let us define some rule of thumb before we dive deep.
- The data can be classified as structured and unstructured. When you are learning the basics, it is better to stick to structured data. Unstructured content could involve text, image, voice, etc. I would recommend that examples in unstructured text, image or voice can be treated as intermediate or advanced from a learners perspective. Fields like Natural Language Processing (NLP), Vision, & Voice use ML extensively. We will treat examples in these fields as intermediate or advanced. In my experience analysis of structured data dominates over unstructured data at present. This trend might change in next decade.
- Within the structured data, some data sets have time correlations. For simplicity, it is better to treat the time dimension also as intermediate or advanced topic. Let us focus on independent observations as part of basics. It is inappropriate to treat a time series as a random sample of independent observations.
- The data we collect is an outcome of a stochastic process or deterministic process. Natural world include stochasticity. Building stochastic models are considerably more complex. We have to assume that we deal with data obtained from deterministic process as part of basic. Incorporating stochasticity into models will be Intermediate or advanced topic. In real life we will have to deal with demographic stochasticity or environmental stochasticity. We will have to treat it as intermediate or advanced topic.
- We are assuming that learning basics involves processing of data in a single machine be it desktop or laptop or server. In reality, certain data sets cannot be processed in single machine. At times you need to use distributed machines to process the data or build the models. Use of frameworks like Apache Spark for distributed processing is common in real world. For simplicity, we will assume that such activities are part of intermediate or advanced learning.
- Analyzing data streams is common in an IoT world. For example a sensor could be sending data every second and you will first have to decide if the data does not have any outliers prior to its usage. As part of basic learning we will assume that we process files of data and not streams of data that could constantly change.
The field of statistics can be segmented to statistical inference, statistical populations, statistical models ,etc. In short, it is about collection, analysis, interpretation, and presentation of data. Let us look at some of the real time scenarios where stats can help us.
- You have a large corpus of data. You want to subset this data so that you can process it to identify patterns. Now you need some techniques to subset it. You need statistical techniques to do it. If you don't do it right, you might get a subset which does not represent the population leading to wrong patterns. Exit polls and pre-poll surveys are an example of this subset. It is performed on a subset of population which is carefully chosen to be representative of the entire population. If the subset is chosen wrong, the outcome also could go wrong.
- You are trying to predict an outcome like a house price. What variables does really influence the outcome? It could the size of the house, the location of it, number of bedrooms and bathrooms, crime in the locality, etc. There are two types of relationships you can see here. Relationship among independent variables like size of house and number of bedrooms and bathrooms or relationship between independent variables and dependent variables. Stats can help you in computing correlation and covariance which allows you to see the direction and magnitude of relation.
- Outliers are possible in real data sets. Certain algorithms get influenced by outliers. Detecting them and removing them is a necessity prior to model building.
- It is relatively common to see missing values in data sets. There are different techniques to deal with this problem like deleting the records, deleting the variable, imputation with mean or median or mode, and prediction. If you are interested in knowing more about missing values and how it is dealt with, I would recommend reading the article "Missing Value Treatment" by Selva Prabhakaran.
What is listed above is only some sample scenarios that you will face. In real life predictions, it can get much more complex than what is listed here.
The wiki page on "Outline of statistics" lists many concepts in stats. It is beyond what a basic learner needs to know. If you are a person who learn maximum by visualization, I would recommend "Taxonomy of Statistics". Sometimes pictures communicate information much better.
What are the basic concepts in stats, math, & probability that we need to be aware? Below is a sample list. Ensure that you understand them and work them out as part of your learning basics. I would recommend the material "Statistics Review" from Nicholas School of the Environment at Duke University by Elizabeth A. Albright, PhD. Another course I would recommend to learn basics is "I "Heart" Stats: Learning to Love Statistics" in edX by Dan Myers, Professor of Sociology, University of Notre Dame.
- Knowledge of common Greek letters/symbols used in statistics. Many find it difficult to understand equations written because they don't have a basic understanding of what it means. Beyond the materials listed above, I would recommend taking a look at the Probability And Statistics Symbols from other sites. Mostly these symbols are standardized across globe. Expect some surprises and have some willingness to accept diversity.
- Basics of exponents, logarithms, and factorials.
- Linear equation, x-intercept, y-intercept.
- Types of Statistical Data (Qualitative, Quantitative): Numerical (Discrete & Continuous), Categorical, and Ordinal
- Measures of Central Tendency: Mean, Median, Mode
- Measures of dispersion: Range, Inter Quartile Range (IQR), Variance and Standard deviation
- Measures of Position: Quartiles, Percentiles, Deciles
- Location, Spread, Skewness, Kurtosis, Outliers
- Central limit theorem, Law of Large Numbers
- Hypothesis Testing: Null hypothesis, One-Tailed Test, Two-Tailed Test, Chi-Squared, T-Score, Z-Score, P-Value
- Parametric Data Analysis: ANalysis Of VAriance (ANOVA), Regression, and Chi-Square
- Variables: Discrete random variable, Continuous random variable
- Events: Exhaustive events, Complementary events, Equally likely events, Mutually exclusive events, Independent events, Dependent events
- Probability: Marginal probability, Joint probability, Conditional probability
- Bayes’ Theorem, Chain rule, Likelihood function
- Probability Distributions: Discrete, Binomial, Poisson, Normal
You have many options in choosing programming languages while trying out concepts. I would recommend R for aspiring data scientists and Python for machine learning engineers. MATLAB might be an option for students in Universities. Its adoption is not that great in industries that I have worked with. Do not underestimate the power of excel sheet. If you are a business analyst or a manager, excel might be the right tool for you to learn the stats concepts. I would recommend choosing what you are comfortable with or if you are attending a MOOC course, they might restrict the usage of programming languages for easier evaluation of assignments. If you are looking for a single programming language through out your ML journey (starting with stats to ending with deep learning), I think you should stick with Python. R is an option, but is has limited libraries around deep learning compared to Python. R is very strong in stats, visualization, and shallow learning areas.
The ultimate goal of analyzing data is to extract information, derive knowledge out of it, and distill the wisdom (DIKW). We have tools and techniques to achieve it, but the human contribution is still considerable. We have not reached a stage where we could derive Information, knowledge, & wisdom without manual intervention. The information, knowledge, & wisdom derived varies from person to person even if they are given the same data. That is because they use different tools and techniques to achieve more out of it. In short, it is difficult to teach that skill and it has to come from practice. That is why some say, you have to breath data to be successful.
Knowledge Nugget: Are data scientists born or made? From what I have seen, they are made.
We touched upon the basics of stats, math, & probability. What constitutes intermediate or advanced topics? The following topics could be a probable list. If you are new to machine learning, go through the basics of each topic prior to attempting advanced. Ultimately advanced techniques are required to solve real world problems in a better way. Attempting the basics across topics boosts your confidence that you can do it. If you start going deeper into one subject as part of learning, there is a high chance that you will loose interest due to complexity in learning. So ensure that you iterate topics as you go deeper.
- Probability distributions beyond what we learned in basics. I would recommend the blog "Common Probability Distributions: The Data Scientist’s Crib Sheet" by Sean Owen.
- Multivariate Data Analysis, Principal component analysis (PCA)
- Cumulative Distribution Function (CDF), Kernel density estimation (KDE)
- Markov Chain Monte Carlo (MCMC)
- Prior, Posterior, Conjugate Prior
- Stochastic modeling
- Time series analysis
- Kalman filtering or Linear Quadratic Estimation (LQE)
- Understanding different schools of thought like Frequentists vs Bayesians vs Likelihoodists
There are many more advanced topics in statistics than what is listed above. I think it is advanced even for an advanced learner. I would recommend few books and articles that you can take a look. Quickly explore them before you start your journey.
- Think Stats, Exploratory Data Analysis in Python
- An Introduction to Statistical Learning with Applications in R
- The Elements of Statistical Learning: Data Mining, Inference, and Prediction
- Free Must Read Books on Statistics & Mathematics for Data Science
- 19 MOOCs on Mathematics & Statistics for Data Science & Machine Learning
Knowledge Nugget: Does good statisticians work in domains closer to nature as opposed to fields like manufacturing or software engineering which are man made?
Next topic in our list is linear algebra. It is about matrices and vectors. You will hear about tensors when you are dealing with deep learning (especially frameworks like Tensorflow). It is better to understand the differences between scalar, vector, & tensor using a real world example. A matrix can be used to represent them. Many problems that you see in ML can be represented in some form of matrix representation and manipulations. Many tool kit's and frameworks expect the input data in this form. Data frame is another terminology you will hear a lot. Getting a good grip in this subject is important if you want to go deep.
In what way linear algebra is related to machine learning? Let us look at some scenarios to understand it.
- Representing the relationship between dependent variables and independent variables using matrices. In short, representing linear equations in the form of matrices.
- Converting a high dimensional space to a low dimensional space using different techniques. The representation of data before and after.
- Creation of word vectors in NLP for further downstream processing. Other examples include Term Document Matrix (TDM).
- Representing images in the form of vectors or tensors for down stream processing (applying filters as part of processing).
- Confusion matrix. The simplest matrix you will deal with which has only two columns and two rows.
Few topics that we need to consider in this field are Eigen values, Eigen vectors, Matrix Factorization, Non-negative Matrix Factorization (NMF), Singular Value Decomposition (SVD), & Alternating Least Squares (ALS). To understand these topics and practice them, we need to go deeper into matrices and vectors. Start with Khan academy to catch up with matrices & vectors if you are very weak in these subjects. Then read up to seventh chapter in the book "Introduction to Linear Algebra" by Gilbert Strang. That should be sufficient to cover the basics and little bit advanced. All the video's from his lectures are publicly available. This grandpa is the man to go when it comes to linear algebra except that he expects some basic in you before you start. The blog "Linear Algebra for Machine Learning" By Jason Brownlee might be worth glancing. The wiki page "List of linear algebra topics" provides you a table of content of what is possible. The basics should cover the following topics.
- Types of matrices
- Matrix operations: Addition, subtraction, multiplication (scalar, vector, matrix-matrix), and division.
- Matrix Theorems: Matrix Addition/Multiplication Properties (Commutative, Associative), Transposition Rules, Inverse Rules
- Inverse, transpose, determinant, rank, Singular/Nonsingular
- Vector spaces and subspaces: Row space, nullspace, column space, & left nullspace
- Matrix factorization
- Eigen values, Eigen vectors
- Singular Value Decomposition (SVD)
- Six Great Theorems of Linear Algebra: Dimension Theorem, Counting Theorem, Rank Theorem, Fundamental Theorem, SVD, & Spectral Theorem
The advanced topics in this space could include application of these concepts in many areas. Some examples include
- The Linear Algebra Aspects of PageRank
- Wavelet transform and Fourier transform for audio and image compression
- Broyden–Fletcher–Goldfarb–Shanno (BFGS) algorithm for solving unconstrained nonlinear optimization
- Cholesky decomposition used in the Monte Carlo method or Kalman filters
If you are really looking for applying advanced concepts in real world applications, the book "Coding the Matrix: Linear Algebra through Applications to Computer Science" by Philip N. Klein might be worth it.
Knowledge Nugget: Default is digital or default is analog? For many CEO's default is digital, but for nature default is analog.
"A picture is worth a thousand words". Humans tend to comprehend much better when information is viewed as a chart or graph. This is very true when dealing with large amounts of data. The intuition that you can gain by representing the data as graphs or charts cannot be obtained when viewed as a columns of data frame. Understanding visualization techniques are a must for machine learning practitioners. Should we learn the topics in order or can we learn visualization in parallel to learning stats and linear algebra? It makes sense to learn visualization in parallel to learning stats and linear algebra.
Visualization is a generic word. I am talking it in the context of charts and graphs which are required to visualize the data or information. Don't get confused with Infographics, power point presentations, digital marketing, etc. Those are required, but not when you are trying to take baby steps.
Many of us are familiar with English grammar, but there is something called "The Grammar of Graphics". It might be too much to start with unless your job focuses on visualizations. You will be using ggplot if you have preferred R as your programming language. The gg in the ggplot stands for grammar of graphics.
There is an art/science behind choosing the right chart or graph. The best representation I have seen is "Choosing a good chart" by Andrew V. Abela. It covers the basics that we need to start our journey. If you are interested, take a look at "Slide Chooser". It gives you guidance for choosing different slide layouts. The "Slide Chooser" might not directly help us in machine learning, but I shared it because I liked it. There are courses like "Data Visualization" by John C. Hart. It might be too much for a starter. It might be worth browsing "Visualization taxonomy" to get some additional views. "The Data Visualization Catalogue" is worth reading.
There are many libraries available for drawing charts and graphs. To some degree, the programming language you chose decides the library. R programming language provides ggplot. If you are a python geek, stick with Matplotlib. Seaborn which is built on top of Matplotlib is another option for python programmers. Excel has been excellent in visualizations for quite long. So if you are using excel, stick with what it provides. For a basic learner, any of these options are fine. The capability of these libraries or tools differs when it comes to advanced visualizations. You can build your data pipelines in python and use ggplot for visualizations. After all, ployglot programming is on the rise in ML space.
You will have to program graphs and charts if you are using ggplot or Matplotlib. Drag and drop visualizations have been with us for long. Excel supports this philosophy. There are commercially available tools like Tableau, Power BI, Qlikview, etc. I think it is an overkill for a learner. If you are using tool kits like Weka for machine learning, they have inbuilt capability for visualizations.
2D visualizations are sufficient for a learner. There are advanced 3D visualizations which can be part of advanced learning. What are some of the visualization techniques that we can include as part of basics? A sample list is provided.
- Line: Line Graph, Circular Line Graph, Density Plot, Start Plot, Vector Graph
- Bar: Bar Chart, Grouped Bar Chart, Stacked Bar Chart, Circular Bar Chart
- Point: Bubble Chart, Scatter Plot, Trend Line
- Circle: Pie Chart, Sector Graph
- Distribution: Histogram, Distribution Curve, Box-And-Whisker Plot
- Heat map, Tree map
- Area: Area Chart, Overlapped Area Chart, Stacked Area Chart
What can be treated as advanced when it comes to visualizations? The simplest answer would be, anything that is not covered in basics. Zoo's are not only for animals. You will see many Zoo's like Visualization Zoo, Model Zoo, Neural Network Zoo, etc. I would recommend taking a look at "Visualization Zoo" to better understand what are all possible. The research paper "High-Dimensional Visualizations" is worth browsing. Some examples of advanced visualizations include
- Scatter-Plot Matrix
- Complex Tree Maps like Circular Treemap
- Visualization of high-dimensional data sets using t-Distributed Stochastic Neighbor Embedding (t-SNE)
- Time series data representation
- Spiral Plot
Knowledge Nugget: In predictions it is easier to get the direction right than to get the magnitude right. Are we close to technological singularity? When will it happen?
Machine Learning - Shallow Learning
Machine learning is a vast field with more than 200 algorithms and combinations available for you to choose. For simplicity I have divided the space into shallow learning and deep learning. We will be defining a learning path for both shallow learning and deep learning. The classifications that currently exist in ML.
- Linear Algorithms vs Non-linear Algorithms vs Ensemble Algorithms
- Five tribes of machine learning (Symbolists, Connectionists, Evolutionaries, Bayesians, & Analogizers)
- Classification vs Clustering vs Regression vs Dimension Reduction
- Unsupervised Learning (Clustering, Dimension Reduction) vs Supervised Learning (Classification, Regression) vs Reinforcement Learning
- Generative Models vs Discriminative Models
- Static ML vs Dynamic ML
- First Generation (Perceptrons, Nearest Neighbor, Naive Bayes) vs Second Generation (ANNs, Kalman, HMMs, SVMs) vs Third Generation (PGMs, Fully Bayesian)
- Transparent ML vs Black-Box ML (Mostly from reasoning perspective)
The common classification scheme I have seen is Unsupervised Learning, Supervised Learning. I think we will follow that to define our learning path. The visualization by Isazi Consulting is good. They have looked at the classification from a use case perspective. It is very easy to take the same diagram and map algorithms for each of the leaf nodes. Reinforcement learning will not be part of the basics, it has to be advanced.
The blog "Which machine learning algorithm should I use?" from sas has a ML Cheat Sheet. It talks about algorithms and criteria for choosing them. Scikit-learn algorithm cheat sheet is another resource I would recommend. Dlib machine learning guide is good. "Types of machine learning algorithms" is another one. Understand the classification so that you have a mental map of what all you are going to learn and where do they fit.
There are many ML concepts that you need to learn in addition to algorithms. Any ML pipeline involves 4 high level steps. Data preprocessing, feature engineering, model building, & model validation. Feedback loop is important in machine learning because ultimately the model has to improve over time. The blog "A Comprehensive Guide to Data Exploration" is worth exploring.
There are too many tools/frameworks available for you to learn. I would recommend taking a look at "Machine Learning Periodic Table". New tools/frameworks are coming up frequently. Any information can get outdated if not updated periodically. This periodic table does not have TensorFlow as part of it, because it was created prior to releasing TensorFlow and was not updated after that.
I would recommend MOOC courses "Machine Learning" by Dr. Andrew Ng and "The Analytics Edge" by Dr. Dimitris Bertsimas. These two courses should cover most of the topics in shallow learning. The programming assignments will have to be in Ocatave & R respectively. "Outline of machine learning" wiki page gives you an exhaustive list of ML tools/frameworks, methods, use cases/applications. "Basic Concepts in Machine Learning" by Dr. Jason Brownlee is worth browsing. "Learning From Data (Introductory Machine Learning)" by Yaser S. Abu-Mostafa is another option. For some reason I felt that it was going deep into theory.
What are the concepts we should be clear as part of basics? The sample list is as follows.
- Independent variable, Dependent variable. There are other names like experimental or predictor variable, outcome variable
- Training data, Validation data, Test data, Cross Validation (K-Fold)
- Class Imbalance and techniques to deal with them. Would recommend the blog "Learning from Imbalanced Classes - https://svds.com/learning-imbalanced-classes/"
- Outlier Detection
- Missing Value Treatment
- Confusion matrix, Accuracy, True positive rate (Recall, Sensitivity), True negative rate (Specificity), Precision, False positive rate, False negative rate, f1 Score
- Overfitting/Underfitting, Regularization (Lasso/L1, Ridge/L2), Bias–Variance tradeoff, Pruning, Generalization
- Bagging and Boosting
- Loss function, Cost function, Objective function
- Mean Squared Error (MSE), Root Mean Squared Error (RMSE), R^2
- Receiver Operator Characteristic (ROC), Area Under the Curve (AUC)
- Optimization: Gradient Descent, Stochastic Gradient Descent, Learning Rate
- One-Hot Encoding
- Measuring Similarity and Distance: Euclidean distance, Cosine distance, Manhattan distance, Hamming distance, Jaccard similarity, Cosine similarity, etc
ML Algorithm Mind Map created by Dr. Jason Brownlee is available in the blog "Machine Learning Algorithms Mindmap". We will not be covering all of them as part of our basics. The algorithms we will be choosing as part of basic are as follows.
- Linear, Logistic, Polynomial, Ridge, LASSO (for regression)
- k-means, Hierarchical (for clustering)
- k-NN (for classification)
- Naive Bayes (for classification)
- Decision trees (for classification) - CART, Random Forest, XGBoost
- Perceptron (for classification)
- SVM (for classification)
- AdaBoost and RankBoost (classification and ranking)
- Principal component analysis (PCA) (dimensionality reduction)
- Apriori (for association rule mining)
- Multiclass Classification: One-vs-all (for classification)
Knowledge Nugget: Do you believe in fashion over function? If so, the current world is just right for you.
There is too much to cover in advanced topics for shallow learning. A sample list from my experience is as follows.
- Timeseries analysis using Auto Regressive Integrated Moving Average (ARIMA) or other variants.
- Topic Modeling: Latent Semantic Analysis (LSA), Latent Dirichlet allocation (LDA)
- Recommender systems: Collaborative filtering, Content-based filtering, Hybrid recommender systems
- Probabilistic Graphical Models: Bayesian Network (BN), Markov Random Fields (MRF), Conditional Random Fields (CRF)
- Advanced Optimization Techniques: SGD+momentum, Adagrad, Adadelta, Adam
- Simulation Methods: Markov Chain Monte Carlo (MCMC)
- NLP Concepts: Bag Of Words (BOW), Continuous Bag Of Words (CBOW), Term Document Matrix (TDM), Term Frequency/Inverse Document Frequency (TF/IDF), Vector Space Model (VSM)
- Other topics: Expectation Maximization (EM), Maximum Likelihood (ML)
- ML on Graphs
- Gaussian Mixture Model (GMM) for voice transcription
There are dedicated courses available for each of the advanced topic as part of MOOC. I would recommend learning advanced topics when you have a real necessity.
Knowledge Nugget: Machine learning has more than 200 algorithms. With permutation and combination, you can make the list much bigger. Do you believe in "The Master Algorithm" replacing all of them in next couple of decades? I don't think so.
Machine Learning - Deep Learning
Deep learning is a sub field within machine learning. What makes shallow learning different than Deep Learning (DL)? Beyond the mechanics of how things work, there are two differences. Ability to learn in layers and ability to learn high-level features from data (avoidance of handcrafted features). "Periodic Table of Deep Learning Patterns" gives a glimpse of what is possible. The diagram available at Neural Network Zoo gives the spectrum of possibilities. Even the basics of DL can be tough to understand. The best way to learn it is by practicing.
DL seems to outperform shallow learning in unstructured content (text, voice, & image). Mostly we tend to use shallow learning for structured data. That does not mean we cannot use DL for structured data. There are many examples of DL used for stock price prediction and other areas based on historical facts. Shallow learning also can be applied to unstructured contents. Use of Gaussian Mixture Model (GMM) for voice transcription is one such example.
There are many frameworks that are available for you to use. Frameworks include Caffe, Theano, Torch, Keras, TensorFlow, MXNet, BigDL, CNTK, etc. TensorFlow is becoming more popular (Python based). You cannot ignore Caffe for its speed mainly in Convolutional Neural Networks (CNN's). Other frameworks are equally good. You will have to make a choice before you start. I would recommend TensorFlow to start with and learn others if needed.
There are couple of free online books available. "Neural Networks and Deep Learning" by Michael Nielsen is a good start. "Deep Learning" by Ian Goodfellow, Yoshua Bengio, Aaron Courville is a classic. You will start to appreciate Greek symbols and linear algebra when you start reading the books.
There are many DL courses available as part of MOOC. I would recommend taking a look at the blog "Deep Learning Courses" by Dr. Jason Brownlee. I will define a different path. First you should take a look at "Deep Learning 101" from big data university. It is basic. Once you finish that, try attempting "Practical Deep Learning For Coders" by Jeremy Howard. You will be using Theano as part of this learning. Third attempt "Neural Networks for Machine Learning" by Geoffrey Hinton.
There are additional materials on specific topics in websites of Intel, Nvidia, etc. Use them to supplement your learning. There are good blogs on specific topics by researchers like Christopher Olah, Denny Britz, Andrej Karpathy, Adit Deshpande, etc. DL is constantly changing. You can get outdated if you don't keep track of the changes happening. Research papers are the best way to find out what is happening.
Most of what you are going to learn works in CPU for small data. DL works on large amounts of data and might require GPU based on what you are trying to do. If you are willing to spend money, GPU in cloud is an option. Assembling your own GPU machine is another option. There is plenty of blogs which gives suggestions on how to build one.
"Deep Learning Glossary", "Deep Learning and Neural Network Glossary" are good resources to familiarize with terms/concepts that you are going to hear. The blog "Deep Learning in a Nutshell: Core Concepts" by Tim Dettmers is good. Some of the concepts that you will need to understand in DL are
- Perceptron, Node, Neuron, Layers (Input, Hidden, Output)
- Activation function (ReLU, tanh, sigmoid). There are many more.
- Channel (for images)
- Pooling (Max, Min, Average)
- Optimization (SGD, RMSProp, Adadelta, Momentum, Adagrad, Adam)
- Back propagation
- Vanishing gradient problem
- Dropout, Cross-entropy cost function, Regularization methods (L1 and L2 regularization)
- Fully-Connected (FC) layer
- Encoder, Decoder
- Normalization, Softmax
- Iteration, Epoch, Learning Rate, Weight initialization
- Vectors, Tensors
- Word Embedding
What are some of the deep learning network topologies that we can learn as part of basics? A sample list is as follows.
- Multi-Layer Perceptron (MLP)
- Convolutional Neural Network (CNN)
- Autoencoders: Denoising Autoencoders, Stacked Denoising Autoencoders
- Restricted Boltzmann Machines (RBM)
- Recurrent Neural Network (RNN), Long Short Term Memory (LSTM)
Advanced topics in deep learning could include the following.
- Object recognition: AlexNet, VGG, ResNet, GoogLeNet, Inception. Take a look at the blog "Neural Network Architectures"
- Object detection: Faster R-CNN, SSD, YOLO
- Object tracking
- Transfer Learning
- Generative Adversarial Networks (GANs) and its variants
- Deep Belief Networks (DBN)
- Deep Reinforcement Learning
- Character-level Convolutional Networks for Text Classification
Mostly what I have listed is from my experience or from my readings. I might have missed some topics. Please add it to the comments section for others to learn from your experience.
How long will it take to learn all the topics. If you are willing to spend 10 hrs per week, we are looking at 1 to 2 years journey to cover the basics and try them out. The overall duration depends on your existing knowledge and programming skills. Having a mentor can make the journey much smoother and shorter. Intuition is more important than theoretical knowledge. It is obtained by working more examples and learning from others. New techniques are constantly emerging. I would recommend subscribing to blogs to learn the latest developments in the field. There is quite an amount of hype in this field, I think that will subside over a period of time. We are still early in the journey for building cognitive architectures. It is better to keep track of what is happening so that you don't get completely lost. Job shift is bound to happen in many fields. It is better to get prepared for it.