Machine learning: August 2021

30/08/2021

Data Cleaning - Part 3 - How to Mark and Remove Missing Data

Real-world data often has missing values. Data can have missing values for a number of reasons such as observations that were not recorded and data corruption.

Handling missing data is important as many machine learning algorithms do not support data with missing values.

In this tutorial, you will discover how to handle missing data for machine learning with Python.

Specifically, after completing this tutorial you will know:

How to mark invalid or corrupt values as missing in your dataset.
How to confirm that the presence of marked missing values causes problems for learning algorithms.
How to remove rows with missing data from your dataset and evaluate a learning algorithm on the transformed dataset.

Data Cleaning - Part 2 - Outlier Identification and Removal

Sometimes a dataset can contain extreme values that are outside the range of what is expected and unlike the other data. These are called outliers and often machine learning modeling and model skill in general can be improved by understanding and even removing these outlier values.

After completing this tutorial, you will know:

That an outlier is an unlikely observation in a dataset and may have one of many causes.
How to use simple univariate statistics like standard deviation and interquartile range to identify and remove outliers from a data sample.
How to use an outlier detection model to identify and remove rows from a training dataset in order to lift predictive modeling performance.

Data Cleaning - Part 1 - Basic Data Cleaning

Data cleaning is a critically important step in any machine learning project. Before jumping to sophisticated methods, there are some very basic data cleaning operations that you probably should perform on every single machine learning project.

In this tutorial, you will discover basic data cleaning methods. After completing this tutorial, you will know:

How to identify and remove column variables that only have a single value.
How to identify and consider column variables with very few unique values.
How to identify and remove rows that contain that duplicate observations.

Data Preparation Without Data Leakage

In this tutorial, you will discover how to avoid data leakage during data preparation when evaluating machine learning models.
After completing this tutorial, you will know:

Naive application of data preparation methods to the whole dataset results in data leakage that causes incorrect estimates of model performance.
Data preparation must be prepared on the training set only in order to avoid data leakage.
How to implement data preparation without data leakage for train-test splits and k-fold cross-validation in Python

Why Data Preparation is So Important

Given that we have standard implementations of highly parameterized machine learning algorithms in open source libraries, fitting models has become routine.

As such, the most challenging part of each predictive modeling project is how to prepare the one thing that is unique to the project: the data used for modeling.

In this tutorial, you will discover the importance of data preparation for each machine learning project.

Data Preparation in a Machine Learning Project

Data preparation may be one of the most difficult steps in any machine learning projects. The reason is that each data is difference and highly specific to the project.

After completing this tutorial, you will know:

Each predictive modeling project with machine learning is different, but there are common steps performed on each project.
Data preparation involves best exposing the unknown underlying structure of the problem to learning algorithm.
The steps before and after data preparation in a project can inform what data preparation methods to apply, or at least explore.

Cross-Entropy for Machine Learning

Cross-entropy is commonly used in machine learning as a loss function.

Cross-entropy is a measure from the field of information theory, building upon entropy and generally calculating the difference between two probabilities distributions.

It is closely related to but is different from KL divergence that calculates the relative entropy between two probability distributions, whereas cross-entropy can be thought to calculate the total entropy between the distributions.

Cross-entropy is also related to and often confused with logistic loss, call log loss. Although the two measures are derived from a difference source, when used as loss functions for classification models, both measures calculate the same quantity and can be used interchangeably.

In this tutorial, you will discover cross-entropy for machine learninng.

Divergence Between Probability Distributions

It is often desirable to quantify the difference between probability distribution for a given variable.

This occurs frequently in machine learning, when we may be interested in calculating the difference between an actual and observed probability distribution.

This can be achieved using techniques from information theory, such as the Kullback-Leibler Divergence (KL divergence), or relative entropy, and the Jensen-Shannon Divergence that provides a normalized and symmetrical version of the KL divergence.

Probability Density Estimation

Probability density is the relationship between observations and their probability.

Some outcomes of a random variable will have low probability density and other outcomes will have a high probability density.

The overall shape of the probability density is referred to as a probability distribution, and the calculation of probabilities for specific outcomes of a random variable is performed by a probability density function, or PDF for short.

It is useful to know the probability density function for a sample of data in order to know whether a given observation is unlikely, or so unlikely as to be considered an outlier or anomaly and whether it should be removed.

Probability Distributions

Probability can be used for more than calculating the likelihood of one event; it can summarize the likelihood of all possible outcomes. A thing of interest in probability is called a random variable, and the relationship between each possible outcome for a random variable and their probabilities is called a probability distribution.

Information Entropy

Information theory is a subfield of mathematics concerned with transmitting data across a noisy channel.

A cornerstone of information theory is the idea of quantifying how much information there is in a message.

More generally, this can be used to quantify the information in an event and a random variable, called entropy, and is calculated using probability.

Calculating information and entropy is a useful tool in machine learning and is used as the basis for techniques such as feature selection, building decision trees, and, more generally, fitting classification models.

As such, a machine learning practitioner requires a strong understanding and intuition for information and entropy.

Data Visualization

Data visualization is an important skill in applied statistics and machine learning. This can be helpful when exploring and getting to know a dataset and can help with identifying patterns, corrupt data, outliers, and much more.

Examples Of Statistics In Machine Learning

Statistics and machine learning are two very closely related fields. In fact, the line between the two can be very fuzzy at times.

It would be fair to say that statistical methods are required to effectively work through a machine learning predictive modeling project.

Linear Regression

Linear regression is a method for modeling the relationship between one or more independent variables and a dependent variable.

It is a staple of statistics and is often considered a good introductory machine learning method.

In this tutorial, you will discover the matrix formulation of linear regression and how to solve it using direct and matrix factorization methods.

Singular Value Decomposition

Matrix decomposition, also known as matrix factorization, involves describing a given matrix using its constituent elements.

Perhaps the most known and widely used matrix decomposition method is the Singular-Value Decomposition, or SVD.

All matrices have an SVD, which makes it more stable than other methods, such as the eigendecomposition.

As such, it is often used in a wide array of applications including compressing, denoising, and data reduction.

Eigendecomposition

Matrix decompositions are a useful tool for reducing a matrix to their constituent parts in order to simplify a range of more complex operations.

Perhaps the most used type of matrix decomposition is the eigendecomposition that decomposes a matrix into eigenvectors and eigenvalues.

Matrix Decompositions

Many complex matrix operations cannot be solved efficiently or with stability using the limited precision of computers.

Matrix decompositions are methods that reduce a matrix into constituent parts that make it easier to calculate more complex matrix operations.

Matrix decomposition methods, also called matrix factorization methods, are a foundation of linear algebra in computers, even for basic operations such as solving systems of linear equations, calculating the inverse, and calculating the determinant of a matrix.

Principal Component Analysis

An important machine learning method for dimensionality reduction is called Principal Component Analysis (PCA).

It is a method that uses simple matrix operations from linear algebra and statistics to calculate a projection of the original data into the same number or fewer dimensions.

In this tutorial, you will discover the PCA machine learning method for dimensionality reduction and how to implement it from scratch in Python.

Introduction to Multivariate Statistics

Fundamental statistics are useful tool in applied machine learning for better understanding your data.

They are also the tools that provide the foundation for more advanced linear algebra operations and machine learning methods, such as the Covariance Matrix and Principal Component Analysis respectively.

In this tutorial, you will discover how fundamental statistical operations work and how to implement them using Numpy.

Tensors and Tensor Arithmetic

In deep learning it is common to see a lot of discussion around tensors as the cornerstone data structure.

Tensor even appears in name of Google’s flagship machine learning library: TensorFlow.

Tensors are a type of data structure used in linear algebra, and like vectors and matrices, you can calculate arithmetic operations with tensors.

This tutorial is divided into 3 parts; they are:

What are Tensors
Tensors in Python
Tensor Arithmetic

Sparse Matrix

A sparse matrix is a matrix that is comprised of mostly zero values. Sparse matrices are distinct from matrices with mostly non-zero values, which are referred to as dense matrices.

Below is an example of a small 3 × 6 sparse matrix

The example has 13 zero values of the 18 elements in the matrix, giving this matrix a sparsity score of 0.722 or about 72%.

Loss and Loss Functions for Training Deep Learning Neural Networks

Neural network are trained using stochastic gradient descent and require that you choose a loss function when designing and configuring your model.

There are many loss functions to choose from and it can be challenging to know what to choose, or even what a loss function is and the role it plays when training a neural network.

4 Types of Classification Tasks in Machine Learning

Examples of classification problems include:

Given an example, classify if it is spam or note
Given a handwritten character, classify if as one of known characters
Given recent user behavior, classify as churn or not

Classification requires a training dataset with many examples of inputs and outputs from which to learn.

14 Different Types of Learning in Machine Learning

There are 14 types of learning that you must be familiar with as a practitioners; they are:

Learning Problem

1.Supervised Learning
2.Unsupervised Learning
3.Reinforcement Learning

Hybrid Learning Problem

4.Semi-Supervised Learning
5.Self-Supervised Learning
6.Multi-Instance Learning

Statistical Inference

7.Inductive Learning
8.Deductive Inference
9.Transductive Learning

Learning Techniques

10.Multi-Task Learning
11.Active Learning
12.Online Learning
13.Transfer Learning
14.Ensemble Learning

Generative Adversarial Networks(GAN)

Reinforcement Learning

Natural Language Processing

NumPy Array Broadcasting

Arrays with difference sizes cannot be added, subtracted, or generally be used in arithmetic. A way to overcome this is to duplicate the smaller array so that it has the dimensionality and size as the large array. This is called array broadcasting.

Evaluation Metrics for Classification

A. Binary Classification

Confusion Matrix

The confusion matrix is a table with the number of correct and incorrect predictions broken down by class

How to Accelerate Learning of Deep Neural Networks With Batch Normalization

Batch Normalization is a technique design to automatic standardize the inputs to a layer in a deep learning neural network.

Once implemented, batch normalization has the effect of dramatically accelerating the training process of a neural network, and in some cases improves the performance of the model via a modest regularization effect.

Lenet-5 Architecture

Lenet-5 is one of the easiest pre-trained models proposed by Yann Lecun in the year 1998.

The network has 5 layers with learnable parameters and hence named Lenet-5. It has three sets of convolution layers with a combination of average pooling. After convolution and average pooling layers, we have two fully connected layers. At last, a Softmax classifier which classifies the images into respective class.

Convolutional Neural Networks for Machine Learning

These networks preserve the spatial structure of the problem.

CNNs are popular because people are achieving state-of-the-art-results on difficult computer vision and natural language processing tasks.

Given a dataset of gray scale images with the standardized size of 32x32 pixels each, a traditional feedforward neural network would require 1024 input weights (plus one bias).

When to use MLP, CNN, RNN Neural Networks?

What neural network is appropriate for your predictive modeling problem?

When to Use Multilayer Perceptrons?

Tabular datasets
Classification prediction problems
Regression prediction problems.

How to choose an Activation Function for Deep Learning

Activation function are a critical part of the design of a neural network.

The choice of activation function in the hidden layer will control how well the network model learns the training dataset. The choice of activation function in the output layer will define the type of predictions the model can make.

Menu bar

30/08/2021

28/08/2021

27/08/2021

26/08/2021

25/08/2021

23/08/2021

22/08/2021

21/08/2021

20/08/2021

18/08/2021

17/08/2021

16/08/2021

15/08/2021

14/08/2021

13/08/2021

11/08/2021

09/08/2021

07/08/2021

06/08/2021

05/08/2021

03/08/2021

01/08/2021