Menu bar

20/10/2021

Non Maximum Suppression: Theory and Implementation in PyTorch


Non Maximum Suppression (NMS) is a technique used in numerous computer vision tasks. It is a class of algorithms to select one entity (e.g., bounding boxes) out of many overlapping entities. We can choose the selection criteria to arrive at the desired results. The criteria are most commonly some form of probability number and some form of overlap measure (e.g. Intersection over Union).

17/10/2021

Object Detection Metrics With Worked Example

Average Precision (AP) and mean Average Precision (mAP) are the most popular metrics used to evaluate object detection models such as Faster R_CNN, Mask R-CNN, YOLO among others. The same metrics have also been used the evaluate submissions in competitions like COCO and PASCAL VOC challenges.

13/10/2021

40 Must know Questions to test a data scientist on Dimensionality Reduction techniques

Q1) Imagine, you have 1000 input features and 1 target feature in a machine learning problem. You have to select 100 most important features based on the relationship between input features and the target features.

Do you think, this is an example of dimensionality reduction?
A. Yes
B. No

Dimensionality Reduction - Part 5 - Practical Approach to Dimensionality Reduction Using PCA, LDA and Kernel PCA

Dimensionality reduction is an important approach in machine learning. A large number of features available in the dataset may result in overfitting of the learning model. To identify the set of significant features and to reduce the dimension of the dataset, there are three popular dimensionality reduction techniques that are used. 

In this article, we will discuss the practical implementation of these three dimensionality reduction techniques:
  • Principal Component Analysis (PCA)
  • Linear Discriminant Analysis (LDA), and
  • Kernel PCA (KPCA)
  • Comparison of PCA, LDA and Kernel PCA

12/10/2021

Dimensionality Reduction - Part 4 - How to Perform SVD Dimensionality Reduction

The most popular technique for dimensionality reduction in machine learning is Singular Value Decomposition (SVD). This is a technique that comes from the field of linear algebra and can be used as a data preparation technique to create a projection of a sparse dataset prior to fitting a model.

In this tutorial, you will discover how to use SVD for dimensionality reduction when developing predictive models. After completing this tutorial, you will know:
  • SVD is a technique from linear algebra that can be used to automatically perform dimensionality reduction.
  • How to evaluate predictive models that use an SVD projection as input and make predictions with new raw data.

Dimensionality Reduction - Part 3 - How to Perform PCA Dimensionality Reduction

The most popular technique for dimensionality reduction in machine learning is Principal Component Analysis (PCA). This is a technique that comes from the field of linear algebra and can be used as a data preparation technique to create a projection of a dataset prior to fitting a model. 

In this tutorial, you will discover how to use PCA for dimensionality reduction when developing predictive models. After completing this tutorial, you will know:
  • PCA is a technique from linear algebra that can be used to automatically perform dimensionality reduction.
  • How to evaluate predictive models that use a PCA projection as input and make predictions with new raw data.

11/10/2021

Dimensionality Reduction - Part 2 - How to Perform LDA Dimensionality Reduction

Linear Discriminant Analysis (LDA) is a predictive modeling algorithm for multiclass classification.

It can also be used as a dimensionality reduction technique, providing a projection of a training dataset that best separates the examples by their assigned class.

The ability to use Linear Discriminant Analysis for dimensionality reduction often surprises most practitioners. 

10/10/2021

Dimensionality Reduction - Part 1 - What is Dimensionality Reduction?

The number of input variables or features for a dataset is referred to as its dimensionality. Dimensionality reduction refers to techniques that reduce the number of input variables in a dataset. More input features often make a predictive modeling task more challenging to model, more generally referred to as the curse of dimensionality.

High-dimensionality statistics and dimensionality reduction techniques are often used for data visualization. Nevertheless these techniques can be used in applied machine learning to simplify a classification or regression dataset in order to better fit a predictive model.

In this tutorial, you will discover a gentle introduction to dimensionality reduction for machine learning.

Advanced Transform - Part 3 - How to Save and Load Data Transforms

It is critical that any data preparation performed on a training dataset is also performed on a new dataset in the future. This may include a test dataset when evaluating a model or new data from the domain when using a model to make predictions. 

Typically, the model fit on the training dataset is saved for later use. The correct solution to preparing new data for the model in the future is to also save any data preparation objects, like data scaling methods, to file along with the model.

In this tutorial, you will discover how to save a model and data preparation object to file for later use. After completing this tutorial, you will know:
  • The challenge of correctly preparing test data and new data for a machine learning model.
  • The solution of saving the model and data preparation objects to file for later use.
  • How to save and later load and use a machine learning model and data preparation model on new data.

08/10/2021

Advanced Transform - Part 2 - How to Transform the Target in Regression

On regression predictive modeling problems where a numerical value must be predicted, it can also be critical to scale and perform other data transformations on the target variable. This can be achieved in Python using the TransformedTargetRegressor class.

In this tutorial, you will discover how to use the TransformedTargetRegressor to scale and transform target variables for regression using the scikit-learn Python machine learning library.

After completing this tutorial, you will know:
  • The importance of scaling input and target data for machine learning.
  • The two approaches to applying data transforms to target variables.
  • How to use the TransformedTargetRegressor on a real regression dataset

07/10/2021

Advanced Transform - Part 1 - How to Transform Numerical and Categorical Data

Applying data transforms like scaling or encoding categorical variables is straightforward when all input variables are the same type. It can be challenging when you have a dataset with mixed types and you want to selectively apply data transforms to some, but not all, input features.

The scikit-learn Python machine learning library provides the ColumnTransformer that allows you to selectively apply data transforms to different columns in your dataset. In this tutorial, you will discover how to use the ColumnTransformer to selectively apply data transforms to columns in a dataset with mixed data types. 

After completing this tutorial, you will know:
  • The challenge of using data transformations with datasets that have mixed data types
  • How to define, fit, and use the ColumnTransformer to selectively apply data transforms to columns
  • How to work through a real dataset with mixed data types and use the ColumnTransformer to apply different transforms to categorical and numerical data columns