Right Machine Learning Approach.

Whenever we talk about automating our decisions the first thing that strikes our mind is Machine Learning. ML has gained huge market lately, every sector digging deeper and deeper into the same to excel themselves.

If we talk about beginners they too want to see themselves as a ML engineer, Data engineers etc. and they also trying their best to master Machine Learning, But even after pushing so hard some of them tend to stuck over some places and when won’t find solution they skip things, because there are still many things which still needed to be cleared (like where to start off, what would be the right approach etc.)

By the end of this article you will get a clear picture about what path you should follow to tackle Machine Learning problem.

Why to Learn Machine Learning?

In 1959, Arthur Samuel (an American IBMer and pioneer in the field of computer gaming and artificial intelligence) coined the term “Machine Learning” and defined it as a “Field of study that gives computers the capability to learn without being explicitly programmed”.

Not every problem can be solved with our typical if-else, for/while loop some needed more than that and then comes the need of Machine Learning where we try to automate things.

Prerequisites and Resources:

Machine Learning is a vast field to be explore so give time while understanding things. There are some topics that needed to be clear before proceeding further with the Machine Learning part, also you don’t have to be expert or hold a Master or Ph.D. degree in these topics a basic understanding will work for you.

  1. Python: Python is currently the most popular programming language for the Machine Learning. There are other languages as well that will serve your task like R, Scala, java etc. But python comes with pre-build libraries for artificial intelligence like Tensorflow, Scikit-learn, and many more.
  2. Pandas: Pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language.
  3. Numpy: It demonstrates how n-dimensional (n≥2) arrays are represented and can be manipulated.
  4. Statistics: It takes about 70-80% of your time just to collect, clean and process the data. And statistics is the field the handles the same. So, there is no chance you can skip this topic as it is the most important among all.
  5. Linear Algebra and Calculus: If you are not focused on R&D in Machine Learning then only basic will work for you.

Moving forward with are approach we’ll see all of them step by step and move gradually towards the final outcome.

Step 1. Data Collection


Collecting as much as data possible is a good practice because we know Machine Learning algorithms uses data so, the quantity should not be less than required. The quality can be improved by preprocessing.

There are numerous sites from where you can collect your data for particular algorithm.

  1. Kaggle: Kaggle is a home for a ML/DS engineer one can participate in various competition, learn different skills, gather data, post data and can perform numerous task.

2. UCI Machine Learning Repository: Another home for data collector. Every dataset you named you can get from here. It is a collection of databases, domain theories, and data generators that are used by the machine learning community for the empirical analysis of machine learning algorithms.

UCI Machine Learning Repository

Step 2. Data Visualization


It is the pictorial representation of information or data which we have previously gathered. The Visualization tool provides the accessibility to find patterns, outliers in our data, it helps in making data driven decisions.

We cannot make decision only by seeing data, there comes the need of visualization and one can easily differentiate among good and bad outliers.

Step 3. Exploratory data analysis


Exploratory data analysis (EDA) is used to analyze dataset and to summarize their main characteristics. EDA helps determining how well one can manipulate data to get relevant outcome, helps determine patterns. Test a hypothesis.

The main idea for performing EDA is to check data before making any assumptions out of it. One can use EDA to ensure that the produced results are valid to the desired outcome.

Step 4. Data Preprocessing

Data preprocessing is the technique to extract relevent information and then further analyze from the gathered data before sending to the model. In simple, we can say it is the process for extracting diamond from coal mine.

Step 5. Machine Learning Concepts

Now that we are done with our data part here comes the ML part (say fun part). Before digging deeper first we have to start with the basics.

i) Terminologies:

i.i) Feature: Feature are nothing but the variables in ML. Features can be raw data. For example, If we want to predict whether a given input is cat, we have to see its features like ears, legs, body type, nose etc.

i.ii) Target : A target or label is the value that needs to be predicted by our model. For above example using feature our model will predict whether input is a cat or not.

i.iii) Model: A model is a program that has to be trained to recognize patterns, we train a model over a set of dataset providing it an algorithm that it can use to learn from that data.

i.iv) Model Training: A model training is a process of feeding an ML algorithm with some data to learn pattern and give best outcome out of it.

Step 6. Machine Learning Algorithms:

Choosing appropriate algorithm for a dataset is the most important and crucial part.

Labeled Data → supervised learning.

Unlabeled Data → unsupervised learning.

i) Linear Regression: Numerical Data

ii) Logistic Regression: When output variable binary

iii) Decision Tree: Regression and Classification

iv) Support Vector Machine: Regression and Classification

After training our data we have to compute our loss and check whether our model is performing well or not for given data set, we have to minimize our loss and the algorithm which gives low loss will be the best algorithm for that particular dataset.


The main purpose for choosing this topic is to make you guys’ familiar with the right and effective approach for excelling in the field of Machine Learning. Do share your views and I’d love to know what you think.