Introduction

Most problems in real world are mathematical or can be reduced to one. These problems reduce to converting an input(s), after applying some form of logic, to a desired output (i.e a finding a functional mapping). Now if the underlying logic of the function :

Is known -> easy peezy -> then Explicitly program the logic
- eg. problems on codechef/SPOJ etc. hard-logic
Is not known
- ML can be attempted….When can it be attempted?

Why learn ML ?

According to a report from IBM, in 2015 there were 2.35 million openings for data analytics jobs in the US. It estimates that number will rise to 2.72 million by 2020.

What is ML ?

Provides machines the ability to automatically learn and improve from experience(can be in form of data) without being explicitly programmed.

Real world examples :

want to tag ? on facebook
smart (google) maps
predictive keyboards

learning(machine learning) is not memorisation:

learning is to figure out a pattern that applies to outside (unseen) datapoints. memorisation is to just cram the data (called overfitting), so that pattern only applies to the given dataset, and no guarentee can be made for its performance on unseen data.

When to apply ML ?

1. Pattern exists

If the problem didn’t have an underlying pattern, you can still try learning. Though you will certainly fail in such a case. We can always apply/try machine learning regardless of whether there is an underlying pattern or not.
How do we know if there is a pattern ?
you can only know it by two ways:Visualising the data
- hit and trial (just try an algo)
- There is something that you can actually measure called the performance metric that will tell you if you learned or not.
There is no harm done in trying machine learning

2. Cannot pin down the pattern mathematically

If you can pin down the pattern mathematically/ explicitly, then machine learning will still work, BUT it is certainly not a wise/optimal way to solve that problem.

3. Have enough Data

If you don’t have relevant data -> THEN MACHINE LEARNING IS NOT POSSIBLE

Lets do the “hello world” of ML

Linear regression

for given dataset(x,y) where x is an independent variable, and y is a continuous target variable.

What does a Machine Learning Algo constitute of?

Model
1. Input data points
2. Examples of Expected Output(supervised learning)
3. Values that the model generates at the supplied input data points
Performance Metric (a feedback signal to adjust the way the ML algorithm works)
- This is what is called LEARNING -or- MACHINE LEARNING.
Loss function
Optimiser

A non-exhaustive Classification of Machine Learning Algorithms:

classification

Some commonly used basic machine learning algorithms. These algorithms may be applied to almost any data problem:

Linear Regression
Logistic Regression
Decision Tree
SVM
Naive Bayes
kNN
K-Means
Random Forest
Dimensionality Reduction Algorithms
Gradient Boosting algorithms
- GBM
- XGBoost
- LightGBM
- CatBoost
- AdaBoost

Implementation Of ML

requires a powerful programming language, use of existing frameworks.

Language of choice

R (strictly for data science)
python (+ added benefits)
Matlab/octave(prototyping algorithms , never used in production systems)

We will prefer python(3) during the course of LR classes as it has a smooth learning curve, and almost all open source frameworks support python:

Apart from the frameworks you would also need to study a few lower level python libraries to handle the datasets:

Python Libraries for handling and visualising data

numpy
scipy
pandas
matplotlib
seaborn
plotly

Common Frameworks :

scikit-learn (easy of use, easy experimentation)
tensorflow
theano
keras (HIGH-level framework for mainly for DL)
pytorch
caffe
microsoft’s CNTK

Linear Regression Class Slides

Click here for linear regression slides.

Note : it contains extra topics, which were not covered in the class.

Useful resources :

API documentation in one place : devdocs.io
Free GPU for 12hrs at a time, after which your VM instance would be reset(you will have to rerun your code again after 12 hrs)google colab
Basic course for learning machine learning algorithms : Andrew Ng’s Course. Though for the purpose of logical rhythmm you can skip the matlab exercises, and try things in python instead. Choice is your’s to make.
What is probability distribution
More on probability distributions
Blog’s to follow :
- analytics vidhya
- towardsdatascience

WILL UPDATE THIS ARTICLE WITH MORE RESOURCES

MNNIT Computer Club

This repository contains the codes, support links and other relevant materials for every class under Computer Club, MNNIT Allahabad.