MNNIT Computer Club

This repository contains the codes, support links and other relevant materials for every class under Computer Club, MNNIT Allahabad.

View on GitHub

Introduction

Most problems in real world are mathematical or can be reduced to one. These problems reduce to converting an input(s), after applying some form of logic, to a desired output (i.e a finding a functional mapping). Now if the underlying logic of the function :

  1. Is known -> easy peezy -> then Explicitly program the logic
    • eg. problems on codechef/SPOJ etc. hard-logic
  2. Is not known

Why learn ML ?

According to a report from IBM, in 2015 there were 2.35 million openings for data analytics jobs in the US. It estimates that number will rise to 2.72 million by 2020.

What is ML ?

Provides machines the ability to automatically learn and improve from experience(can be in form of data) without being explicitly programmed.

Real world examples :

learning(machine learning) is not memorisation:

learning is to figure out a pattern that applies to outside (unseen) datapoints. memorisation is to just cram the data (called overfitting), so that pattern only applies to the given dataset, and no guarentee can be made for its performance on unseen data.

When to apply ML ?

1. Pattern exists

2. Cannot pin down the pattern mathematically

3. Have enough Data

Lets do the “hello world” of ML

Linear regression

What does a Machine Learning Algo constitute of?

  1. Model
    1. Input data points
    2. Examples of Expected Output(supervised learning)
    3. Values that the model generates at the supplied input data points
  2. Performance Metric (a feedback signal to adjust the way the ML algorithm works)
    • This is what is called LEARNING -or- MACHINE LEARNING.
  3. Loss function
  4. Optimiser

A non-exhaustive Classification of Machine Learning Algorithms:

classification

Some commonly used basic machine learning algorithms. These algorithms may be applied to almost any data problem:

  1. Linear Regression
  2. Logistic Regression
  3. Decision Tree
  4. SVM
  5. Naive Bayes
  6. kNN
  7. K-Means
  8. Random Forest
  9. Dimensionality Reduction Algorithms
  10. Gradient Boosting algorithms
    • GBM
    • XGBoost
    • LightGBM
    • CatBoost
    • AdaBoost

Implementation Of ML

requires a powerful programming language, use of existing frameworks.

Language of choice

  1. R (strictly for data science)
  2. python (+ added benefits)
  3. Matlab/octave(prototyping algorithms , never used in production systems)

We will prefer python(3) during the course of LR classes as it has a smooth learning curve, and almost all open source frameworks support python:

Apart from the frameworks you would also need to study a few lower level python libraries to handle the datasets:

Python Libraries for handling and visualising data

  1. numpy
  2. scipy
  3. pandas
  4. matplotlib
  5. seaborn
  6. plotly

Common Frameworks :

  1. scikit-learn (easy of use, easy experimentation)
  2. tensorflow
  3. theano
  4. keras (HIGH-level framework for mainly for DL)
  5. pytorch
  6. caffe
  7. microsoft’s CNTK

Linear Regression Class Slides

Click here for linear regression slides.

Note : it contains extra topics, which were not covered in the class.

Useful resources :

WILL UPDATE THIS ARTICLE WITH MORE RESOURCES