- Introduction
- Implementation Of ML
Introduction
Most problems in real world are mathematical or can be reduced to one. These problems reduce to converting an input(s), after applying some form of logic, to a desired output (i.e a finding a functional mapping). Now if the underlying logic of the function :
- Is known -> easy peezy -> then Explicitly program the logic
- eg. problems on codechef/SPOJ etc. hard-logic
- Is not known
- ML can be attempted….When can it be attempted?
Why learn ML ?
According to a report from IBM, in 2015 there were 2.35 million openings for data analytics jobs in the US. It estimates that number will rise to 2.72 million by 2020.
What is ML ?
Provides machines the ability to automatically learn and improve from experience(can be in form of data) without being explicitly programmed.
Real world examples :
- want to tag
? on facebook - smart (google) maps
- predictive keyboards
learning(machine learning) is not memorisation:
learning is to figure out a pattern that applies to outside (unseen) datapoints. memorisation is to just cram the data (called overfitting), so that pattern only applies to the given dataset, and no guarentee can be made for its performance on unseen data.
When to apply ML ?
1. Pattern exists
- If the problem didn’t have an underlying pattern, you can still try learning. Though you will certainly fail in such a case. We can always apply/try machine learning regardless of whether there is an underlying pattern or not.
- How do we know if there is a pattern ?
- you can only know it by two ways:Visualising the data
- hit and trial (just try an algo)
- There is something that you can actually measure called the performance metric that will tell you if you learned or not.
- There is no harm done in trying machine learning
2. Cannot pin down the pattern mathematically
- If you can pin down the pattern mathematically/ explicitly, then machine learning will still work, BUT it is certainly not a wise/optimal way to solve that problem.
3. Have enough Data
- If you don’t have relevant data -> THEN MACHINE LEARNING IS NOT POSSIBLE
Lets do the “hello world” of ML
Linear regression
- for given dataset(x,y) where x is an independent variable, and y is a continuous target variable.
What does a Machine Learning Algo constitute of?
- Model
- Input data points
- Examples of Expected Output(supervised learning)
- Values that the model generates at the supplied input data points
- Performance Metric (a feedback signal to adjust the way the ML algorithm works)
- This is what is called LEARNING -or- MACHINE LEARNING.
- Loss function
- Optimiser
A non-exhaustive Classification of Machine Learning Algorithms:
Some commonly used basic machine learning algorithms. These algorithms may be applied to almost any data problem:
- Linear Regression
- Logistic Regression
- Decision Tree
- SVM
- Naive Bayes
- kNN
- K-Means
- Random Forest
- Dimensionality Reduction Algorithms
- Gradient Boosting algorithms
- GBM
- XGBoost
- LightGBM
- CatBoost
- AdaBoost
Implementation Of ML
requires a powerful programming language, use of existing frameworks.
Language of choice
- R (strictly for data science)
- python (+ added benefits)
- Matlab/octave(prototyping algorithms , never used in production systems)
We will prefer python(3) during the course of LR classes as it has a smooth learning curve, and almost all open source frameworks support python:
Apart from the frameworks you would also need to study a few lower level python libraries to handle the datasets:
Python Libraries for handling and visualising data
- numpy
- scipy
- pandas
- matplotlib
- seaborn
- plotly
Common Frameworks :
- scikit-learn (easy of use, easy experimentation)
- tensorflow
- theano
- keras (HIGH-level framework for mainly for DL)
- pytorch
- caffe
- microsoft’s CNTK
Linear Regression Class Slides
Click here for linear regression slides.
Note : it contains extra topics, which were not covered in the class.
Useful resources :
- API documentation in one place : devdocs.io
- Free GPU for 12hrs at a time, after which your VM instance would be reset(you will have to rerun your code again after 12 hrs)google colab
- Basic course for learning machine learning algorithms : Andrew Ng’s Course. Though for the purpose of logical rhythmm you can skip the matlab exercises, and try things in python instead. Choice is your’s to make.
- What is probability distribution
- More on probability distributions
- Blog’s to follow :
WILL UPDATE THIS ARTICLE WITH MORE RESOURCES