Study Apache Spark MLlib on IPython—Classification—Linear SVM & Logistic Regression

HI guys,

Let’s keep going to MLlib. Today, let’s study the Linear SVM and logistic Regression

About the methmatic knowledge, you can refer to these links.

Good 3D

Wiki

Andrew Nguyen’s lecture

scikit-learn SVM kernal function

Spark-Mlib

Of course Andrew Nguyen’s Machine Learning course is unbeatable execellent tutorial for ML beginners, which I strongly recommended. Here’s the coursera link

Similarly, I will paste my IPython notebook code here, github repo at here.

1. SVM(scikit-learn)

1

2

3

4

5

6

7

89

1011

With higher degreed kernel function it fits better but cosumes more resources and may overfit.

II.Logistic Regression

12

We can see the species 1 and species 0 did have different correspond to sepal_length and sepal_length combinations

13

III. PySpark SVM

14

IV PySpark LogisticRegression

16Both SVM and LogisticRegression trains well.

Reference

1. https://spark.apache.org/docs/1.4.0/mllib-linear-methods.html#logistic-regression

2. https://www.udemy.com/learning-python-for-data-analysis-and-visualization/, Jose Portilla

3. http://scikit-learn.org/stable/auto_examples/svm/plot_iris.html

4. http://scikit-learn.org/stable/modules/svm.html

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s