Today, let’s study the Navie Bayes. Bayes formula is well known to all of us, but the way to apply it to classification may be puzzling you. Here’s a very brief tutorial about it from Andrew Ng, which I high recommend you to take a look at (only 10 mins!)
1. Backgound of different type of Bayes classifier, which mainly different from distribution of .
3. One common sense. “Naive” of Naive Bayes stands for considering every pairs of features are independent.
Multinomial naive Bayes: In context, each observation is a document and each feature represents a term whose value is the frequency of the term
Bernoulli naive Bayes: a zero or one indicating whether the term was found in the document
Let’s look at IPython code. Github repo is at here.
We will still use the classic Iris dataset here with all three categories.
Next let’s use MLlib model
We can see the accuracy by MLlib Naive Bayes model is similar to the one by Bernouli Bayes in scikit-learn packages. While, the Multinomial one gets better accuray. Mostly because the Iris dataset features are not integers. If we changed them to integers, the accuracy would become better.