I am back. I am sorry I didn’t update any post from September, due to focusing on my current jobs which is working as a Django developer from back-end to the front-end even involving using some D3.js, lol.
Anyway, I am trying to continue study on Big data and data mining at my free time and I will list the following resources I have been through in this half year, especially on Apache Spark.
Although these two books are relatively old, they are decently introduce the data mining on the website and Machine learning algorithms in python respectively, which are worthy to take a quick look.
They very formally present the Machine Learning Algorithms with pdf download available. But with totally concentrating on algorithms and derivatives, it might be boring when you read.
1.3 Text Processing
These two courses firmly explain the text processing with good explanations to nearly all the popular NLP algorithms. Strongly recommend to go though if you are interesting on NLP.
2. Apache Spark
I have passed the Apache Spark Certification by Databricks and O’Reilly yesterday, which is not too hard (as I am not a Spark developer but as a web developer now), but still many questions are pretty puzzling. I am not allow to tell you the specific questions but will recommend the public materials that useful to prepare for it.
Please don’t go online testing, that testing software is extremely hard to install and using inner laptop camera is not allowed, so that you have to buy an other camera. I strongly recommend to go onsite testing.
2.1 Learning Spark pdf
This book is still the bible of Apache Spark. You would better read it at least twice!
2.2 RDD original paper pdf
RDD is the core of Spark and this paper is the original published paper for RDD. I am strongly recommend to read from session 1 to session 6.5
2.3 Spark Summit 2014 to 2015
You would like to go through the ppt and pdf of the following websites. You should learn them easily after you read Learning Spark book.
Another Very good tutorial in Summit 2015 pdf
CS 200 Introduction to Big Data with Apache Spark is suggested to go through.
2.5 UC Berkeley AMP Camp
You would like to take a quick look at the training materials in passed Camps’
2.6 IBM Big Data University
Although this website UI design is lame, the complete certifications cannot be shared in LinkedIn, the following courses are all good enough. If you just interesing on Spark, Spark Fundamentals I and II are suit for you. However, the sandbox from IBM requires over 10G RAM…..
2.7 Databricks Spark Knowledge Git-Books
Spark Base pdf
Spark Reference Application pdf
2.8 Apache Spark API by La Trobe University pdf
Although spark official doc illustrates the API and application usages very well, this pdf document from La Trobe University explained each API method in very detail, I strongly recommend you to read every examples.
2.9 Advanced Analytics with Spark pdf
This book provides commercial level codes, while you can go through it if you have time.
2.10. Coursera —Hadoop Platform and Application Framework
I am taking this course from UCSD, seems decently enough, while it mainly introducing Hadoop.
For Chinese Resources,
——No Commercial Use for all the Links and pdf provided above——
Hopefully all of theses materials are beneficial to you. During my career as a web developer in the past half year, I feel like although Big data and data analysis are extremely hot recently, we have to consider the whole picture of the product and better to understand the product, the market need and then think about how to exploit the data analysis techniques and even big data.