Welcome

I am a big data beginner living in the bay area, who spent half of year on learning about Hadoop, Yarn, Spark, Scala and Data Analysis with Python. With spending a lot of time on these directions, I would like to share with my experience and learning materials and wish may help you on learning big data.
My Linkedin

Index

I. Online Resources of Big Data, Data Analysis

1. My Learning Curve of Big Data and Data Analysis I

2. My Learning Curve of Spark and Data Analysis II

II. Big data tools Installation

2. Step by Step of installing SingleNode Yarn on Ubuntu

3. Step by Step of installing Apache Spark on Apache Hadoop

4. Step by Step of installing Apache Cassandra on Ubuntu Standalone mode

5. Setup ipython notebook on PySpark

III. Infrastructure

6. Step by Step of Configuring Apache Spark to Connecting with Cassandra

7. Step by Step of Installing Apache Kafka and Communicating with Spark

8. Step by Step of Installing Tachyon in Stanalone mode and Work with Apache Spark

IV. Scala Project on Intellij

9. Step by Step of Building Scala SBT project on Intellij

10. Study Spam Classifier Code by MLlib on Intellij

V. PySpark MLlib

11. Setup ipython notebook on PySpark

12. A brief introduction and summary of MLlib

13.Study Apache Spark MLlib on IPython—Linear Regression

14.Study Apache Spark MLlib on IPython—Classification—Linear SVM & Logistic Regression

15.Study Apache Spark MLlib on IPython—Classification—Naive Bayes

16.Study Apache Spark MLlib on IPython—Regression & Classification—Decision Tree

17.Study Apache Spark MLlib on IPython—Regression & Classification—Random Forest & GBTs

18.Study Apache Spark MLlib on IPython—Clustering—K-Means

19.Study Apache Spark MLlib on IPython—Clustering—GMM

 VI. ASIC Verification

20. ASIC Verification interview questions

VII. Software

21. Tips for Deploying Meteor App to AWS EC2

22. Web scraping using Scrapy and Deploy on Heroku

Reference of background image

4 comments

  1. Teng Huang · July 6, 2015

    翀哥v5 !!!!!

    Like

  2. Nikitha · September 30

    Hi first i would like to appreciate on having such a b=useful blog with clear explanations. And I am working on spark streaming. BY any chance, do u have any blogs on spark streaming and doing data analysis on the real-time data. Thanks so much

    Like

    • cyrobin · October 1

      Thanks Nikitha. However, I don’t have any more real-time data analysis blogs

      Like

  3. Vivek · October 6

    Hi, I have tried with above example but i am getting below error, Can you please help me
    ———————————————————————————————————————————————

    Exception in thread “main” java.io.IOException: Failed to open native connection to Cassandra at {192.168.0.3}:9042
    at com.datastax.spark.connector.cql.CassandraConnector$.com$datastax$spark$connector$cql$CassandraConnector$$createSession(CassandraConnector.scala:164)
    at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$2.apply(CassandraConnector.scala:150)
    at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$2.apply(CassandraConnector.scala:150)
    at com.datastax.spark.connector.cql.RefCountedCache.createNewValueAndKeys(RefCountedCache.scala:31)
    at com.datastax.spark.connector.cql.RefCountedCache.acquire(RefCountedCache.scala:56)
    at com.datastax.spark.connector.cql.CassandraConnector.openSession(CassandraConnector.scala:81)
    at com.datastax.spark.connector.cql.CassandraConnector.withSessionDo(CassandraConnector.scala:109)

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s