Scaling TB's of data with Apache Spark and Scala DSL at Production

Video 1

Scaling TB's of data with Apache Spark and Scala DSL at Production

2018-06-17 (Day 2) ~ 10:50 - 11:20

Conference Hall 4-5

English

Intermediate

English

Intermediate

Apache Spark is one of the top big-data processing platforms and has driven the adoption of Scala in many industry and academic settings. As entire Apache Spark framework has been written in Scala as a base, it’s real pleasure to understand beauty of functional Scala DSL with Spark.

This talk is intent to present :

Primary data structures (RDD, DataSet, Dataframe) usage in universal large scale data processing with Hbase (Data lake), Hive (Analytical Engine).
Case study: We will go through importance of physical data split up techniques such as coalesce, Partition, Repartition and other important spark internals in Scaling TB’s of data / ~17 billions records
Also, We will understand crucial part and very interesting way of understanding parallel & concurrent distributed data processing – tuning memory, cache, Disk I/O, Leaking memory, Internal shuffle, spark executor, spark driver etc.

Chetankumar Khatri

Accion labs Inc. / India

Chetan Khatri is working as a Technical Lead at Accion labs, he has diverse experience in field of Data Science and Machine learning. He is a open source contributor at Apache Spark, Apache HBase, Apache Spark - HBase Connector, Elixir Lang and many other open source projects. He has been authored curriculum of Artificial Intelligence, Data Science, Distributed computing at KSKV Kachchh University, Government of Gujarat - INDIA. He has also reviewed couple of Books with Scala Machine learning, Tensorflow Deep learning, Machine learning for Web with Packt Publication. He has delivered many talks at Pycon India 2016, PyKutch 2016, FOSSASIA 2018

Distributing Machine learning with Apache Spark - Pycon India 2016
Think Machine learning with Scikit-learn - PyKutch 2016

Open Source Contributor:

Apache Spark
Apache HBase
Apache MXNet
ParlAI
Spark HBase Connector