About the Author:¶

More details about Deepak at www.linkedin.com/in/deepakngowda

Spark credentials obtained from databricks in 2015 is as below.

Target audience¶

Targeted for data engineers and data scientists.

Preface¶

First of all, thank you for taking time to read my book. I hope this book meets your expectations. If not, your feedback is highly appreciated. I chose to write this book because I believe that learning spark should be made a lot easier through visualization & hands on pratice rather than just reading text. Another strong reason was to help the readers to get the most efficient and effective solution very quickly for the problem they are trying to solve.

Book makes an attempt to get the answers for your questions more through diagrams, visualizations and code examples and less through theoretical concepts. Special effort has been put to make the visualization easy to understand and remember.

Another unique thing about this book is about modularization. Transforming data is a multi step process. Think of data transformation as an assembly of many Lego blocks to get the desired final shape. Each step of the process involves a certain input and output. Data transformation is often very challenging and needs to be broken down into a series of transformation tasks before getting the desired data. Keeping this in mind, readers who already know spark need not read in a particular order. Instead, the reader can directly jump into sections that are relevant and then the book navigates the reader in many different ways to solve the reader problem. My hope is that this book will help to get the solutions for the majority of your questions.

Thought of writing this book came due to my own needs on a daily basis to find solutions quickly for my own data problems. I often work with other data frame technology and found it hard to memorize the syntax of all the technologies that I worked on . Whenever switching from one to another involved unlearning one and learning the other especially when switching to distributed data structures. I call this as a soak time which was some times frustrating as I noticed that some of the data transformation techniques that I had mastered earlier was again taking time. Sometimes, I have ended up spending few hours instead of few minutes to find the right solution. Hence I started organizing the code to be more efficient and effective. A thought came to my mind one day that I share these documents/code in the form of a book.

This is my first attempt to share my knowledge with a larger base of spark developer community. Your feedback is very important to make it more valuable & effective to the community.

Chapter 1: Spark Basics

Spark Data Frame Programming for Modern Data Engineering

About the Author:¶

Target audience¶

Preface¶