Gain insight into essential data science skills in a holistic manner using data engineering and associated scalable computational methods. This book covers the most popular Python 3 frameworks for both local and distributed (in premise and cloud based) processing. Along the way, you will be introduced to many popular open-source frameworks, like, SciPy, scikitlearn, Numba, Apache Spark, etc. The book is structured around examples, so you will grasp core concepts via case studies and Python 3 code.
As data science projects gets continuously larger and more complex, software engineering knowledge and experience is crucial to produce evolvable solutions. You’ll see how to create maintainable software for data science and how to document data engineering practices.