Productive and Efficient Data Science with Python: With Modularizing, Memory profiles, and Parallel/
Tirthajyoti Sarkar
Résumé
This book focuses on the Python-based tools and techniques to help you become highly productive at all aspects of typical data science stacks such as statistical analysis, visualization, model selection, and feature engineering.
You'll review the inefficiencies and bottlenecks lurking in the daily business process and solve them with practical solutions. Automation of repetitive data science tasks is a key mindset that is promoted throughout the book. You'll learn how to extend the existing coding practice to handle larger datasets with high efficiency with the help of advanced libraries and packages that already exist in the Python ecosystem.
The book focuses on topics such as how to measure the memory footprint and execution speed of machine learning models, quality test a data science pipelines, and modularizing a data science pipeline for app development. You'll review Python libraries which come in very handy for automating and speeding up the day-to-day tasks.
In the end, you'll understand and perform data science and machine learning tasks beyond the traditional methods and utilize the full spectrum of the Python data science ecosystem to increase productivity.
What You'll Learn
- Write fast and efficient code for data science and machine learning
- Build robust and expressive data science pipelines
- Measure memory and CPU profile for machine learning methods
- Utilize the full potential of GPU for data science tasks
- Handle large and complex data sets efficiently
Who This Book Is For
Data scientists, data analysts, machine learning engineers, Artificial intelligence practitioners, statisticians who want to take full advantage of Python ecosystem.
Chapter 2: Better Programming Principles for Efficient Data ScienceChapter Goal: Help readers grasp the idea of efficient programming techniques and how they can be applied to a typical data science task flow.No of pages - 15Subtopics* The concept of time and space complexity, Big-O notation* Why complexity matters for data science* Examples of inefficient programming in data science tasks* What you can do instead* Measuring code execution timing
Chapter 3: How to Use Python Data Science Packages more ProductivelyChapter Goal: Illustrate handful of tricks and techniques to use the most well-known Python data science packages - Numpy, Pandas, Matplotlib, Seaborn, Scipy - more productively.No of pages - 20Subtopics* Why Numpy is faster than regular Python code and how much* Using Numpy efficiently* Using Pandas productively* Matplotlib and Seaborn code for and productive EDA* Using SciPy for common data science tasks
Chapter 4: Writing Machine Learning Code More ProductivelyChapter Goal: Teach the reader about writing efficient and modular machine learning code for productive data science pipeline with hands-on examples using Scikit-learn.No of pages - 15Subtopics* Why modular code for machine learning and deep learning* Scikit-learn tools and techniques* Systematic evaluation of Scikit-learn ML algorithms in automated fashion* Decision boundary visualization with custom function* Hyperparameter search in Scikit-learn
Chapter 5: Modular and Productive Deep Learning CodeChapter Goal: Teach the reader about mixing modular programming style in deep learning code with hands-on examples using Keras/TensorFlow.No of pages - 25Subtopics* Why modular code and object-oriented style for deep learning* Wrapper functions with Keras for faster deep learning experimentations* A single function to streamline image classification task flow* Visualize activation functions of neural networks* Custom callback functions in Keras and their utilities* Using Scikit-learn wrapper for hyperparameter search in Keras
Chapter 6: Build Your Own Machine Learning Estimator/PackageChapter Goal: Illustrate how to build a new Python machine learning module/package from scratch.No of pages - 15Subtopics* Why write your own ML package/module?* A simple example vs. a data scientist's example* A good, old Linear Regression estimator - with a twist* How do you start building?* Add utility functions* Do more with object-oriented approach
Chapter 7: Some Cool Utility PackagesChapter Goal: Introduce the readers to the idea of executing data science tasks efficiently by going beyond traditional stack and utilizing exciting, new libraries.No of pages - 20Subtopics* The great Python data science ecosystem* Build pipeline using "pdpipe"* Check data integrity and expectations with "great_expectations"* Speed up Numpy and Pandas using Numexpr* Discover best fitted distributions using "distfit"
Chapter 8: Testing the Machine Learning CodeChapter Goal: Teach the readers some basic principles of testing Python code and how to apply them to a specific case of machine learning module.No of pages - 20Subtopics* Why testing boosts productivity* Basic principles and variations of testing* Data science or machine learning testing is somewhat different* A PyTest module for a ML module
Chapter 9: Memory and Timing ProfilingChapter Goal: Illustrate how to measure and profile typical data science and machine learning code/ module.No of pages - 15Subtopics* Why profiling is important* Well-known profilers out there* cProfile* Memory_profile* Scalene
Chapter 10: Scalable Data ScienceChapter Goal: Demonstrate the importance of scalability in data science tasks with hands-on examples.No of pages - 15Subtopics* Data science pipeline needs to be easily scalable* Common problems - out-of-memory and single-threading* What options are out there?* Hands-on example with Vaex* Hands-on example with Modin
Chapter 11: Parallelized Data ScienceChapter Goal: Demonstrate the importance of parallel processing in data science tasks with hands-on examples.No of pages - 15Subtopics* Data science pipeline should take advantage of parallel computing* Two great options - Ray and Dask* Hands-on example with Dask cluster* Hands-on example with "Ray serve" and actors
Chapter 12: GPU-Based Data Science for High ProductivityChapter Goal: Illustrate how to harness the power of GPU-based hardware for common data science tasks and classical machine learning.No of pages - 20Subtopics* GPU-powered data science (not deep learning)* The RAPIDS ecosystem* CuPy vs. NumPy* CuDF vs. Pandas* CuML vs. Scikit-learn
Chapter 13: Other Useful Skills to MasterChapter Goal: Give an overview of other related skills to master for executing data science tasks more efficiently.No of pages - 25Subtopics* Key things to learn * Understanding the basics of web technologies* Going from local to cloud* Simple web app to showcase a data science project* GUI programming for a quick demo* Being comfortable with container technologies* Putting it all together
Chapter 14: Wrapping It UpChapter Goal: Show a summary of all the things discussed and some future projections.No of pages - 10Subtopics* Chapter-wise summary * What were not discussed in this book* Future projections* General advice for upcoming data scientists
He has published data science books, and regularly contributes highly cited AI/ML-related articles on top platforms such as KDNuggets and Towards Data Science. Tirthajyoti has developed multiple open-source software packages in the field of statistical modeling and data analytics. He has 5 US patents and more than thirty technical publications in international journals and conferences.
He conducts regular workshops and participates in expert panels on various AI/ML topics and contributes to the broader data science community in numerous ways. Tirthajyoti holds a Ph.D. from the University of Illinois and a B.Tech degree from the Indian Institute of Technology, Kharagpur.
Caractéristiques techniques
PAPIER | |
Éditeur(s) | Apress |
Auteur(s) | Tirthajyoti Sarkar |
Parution | 01/07/2022 |
Nb. de pages | 383 |
EAN13 | 9781484281208 |
Avantages Eyrolles.com
Consultez aussi
- Les meilleures ventes en Graphisme & Photo
- Les meilleures ventes en Informatique
- Les meilleures ventes en Construction
- Les meilleures ventes en Entreprise & Droit
- Les meilleures ventes en Sciences
- Les meilleures ventes en Littérature
- Les meilleures ventes en Arts & Loisirs
- Les meilleures ventes en Vie pratique
- Les meilleures ventes en Voyage et Tourisme
- Les meilleures ventes en BD et Jeunesse