Productive and Efficient Data Science with Python: With Modularizing, Memory profiles, and Parallel/

Tirthajyoti Sarkar

383 pages, parution le 01/07/2022

Ajouter à une liste

Expédié sous 24h

Livraison à partir de 0,01€ dès 35€ d'achats
Pour une livraison en France métropolitaine

Retrait à la librairie - Paris 5e

Disponible dès demain

QUANTITÉ

Résumé

This book focuses on the Python-based tools and techniques to help you become highly productive at all aspects of typical data science stacks such as statistical analysis, visualization, model selection, and feature engineering.

You'll review the inefficiencies and bottlenecks lurking in the daily business process and solve them with practical solutions. Automation of repetitive data science tasks is a key mindset that is promoted throughout the book. You'll learn how to extend the existing coding practice to handle larger datasets with high efficiency with the help of advanced libraries and packages that already exist in the Python ecosystem.

The book focuses on topics such as how to measure the memory footprint and execution speed of machine learning models, quality test a data science pipelines, and modularizing a data science pipeline for app development. You'll review Python libraries which come in very handy for automating and speeding up the day-to-day tasks.

In the end, you'll understand and perform data science and machine learning tasks beyond the traditional methods and utilize the full spectrum of the Python data science ecosystem to increase productivity.

What You'll Learn

Write fast and efficient code for data science and machine learning
Build robust and expressive data science pipelines
Measure memory and CPU profile for machine learning methods
Utilize the full potential of GPU for data science tasks
Handle large and complex data sets efficiently

Who This Book Is For

Data scientists, data analysts, machine learning engineers, Artificial intelligence practitioners, statisticians who want to take full advantage of Python ecosystem.

Chapter 1: What is Productive and Efficient Data Science?Chapter Goal: To introduce the readers with the concept of doing data science tasks efficiently and more productively and illustrating potential pitfalls in their everyday work.No of pages - 10Subtopics* Typical data science pipeline* Short examples of inefficient programming in data science* Some pitfalls to avoid* Efficiency and productivity go hand in hand* Overview of tools and techniques for a productive data science pipeline* Skills and attitude for productive data science
Chapter 2: Better Programming Principles for Efficient Data ScienceChapter Goal: Help readers grasp the idea of efficient programming techniques and how they can be applied to a typical data science task flow.No of pages - 15Subtopics* The concept of time and space complexity, Big-O notation* Why complexity matters for data science* Examples of inefficient programming in data science tasks* What you can do instead* Measuring code execution timing
Chapter 3: How to Use Python Data Science Packages more ProductivelyChapter Goal: Illustrate handful of tricks and techniques to use the most well-known Python data science packages - Numpy, Pandas, Matplotlib, Seaborn, Scipy - more productively.No of pages - 20Subtopics* Why Numpy is faster than regular Python code and how much* Using Numpy efficiently* Using Pandas productively* Matplotlib and Seaborn code for and productive EDA* Using SciPy for common data science tasks
Chapter 4: Writing Machine Learning Code More ProductivelyChapter Goal: Teach the reader about writing efficient and modular machine learning code for productive data science pipeline with hands-on examples using Scikit-learn.No of pages - 15Subtopics* Why modular code for machine learning and deep learning* Scikit-learn tools and techniques* Systematic evaluation of Scikit-learn ML algorithms in automated fashion* Decision boundary visualization with custom function* Hyperparameter search in Scikit-learn
Chapter 5: Modular and Productive Deep Learning CodeChapter Goal: Teach the reader about mixing modular programming style in deep learning code with hands-on examples using Keras/TensorFlow.No of pages - 25Subtopics* Why modular code and object-oriented style for deep learning* Wrapper functions with Keras for faster deep learning experimentations* A single function to streamline image classification task flow* Visualize activation functions of neural networks* Custom callback functions in Keras and their utilities* Using Scikit-learn wrapper for hyperparameter search in Keras
Chapter 6: Build Your Own Machine Learning Estimator/PackageChapter Goal: Illustrate how to build a new Python machine learning module/package from scratch.No of pages - 15Subtopics* Why write your own ML package/module?* A simple example vs. a data scientist's example* A good, old Linear Regression estimator - with a twist* How do you start building?* Add utility functions* Do more with object-oriented approach
Chapter 7: Some Cool Utility PackagesChapter Goal: Introduce the readers to the idea of executing data science tasks efficiently by going beyond traditional stack and utilizing exciting, new libraries.No of pages - 20Subtopics* The great Python data science ecosystem* Build pipeline using "pdpipe"* Check data integrity and expectations with "great_expectations"* Speed up Numpy and Pandas using Numexpr* Discover best fitted distributions using "distfit"
Chapter 8: Testing the Machine Learning CodeChapter Goal: Teach the readers some basic principles of testing Python code and how to apply them to a specific case of machine learning module.No of pages - 20Subtopics* Why testing boosts productivity* Basic principles and variations of testing* Data science or machine learning testing is somewhat different* A PyTest module for a ML module
Chapter 9: Memory and Timing ProfilingChapter Goal: Illustrate how to measure and profile typical data science and machine learning code/ module.No of pages - 15Subtopics* Why profiling is important* Well-known profilers out there* cProfile* Memory_profile* Scalene
Chapter 10: Scalable Data ScienceChapter Goal: Demonstrate the importance of scalability in data science tasks with hands-on examples.No of pages - 15Subtopics* Data science pipeline needs to be easily scalable* Common problems - out-of-memory and single-threading* What options are out there?* Hands-on example with Vaex* Hands-on example with Modin
Chapter 11: Parallelized Data ScienceChapter Goal: Demonstrate the importance of parallel processing in data science tasks with hands-on examples.No of pages - 15Subtopics* Data science pipeline should take advantage of parallel computing* Two great options - Ray and Dask* Hands-on example with Dask cluster* Hands-on example with "Ray serve" and actors
Chapter 12: GPU-Based Data Science for High ProductivityChapter Goal: Illustrate how to harness the power of GPU-based hardware for common data science tasks and classical machine learning.No of pages - 20Subtopics* GPU-powered data science (not deep learning)* The RAPIDS ecosystem* CuPy vs. NumPy* CuDF vs. Pandas* CuML vs. Scikit-learn
Chapter 13: Other Useful Skills to MasterChapter Goal: Give an overview of other related skills to master for executing data science tasks more efficiently.No of pages - 25Subtopics* Key things to learn * Understanding the basics of web technologies* Going from local to cloud* Simple web app to showcase a data science project* GUI programming for a quick demo* Being comfortable with container technologies* Putting it all together
Chapter 14: Wrapping It UpChapter Goal: Show a summary of all the things discussed and some future projections.No of pages - 10Subtopics* Chapter-wise summary * What were not discussed in this book* Future projections* General advice for upcoming data scientists
Dr. Tirthajyoti Sarkar lives in the San Francisco Bay area works as a Data Science and Solutions Engineering Manager at Adapdix Corp., where he architects Artificial intelligence and Machine learning solutions for edge-computing based systems powering the Industry 4.0 and Smart manufacturing revolution across a wide range of industries. Before that, he spent more than a decade developing best-in-class semiconductor technologies for power electronics.
He has published data science books, and regularly contributes highly cited AI/ML-related articles on top platforms such as KDNuggets and Towards Data Science. Tirthajyoti has developed multiple open-source software packages in the field of statistical modeling and data analytics. He has 5 US patents and more than thirty technical publications in international journals and conferences.
He conducts regular workshops and participates in expert panels on various AI/ML topics and contributes to the broader data science community in numerous ways. Tirthajyoti holds a Ph.D. from the University of Illinois and a B.Tech degree from the Indian Institute of Technology, Kharagpur.

Caractéristiques techniques

	PAPIER
Éditeur(s)	Apress
Auteur(s)	Tirthajyoti Sarkar
Parution	01/07/2022
Nb. de pages	383
EAN13	9781484281208

Avantages Eyrolles.com

Livraison à partir de 0,01 € en France métropolitaine

Paiement en ligne SÉCURISÉ

Livraison dans le monde

Retour sous 15 jours

+ d'un million et demi de livres disponibles

Productive and Efficient Data Science with Python: With Modularizing, Memory profiles, and Parallel/

Résumé

Caractéristiques techniques

Consultez aussi