Name: Data Science at Scale with Python and Dask
Price: 363.55 RON
Availability: InStock
Author: Jesse Daniel
ISBN: 9781617295607

Data Science at Scale with Python and Dask

Jesse Daniel

en Limba Engleză Paperback – 11 oct 2019

Ecosistemul abordat în Data Science at Scale with Python and Dask este construit în jurul integrării native a bibliotecii Dask cu pilonii analizei de date în Python: Pandas, NumPy și Scikit-learn. Ne-a atras atenția abordarea pragmatică a autorului Jesse Daniel, care nu solicită cititorului să își schimbe radical fluxul de lucru, ci propune extinderea acestuia prin calcul paralel și distribuit. Subliniem faptul că volumul trece rapid de la teorie la execuție, folosind containere Docker și infrastructură AWS pentru a demonstra cum un algoritm poate scala de pe un simplu laptop pe un cluster cu sute de noduri.

Structura narativă a cărții este tehnică și progresivă. În prima parte, explorăm blocurile fundamentale ale calculului scalabil, pentru ca ulterior să aplicăm aceste concepte pe seturi de date masive, precum arhiva amenzilor de parcare din New York. Un element distinctiv este focusul pe vizualizarea datelor de mari dimensiuni; prin utilizarea Seaborn și Datashader, autorul rezolvă problema reprezentării grafice a milioane de puncte de date fără a sacrifica performanța.

Dacă Scaling Python with Dask v-a oferit cadrul teoretic și o introducere rapidă în API-urile bibliotecii, lucrarea de față oferă instrumentele practice și contextul complet de inginerie a datelor. Spre deosebire de abordările axate pe Spark, prezente în Data Analysis with Python and Pyspark, acest volum rămâne ancorat în universul nativ Python, fiind ideal pentru cei care doresc performanță ridicată fără a părăsi ecosistemul PyData. Este un ghid de implementare care acoperă inclusiv zona de Dask-ML și gestionarea datelor nestructurate prin Bags și Arrays.

Citește tot Restrânge

De ce să citești această carte

Recomandăm această carte cercetătorilor de date care s-au lovit de limitările de memorie ale bibliotecii Pandas. Veți câștiga competențe concrete în gestionarea seturilor de date de ordinul terabytes-ilor și în automatizarea fluxurilor de lucru prin Dask Distributed. Este un manual esențial pentru a face tranziția de la prototipuri locale la aplicații de producție scalabile în cloud.

Descriere

Summary Dask is a native parallel analytics tool designed to integrate seamlessly with the libraries you're already using, including Pandas, NumPy, and Scikit-Learn. With Dask you can crunch and work with huge datasets, using the tools you already have. And Data Science with Python and Dask is your guide to using Dask for your data projects without changing the way you work! Purchase of the print book includes a free eBook in PDF, Kindle, and ePub formats from Manning Publications. You'll find registration instructions inside the print book. About the Technology An efficient data pipeline means everything for the success of a data science project. Dask is a flexible library for parallel computing in Python that makes it easy to build intuitive workflows for ingesting and analyzing large, distributed datasets. Dask provides dynamic task scheduling and parallel collections that extend the functionality of NumPy, Pandas, and Scikit-learn, enabling users to scale their code from a single laptop to a cluster of hundreds of machines with ease. About the Book Data Science with Python and Dask teaches you to build scalable projects that can handle massive datasets. After meeting the Dask framework, you'll analyze data in the NYC Parking Ticket database and use DataFrames to streamline your process. Then, you'll create machine learning models using Dask-ML, build interactive visualizations, and build clusters using AWS and Docker. What's inside Working with large, structured and unstructured datasetsVisualization with Seaborn and DatashaderImplementing your own algorithmsBuilding distributed apps with Dask DistributedPackaging and deploying Dask apps About the Reader For data scientists and developers with experience using Python and the PyData stack. About the Author Jesse Daniel is an experienced Python developer. He taught Python for Data Science at the University of Denver and leads a team of data scientists at a Denver-based media technology company. Table of Contents PART 1 - The Building Blocks of scalable computingWhy scalable computing matters Introducing Dask PART 2 - Working with Structured Data using Dask DataFrames Introducing Dask DataFrames Loading data into DataFrames Cleaning and transforming DataFrames Summarizing and analyzing DataFrames Visualizing DataFrames with Seaborn Visualizing location data with Datashader PART 3 - Extending and deploying DaskWorking with Bags and Arrays Machine learning with Dask-ML Scaling and deploying Dask

Data Science at Scale with Python and Dask

Preț: 363^.55 lei

Carte disponibilă

Specificații

De ce să citești această carte

Descriere

Ficțiune

Business

Medicină

Lifestyle

Copii și adolescenți

Biografii

Artă, arhitectură şi design

Calculatoare și IT

Științe

Tehnologie și inginerie

Papetărie, jocuri, reviste

Data Science at Scale with Python and Dask

Preț: 363.55 lei

Specificații

V-ar putea interesa

De ce să citești această carte

Descriere

Papetărie, jocuri, reviste

Preț: 363^.55 lei