Tous nos rayons

Déjà client ? Identifiez-vous

Mot de passe oublié ?

Nouveau client ?

CRÉER VOTRE COMPTE
Big Data: Concepts, Technology, and Architecture
Ajouter à une liste

Librairie Eyrolles - Paris 5e
Indisponible

Big Data: Concepts, Technology, and Architecture

Big Data: Concepts, Technology, and Architecture

Balamurugan / Abirami R Balusamy

384 pages, parution le 10/05/2021

Résumé

This book offers comprehensive coverage of Big Data tools, terminologies and technologies for researchers, business professionals and graduates. This book begins with an overview of what Big Data is and emphasizes all the key concepts of big data end to end. Big Data concepts, technologies, terminologies and storing, processing and analysis techniques and much more - are all logically organized and reinforced by diagrams and case studies. This book refines readers' understanding of Big Data with in-depth analysis of key concepts. The case studies provided in this book give insight on key concepts. The initial chapters of the book shed light on various characteristics of Big Data that distinguish it from traditional Database Management systems. Big Data Analytics are covered in detail in a separate chapter. Hadoop, the heart of Big Data is handled in the Big Data processing chapter and a deep understanding of its concepts is provided.Big Data - concepts, Technology and Architecture. 1 Book Description.. 11 1.1 Understanding Big Data. 13 1.2 Evolution of Big Data. 14 1.3 Failure of Traditional database in handling Big Data. 15 1.3 (a) Data Mining Vs Big Data. 16 1.4 3 V's of Big Data. 17 1.4.1 Volume. 17 1.4.2 Velocity. 18 1.4.3 Variety. 19 1.5 Sources of Big Data. 19 1.6 Different Types of Data. 21 1.6.1 Structured Data. 22 1.6.2 Unstructured Data. 22 1.6.3 Semi-Structured Data. 23 1.7 Big Data Infrastructure. 24 1.8 Big Data Life Cycle. 25 1.8.1 Big Data Generation. 26 1.8.2 Data Aggregation. 26 1.8.3 Data Preprocessing. 27 1.7.3Big Data Analytics. 31 1.7.4 Visualizing Big Data. 32 1.8 Big Data Technology. 32 1.8.1 Challenges faced by Big Data technology. 34 1.8.1 Heterogeneity and incompleteness. 34 1.8.2 Volume and velocity of the Data. 35 1.8.3 Data Storage. 35 1.8.4 Data Privacy. 36 1.9 Big Data Applications. 36 1.10 Big Data Use Cases. 37 1.9. 1 Healthcare. 37 1.9.2 Telecom.. 38 1.9.3 Financial Services. 39 Chapter 1 refresher: 40 Conceptual short Questions with answers. 43 Frequently asked Interview questions. 45 Chapter Objective. 46 Big Data Storage Concepts. 46 2.1 Cluster computing. 47 2.1.1 Types of cluster. 49 2.1.1.1 High availability cluster. 50 2.1.1.2 Load balancing cluster. 50 2.1.2 Cluster structure. 51 2.3 Distribution Models. 53 2.3.1 Sharding. 54 2.3.2 Data Replication. 56 2.3.2.1 Master-Slave model 57 2.3.2.2 Peer-to-Peer model 58 2.3.3 Sharding and Replication. 59 2.4 Distributed file system.. 60 2.5 Relational and Non Relational Databases. 61 CoursesOffered. 62 Figure 2.12 Data divided across multiple related tables. 62 2.4.2 RDBMS Databases. 63 2.4.3 NoSQL Databases. 63 2.4.4 NewSQL Databases. 64 2.5 Scaling Up and Scaling Out Storage. 65 Chapter 2 refresher. 67 Conceptual short questions with answers. 69 Chapter Objective. 72 3.1 Introduction to NoSQL. 72 3.2 Why NoSQL. 72 3.3 CAP theorem.. 73 3.4 ACID.. 75 3.5 BASE. 76 3.6 Schemaless Database. 77 3.7 NoSQL (Not Only SQL) 77 3.7.1 NoSQL Vs RDBMS. 78 3.7.2Features of NoSQL database. 79 3.7.3Types of NoSQL Technologies. 80 3.7.3.1 Key-Value store database. 81 3.7.3.2 Column-store database. 82 3.7.3.3 Document Oriented Database. 84 3.7.3.4 Graph-oriented Database. 86 3.7.4 NoSQL Operations. 93 3.9 Migrating from RDBMS to NoSQL. 98 Chapter 3 refresher. 99 Conceptual short questions with answers. 102 Chapter Objective. 104 4.1 Data Processing. 104 4.2 Shared Everything Architecture. 106 4.2.1 Symmetric multiprocessing architecture. 107 4.2.2 Distributed Shared memory. 108 4.3 Shared nothing architecture. 109 4.4 Batch Processing. 110 4.5 Real-Time Data Processing. 111 4.6 Parallel Computing. 112 4.7 Distributed Computing. 113 4.8 Big Data Virtualization. 113 4.8.1 Attributes of Virtualization. 114 4.8.1.1 Encapsulation. 115 4.8.1.2 Partitioning. 115 4.8.1.3 Isolation. 115 4.8.2Big Data Server Virtualization. 116 4.9 Introduction. 116 4.10 Cloud computing types. 118 4.11Cloud Services. 120 4.12 Cloud Storage. 121 4.12.1 Architecture of GFS. 121 4.12.1.1 Master. 123 4.12.1.2 Client. 123 4.13 Cloud Architecture. 127 Cloud Challenges. 129 Chapter 4 Refresher. 130 Conceptual short questions with answers. 133 Chapter Objective. 139 5.1 Apache Hadoop. 139 5.1.1 Architecture of Apache Hadoop. 140 5.1.2Hadoop Ecosystem Components Overview.. 140 5.2 Hadoop Storage. 142 5.2.1HDFS (Hadoop Distributed File System). 142 5.2.2Why HDFS?. 143 5.2.3HDFS Architecture. 143 5.2.4HDFS Read/Write Operation. 146 5.2.5Rack Awareness. 148 5.2.6Features of HDFS. 149 5.2.6.1Cost-effective. 149 5.2.6.2Distributed storage. 149 5.2.6.3Data Replication. 149 5.3 Hadoop Computation. 149 5.3.1MapReduce. 149 5.3.1.1Mapper. 151 5.3.1.2Combiner. 151 5.3.1.3 Reducer. 152 5.3.1.4 JobTracker and TaskTracker. 153 5.3.2 MapReduce Input Formats. 154 5.3.3 MapReduce Example. 156 5.3.4 MapReduce Processing. 157 5.3.5 MapReduce Algorithm.. 160 5.3.6 Limitations of MapReduce. 161 5.4Hadoop 2.0. 161 5.4.1Hadoop 1.0 limitations. 162 5.4.2 Features of Hadoop 2.0. 163 5.4.3 Yet Another Resource Negotiator (YARN). 164 5.4.3 Core components of YARN.. 165 5.4.3.1 ResourceManager. 165 5.4.3.2 NodeManager. 166 5.4.4 YARN Scheduler. 169 5.4.4.1 FIFO scheduler. 169 5.4.4.2 Capacity Scheduler. 170 5.4.4.3 Fair Scheduler. 170 5.4.5 Failures in YARN.. 171 5.4.5.1ResourceManager failure. 171 5.4.5.2 ApplicationMaster failure. 172 5.4.5.3 NodeManagerFailure. 172 5.4.5.4 Container Failure. 172 5.3 HBASE. 173 5.4 Apache Cassandra. 176 5.5 SQOOP. 177 5.6 Flume. 179 5.6.1 Flume Architecture. 179 5.6.1.1 Event. 180 5.6.1.2 Agent. 180 5.7 Apache Avro. 181 5.8 Apache Pig. 182 5.9 Apache Mahout. 183 5.10 Apache Oozie. 183 5.10.1 Oozie Workflow.. 184 5.10.2 Oozie Coordinators. 186 5.10.3 Oozie Bundles. 187 5.11 Apache Hive. 187 5.11 Apache Hive. 187 Hive Architecture. 189 Hadoop Distributions. 190 Chapter 5refresher. 191 Conceptual short questions with answers. 194 Frequently asked Interview Questions. 199 Chapter Objective. 200 6.1 Terminologies of Big Data Analytics. 201 Data Warehouse. 201 Business Intelligence. 201 Analytics. 202 6.2 Big Data Analytics. 202 6.2.1 Descriptive Analytics. 204 6.2.2 Diagnostic Analytics. 205 6.2.3 Predictive Analytics. 205 6.2.4 Prescriptive Analytics. 205 6.3 Data Analytics Lifecycle. 207 6.3.1 Business case evaluation and Identify the source data. 208 6.3.2 Data preparation. 209 6.3.3 Data Extraction and Transformation. 210 6.3.4 Data Analysis and visualization. 211 6.3.5 Analytics application. 212 6.4 Big Data Analytics Techniques. 212 6.4.1 Quantitative Analysis. 212 6.4.3 Statistical analysis. 214 6.4.3.1 A/B testing. 214 6.4.3.2 Correlation. 215 6.4.3.3 Regression. 218 6.5 Semantic Analysis. 220 6.5.1 Natural Language Processing. 220 6.5.2 Text Analytics. 221 6.7 Big Data Business Intelligence. 222 6.7.1 Online Transaction Processing (OLTP). 223 6.7.2 Online Analytical Processing (OLAP). 223 6.7.3 Real-Time Analytics Platform (RTAP). 224 6.6Big Data Real Time Analytics Processing. 225 6.7 Enterprise Data Warehouse. 227 Chapter 6 Refresher. 228 Conceptual short questions with answers. 230 Chapter Objective. 233 7.1 Introduction to Machine learning. 233 7.2 Machine learning use cases. 234 7.3 Types of Machine learning. 235 7.3.1 Supervised machine learning algorithm.. 236 7.3.1.1 Classification. 237 7.3.1.2 Regression. 238 Support vector machines (SVM). 239 Big Data Analytics Practical Application. 244 Chapter 7 Refresher. 245 Conceptual short questions with answers. 247 Chapter Objective. 249 8.1 Itemset Mining. 249 8.2 Association Rules. 255 8.3 Frequent itemset generation. 259 8.4 Itemset Mining Algorithms. 260 8.4.1 Apriori Algorithm.. 260 8.4.1.2 Frequent Itemset generation using Apriori Algorithm.. 266 8.4.2 Eclat Algorithm - Equivalence Class Transformation Algorithm.. 268 8.4.3 FP growth algorithm.. 271 8.5 Maximal and Closed Frequent Itemset. 278 Mining Closed Frequent Itemsets: Charm Algorithm.. 284 CHARM Algorithm implementation. 285 Data Mining Methods. 287 8.8 Prediction. 288 8.8.2 Classification techniques. 289 8.8.2.1 Bayesian Network. 289 8.8.2.2 K- Nearest Neighbor Algorithm.. 294 8.8.2.2.1 The Distance metric. 296 8.8.2.2.2 The parameter selection - cross validation. 296 8.8.2.3 Decision tree classifier. 297 Density based clustering algorithm.. 299 DBSCAN.. 299 Kernel Density Estimation. 303 8.9.3 Artificial Neural Network. 303 The Biological Neural Network. 303 8.11 Mining Data Streams. 305 Time Series Forecasting. 306 9.1Clustering. 308 Application of Hierarchical methods. 315 Kernel k-means clustering. 321 Expectation Maximization Clustering Algorithm.. 323 Methods of determining the Number of clusters: 327 Outlier detection. 327 Types of Outliers. 329 Outlier detection techniques. 332 Training dataset based outlier detection. 332 Assumption based outlier detection. 333 Applications of outlier detection. 334 9.6.3 Optimization Algorithm.. 335 Choosing the Number of Clusters. 339 Bayesian Analysis of Mixtures. 342 Fuzzy Clustering. 342 10.1 Big Data Visualization. 345 10.2 Conventional Data Visualization Techniques. 346 10.2.1 Line Chart. 346 10.2.2 Bar Chart. 347 10.2.3 Pie Chart. 348 10.2.4 Scatter Plot. 349 10.2.5 Bubble plot. 350 Tableau. 350 Connecting to data. 354 Connecting to data in Cloud. 355 Connect to a file. 356 Scatter plot in tableau. 362 Histogram using Tablaeu. 365 Bar chart in tableau. 365 Line Chart. 367 Pie chart. 368 Bubble chart. 369 Box Plot. 370 Tableau Use Cases. 371 Airlines. 371 Office Supplies. 372 Sports. 374 Science - Earthquake Analysis. 375 Tableau is used to analyze the magnitude of earth quake and the frequency of occurrence over the years. 375 Installing R and Getting Ready. 377 R Basic commands. 378 Assigning value to a variable. 378 Data Structures in R. 379 Vector. 379 Coercion. 380 Length, Mean and median. 381 Matrix. 382 Arrays. 385 Data frames. 387 Lists. 390 Importing data from a file. 392 Importing data from a delimited text file. 394 Control Structures in R. 394 If-else. 395 Nested if-else. 395 for loops. 396 Example. 396 [1] 4. 397 while loops. 397 Break. 398 Basic Graphs in R. 398 Pie Charts. 398 3D - Pie Charts. 399 Bar Charts. 400 Boxplots. 401 Histograms. 402 Line charts. 403 Scatter plots. 405

Caractéristiques techniques

  PAPIER
Éditeur(s) Wiley
Auteur(s) Balamurugan / Abirami R Balusamy
Parution 10/05/2021
Nb. de pages 384
EAN13 9781119701828

Avantages Eyrolles.com

Livraison à partir de 0,01 en France métropolitaine
Paiement en ligne SÉCURISÉ
Livraison dans le monde
Retour sous 15 jours
+ d'un million et demi de livres disponibles
satisfait ou remboursé
Satisfait ou remboursé
Paiement sécurisé
modes de paiement
Paiement à l'expédition
partout dans le monde
Livraison partout dans le monde
Service clients sav.client@eyrolles.com
librairie française
Librairie française depuis 1925
Recevez nos newsletters
Vous serez régulièrement informé(e) de toutes nos actualités.
Inscription