Data Mining with Microsoft SQL Server 2000 Technical Reference

Claude Seidman

368 pages, parution le 29/11/2001

Ajouter à une liste

Indisponible

Résumé

With its state-of-the-art capabilities for rapidly processing and retrieving huge quantities of data, Microsoft® SQL Server 2000 is quickly growing in popularity among large corporations. But learning how to take advantage of the powerful, built-in data-mining services in SQL Server to turn all that data into meaningful information takes time and effort. Data Mining with SQL Server 2000 Technical Reference is the ideal, in-depth reference guide for any database developer, administrator, or IT professional who needs comprehensive information about these powerful new data-mining services. In particular, it fully examines the data-warehousing architecture in SQL Server 2000 to show how to take full advantage of the data-mining services in this RDBMS. This is the only Microsoft-approved technical guide to the data mining services in SQL Server 2000.Positioning Statement: The ideal, in-depth reference for anyone who needs complete information about the powerful new data-mining tools and services in SQL Server 2000 Unique Selling Proposition: This title is the only Microsoft-approved technical guide to the data-mining services in SQL Server 2000.Key Book Benefits: Fully examines the data-warehousing architecture in SQL Server 2000 to show how to take full advantage of the data-mining services in this RDBMS Shows how the data-mining service in SQL Server fit into its complete suite of technologies for extracting meaningful information from data Is the only Microsoft-approved technical guide to the data-mining services in Microsoft SQL Server 2000.

Contents

Acknowledgments xi
Page xi
Introduction xiii
Page xiii
PART I INTRODUCING DATA MINING Page 1 Understanding Data Mining  Page 3
What Is Data Mining? Page 3
Why Use Data Mining? Page 4
How Data Mining Is Currently Used Page 6
Defining the Terms Page 7
Data Mining Methodology Page 9
  Analyzing the Problem Page 10
  Extracting and Cleansing the Data Page 10
  Validating the Data Page 10
  Creating and Training the Model Page 10
  Querying the Data Mining Model Data Page 10
  Maintaining the Validity of the Data-Mining Model Page 10
Overview of Microsoft Data Mining Page 11
  Data Mining vs. OLAP Page 11
  Data-Mining Models Page 11
  Data-Mining Algorithms Page 12
  Using SQL Server Syntax to Data Mine Page 14
Summary Page 14
2 Microsoft SQL Server Analysis Services Architecture Page 15
Introduction to OLAP Page 16
  MOLAP Page 18
  ROLAP Page 18
  HOLAP Page 19
Server Architecture Page 20
  Data Mining Services Within Analysis Services Page 20
Client Architecture Page 21
  PivotTable Service Page 22
  OLE DB Page 23
  Decision Support Objects (DSO) Page 24
  Multidimensional Expressions(MDX) Page 25
  Prediction Joins Page 25
Summary Page 26
3 Data Storage Models 27 Page Why Data Mining Needs a Data Warehouse Page 27
  Maintaining Data Integrity Page 28
Reporting Against OLTP Data Can Be Hazardous to Your Performance Page 31
Data Warehousing Architecture for Data Mining Page 33
  Creating the Warehouse from OLTP Data Page 33
  Optimizing Data for Mining Page 36
  Physical Data Mining Structure Page 42
  Three-Tier Architecture Page 43
Relational Data Warehouse Page 43
  Advantages of Relational Data Storage Page 44
  Building Supporting Tables for Data Mining Page 45
OLAP cubes Page 46
  How Data Mining Uses OLAP Structures Page 46
  Advantages of OLAP Storage Page 47
  When OLAP Is Not Appropriate for Data Mining Page 49
Summary Page 49
4 Approaches to Data Mining 51 Page Directed Data Mining Page 51
Undirected Data Mining Page 52
  Data Mining vs. Statistics Page 52
  Learning from Historical Data Page 57
  Predicting the Future Page 59
Training Data-Mining Models Page 61
  Evaluating the Models and Avoiding Errors Page 62
Summary Page 65
PART II DATA-MINING METHODS Page 5 Microsoft Decision Trees  Page 69
Creating the Model Page 69
  Analysis Manager Page 70
Visualizing the Model Page 87
  Dependency Network Browser Page 94
  Inside the Decision Tree Algorithm Page 97
  How Predictions Are Derived Page 109
  Navigating the Tree Page 109
  Navigation vs. Rules Page 112
  When to Use Decision Trees Page 113
Summary Page 114
6 Creating Decision Trees with OLAP Page 115
Creating the Model Page 115
  Select Source Type Page 116
  Select Source Cube and Data-Mining Technique Page 116
  Select Case Page 118
  Select Predicted Entity Page 119
  Select Training Data Page 121
  Select Dimension and Virtual Cube Page 121
  Completing the Data-Mining Model Page 123
OLAP Mining Model Editor Page 125
  Content Detail Pane Page 126
  Structure Panel Page 126
  Prediction Tree List Page 126
Analyzing Data with the OLAP Data-Mining Model Page 126
  Using the Generated Virtual Cube Page 128
  Using the Generated Dimension Page 129
Summary Page 133
7 Microsoft Clustering Page 135
The Search for Order Page 136
Looking for Ways to Understand Data Page 136
Clustering as an Undirected Data-Mining Technique Page 137
How Clustering Works Page 138
  Overview of the Algorithm Page 138
  The K-Means Method Clustering Algorithm Page 138
  What Is Being Measured Exactly? Page 142
  Clustering Factors Page 142
  Measuring "Closeness" Page 143
When to Use Clustering Page 146
  Visualize Relationships Page 146
  Highlight Anomalies Page 146
  Create Samples for Other Data-Mining Efforts Page 148
  Weaknesses of Clustering Page 148
Creating a Data-Mining Model Using Clustering Page 149
  Select Source Type Page 150
  Select the Table or Tables for Your Mining Model Page 150
  Select the Data-Mining Technique Page 151
  Edit Joins Page 152
  Select the Case Key Column for Your Mining Model Page 152
  Select the Input and Predictable Columns Page 152
Viewing the Model Page 154
  Organization of the Cluster Nodes Page 154
  Order of the Cluster Nodes Page 156
Analyzing the Data Page 156
Summary Page 158
PART III CREATING DATA–MINING APPLICATIONS WITH CODE Page 8 Using Microsoft Data Transformation Services (DTS) Page 161
What Is DTS? Page 162
DTS Tasks Page 162
  Transform Page 162
  Bulk Insert Page 163
  Data Driven Query Page 163
  Execute Package Page 164
Connections Page 167
  Sources Page 167
  Configuring a Connection Page 168
DTS Package Workflow Page 169
  DTS Package Steps Page 169
  Precedence Constraints Page 170
DTS Designer Page 171
  Opening the DTS Designer Page 171
  Saving a DTS Package Page 172
dtsrun Utility Page 174
Using DTS to Create a Data-Mining Model Page 177
  Preparing the SQL Server Environment Page 178
  Creating the Package Page 182
Summary Page 208
9 Using Decision Support Objects (DSO) Page 209
Scripting vs. Visual Basic Page 210
  The Server
Object Page 211
  The Database
Object Page 219
Creating the Relational Data-Mining Model Using DSO Page 221
Creating the OLAP Data-Mining Model Using DSO Page 230
  The DataSource
Object Page 232
  Data-Mining Model (Decision Support Objects) Page 233
Adding a New Data Source Page 233
Analysis Server Roles Page 234
  Data-Mining Model Roles Page 235
Summary Page 236
10 Understanding Data-Mining Structures Page 237
The Structure of the Data-Mining Model Case Page 237
  Data-Mining Models Look Like Tables Page 237
Using Code to Browse Data-Mining Models Page 238
Using the Schema Rowsets Page 243
  MINING_MODELS Schema Rowset Page 243
  MINING_COLUMNS Schema Rowset Page 249
  MINING_MODEL_CONTENT Schema Rowset Page 259
  MINING_SERVICES Schema Rowset Page 262
  SERVICE_PARAMETERS Schema Rowset Page 266
  MODEL_CONTENT_PMML Schema Rowset Page 268
Summary Page 269
11 Data Mining Using PivotTable Service Page 271
Redistributing Components Page 272
Installing and Registering Components Page 273
  File Locations Page 274
  Installation Registry Settings Page 275
  Redistribution Setup Programs Page 275
Connecting to the PivotTable Service Page 276
  Connect to Analysis Services Using PivotTable Service Page 276
  Connect to Analysis Services Using HTTP Page 280
Building a Local Data-Mining Model Page 280
  Storage of Local Mining Models Page 284
  SELECT INTO Statement Page 286
  INSERT INTO Statement Page 286
  OPENROWSET Syntax Page 287
  Nested Tables and the SHAPE Statement Page 289
Using XML in Data Mining Page 290
  The PMML Standard Page 290
Summary Page 296
12 Data-Mining Queries Page 297
Components of a Prediction Query Page 297
  The Basic Prediction Query Page 298
  Specifying the Test Case Source Page 298
  Specifying Columns Page 300
  The PREDICTION JOIN Clause Page 300
  Using Functions as Columns Page 304
  Using Tabular Values as Columns Page 304
  The WHERE Clause Page 306
  Prediction Functions Page 307
  Predict
Page 307
  PredictProbability
Page 308
  PredictSupport
Page 308
  PredictVariance
Page 309
  PredictStdev
Page 310
  PredictProbabilityVariance
Page 310
  PredictProbabilityStdev
Page 310
  PredictHistogram
Page 310
  TopCount
Page 313
  TopSum
Page 313
  TopPercent
Page 314
  RangeMin
Page 314
  RangeMid
Page 314
  RangeMax
Page 314
  PredictScore
Page 314
  PredictNodeId
Page 315
Prediction Queries with Clustering Models Page 315
  Cluster
Page 315
  ClusterProbability
Page 316
  ClusterDistance
Page 316
Using DTS to Run Prediction Queries Page 317
Summary Page 322
APPENDIX Page 325
GLOSSARY  Page 349
INDEX  Page 359

Caractéristiques techniques

	PAPIER
Éditeur(s)	Microsoft Press
Auteur(s)	Claude Seidman
Parution	29/11/2001
Nb. de pages	368
Format	19,2 x 24
Couverture	Relié
Poids	1050g
Intérieur	Noir et Blanc
EAN13	9780735612716
ISBN13	978-0-7356-1271-6