
Introduction to Data Science: Data Analysis and Prediction Algorithms with R
Rafael A. Irizarry
Résumé
Introduction to Data Science: Data Analysis and Prediction Algorithms with R introduces concepts and skills that can help you tackle real-world data analysis challenges. It covers concepts from probability, statistical inference, linear regression, and machine learning. It also helps you develop skills such as R programming, data wrangling, data visualization, predictive algorithm building, file organization with UNIX/Linux shell, version control with Git and GitHub, and reproducible document preparation.
This book is a textbook for a first course in data science. No previous knowledge of R is necessary, although some experience with programming may be helpful. The book is divided into six parts: R, data visualization, statistics with R, data wrangling, machine learning, and productivity tools. Each part has several chapters meant to be presented as one lecture.
The author uses motivating case studies that realistically mimic a data scientist's experience. He starts by asking specific questions and answers these through data analysis so concepts are learned as a means to answering the questions. Examples of the case studies included are: US murder rates by state, self-reported student heights, trends in world health and economics, the impact of vaccines on infectious disease rates, the financial crisis of 2007-2008, election forecasting, building a baseball team, image processing of hand-written digits, and movie recommendation systems.
The statistical concepts used to answer the case study questions are only briefly introduced, so complementing with a probability and statistics textbook is highly recommended for in-depth understanding of these concepts. If you read and understand the chapters and complete the exercises, you will be prepared to learn the more advanced concepts and skills needed to become an expert.
I R 20
1. Installing R and RStudio
Installing R
Installing RStudio
2. Getting Started with R and RStudio
Why R?
The R console
Scripts
RStudio
The panes
Key bindings
Running commands while editing scripts
Changing global options
Installing R packages
3. R Basics
Case study: US Gun Murders
The very basics
Objects
The workspace
Functions
Other prebuilt objects
Variable names
Saving your workspace
Motivating scripts
Commenting your code
Exercises
Data types
Data frames
Examining an object
The accessor: $
Vectors: numerics, characters, and logical
Factors
Lists
Matrices
Exercises
Vectors
Creating vectors
Names
Sequences Subsetting
Coercion
Not availables (NA)
Exercises
Sorting
sort
order
max and which.max
rank
Beware of recycling
Exercise
Vector arithmetics
Rescaling a vector
Two vectors
Exercises
Indexing
Subsetting with logicals
Logical operators
which
match
%in%
Exercises
Basic plots
plot
hist
boxplot
image
Exercises
4. Programming basics
Conditional expressions
Defining functions
Namespaces
For-loops
Vectorization and functionals
Exercises
5. The tidyverse 84
Tidy data
Exercises
Manipulating data frames
Adding a column with mutate
Subsetting with filter
Selecting columns with select
Exercises
The pipe: %> %
Exercises
Summarizing data
summarize
pull
Group then summarize with group by
Sorting data frames
Nested sorting
The top n
Exercises
Tibbles
Tibbles display better
Subsets of tibbles are tibbles
Tibbles can have complex entries
Tibbles can be grouped
Create a tibble using tibble instead of data frame
The dot operator
do
The purrr package
Tidyverse conditionals
Case when
between
Exercises
6. Importing data 105
Paths and the working directory
The filesystem
Relative and full paths
The working directory
Generating path names
Copying files using paths
The readr and readxl packages
readr
readxl
Exercises
Downloading files
R-base importing functions
scan
Text versus binary files
Unicode versus ASCII
Organizing Data with Spreadsheets
Exercises
II Data Visualization
7. Introduction to data visualization
8. ggplot2
The components of a graph
ggplot objects
Geometries
Aesthetic mappings
Layers
Tinkering with arguments
Global versus local aesthetic mappings
Scales
Labels and titles
Categories as colors
Annotation, shapes, and adjustments
Add-on packages
Putting it all together
Quick plots with qplot
Grids of plots
Exercises
9. Visualizing data distributions
Variable types
Case study: describing student heights
Distribution function
Cumulative distribution functions
Histograms
Smoothed density
Interpreting the y-axis
Densities permit stratification
Exercises
The normal distribution
Standard units
Quantile-quantile plots
Percentiles
Boxplots
Stratification
Case study: describing student heights (continued)
Exercises
ggplot2 geometries
Barplots
Histograms
Density plots
Boxplots
QQ-plots
Images
Quick plots
Exercises
10. Data visualization in practice
Case study: new insights on poverty
Hans Rosling's quiz
Scatterplots
Faceting
facet_wrap
Fixed scales for better comparisons
Time series plots
Labels instead of legends
Data transformations
Log transformation
Which base?
Transform the values or the scale?
Visualizing multimodal distributions
Comparing multiple distributions with boxplots and ridge plots
Boxplots
Ridge plots
Example: 1970 versus 2010 income distributions
Accessing computed variables
Weighted densities
The ecological fallacy and importance of showing the data
Logistic transformation
Show the data
11. Data visualization principles
Encoding data using visual cues
Know when to include
Do not distort quantities
Order categories by a meaningful value
Show the data
Ease comparisons
Use common axes
Align plots vertically to see horizontal changes and horizontally to
see vertical changes
Consider transformations
Visual cues to be compared should be adjacent
Use color
Think of the color blind
Plots for two variables
Slope charts
Bland-Altman plot
Encoding a third variable
Avoid pseudo-three-dimensional plots
Avoid too many significant digits
Know your audience
Exercises
Case study: impact of vaccines on battling infectious diseases
Exercises
12. Robust summaries
Outliers
Median
The inter quartile range (IQR)
Tukey's definition of an outlier
Median absolute deviation
Exercises
Case study: self-reported student heights
III Statistics with R
13. Introduction to Statistics with R
14. Probability
Discrete probability
Relative frequency
Notation
Probability distributions
Monte Carlo simulations for categorical data
Setting the random seed
With and without replacement
Independence
14.4 Conditional probabilities
Addition and multiplication rules
Multiplication rule
Multiplication rule under independence
Addition rule
Combinations and permutations
Monte Carlo example
Examples
Monty Hall problem
Birthday problem
Infinity in practice
Exercises
Continuous probability
Theoretical continuous distributions
Theoretical distributions as approximations
The probability density
Monte Carlo simulations for continuous variables
Continuous distributions
Exercises
15. Random variables
Random variables
Sampling models
The probability distribution of a random variable
Distributions versus probability distributions
Notation for random variables
The expected value and standard error
Population SD versus the sample SD
Central Limit Theorem
How large is large in the Central Limit Theorem
Statistical properties of averages
Law of large numbers
Misinterpreting law of averages
Exercises
Case study: The Big Short
Interest rates explained with chance model
The Big Short
Exercises
16. Statistical Inference
1Polls
The sampling model for polls
Populations, samples, parameters and estimates
The sample average
Parameters
Polling versus forecasting
Properties of our estimate: expected value and standard error
Exercises
Central Limit Theorem in practice .
A Monte Carlo simulation
The spread
Bias: why not run a very large poll?
Exercises
Confidence intervals
A Monte Carlo simulation
The correct language
Exercises
Power
p-values
Association Tests
Lady Tasting Tea
Two-by-two tables
Chi-square Test 16.10.4 The odds ratio
Confidence intervals for the odds ratio
Small count correction
Large samples, small p-values
Exercises
17. Statistical models
Poll aggregators
Poll data
Pollster bias
Data driven models
Exercises
Bayesian statistics
Bayes theorem
Bayes Theorem simulation
Bayes in practice
Hierarchical models
Exercises
Case study: Election forecasting
Bayesian approach
The general bias
Mathematical representations of models
Predicting the electoral college
Forecasting
Exercises
The t-distribution
18. Regression
Case study: is height hereditary?
The correlation coefficient
Sample correlation is a random variable
Correlation is not always a useful summary
Conditional expectations
The regression line
Regression improves precision
Bivariate normal distribution (advanced)
Variance explained
Warning: there are two regression lines
Exercises
19. Linear Models
Case Study: Moneyball
Sabermetics
Baseball basics
No awards for BB
Base on Balls or Stolen Bases?
Regression applied to baseball statistics
Confounding
Understanding confounding through stratification
Multivariate regression
Least Squared Estimates
Interpreting linear models
Least Squares Estimates (LSE)
The lm function
LSE are random variables
Predicted values are random variables
Exercises
Linear regression in the tidyverse
The broom package
Exercises
Case study: Moneyball (continued)
Adding salary and position information
Picking 9 players
The regression fallacy
Measurement error models
Exercises
20. Association is not causation
Spurious correlation
Outliers
Reversing cause and effect
Confounders
Example: UC Berkeley admissions
Confounding explained graphically
Average after stratifying
Simpson's paradox
Exercises
IV Data Wrangling
21. Introduction to Data Wrangling
22. Reshaping data
gather
spread
separate
unite
Exercises
23. Joining tables
Joins
Left join
Right join
Inner join
Full join
Semi join
Anti-join
Binding
Binding columns
Binding by rows.
Set operators
Intersect
Union
setdiff
setequal
Exercises
24. Web Scraping
HTML
The rvest package
CSS selectors
JSON
Exercises
25. String Processing
The stringr package
Case study 1: US murders data
Case study 2: self reported heights
How to escape when defining strings
Regular expressions
Strings are a regexp
Special characters
Character classes
Anchors
Quantifiers
White space \s
Quantifiers: *, ?, +
Groups
Search and replace with regex
Search and replace using groups
Testing and improving
Trimming
Changing lettercase
Case study 2: self reported heights (continued)
The extract function
Putting it all together
String splitting
Case study 3: extracting tables from a PDF
Recoding
Exercises
26. Parsing Dates and Times
The date data type
The lubridate package
Exercises
27. Text mining
Case study: Trump tweets
Text as data
Sentiment analysis
Exercises
V Machine Learning
28. Introduction to Machine Learning
Notation
An example
Exercises
Evaluation Metrics
Training and test sets
Overall accuracy
The confusion matrix
Sensitivity and specificity
Balanced accuracy and F 1 score
Prevalence matters in practice
ROC and precision-recall curves
The loss function
Exercises
Conditional probabilities and expectations
Conditional probabilities Conditional expectations
Conditional expectation minimizes squared loss function
Exercises
Case study: is it a 2 or a 7?
29. Smoothing
Bin smoothing
Kernels
Local weighted regression (loess)
Fitting parabolas
Beware of default smoothing parameters
Connecting smoothing to machine learning
Exercises
30. Cross validation
Motivation with k-nearest neighbors
Over-training
Over-smoothing
Picking the k in kNN
Mathematical description of cross validation
K-fold cross validation
Exercises
Bootstrap
Exercises
31. The caret package
The caret train functon
Cross validation
Example: fitting with loess
32. Examples of algorithms
Linear regression
The predict function
Exercises
Logistic regression
Generalized Linear Models
Logistic regression with more than one predictor
Exercises
k-nearest neighbors
Exercises
Generative models
Naive Bayes
Controlling prevalence
Quadratic Discriminant Analysis
Linear discriminant analysis
Connection to distance
Case study: more than three classes
Exercises
Classification and Regression Trees (CART)
The curse of dimensionality
CART motivation
Regression trees
Classification (decision) trees
Random Forests
Exercises
33. Machine learning in practice
Preprocessing
k-Nearest Neighbor and Random Forest
Variable importance
Visual assessments
Ensembles
Exercises
34. Large datasets
Matrix algebra
Notation
Converting a vector to a matrix
Row and column summaries
apply
Filtering columns based on summaries
Indexing with matrices
Binarizing the data
Vectorization for matrices
Matrix algebra operations
Exercises
Distance
Euclidean distance
Distance in higher dimensions
Euclidean distance example
Predictor Space
Distance between predictors
Exercises
Dimension reduction
Preserving distance
Linear transformations (advanced)
Orthogonal transformations (advanced)
Principal Component Analysis
Iris Example
MNIST Example
Exercises
Recommendation systems
Movielens data
Recommendation systems as a machine learning challenge
Loss function
A first model
Modeling movie effects
User effects
Exercises
Regularization
Motivation
34.9.2 Penalized Least Squares
Choosing the penalty terms
Exercises
Matrix factorization
Factors analysis
Connection to SVD and PCA
Exercises
35. Clustering
Hierarchical clustering
k-means
Heatmaps
Filtering features
Exercises
VI Productivity tools
36. Introduction to productivity tools
37. Accessing the terminal and installing Git
Accessing the terminal on a Mac
Installing Git on the Mac
Installing Git and Git Bash on Windows
Accessing the terminal on Windows
38. Organizing with Unix
Naming convention
The terminal
The filesystem
Directories and subdirectories
The home directory
Working directory
Paths
Unix commands
ls: Listing directory content
mkdir and rmdir: make and remove a directory
cd: Navigating the filesystem by changing directories
Some examples
More Unix commands
mv: moving files
cp: copying files
rm: removing files
less: looking at a file
Preparing for a data science project
Advanced Unix
Arguments
Getting help
Pipes
Wild cards
Environment variables
Shells
Executables
Permissions and file types
Commands you should learn
File manipulation in R
39. Git and GitHub
Why use Git and GitHub?
GitHub accounts
GitHub repositories
Overview of Git
Clone
Initializing a Git directory
Using Git and GitHub in RStudio
40. Reproducible projects with RStudio and R markdown
RStudio projects
R markdown
The header
R code chunks
I R 20
1. Installing R and RStudio
Installing R
Installing RStudio
2. Getting Started with R and RStudio
Why R?
The R console
Scripts
RStudio
The panes
Key bindings
Running commands while editing scripts
Changing global options
Installing R packages
3. R Basics
Case study: US Gun Murders
The very basics
Objects
The workspace
Functions
Other prebuilt objects
Variable names
Saving your workspace
Motivating scripts
Commenting your code
Exercises
Data types
Data frames
Examining an object
The accessor: $
Vectors: numerics, characters, and logical
Factors
Lists
Matrices
Exercises
Vectors
Creating vectors
Names
Sequences Subsetting
Coercion
Not availables (NA)
Exercises
Sorting
sort
order
max and which.max
rank
Beware of recycling
Exercise
Vector arithmetics
Rescaling a vector
Two vectors
Exercises
Indexing
Subsetting with logicals
Logical operators
which
match
%in%
Exercises
Basic plots
plot
hist
boxplot
image
Exercises
4. Programming basics
Conditional expressions
Defining functions
Namespaces
For-loops
Vectorization and functionals
Exercises
5. The tidyverse 84
Tidy data
Exercises
Manipulating data frames
Adding a column with mutate
Subsetting with filter
Selecting columns with select
Exercises
The pipe: %> %
Exercises
Summarizing data
summarize
pull
Group then summarize with group by
Sorting data frames
Nested sorting
The top n
Exercises
Tibbles
Tibbles display better
Subsets of tibbles are tibbles
Tibbles can have complex entries
Tibbles can be grouped
Create a tibble using tibble instead of data frame
The dot operator
do
The purrr package
Tidyverse conditionals
Case when
between
Exercises
6. Importing data 105
Paths and the working directory
The filesystem
Relative and full paths
The working directory
Generating path names
Copying files using paths
The readr and readxl packages
readr
readxl
Exercises
Downloading files
R-base importing functions
scan
Text versus binary files
Unicode versus ASCII
Organizing Data with Spreadsheets
Exercises
II Data Visualization
7. Introduction to data visualization
8. ggplot2
The components of a graph
ggplot objects
Geometries
Aesthetic mappings
Layers
Tinkering with arguments
Global versus local aesthetic mappings
Scales
Labels and titles
Categories as colors
Annotation, shapes, and adjustments
Add-on packages
Putting it all together
Quick plots with qplot
Grids of plots
Exercises
9. Visualizing data distributions
Variable types
Case study: describing student heights
Distribution function
Cumulative distribution functions
Histograms
Smoothed density
Interpreting the y-axis
Densities permit stratification
Exercises
The normal distribution
Standard units
Quantile-quantile plots
Percentiles
Boxplots
Stratification
Case study: describing student heights (continued)
Exercises
ggplot2 geometries
Barplots
Histograms
Density plots
Boxplots
QQ-plots
Images
Quick plots
Exercises
10. Data visualization in practice
Case study: new insights on poverty
Hans Rosling's quiz
Scatterplots
Faceting
facet_wrap
Fixed scales for better comparisons
Time series plots
Labels instead of legends
Data transformations
Log transformation
Which base?
Transform the values or the scale?
Visualizing multimodal distributions
Comparing multiple distributions with boxplots and ridge plots
Boxplots
Ridge plots
Example: 1970 versus 2010 income distributions
Accessing computed variables
Weighted densities
The ecological fallacy and importance of showing the data
Logistic transformation
Show the data
11. Data visualization principles
Encoding data using visual cues
Know when to include
Do not distort quantities
Order categories by a meaningful value
Show the data
Ease comparisons
Use common axes
Align plots vertically to see horizontal changes and horizontally to
see vertical changes
Consider transformations
Visual cues to be compared should be adjacent
Use color
Think of the color blind
Plots for two variables
Slope charts
Bland-Altman plot
Encoding a third variable
Avoid pseudo-three-dimensional plots
Avoid too many significant digits
Know your audience
Exercises
Case study: impact of vaccines on battling infectious diseases
Exercises
12. Robust summaries
Outliers
Median
The inter quartile range (IQR)
Tukey's definition of an outlier
Median absolute deviation
Exercises
Case study: self-reported student heights
III Statistics with R
13. Introduction to Statistics with R
14. Probability
Discrete probability
Relative frequency
Notation
Probability distributions
Monte Carlo simulations for categorical data
Setting the random seed
With and without replacement
Independence
14.4 Conditional probabilities
Addition and multiplication rules
Multiplication rule
Multiplication rule under independence
Addition rule
Combinations and permutations
Monte Carlo example
Examples
Monty Hall problem
Birthday problem
Infinity in practice
Exercises
Continuous probability
Theoretical continuous distributions
Theoretical distributions as approximations
The probability density
Monte Carlo simulations for continuous variables
Continuous distributions
Exercises
15. Random variables
Random variables
Sampling models
The probability distribution of a random variable
Distributions versus probability distributions
Notation for random variables
The expected value and standard error
Population SD versus the sample SD
Central Limit Theorem
How large is large in the Central Limit Theorem
Statistical properties of averages
Law of large numbers
Misinterpreting law of averages
Exercises
Case study: The Big Short
Interest rates explained with chance model
The Big Short
Exercises
16. Statistical Inference
1Polls
The sampling model for polls
Populations, samples, parameters and estimates
The sample average
Parameters
Polling versus forecasting
Properties of our estimate: expected value and standard error
Exercises
Central Limit Theorem in practice .
A Monte Carlo simulation
The spread
Bias: why not run a very large poll?
Exercises
Confidence intervals
A Monte Carlo simulation
The correct language
Exercises
Power
p-values
Association Tests
Lady Tasting Tea
Two-by-two tables
Chi-square Test 16.10.4 The odds ratio
Confidence intervals for the odds ratio
Small count correction
Large samples, small p-values
Exercises
17. Statistical models
Poll aggregators
Poll data
Pollster bias
Data driven models
Exercises
Bayesian statistics
Bayes theorem
Bayes Theorem simulation
Bayes in practice
Hierarchical models
Exercises
Case study: Election forecasting
Bayesian approach
The general bias
Mathematical representations of models
Predicting the electoral college
Forecasting
Exercises
The t-distribution
18. Regression
Case study: is height hereditary?
The correlation coefficient
Sample correlation is a random variable
Correlation is not always a useful summary
Conditional expectations
The regression line
Regression improves precision
Bivariate normal distribution (advanced)
Variance explained
Warning: there are two regression lines
Exercises
19. Linear Models
Case Study: Moneyball
Sabermetics
Baseball basics
No awards for BB
Base on Balls or Stolen Bases?
Regression applied to baseball statistics
Confounding
Understanding confounding through stratification
Multivariate regression
Least Squared Estimates
Interpreting linear models
Least Squares Estimates (LSE)
The lm function
LSE are random variables
Predicted values are random variables
Exercises
Linear regression in the tidyverse
The broom package
Exercises
Case study: Moneyball (continued)
Adding salary and position information
Picking 9 players
The regression fallacy
Measurement error models
Exercises
20. Association is not causation
Spurious correlation
Outliers
Reversing cause and effect
Confounders
Example: UC Berkeley admissions
Confounding explained graphically
Average after stratifying
Simpson's paradox
Exercises
IV Data Wrangling
21. Introduction to Data Wrangling
22. Reshaping data
gather
spread
separate
unite
Exercises
23. Joining tables
Joins
Left join
Right join
Inner join
Full join
Semi join
Anti-join
Binding
Binding columns
Binding by rows.
Set operators
Intersect
Union
setdiff
setequal
Exercises
24. Web Scraping
HTML
The rvest package
CSS selectors
JSON
Exercises
25. String Processing
The stringr package
Case study 1: US murders data
Case study 2: self reported heights
How to escape when defining strings
Regular expressions
Strings are a regexp
Special characters
Character classes
Anchors
Quantifiers
White space \s
Quantifiers: *, ?, +
Groups
Search and replace with regex
Search and replace using groups
Testing and improving
Trimming
Changing lettercase
Case study 2: self reported heights (continued)
The extract function
Putting it all together
String splitting
Case study 3: extracting tables from a PDF
Recoding
Exercises
26. Parsing Dates and Times
The date data type
The lubridate package
Exercises
27. Text mining
Case study: Trump tweets
Text as data
Sentiment analysis
Exercises
V Machine Learning
28. Introduction to Machine Learning
Notation
An example
Exercises
Evaluation Metrics
Training and test sets
Overall accuracy
The confusion matrix
Sensitivity and specificity
Balanced accuracy and F 1 score
Prevalence matters in practice
ROC and precision-recall curves
The loss function
Exercises
Conditional probabilities and expectations
Conditional probabilities Conditional expectations
Conditional expectation minimizes squared loss function
Exercises
Case study: is it a 2 or a 7?
29. Smoothing
Bin smoothing
Kernels
Local weighted regression (loess)
Fitting parabolas
Beware of default smoothing parameters
Connecting smoothing to machine learning
Exercises
30. Cross validation
Motivation with k-nearest neighbors
Over-training
Over-smoothing
Picking the k in kNN
Mathematical description of cross validation
K-fold cross validation
Exercises
Bootstrap
Exercises
31. The caret package
The caret train functon
Cross validation
Example: fitting with loess
32. Examples of algorithms
Linear regression
The predict function
Exercises
Logistic regression
Generalized Linear Models
Logistic regression with more than one predictor
Exercises
k-nearest neighbors
Exercises
Generative models
Naive Bayes
Controlling prevalence
Quadratic Discriminant Analysis
Linear discriminant analysis
Connection to distance
Case study: more than three classes
Exercises
Classification and Regression Trees (CART)
The curse of dimensionality
CART motivation
Regression trees
Classification (decision) trees
Random Forests
Exercises
33. Machine learning in practice
Preprocessing
k-Nearest Neighbor and Random Forest
Variable importance
Visual assessments
Ensembles
Exercises
34. Large datasets
Matrix algebra
Notation
Converting a vector to a matrix
Row and column summaries
apply
Filtering columns based on summaries
Indexing with matrices
Binarizing the data
Vectorization for matrices
Matrix algebra operations
Exercises
Distance
Euclidean distance
Distance in higher dimensions
Euclidean distance example
Predictor Space
Distance between predictors
Exercises
Dimension reduction
Preserving distance
Linear transformations (advanced)
Orthogonal transformations (advanced)
Principal Component Analysis
Iris Example
MNIST Example
Exercises
Recommendation systems
Movielens data
Recommendation systems as a machine learning challenge
Loss function
A first model
Modeling movie effects
User effects
Exercises
Regularization
Motivation
34.9.2 Penalized Least Squares
Choosing the penalty terms
Exercises
Matrix factorization
Factors analysis
Connection to SVD and PCA
Exercises
35. Clustering
Hierarchical clustering
k-means
Heatmaps
Filtering features
Exercises
VI Productivity tools
36. Introduction to productivity tools
37. Accessing the terminal and installing Git
Accessing the terminal on a Mac
Installing Git on the Mac
Installing Git and Git Bash on Windows
Accessing the terminal on Windows
38. Organizing with Unix
Naming convention
The terminal
The filesystem
Directories and subdirectories
The home directory
Working directory
Paths
Unix commands
ls: Listing directory content
mkdir and rmdir: make and remove a directory
cd: Navigating the filesystem by changing directories
Some examples
More Unix commands
mv: moving files
cp: copying files
rm: removing files
less: looking at a file
Preparing for a data science project
Advanced Unix
Arguments
Getting help
Pipes
Wild cards
Environment variables
Shells
Executables
Permissions and file types
Commands you should learn
File manipulation in R
39. Git and GitHub
Why use Git and GitHub?
GitHub accounts
GitHub repositories
Overview of Git
Clone
Initializing a Git directory
Using Git and GitHub in RStudio
40. Reproducible projects with RStudio and R markdown
RStudio projects
R markdown
The header
R code chunks
Global options
knitR
More on R markdown
Organizing a data science project
Create directories in Unix
Create an RStudio project
Edit some R Scripts
Create some more directories using Unix
Add a README file
Initilazing a Git directory
Add, commit and push files using RStudio
Global options
knitR
More on R markdown
Organizing a data science project
Create directories in Unix
Create an RStudio project
Edit some R Scripts
Create some more directories using Unix
Add a README file
Initilazing a Git directory
Add, commit and push files using RStudio
Caractéristiques techniques
PAPIER | |
Éditeur(s) | Taylor&francis |
Auteur(s) | Rafael A. Irizarry |
Parution | 06/11/2019 |
Nb. de pages | 713 |
EAN13 | 9780367357986 |
Avantages Eyrolles.com
Consultez aussi
- Les meilleures ventes en Graphisme & Photo
- Les meilleures ventes en Informatique
- Les meilleures ventes en Construction
- Les meilleures ventes en Entreprise & Droit
- Les meilleures ventes en Sciences
- Les meilleures ventes en Littérature
- Les meilleures ventes en Arts & Loisirs
- Les meilleures ventes en Vie pratique
- Les meilleures ventes en Voyage et Tourisme
- Les meilleures ventes en BD et Jeunesse