Essential R Commands for Statistical Data Analysis
Posted by Anonymous and classified in Mathematics
Written on in
with a size of 800.04 KB
Data Types and Vector Creation
- Nominal → names
- Ordinal → order
- Interval → equal spacing
- Ratio → real math possible
Creating Vectors
c(1, 2, 3) # numeric
c("a", "b") # text
c(TRUE, FALSE) # logicalVector Functions
numeric(5) # 0 0 0 0 0 rep(2, 5) # 2 2 2 2 2 seq(1, 10) # 1 to 10
Matrix Operations
A matrix is a table of numbers: matrix(1:12, nrow=3)
+→ add%*%→ matrix multiplyt()→ transposesolve()→ inversedet()→ determinant
Note: Standard deviation is the square root of the variance.
Understanding trim=0.10 in R
It means to trim 10% from each end of the dataset.
Reading Data
return <- fund_return[,1]
Descriptive Statistics Functions
length(return)
Gives the number of observations (e.g., 76).
mean(return)
Calculates the ordinary average.
median(return)
Finds the middle value.
var(return)
Calculates the sample variance.
sd(return)
Calculates the standard deviation.
range(return)
Gives the minimum and maximum values:
- min = -2.70
- max = 91.15
IQR(return)
Calculates the interquartile range.
quantile(return, 0.25)
Returns the 25th percentile (Q1).
quantile(return, 0.95)
Returns the 95th percentile.
Coefficient of Variation: CV <- sd(return)/mean(return)
Sum of Squares Function: ss <- function(x){sum((x-mean(x))^2)}
What is dnorm()?
dnorm() gives the height of the normal curve at each x-value.
summary(return)
Provides a summary including minimum, 1st quartile, median, mean, 3rd quartile, and maximum.
Graphical Commands
stem(x): Draws a stem-and-leaf plot (use for small/medium datasets).hist(x): Draws a histogram (use to inspect shape, skewness, and outliers).hist(x, breaks=n): Controls the number of bins.boxplot(x): Displays median, quartiles, and outliers.
Normal Curve Overlay
xpt <- seq(-10, 100, by = 0.1): Creates x-values for a smooth curve.n_den <- dnorm(xpt, mean(x), sd(x)): Computes heights of a normal density curve.lines(xpt, y): Adds a line to an existing plot.
Log Transformation
log(x): Takes the natural log (use for positive, right-skewed data).exp(x): Reverses a natural log.x[x > 0]: Filters for positive values to avoid errors.
QQ Plots and Normality
qqnorm(x): Draws a normal QQ plot.qqline(x): Adds a reference line.shapiro.test(x): Performs the Shapiro-Wilk test. If p > 0.05, there is not enough evidence to reject normality.
Statistical Workflow
For a numerical variable x, use this sequence: length(x), summary(x), mean(x), median(x), sd(x), IQR(x), range(x), followed by hist(x), boxplot(x), qqnorm(x), and shapiro.test(x).