Essential R Commands for Statistical Data Analysis

Posted by Anonymous and classified in Mathematics

Written on in with a size of 800.04 KB

Data Types and Vector Creation

  • Nominal → names
  • Ordinal → order
  • Interval → equal spacing
  • Ratio → real math possible

8BHeWthpKSLRIAAAAASUVORK5CYII=

Creating Vectors

c(1, 2, 3) # numeric
c("a", "b") # text
c(TRUE, FALSE) # logical

Vector Functions

numeric(5) # 0 0 0 0 0
rep(2, 5) # 2 2 2 2 2
seq(1, 10) # 1 to 10

Matrix Operations

A matrix is a table of numbers: matrix(1:12, nrow=3)

  • + → add
  • %*% → matrix multiply
  • t() → transpose
  • solve() → inverse
  • det() → determinant

Note: Standard deviation is the square root of the variance.

6+2AFefTmd7gsW+bS4psH+rpFRERKQkGzkcfxpIYJfyOOU9EFPSKiIiISMnT8AYRERERKXkKekVERESk5CnoFREREZGSp6BXREREREqegl4RERERKXkKekVERESk5P1v4vWcCPchKy8AAAAASUVORK5CYII=

weocq4WUKQqeAAAAABJRU5ErkJggg==

D6MZQU455mTf6aOJzl6MEp3eDuhLvPLfkXOVwpYIiIiIjZTDZaIiIiIzRSwRERERGymgCUiIiJiMwUsEREREZspYImIiIjYTAFLRERExGYKWCIiIiI2U8ASERERsZkCloiIiIjNFLBEREREbKaAJSIiImIzBSwRERERmylgiYiIiNhMAUtERETEZgpYIiIiIjb7fwYcbw2x5tk4AAAAAElFTkSuQmCC

AdWJen1ndaNqAAAAAElFTkSuQmCC

m5yk5B9bVsEEhv8LCFSt+7ksmARHCgdQgQjiQgAjhQAIihAMJiBAOJCBCOJCACOFAAiKEAwmIEA4kIEI4kIAI4UACIoQDCYgQDiQgQjiQgAjhQAIihAMJiBAOJCBCOJCACOFAAiKEAwmIEA4kIEI4kIAI4UACIoQDCYgQDv4f4KZmSAeGdVgAAAAASUVORK5CYII=

kIfnuzd0tljQ0QuP5PJhIeHGavVhqXQgtVqxcvLk8jIcGJiqlOtWhXCw8vj7e3tuM0XkdJOiQsREZGLZPthe3+E4sZ+ulr34SCe7N3SuAzA3dc2dKsYKAu8vexPP7LyCoyhf6VVrQqnVbUE+nnj72Pv7eGqVa0KzqarXo4xtGfTu00tPrjnGiKCT01OEZHLq+jIRGGhhfz8QvLzC7Fa7WteXp54eXlgMhU13NQBCpGyQIkLERGRi6DQYmXVnhMALDnLqNIrkZ+3PaGQknl6I1ERkfPl5eWJn583vr7emF3uamw2sFqh0KKkhUhZ8f8dH+XaoLoPrgAAAABJRU5ErkJggg==

Understanding trim=0.10 in R

It means to trim 10% from each end of the dataset.

Reading Data

return <- fund_return[,1]

Descriptive Statistics Functions

length(return)

Gives the number of observations (e.g., 76).

mean(return)

Calculates the ordinary average.

median(return)

Finds the middle value.

var(return)

Calculates the sample variance.

sd(return)

Calculates the standard deviation.

range(return)

Gives the minimum and maximum values:

  • min = -2.70
  • max = 91.15

IQR(return)

Calculates the interquartile range.

quantile(return, 0.25)

Returns the 25th percentile (Q1).

quantile(return, 0.95)

Returns the 95th percentile.

Coefficient of Variation: CV <- sd(return)/mean(return)

P9BauVvZPY7JwAAAABJRU5ErkJggg==

Sum of Squares Function: ss <- function(x){sum((x-mean(x))^2)}

wNGVcsOJD2JqQAAAABJRU5ErkJggg==

What is dnorm()?

dnorm() gives the height of the normal curve at each x-value.

summary(return)

Provides a summary including minimum, 1st quartile, median, mean, 3rd quartile, and maximum.

Graphical Commands

  • stem(x): Draws a stem-and-leaf plot (use for small/medium datasets).
  • hist(x): Draws a histogram (use to inspect shape, skewness, and outliers).
  • hist(x, breaks=n): Controls the number of bins.
  • boxplot(x): Displays median, quartiles, and outliers.

Normal Curve Overlay

  • xpt <- seq(-10, 100, by = 0.1): Creates x-values for a smooth curve.
  • n_den <- dnorm(xpt, mean(x), sd(x)): Computes heights of a normal density curve.
  • lines(xpt, y): Adds a line to an existing plot.

Log Transformation

  • log(x): Takes the natural log (use for positive, right-skewed data).
  • exp(x): Reverses a natural log.
  • x[x > 0]: Filters for positive values to avoid errors.

QQ Plots and Normality

  • qqnorm(x): Draws a normal QQ plot.
  • qqline(x): Adds a reference line.
  • shapiro.test(x): Performs the Shapiro-Wilk test. If p > 0.05, there is not enough evidence to reject normality.

Statistical Workflow

For a numerical variable x, use this sequence: length(x), summary(x), mean(x), median(x), sd(x), IQR(x), range(x), followed by hist(x), boxplot(x), qqnorm(x), and shapiro.test(x).

Related entries: