Data Mining and Machine Learning Techniques: A Comprehensive Overview
Classified in Computers
Written at on English with a size of 4.37 KB.
KDD-- CRISP-DM = business understanding, data understanding, data prep, modeling, evaluation, deployment-- KDD- selection, preprocessing, transformation, mining, interp/eval-- classification = most frequently used, machine learning (supervised), output is nominal or ordinal categorical in nature -- assessment methods = predictive accuracy, speed, robustnest, scalability, interpretability-- confusion matrix formulae = accuracy = (TP + TN)/(TP+TN+FP+FN); true positive rate = (TP)/(TP+FN); true negative = TN/(TN+FP); precision = (TP)/(TP+FP); recall = (TP)/(TP+FN) -- overfitting = excessively complex model, can give bad predictions, underfitting- too flexible, also gives bad predictions-- k-fold cross validation- split data into k mutually exclusive subsets, use each subset as testing and the rest as training, repeat k times, aggregate results-- leave one out, bootstrapping, jackknifing (like LOO), area under the ROC curve-- regression different from classifcation (continuous), linear, non, multiple, trees, NNs, SVMs,-- regression accuracy (forecast error = actual value-forecast value), r-squared = correlation(actual, forecast)^2 -- Mean absolute deviation (MAD= sum(forecast error)/n -- Mean squared error = sumerror^2/n == mean absolute percent error = sum(error/actual)/n *100 == good clustering has high intra-class similarity and low inter-class similarity-- uses a distance measure (euclidian, manhatten)- k-means, k is predetermined number of clusters, randomly generate k random points, assign each point to nearest cluster center, recompute centers, repeat- pro- easy and efficient, con- applicable w/ defined means, non-categorical, need to specify k, unable to handle noisy outlier data-- association rule mining, x -> y (support%, confidence%) == support a->b = p(a n b) == confidence a -> b = p(B|A) == support = (a and b)/ n == confidence = ((a and b) / n) / (a / n ) -- lift = x -> y = p(y|x) / p(y) = confidence / (y/n) === algorithms- apriori, eclat, fp-growth, derivatives-- help identify frequent itemsets == decision tree uses construction and purify pruning-- gini index- determines purity of class in decision tree as a result of a decision to branch along a certain attribute, information gain uses entropy to measure uncertainty, HANDOUTS - blunders- selecting wrong problem, ignoring what sponsor thinks, not leaving enough time for acquisition selection prep, looking only at aggregated results, being sloppy about keeping track of procedures, ignoring suspicious findings, running algorithms repeatedly and blindly, believing everything about your methods and data, measuring results differently than sponsor-- confusion matric for multi-class accuracy = (T1+T2+T3)/(T1+e21+e31+e12+T2+e32+e13+e23+t3)-- sensitivity class 1 = T1/(T1+e12+e13)--precision class 1 = T1/(T1+e21+e31) -- specificity class1 = T-1/(T-1 + e21+e31) where T-1 + T2+e23+e32+T3)-- multi-class accuracy = (1/n)sum(1-(x-xbar)/(abs(xmax-xmin))) max and min are how many classes-- NN input, weight, neuron/processing element, transfer function, output--soma=node,dendrite=input,axon=output,synapse=weight,slow v fast, many neurons v few neurons-- training, validation, testing-- kfoldcross = less bias but is time consuming-- blackbox model, no transparency, solution=sensitivity analysis--svm support vector machine is popular machine learning tech.-- generalized linear model, uses nonlinear kernal functions to transform nonlinear relationships into linearly separable feature spaces- uses distance models to determine parallel hyperplanes-- learns from historic cases-- k nearest neighbor method-- simplistic and logical prediction method with competitive results- instance based learning, learning happens at time of prediction, not modeling- cross validation used to determine best value for k and distance measure- similarity measure, minkowski distance- q=1 is mangappen, q=2 is euclidean-- sensitivity analysis for NN Si = (vi/v(Ft)) = (V(E(Ft|X-i)))/(v(Ft)) --normalization v’ = (v-minA)/(maxA-minA) (new_maxA - new_minA) +new_minA --v is the value, minA is the old minimum value from the variables, newmin is 0 newmax is 1-- how to pick model with different sensitivities: rescale by doing r^2one model/SUMr^2allmodels + … then SUMPRODUCT of each variable sensitivity score by the rescaled r^2 for each model (e.g. ANN SSvar1 * ANN rescaler^2 + SVM SSvar1* SVMrescaler^2) -- ADD HOW NN LEARNS CALC