Predictive Modeling Techniques in Data Mining
Classified in Computers
Written at on English with a size of 4.25 KB.
CH6
Predictive Modeling Techniques
61) Predictive modeling is perhaps the most commonly practiced branch in data mining. What are three of the most popular predictive modeling techniques?
Answer:
- Artificial neural networks
- Support vector machines
- k-nearest neighbor
62) Why have neural networks shown much promise in many forecasting and business classification applications?
Answer: Because of their ability to "learn" from the data, their nonparametric nature (i.e., no rigid assumptions), and their ability to generalize.
Understanding Neural Networks
63) Each ANN is composed of a collection of neurons that are grouped into layers. One of these layers is the hidden layer. Define the hidden layer.
Answer: A hidden layer is a layer of neurons that takes input from the previous layer and converts those inputs into outputs for further processing.
64) How is a general Hopfield network represented architecturally?
Answer: Architecturally, a general Hopfield network is represented as a single large layer of neurons with total interconnectivity; that is, each neuron is connected to every other neuron within the network.
Developing an ANN Application
65) Describe the nine steps in the development process for an ANN application.
Answer:
- Collect, organize, and format the data.
- Separate data into training, validation, and testing sets.
- Decide on a network architecture and structure.
- Select a learning algorithm.
- Set network parameters and initialize their values.
- Initialize weights and start training (and validation).
- Stop training, freeze the network weights.
- Test the trained network.
- Deploy the network for use on unknown new cases.
Backpropagation Learning Algorithm
66) What are the five steps in the backpropagation learning algorithm?
Answer:
- Initialize weights with random values and set other parameters.
- Read in the input vector and the desired output.
- Compute the actual output via the calculations, working forward through the layers.
- Compute the error.
- Change the weights by working backward from the output layer through the hidden layers.
67) Define the term sensitivity analysis as it relates to ANNs.
Answer: Sensitivity analysis is a method for extracting the cause-and-effect relationships among the inputs and the outputs of a trained neural network model.
Support Vector Machines (SVMs)
68) In 1992, Boser, Guyon, and Vapnik suggested a way to create nonlinear classifiers by applying the kernel trick to maximum-margin hyperplanes. How does the resulting algorithm differ from the original optimal hyperplane algorithm proposed by Vladimir Vapnik in 1963?
Answer: The resulting algorithm is formally similar, except that every dot product is replaced by a nonlinear kernel function. This allows the algorithm to fit the maximum-margin hyperplane in the transformed feature space. The transformation may be nonlinear and the transformed space high dimensional; thus, though the classifier is a hyperplane in the high-dimensional feature space it may be nonlinear in the original input space.
69) What are the three steps in the process-based approach to the use of support vector machines (SVMs)?
Answer:
- Numericizing the data
- Normalizing the data
- Selecting the kernel type and kernel parameters
k-Nearest Neighbor Algorithm
70) Describe the k-nearest neighbor (kNN) data mining algorithm.
Answer: k-NN is a prediction method for classification- as well as regression-type prediction problems. k-NN is a type of instance-based learning (or lazy learning) where the function is only approximated locally and all computations are deferred until the actual prediction.