Essential Concepts in Statistical Modeling and Optimization Methods
Classified in Mathematics
Written on in
English with a size of 13.69 KB
Probability Distributions for Discrete Events
The following table matches common scenarios to their appropriate probability distributions:
| Scenario Description | Distribution Type |
|---|---|
| Number of people clicking an online banner ad each hour | Poisson |
| Number of arrivals to a flu-shot clinic each minute | Poisson |
| Number of hits to a real estate website each minute | Poisson |
| Number of arrivals to the ID-check queue at an airport each minute | Poisson |
| Number of people entering a grocery store each minute | Poisson |
| Number of penalty kicks taken until one is saved | Geometric |
| Number of faces correctly identified by Deep Learning (DL) software until an error occurs | Geometric |
| Of the first 100 people viewing a house listing, the number who tour it | Binomial |
| Number of days in a year with temperature 3+ degrees above forecast | Binomial |
| Time between arrivals to a flu-shot clinic | Exponential |
| Time between hits on a real estate website | Exponential |
| Time between people entering a grocery store | Exponential |
| Time from the start of a World Cup soccer match until a goal is scored | Weibull |
| Time from when a house is on the market until the first offer | Weibull |
| Time from the beginning of Fall until the first snowflake is seen | Weibull |
| Time from when a generator is turned on until it fails | Weibull |
Strategies for Handling Missing Data and Imputation
Five models proposed for handling missing school rating data:
- Model 1: Imputation using the average school rating derived from the rest of the dataset.
- Model 2: Imputation using a regression model based on other available variables.
- Model 3: Two-step approach: First, classify whether the school was built due to population growth, then select the appropriate regression model based on that classification.
- Model 4: Use a binary variable to explicitly identify locations where information is missing.
- Model 5: Use a categorical variable approach with three categories: "data available," "missing, population growth," and "missing, other reason."
Model Feasibility Notes
- Model 3: Ratings can be used / Reasons can be inferred.
- Model 2: Can be used / Cannot be used (depending on context).
- Model 5: Cannot be used / Can be used (depending on context).
- Model 4: Cannot be used / Cannot be used (depending on context).
Formulating Optimization Constraints with Binary Variables
Let y represent binary variables (1 if eaten, 0 if not) and x represent continuous amounts eaten. M is a large constant.
Mutual Exclusivity Constraints
Out of peanut butter and cheese sauce, exactly one must be eaten:
y_peanutbutter + y_cheesesauce = 1OR
y_peanutbutter = 1 - y_cheesesauceNeither peanut butter nor cheese sauce can be eaten:
y_peanutbutter + y_cheesesauce = 0Either peanut butter or cheese sauce, but not both, must be eaten (Exclusive OR):
y_peanutbutter = 1 - y_cheesesauceConditional Constraints (Broccoli)
Either cheese sauce or peanut butter (or both) must be eaten with broccoli:
y_broccoli ≤ y_cheesesauce + y_peanutbutter(Note: This constraint means if broccoli is eaten (y_broccoli=1), then the sum of the others must be at least 1.)
Broccoli can only be eaten if either cheese sauce or peanut butter (or both) is also eaten:
y_broccoli ≤ y_cheesesauce + y_peanutbutterIf cheese sauce and peanut butter are not eaten, then broccoli can't be eaten:
y_broccoli ≤ y_cheesesauce + y_peanutbutterLimiting Total Items
No more than two of broccoli, cheese sauce, and peanut butter may be eaten:
y_broccoli + y_cheesesauce + y_peanutbutter ≤ 2Broccoli, cheese sauce, and peanut butter all cannot be eaten together:
y_broccoli + y_cheesesauce + y_peanutbutter ≤ 2Linking Continuous and Binary Variables (Big M Formulation)
No amount of cheese sauce may be eaten unless its binary variable is 1 (If any amount of cheese sauce is eaten, then its binary variable must be 1):
x_cheesesauce ≤ M · y_cheesesauceCheese sauce must be eaten:
y_cheesesauce = 1Unless peanut butter is eaten, no amount of broccoli can be eaten:
x_broccoli ≤ M · y_peanutbutterIf any amount of broccoli is eaten, then peanut butter must also be eaten:
x_broccoli ≤ M · y_peanutbutter
Discrete-Event Simulation and Replication
Stochastic Discrete-Event Simulation
When a company creates a stochastic discrete-event simulation, many replications are needed because of the inherent variability and randomness in the system being modeled.
Interpreting Simulation Run Results
- A simulation could stop after 300 or 400 events, but it could not stop after only 5 events (implying a minimum run length requirement).
- The simulated wait time was not 50 or less just once out of all the runs (implying consistency).
- The expected wait time of simulated runs (replications) is likely to be between 65 and 75 (Confidence Interval interpretation).
- The expected wait time of simulated runs (replications) is not likely to be between 75 and 85.
- There is significant variability in the simulated wait time across the runs (replications).
- There is not very little variability in the simulated wait time across the runs (replications).
Simulation Validation
If the simulated wait time is 50% higher than observed reality, one must investigate to see what is wrong with the simulation, as it indicates a poor match to reality.
Classification of Optimization Problem Types
Optimization problems are classified based on the structure of their objective function and constraints:
Linear Programming (LP)
Objective:
∑i cixiConstraints:
∑i aijxi ≥ bjConvex Quadratic Programming (CQP)
Objective:
∑i cixi^2Constraints:
∑i aijxi ≥ bjConvex Programming (CP)
Objective:
∑i ci|xi−6|Constraints:
∑i aijxi ≥ bjInteger Programming (IP)
Objective:
∑i cixiConstraints:
∑i aijxi ≥ bj, wherexi ∈ {0, 1}(Binary/Integer variables)General Non-Convex Programming (GNCP)
Objective:
ci sin xiConstraints: (Linear or non-linear)
General Non-Convex Programming (GNCP)
Objective:
∑i cixiConstraints:
∑i ∑k aikjxixk ≤ bj(Non-linear, non-convex constraints)Linear Programming (LP)
Objective:
(log c) xi(Assuming log c is a constant coefficient)Constraints:
∑i aijxi ≥ bj
Queuing Theory and Markov Chain Properties
- To check system stability (utilization): Take the reciprocal of the service rate (1/μ), multiply by the number of service lines (c), and check if this value is greater than the arrival rate (λ). If
c/μ > 1/λ, the system is stable. - If a process is not memoryless, the standard Markov chain model would not be well-defined or appropriate for modeling the system state transitions.
Decision Making and Statistical Measures
Exploration vs. Exploitation
- Use more exploration if observed rates are similar.
- Use exploitation if observed rates are very different (choose the lowest or highest rate depending on the context of the problem).
Choosing Appropriate Measures
- Binomial-based data: Use the highest rate or fraction.
- Parametric data: Use the average or mean.
- Non-parametric data: Use the median.
Regression Regularization Techniques
Regularization Constraints and Objective Functions
These techniques minimize the sum of squared errors (SSE) subject to constraints on the coefficients (aj):
Standard Linear Regression (No Regularization)
Minimize:
∑n i=1 (yi − (a0 + ∑m j=1 ajxij))^2Lasso Regression (L1 Penalty)
Constraint:
∑j |aj| ≤ T(T is the tuning parameter)Ridge Regression (L2 Penalty)
Constraint:
∑j (aj)^2 ≤ TElastic Net
Penalty Term:
λ ∑j |aj| + (1 − λ) ∑j (aj)^2(Combines L1 and L2 penalties)
Variable Selection Properties
- Lasso: Selects the fewest variables (performs feature selection by driving coefficients to zero).
- Elastic Net (EN): Selects a medium number of variables.
- Linear Regression (LR) / Ridge Regression (RR): Selects the most variables (all variables are retained, though coefficients may be small).
Model Complexity and Overfitting
- Should we seek a simpler model? YES. Because there isn't enough data to avoid overfitting a model with many factors.
- Should we seek a more complex model? NO. (Unless complexity is justified by data volume and performance gains).
- Should we seek a simpler model? YES. To improve interpretability and generalization.
Optimization vs. Regression Perspective
- Optimization Perspective: Coefficients (a) are variables; inputs (x) are constants.
- Regression Perspective: Inputs (x) are variables; coefficients (a) are constants (parameters to be estimated).
Data Science Modeling Workflow
A typical workflow for developing and validating predictive models:
- Remove outliers from the dataset.
- Impute missing data values and scale the data appropriately.
- Fit a Lasso regression model on all available variables (for feature selection).
- Fit alternative models (e.g., linear regression, regression tree, and random forest) using only the variables chosen by the Lasso regression model.
- Pick the best model to use based on performance metrics evaluated on a dedicated validation dataset.
- Test the final chosen model on a separate, unseen test dataset to estimate its true generalization quality.
Advanced Analytical Methods for Decision Making
| Problem Description | Appropriate Analytical Method |
|---|---|
| Find the best airline schedule with uncertain delays | Stochastic Optimization |
| Find the best portfolio with uncertain investment returns | Stochastic Optimization |
| Determine the best route for delivery given uncertainties in traffic | Stochastic Optimization |
| Decide how many products to manufacture with uncertain demand | Stochastic Optimization |
| Estimate the required number of workers for a call center | Queuing Theory |
| Determine how many checkout lanes are needed in a supermarket or tables in a restaurant | Queuing Theory |
| Compare the median age of MSA students across campus and online programs | Non-parametric Test |
| Determine if the median home price is lower in one city versus another | Non-parametric Test |
| Identify which month has a higher median temperature | Non-parametric Test |
| Identify which sets of electives or recipes share common elements | Louvain Algorithm (Community Detection) |
| Find groups of electives that are often taken by the same students | Louvain Algorithm (Community Detection) |
| Find sets of terrorists (network community detection) | Louvain Algorithm (Community Detection) |
| Determine how much to bid in competitive situations | Game-Theoretic Analysis |
| Determine the best marketing strategy, given competitor reaction | Game-Theoretic Analysis |