Essential Concepts in Statistical Modeling and Optimization Methods

Written on November 9, 2025 in English with a size of 13.69 KB

Probability Distributions for Discrete Events

The following table matches common scenarios to their appropriate probability distributions:

Scenario Description	Distribution Type
Number of people clicking an online banner ad each hour	Poisson
Number of arrivals to a flu-shot clinic each minute	Poisson
Number of hits to a real estate website each minute	Poisson
Number of arrivals to the ID-check queue at an airport each minute	Poisson
Number of people entering a grocery store each minute	Poisson
Number of penalty kicks taken until one is saved	Geometric
Number of faces correctly identified by Deep Learning (DL) software until an error occurs	Geometric
Of the first 100 people viewing a house listing, the number who tour it	Binomial
Number of days in a year with temperature 3+ degrees above forecast	Binomial
Time between arrivals to a flu-shot clinic	Exponential
Time between hits on a real estate website	Exponential
Time between people entering a grocery store	Exponential
Time from the start of a World Cup soccer match until a goal is scored	Weibull
Time from when a house is on the market until the first offer	Weibull
Time from the beginning of Fall until the first snowflake is seen	Weibull
Time from when a generator is turned on until it fails	Weibull

Strategies for Handling Missing Data and Imputation

Five models proposed for handling missing school rating data:

Model 1: Imputation using the average school rating derived from the rest of the dataset.
Model 2: Imputation using a regression model based on other available variables.
Model 3: Two-step approach: First, classify whether the school was built due to population growth, then select the appropriate regression model based on that classification.
Model 4: Use a binary variable to explicitly identify locations where information is missing.
Model 5: Use a categorical variable approach with three categories: "data available," "missing, population growth," and "missing, other reason."

Model Feasibility Notes

Model 3: Ratings can be used / Reasons can be inferred.
Model 2: Can be used / Cannot be used (depending on context).
Model 5: Cannot be used / Can be used (depending on context).
Model 4: Cannot be used / Cannot be used (depending on context).

Formulating Optimization Constraints with Binary Variables

Let y represent binary variables (1 if eaten, 0 if not) and x represent continuous amounts eaten. M is a large constant.

Mutual Exclusivity Constraints
Out of peanut butter and cheese sauce, exactly one must be eaten:
y_peanutbutter + y_cheesesauce = 1
OR
y_peanutbutter = 1 - y_cheesesauce
Neither peanut butter nor cheese sauce can be eaten:
y_peanutbutter + y_cheesesauce = 0
Either peanut butter or cheese sauce, but not both, must be eaten (Exclusive OR):
y_peanutbutter = 1 - y_cheesesauce
Conditional Constraints (Broccoli)
Either cheese sauce or peanut butter (or both) must be eaten with broccoli:
y_broccoli ≤ y_cheesesauce + y_peanutbutter
(Note: This constraint means if broccoli is eaten (y_broccoli=1), then the sum of the others must be at least 1.)
Broccoli can only be eaten if either cheese sauce or peanut butter (or both) is also eaten:
y_broccoli ≤ y_cheesesauce + y_peanutbutter
If cheese sauce and peanut butter are not eaten, then broccoli can't be eaten:
y_broccoli ≤ y_cheesesauce + y_peanutbutter
Limiting Total Items
No more than two of broccoli, cheese sauce, and peanut butter may be eaten:
y_broccoli + y_cheesesauce + y_peanutbutter ≤ 2
Broccoli, cheese sauce, and peanut butter all cannot be eaten together:
y_broccoli + y_cheesesauce + y_peanutbutter ≤ 2
Linking Continuous and Binary Variables (Big M Formulation)
No amount of cheese sauce may be eaten unless its binary variable is 1 (If any amount of cheese sauce is eaten, then its binary variable must be 1):
x_cheesesauce ≤ M · y_cheesesauce
Cheese sauce must be eaten:
y_cheesesauce = 1
Unless peanut butter is eaten, no amount of broccoli can be eaten:
x_broccoli ≤ M · y_peanutbutter
If any amount of broccoli is eaten, then peanut butter must also be eaten:
x_broccoli ≤ M · y_peanutbutter

Discrete-Event Simulation and Replication

Stochastic Discrete-Event Simulation

When a company creates a stochastic discrete-event simulation, many replications are needed because of the inherent variability and randomness in the system being modeled.

Interpreting Simulation Run Results

A simulation could stop after 300 or 400 events, but it could not stop after only 5 events (implying a minimum run length requirement).
The simulated wait time was not 50 or less just once out of all the runs (implying consistency).
The expected wait time of simulated runs (replications) is likely to be between 65 and 75 (Confidence Interval interpretation).
The expected wait time of simulated runs (replications) is not likely to be between 75 and 85.
There is significant variability in the simulated wait time across the runs (replications).
There is not very little variability in the simulated wait time across the runs (replications).

Simulation Validation

If the simulated wait time is 50% higher than observed reality, one must investigate to see what is wrong with the simulation, as it indicates a poor match to reality.

Classification of Optimization Problem Types

Optimization problems are classified based on the structure of their objective function and constraints:

Linear Programming (LP)
Objective: ∑i cixi
Constraints: ∑i aijxi ≥ bj
Convex Quadratic Programming (CQP)
Objective: ∑i cixi^2
Constraints: ∑i aijxi ≥ bj
Convex Programming (CP)
Objective: ∑i ci|xi−6|
Constraints: ∑i aijxi ≥ bj
Integer Programming (IP)
Objective: ∑i cixi
Constraints: ∑i aijxi ≥ bj, where xi ∈ {0, 1} (Binary/Integer variables)
General Non-Convex Programming (GNCP)
Objective: ci sin xi
Constraints: (Linear or non-linear)
General Non-Convex Programming (GNCP)
Objective: ∑i cixi
Constraints: ∑i ∑k aikjxixk ≤ bj (Non-linear, non-convex constraints)
Linear Programming (LP)
Objective: (log c) xi (Assuming log c is a constant coefficient)
Constraints: ∑i aijxi ≥ bj

Queuing Theory and Markov Chain Properties

To check system stability (utilization): Take the reciprocal of the service rate (1/μ), multiply by the number of service lines (c), and check if this value is greater than the arrival rate (λ). If c/μ > 1/λ, the system is stable.
If a process is not memoryless, the standard Markov chain model would not be well-defined or appropriate for modeling the system state transitions.

Decision Making and Statistical Measures

Exploration vs. Exploitation

Use more exploration if observed rates are similar.
Use exploitation if observed rates are very different (choose the lowest or highest rate depending on the context of the problem).

Choosing Appropriate Measures

Binomial-based data: Use the highest rate or fraction.
Parametric data: Use the average or mean.
Non-parametric data: Use the median.

Regression Regularization Techniques

Regularization Constraints and Objective Functions

These techniques minimize the sum of squared errors (SSE) subject to constraints on the coefficients (a_j):

Standard Linear Regression (No Regularization)
Minimize: ∑n i=1 (yi − (a0 + ∑m j=1 ajxij))^2
Lasso Regression (L1 Penalty)
Constraint: ∑j |aj| ≤ T (T is the tuning parameter)
Ridge Regression (L2 Penalty)
Constraint: ∑j (aj)^2 ≤ T
Elastic Net
Penalty Term: λ ∑j |aj| + (1 − λ) ∑j (aj)^2 (Combines L1 and L2 penalties)

Variable Selection Properties

Lasso: Selects the fewest variables (performs feature selection by driving coefficients to zero).
Elastic Net (EN): Selects a medium number of variables.
Linear Regression (LR) / Ridge Regression (RR): Selects the most variables (all variables are retained, though coefficients may be small).

Model Complexity and Overfitting

Should we seek a simpler model? YES. Because there isn't enough data to avoid overfitting a model with many factors.
Should we seek a more complex model? NO. (Unless complexity is justified by data volume and performance gains).
Should we seek a simpler model? YES. To improve interpretability and generalization.

Optimization vs. Regression Perspective

Optimization Perspective: Coefficients (a) are variables; inputs (x) are constants.
Regression Perspective: Inputs (x) are variables; coefficients (a) are constants (parameters to be estimated).

Data Science Modeling Workflow

A typical workflow for developing and validating predictive models:

Remove outliers from the dataset.
Impute missing data values and scale the data appropriately.
Fit a Lasso regression model on all available variables (for feature selection).
Fit alternative models (e.g., linear regression, regression tree, and random forest) using only the variables chosen by the Lasso regression model.
Pick the best model to use based on performance metrics evaluated on a dedicated validation dataset.
Test the final chosen model on a separate, unseen test dataset to estimate its true generalization quality.

Advanced Analytical Methods for Decision Making

Problem Description	Appropriate Analytical Method
Find the best airline schedule with uncertain delays	Stochastic Optimization
Find the best portfolio with uncertain investment returns	Stochastic Optimization
Determine the best route for delivery given uncertainties in traffic	Stochastic Optimization
Decide how many products to manufacture with uncertain demand	Stochastic Optimization
Estimate the required number of workers for a call center	Queuing Theory
Determine how many checkout lanes are needed in a supermarket or tables in a restaurant	Queuing Theory
Compare the median age of MSA students across campus and online programs	Non-parametric Test
Determine if the median home price is lower in one city versus another	Non-parametric Test
Identify which month has a higher median temperature	Non-parametric Test
Identify which sets of electives or recipes share common elements	Louvain Algorithm (Community Detection)
Find groups of electives that are often taken by the same students	Louvain Algorithm (Community Detection)
Find sets of terrorists (network community detection)	Louvain Algorithm (Community Detection)
Determine how much to bid in competitive situations	Game-Theoretic Analysis
Determine the best marketing strategy, given competitor reaction	Game-Theoretic Analysis

Related entries:

Tags:

Probability Distributions for Discrete Events

Strategies for Handling Missing Data and Imputation

Model Feasibility Notes

Formulating Optimization Constraints with Binary Variables

Mutual Exclusivity Constraints

Conditional Constraints (Broccoli)

Limiting Total Items

Linking Continuous and Binary Variables (Big M Formulation)

Discrete-Event Simulation and Replication

Stochastic Discrete-Event Simulation

Interpreting Simulation Run Results

Simulation Validation

Classification of Optimization Problem Types

Linear Programming (LP)

Convex Quadratic Programming (CQP)

Convex Programming (CP)

Integer Programming (IP)

General Non-Convex Programming (GNCP)