Key Concepts in Data Science & Scientific Computing
Visualization Techniques
- Overlapping Histograms: Use semi-transparent
alpha
parameter for comparison.
Data Structures & Algorithms
- BFS Implementation:
collections.deque
is ideal for Breadth-First Search. - Grid Representation: Obstacles often represented by a value like
1
.
Jupyter Notebook & Markdown
- Markdown Headings: Use
#
prefix for headings in Jupyter Markdown cells.
Optimization & Least Squares
- Normal Equations: Direct matrix inversion for Least Squares:
β = (XᵀX)⁻¹Xᵀy
.
Numerical Integration & Simulation
- Orbit Simulation: Runge–Kutta 4th order method is a common integration technique.
Search Algorithms
- Brute-Force Search: Often implemented using nested loops.
Python Ecosystem Fundamentals
Python Language Features
- Dynamic Typing: Variables have no fixed type; type is determined at runtime.
- Indentation: 4 spaces define code blocks (no braces).
- Chained Comparisons:
0 < x < 1
is equivalent to 0 < x and x < 1
. - Comprehension: Builds a new container in one line (e.g., list, dict, set).
- Exception Handling:
try/except
is preferred over pre-checks (EAFP - Easier to Ask for Forgiveness than Permission).
Core Python Data Structures
- List: Ordered, mutable, allows duplicate elements.
- Tuple: Ordered, immutable, generally faster than lists.
- Dictionary (Dict): Key-value map; average lookup time is O(1).
- Set: Unordered collection of unique elements; fast membership tests.
NumPy Concepts
- ndarray: N-dimensional homogeneous array; fixed-size, contiguous in memory.
- Vectorization: Apply operations element-wise without explicit Python loops for performance.
- Broadcasting: Aligning shapes by "stretching" singleton dimensions for element-wise operations.
- uFuncs (Universal Functions): Fast C-implemented functions (e.g.,
np.sin
, np.exp
).
Pandas Principles
- Series vs. DataFrame: 1-D labeled array vs. 2-D tabular data structure.
- Indexing:
.loc
for label-based indexing, .iloc
for integer-based indexing. - Missing Data: Represented by
NaN
(Not a Number) for floats; use .dropna()
or .fillna()
. - GroupBy: Split-apply-combine pattern for aggregations.
Visualization Terms (Matplotlib)
- Figure vs. Axes: Figure is the overall canvas; Axes is an individual plot area.
- Legend:
ax.legend()
displays only labeled artists on the plot. - Alpha: Opacity of plot elements (0 for transparent, 1 for opaque).
- Histogram Bins: Number of bins controls resolution versus noise in a histogram.
- KDE (Kernel Density Estimate): Smooth estimate of the underlying Probability Density Function (PDF); bandwidth controls smoothness.
Discrete Dynamics
- State Vector: Encapsulates all variables needed to advance a system's state.
- Time Stepping (Explicit Euler):
xk+1 = xk + f(xk)·dt
(1st order approximation). - Equilibrium: A state
x*
such that x* = f(x*)
. - Stability: Local stability occurs when
|f′(x*)| < 1
.
Optimization & Least Squares
- Objective Function: The function to minimize (cost) or maximize (gain).
- Decision Variables: Parameters that are adjusted during optimization.
- Constraints: Equalities or inequalities that restrict decision variables.
- Normal Equations: Closed-form solution for Least Squares:
β = (XᵀX)⁻¹Xᵀy
. - SciPy Optimization: Use
scipy.optimize.minimize(fun, x0, bounds, constraints)
.
Search Algorithms (BFS)
- FIFO Queue: Ensures the first-explored layer is the shallowest, leading to the shortest path.
- Visited Set/Map: Avoids revisiting nodes and records parents for backtracking.
- Grid Encoding: Common representation:
0
for free space, 1
for blocked/obstacle. Neighbors are typically up/down/left/right.
Jupyter Notebook & Python Style
- Shift+Enter: Runs the current cell and moves to the next.
- Markdown Cells: Cell type for text; prefix headings with
#
. - PEP8: Python style guide recommendations: 4-space indent,
snake_case
for variables/functions, max ~79 characters per line, grouped imports.
Python & Plotting Utilities
*=
: Works with strings for repetition (e.g., "abc" * 3
).list(filter())
: Removes elements that satisfy the condition inside the filter function.ax.fill_between(x, y1, y2)
: Fills the area between two y-coordinates.max(d, key=d.get)
: Returns the key with the maximum value in a dictionary d
.
Practical Code Snippets & Examples
Python Language Features
- List Comprehension: Generate lists concisely.
e = [i for i in range(10) if i % 2 == 0] # [0, 2, 4, 6, 8]
- Sorting Lists: Sort in ascending or descending order.
m = sorted(my_list, reverse=True) # Sorts from largest to smallest
- F-strings (Formatted String Literals): Embed expressions inside string literals.
print(f'Distance {v:.2f} m') # Formats 'v' to two decimal places
- Finding Max Value and Index:
peak_power = round(max(elec_power), 2)
i = elec_power.index(peak_power)
- New Line Character:
\n
inserts a new line. - Dictionary Definition:
s = {'a': 1, 'b': 2}
- Dictionary Comprehension:
squares = {k: k*k for k in range(1, 6)} # {1: 1, 2: 4, 3: 9, 4: 16, 5: 25}
- Looping Through Strings:
for letter in input_str: # Iterates over each character
- Boolean Checks:
any(x > 5 for x in data)
: Returns True
if any element in data
is greater than 5.all(x > 0 for x in data)
: Returns True
if all elements in data
are greater than 0.
Math Module Usage
- Importing Pi:
from math import pi
- Trigonometric Functions:
math.cos(math.radians(angles[i]))
- Square Root:
math.sqrt(d)
NumPy Array Operations
- Array Creation:
a = np.array([1, 2, 3])
- Array Properties:
arr.shape
: Dimensions of the array (rows, columns).arr.size
: Total number of elements.arr.dtype
: Data type of array elements.
- Statistical Functions:
np.mean(data[:, 2])
: Mean of the third column.np.max(arr)
: Maximum value in the array.np.min(arr)
: Minimum value in the array.np.mean(arr)
/ np.median(arr)
: Mean or median of the array.np.sum(arr)
: Sum of array elements.
- Special Arrays:
np.eye(N)
: Creates an N x N identity matrix. - Sorting & Indexing:
np.sort(arr)
: Returns a sorted copy of the array.np.argmax(arr)
: Returns the indices of the maximum values along an axis.
Pandas DataFrame Operations
- DataFrame Creation:
df = pd.DataFrame(...)
. - Loading & Saving Data:
df = pd.read_csv('data.csv')
: Loads data from a CSV file.df.to_csv('output.csv', index=False)
: Saves DataFrame to CSV without index.
- DataFrame Properties:
df.shape
: Returns a tuple representing the dimensions (rows, columns).df.describe()
: Generates descriptive statistics of the DataFrame.df.count()
: Counts non-null observations per column.df.sum()
/ df.mean()
: Sum or mean of DataFrame elements/columns.
- Data Selection & Filtering:
ages = df['age']
: Selects a single column (Series).row5 = df.iloc[4]
: Selects the 5th row by integer position.adults = df[df.age >= 18]
: Filters rows where age is 18 or greater.
- Grouping Data:
grouped = df.groupby('category').mean()
: Groups by 'category' and computes the mean.
Matplotlib Plotting Commands
- Basic Plotting:
ax.plot(x, y, color='blue', alpha=0.6, linestyle='-', marker='o', markersize=5, label='My Data')
Note: title
is usually set via ax.set_title()
, not directly in ax.plot()
. - Setting Labels & Legend:
ax.set_xlabel('x-axis label')
: Sets the label for the x-axis.ax.legend()
: Displays the legend for labeled plot elements.
- Displaying & Saving Plots:
plt.show()
: Displays all open figures.plt.savefig('plot.png')
: Saves the current figure to a file.
- Common Plot Types:
ax.plot(...)
: Line plot.ax.scatter(...)
: Scatter plot (individual points).ax.bar(...)
: Vertical bar chart.ax.barh(...)
: Horizontal bar chart.ax.hist(...)
: Histogram.ax.pie(...)
: Pie chart.ax.errorbar(...)
: Plot data with error bars.
Numerical Simulation Example (Projectile Motion)
import numpy as np
# Physical parameters
g = 9.81 # Gravity (m/s^2)
dt = 0.01 # Time step (s)
n_steps = 1000 # Number of simulation steps
# Initial conditions
y0 = 10.0 # Initial height (m)
v0 = 0.0 # Initial velocity (m/s)
def dynamics(xk):
"""
Compute the next state [y, v] from the current state xk = [y, v].
Applies explicit Euler integration.
"""
yk, vk = xk
vn = vk - g * dt
yn = yk + vk * dt
return [yn, vn]
# Pre-allocate state array
x = np.zeros((n_steps, 2))
x[0] = [y0, v0]
# Simulation loop
for k in range(n_steps - 1):
x[k + 1] = dynamics(x[k])