Python Fundamentals and String Methods for Data Preparation
Day 1: Python Basics for AI/ML Preparation
Output Statements
print()– Displays output to the console.end=" "– Prevents a new line after the print statement.sep=","– Specifies the character used to separate multiple items in the output.
Input and Type Casting
- The
input()function always returns a string data type. - Adding strings results in *concatenation* (e.g.,
"5" + "10" = "510"). - Use
int()orfloat()for numerical input conversion:Example:
num = int(input("Enter a number: "))
Variables and Data Types
- Common types include
int(integer),float(decimal),str(string), andbool(boolean). - Use
type(variable)to check the data type of any variable.
Formatted Strings (f-Strings)
- Used for easy string formatting and embedding variables:
Example:
print(f"Value: {num}") - Limiting decimal places:
Use
{num:.2f}to display the number with 2 decimal places.
Naming Convention
- Use snake_case for variable and function names (all lowercase, words separated by underscores).
Example:
my_variable_name = 10
Day 2: Python Data Cleaning String Methods
Essential String Trimming and Replacement
These methods are used to remove unwanted characters or spaces from strings, crucial for data preparation.
.lstrip(): Removes spaces or specified characters from the left side.Example 1:
" Kamal"becomes"Kamal"Example 2:
"__Kamal".lstrip("__")becomes"Kamal".rstrip(): Removes spaces or specified characters from the right side.Example:
"Kamal "becomes"Kamal".strip(): Removes spaces or specified characters from both sides.Example 1:
" Kamal "becomes"Kamal"Example 2:
"1kamal3".strip("13")becomes"kamal".replace(): Replaces all occurrences of a specified substring.Example 1:
"Ram ** Shrestha".replace(" ** ", " ")becomes"Ram Shrestha"Note: You can also replace letters, like replacing
"R"with"Hari".
Cleaning Messy Text Workflow
Combine methods to handle complex data issues:
- Use
strip()to clean leading/trailing characters. - Use
replace()to clean internal parts of the string. - *Complex Example:* To clean
"--My ----name is kamal 123___":Apply
strip(' -123_')followed byreplace(' ----', ' ').
Day 3: Advanced String Methods for Data Normalization
.title()Method- Capitalizes the first letter of each word in a string.
- Example:
'john doe'.title()results in'John Doe'. - Often combined with
strip()andreplace()to clean text and properly capitalize names.
.split()Method- Splits a string into a list of substrings based on a delimiter (default is whitespace).
- Example:
'John Doe'.split()results in['John', 'Doe']. - Can unpack results directly into variables:
first_name, last_name = name.split() - Works with custom delimiters:
'Kamal,Raj,Paudel'.split(',')results in['Kamal', 'Raj', 'Paudel']
.capitalize()Method- Capitalizes only the first letter of the entire string, converting all other letters to lowercase.
- Example:
'kamal paudel'.capitalize()results in'Kamal paudel'.
Practical Data Cleaning Tasks and Examples
- Name Cleaning: Use
strip()+replace()+title()sequentially to handle messy characters and ensure proper capitalization of names. - Extraction: Extract first and last names efficiently using
split()after initial cleaning. - Phone Number Normalization: Use
replace()to remove unwanted characters or country codes (e.g., removing(+977)). - Complex String Processing: Combine
strip(),replace(),title(), andsplit()to clean and parse complex strings containing mixed data like names and phone numbers.
English with a size of 5.02 KB