Python Fundamentals and String Methods for Data Preparation

Posted by Anonymous and classified in Computers

Written on in English with a size of 5.02 KB

Day 1: Python Basics for AI/ML Preparation

Output Statements

  • print() – Displays output to the console.
  • end=" " – Prevents a new line after the print statement.
  • sep="," – Specifies the character used to separate multiple items in the output.

Input and Type Casting

  • The input() function always returns a string data type.
  • Adding strings results in *concatenation* (e.g., "5" + "10" = "510").
  • Use int() or float() for numerical input conversion:

    Example: num = int(input("Enter a number: "))

Variables and Data Types

  • Common types include int (integer), float (decimal), str (string), and bool (boolean).
  • Use type(variable) to check the data type of any variable.

Formatted Strings (f-Strings)

  • Used for easy string formatting and embedding variables:

    Example: print(f"Value: {num}")

  • Limiting decimal places:

    Use {num:.2f} to display the number with 2 decimal places.

Naming Convention

  • Use snake_case for variable and function names (all lowercase, words separated by underscores).

    Example: my_variable_name = 10

Day 2: Python Data Cleaning String Methods

Essential String Trimming and Replacement

These methods are used to remove unwanted characters or spaces from strings, crucial for data preparation.

  • .lstrip(): Removes spaces or specified characters from the left side.

    Example 1: " Kamal" becomes "Kamal"

    Example 2: "__Kamal".lstrip("__") becomes "Kamal"

  • .rstrip(): Removes spaces or specified characters from the right side.

    Example: "Kamal " becomes "Kamal"

  • .strip(): Removes spaces or specified characters from both sides.

    Example 1: " Kamal " becomes "Kamal"

    Example 2: "1kamal3".strip("13") becomes "kamal"

  • .replace(): Replaces all occurrences of a specified substring.

    Example 1: "Ram ** Shrestha".replace(" ** ", " ") becomes "Ram Shrestha"

    Note: You can also replace letters, like replacing "R" with "Hari".

Cleaning Messy Text Workflow

Combine methods to handle complex data issues:

  • Use strip() to clean leading/trailing characters.
  • Use replace() to clean internal parts of the string.
  • *Complex Example:* To clean "--My ----name is kamal 123___":

    Apply strip(' -123_') followed by replace(' ----', ' ').

Day 3: Advanced String Methods for Data Normalization

  1. .title() Method

    • Capitalizes the first letter of each word in a string.
    • Example: 'john doe'.title() results in 'John Doe'.
    • Often combined with strip() and replace() to clean text and properly capitalize names.
  2. .split() Method

    • Splits a string into a list of substrings based on a delimiter (default is whitespace).
    • Example: 'John Doe'.split() results in ['John', 'Doe'].
    • Can unpack results directly into variables:

      first_name, last_name = name.split()

    • Works with custom delimiters:

      'Kamal,Raj,Paudel'.split(',') results in ['Kamal', 'Raj', 'Paudel']

  3. .capitalize() Method

    • Capitalizes only the first letter of the entire string, converting all other letters to lowercase.
    • Example: 'kamal paudel'.capitalize() results in 'Kamal paudel'.

Practical Data Cleaning Tasks and Examples

  • Name Cleaning: Use strip() + replace() + title() sequentially to handle messy characters and ensure proper capitalization of names.
  • Extraction: Extract first and last names efficiently using split() after initial cleaning.
  • Phone Number Normalization: Use replace() to remove unwanted characters or country codes (e.g., removing (+977)).
  • Complex String Processing: Combine strip(), replace(), title(), and split() to clean and parse complex strings containing mixed data like names and phone numbers.

Related entries: