Skip to content

🤖Review and expand DataFrame operations section in "Data Handling with Pandas" notebook #16

@clstaudt

Description

@clstaudt

Issue Description

The notebook introduces basic DataFrame operations but can be expanded to showcase a wider range of common manipulations, including handling missing data and more complex filtering.

Examples

The notebook could include examples of:

  • Handling missing data with methods like dropna() and fillna()
  • More complex boolean indexing with multiple conditions
  • The use of the .query() method for filtering
  • Demonstrating .apply() for applying a function to rows/columns

Proposed Change

  • Add new content sections demonstrating the above operations.
  • Provide additional context as to why these operations are useful in data analysis.
  • Include best practice tips, such as avoiding in-place modifications when exploring data.

Example Implementation

# Handling missing data
df_cleaned = df.dropna()  # Drops rows with any missing values
df_filled = df.fillna(method='ffill')  # Forward-fill missing values


# Complex boolean indexing

high_quality_red = df[(df['quality'] > 7) & (df['color'] == 'red')]

# Using .query() for filtering
high_quality_red_query = df.query("quality > 7 and color == 'red'")

# Applying a function with .apply()
df['quality_label'] = df['quality'].apply(lambda x: 'high' if x > 7 else 'low')

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions