Beach Bliss Adventures & Insurance is dedicated to enhancing vacation planning services and providing comprehensive insurance packages, including unique coverage such as shark attack insurance.
To accurately price these insurance packages, we identified five key hypotheses to test and analyze:
- Sharks attack younger individuals more frequently than older individuals.
- The frequency of shark attacks differs between males and females.
- Shark attacks occur more frequently in the USA compared to other countries.
- There are more fatalities than injuries globally.
- Surfing is the most dangerous activity.
The dataset provided by the Global Shark Attack File (GSAF) is available here. This dataset logs shark attack incidents and includes various details such as:
- Date of the occurrence
- Year of the occurrence
- Type of incident
- Country where the attack occurred
- State where the attack occurred
- Location of the attack
- Activity the victim was engaged in
- Name of the victim
- Sex of the victim
- Age of the victim, if available
- Injury details, including whether it was fatal or non-fatal
- Species of the shark
- Source of the information
- Unknown columns
We used the Pandas library in Python to read and manipulate this Excel data as a dataframe, employing the read_excel
method. Additionally, we used the Seaborn library to create charts and plots for our analysis.
To address our hypotheses, we employed various data analysis techniques, including statistical testing, activity-specific trend analysis, and demographic segmentation. These methods help uncover patterns and correlations within the data that directly inform our insurance pricing strategy.
- Data Import
- Data Cleaning
- Data Standardization
- Data Filtering
During the data cleaning process, we discovered and removed several irrelevant columns that were not pertinent to our business plan.
Objective: To identify the most common age among those who have been attacked by sharks.
Hypothesis: Sharks attack younger people more frequently than older individuals.
Approach:
- Clean the data (remove null and outdated data)
- Identify the most common age attacked by sharks
- Create a graph in Seaborn to visualize the data
- Narrow down to the top 15 age groups
Results:
- Shark attacks are most common in the 17-year-old cohort.
- 36% of attacks occur between the ages of 13 and 29.
- Based on this, the premium for this age bracket will be priced higher.
Objective: To identify the most common gender among those who have been attacked by sharks.
Hypothesis: There is a difference in shark attack rates between males and females.
Approach:
- Clean the data (remove null and outdated data)
- Create a table showing the relationship between age and gender
- Use DataFrame
groupby
gender and age - Create a graph in Seaborn to visualize the data
Results:
- Males are attacked 2999 times (85%) compared to females, who are attacked 503 times (15%).
- The most attacked males are 17 years old (143 incidents).
- The most attacked females are 13 years old (28 incidents).
Objective: To identify the most common country where shark attacks occur.
Hypothesis: There are more shark attacks in the USA than anywhere else in the world.
Approach:
- Clean the data (remove null and outdated data)
- Create a table showing the relationship between country and state by attacks
- Use DataFrame
groupby
country and size - Create a graph in Seaborn to visualize the data
Results:
- The USA has 1615 attacks (45%), the most of any country.
- The next four countries (Australia, South Africa, Bahamas, and Brazil) combined have fewer attacks than the USA.
Objective: To identify the most common injuries among those who have been attacked by sharks.
Hypothesis: There are more fatalities than injuries globally.
Approach:
- Clean the data (remove null and outdated data)
- Create a table showing the relationship between injuries in ascending order
- Use DataFrame
groupby
injury and size - Create a graph in Seaborn to visualize the data
Results:
- There are 321 fatalities, constituting 9% of all injuries.
- Leg injuries are the most common among all injuries.
Objective: To identify the most dangerous activities among those who have been attacked by sharks.
Hypothesis: The most dangerous activities involve water sports.
Approach:
- Clean the data (remove null and outdated data)
- Create a table showing the relationship between activity and injury
- Use DataFrame
groupby
activity and size - Create a graph in Seaborn to visualize the data
Results:
- Surfing injuries total 848, making it the most dangerous activity.
- Swimming injuries total 622.
- The top 5 most dangerous activities make up 56% of all injuries.
Our analysis provided several clear insights:
- Age Factor: Younger individuals are more frequently targeted by sharks, suggesting age-based insurance rate adjustments.
- Gender Differences: Notable differences exist in shark attack rates between males and females, necessitating gender-specific risk assessments.
- Injury Types: Fatalities constitute 9% of all injuries, with leg injuries being the most common.
- Activity Risks: Surfing, swimming, and diving are the most dangerous activities, highlighting the need for activity-specific insurance packages.
- Geographic Variation: Shark attacks are significantly more frequent in the United States, justifying location-based insurance pricing.
By leveraging these insights, Beach Bliss Insurance demonstrates a commitment to continuous improvement and innovation in its service offerings. The actionable recommendations derived from our data analysis enhance the precision of our insurance pricing and improve overall customer satisfaction and safety.
[Notebook on GitHub] (https://docs.google.com/presentation/d/1hsMnwhCPld-1M7v_hPyn1199BGjw2ZxJBo6nrpb4zZE/edit?usp=sharing)