The primary objective of this project was to address the challenge of high employee turnover within a company. The goal was to understand the factors influencing employees to either stay or leave the organization and to develop strategies for reducing turnover. The overarching aim was to create a data-driven approach to human resource management that would contribute to a healthier and more stable work environment.
The project adopted a comprehensive methodology, leveraging the CRISP-DM model for data analysis. The initial focus was on establishing a clear business understanding, defining objectives, and recognizing success criteria. Subsequently, data preparation involved cleaning and exploring the dataset, utilizing descriptive statistics and graphical representations. The identification of potential predictors through data modeling, employing techniques like Decision Trees, Random Forests, and Logistic Regression, played a crucial role.
To refine the analysis, the project addressed issues of multicollinearity through the use of Variance Inflation Factor (VIF). Furthermore, dummy variables were employed for categorical features, streamlining the representation of data and reducing dimensionality. A key aspect of the solution was the integration of business objectives with data mining goals, aligning technical findings with organizational needs.
The outcomes were multifaceted. The project successfully identified significant correlations between variables, highlighting factors like years of service and promotions as key influencers of attrition. Through dummy variable reduction and careful consideration of statistical measures, the final dataset was refined. Although the predictive models achieved an 80% accuracy rate, surpassing the defined success criteria, our analysis can only lead us so far because our data might not capture all issues in workplace. Over half of all laboratory technicians worked overtime and only 26% have left the workplace. It is important to remember that employees motives and issues are not captured during this dataset. We would highly recommend Exit Interviews/ Surveys and Focus Group to examine the employee's motivations and ambitions.
- High Dimensionality: Managing a dataset with numerous variables and features, requiring careful consideration to avoid information overload and ensure meaningful analysis.
- Multicollinearity: Addressing high correlations among variables to prevent distortions in regression coefficients and ensure accurate model interpretation.
- Data Quality Assurance: Ensuring the reliability and completeness of data, including handling missing values and outliers that can impact the validity of analysis.
- Ethical Considerations: Navigating ethical challenges related to data privacy, especially when dealing with sensitive information about employees.
- Decision-Making in Feature Selection: Choosing relevant features for analysis and model development, considering business objectives and potential impacts on outcomes.
- Imbalanced Dataset: More employees stayed in their jobs than left, making it very difficult to answer which employees will churn due to imbalanced data even though SMOTE upsampling techniques were engaged.
- Data Preprocessing: Cleaning and preparing data for analysis, including handling missing values, outliers, and transforming variables.
- Statistical Analysis: Applying a variety of statistical methods, including correlation analysis, VIF, and regression, to derive meaningful insights from data.
- Machine Learning Techniques: Implementing machine learning algorithms, such as Decision Trees, Random Forests, and Logistic Regression, for predictive modeling
- Data Visualization: Creating effective visualizations using tools like Tableau to present complex findings in a clear and accessible manner.
- Research Design: Developing a robust research design, including defining research questions, objectives, and success criteria.
- Project Management: Strategically managing project timelines, milestones, and resources to ensure timely completion.
- Ethical Research Practices: Navigating ethical considerations in data analysis, ensuring compliance with research ethics and privacy guidelines.
- Problem-Solving: Developing a systematic approach to problem-solving, especially in addressing challenges related to data quality and analysis.
- Business Understanding: Integrating business objectives with data mining goals, aligning technical insights with organizational needs for actionable outcomes.