Skip to content

A data science competition organised by Consulting & Anlaytics club IIT seeing more than 2000+ participants , Our team landed up with Rank 32.

Notifications You must be signed in to change notification settings

mandeepnikhil/Cascade-Cup-20

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

Cascade-Cup-20

A data science competition organised by Consulting & Anlaytics club IIT Guwahati seeing more than 2000+ participants and 2 round of data science hackathons, Our team landed up with Rank 32.

Problem Overview

Round 1:

Understanding the customers’ intentions can help to improve the journey, e.g., by taking shortcuts or giving recommendations to improve the overall experience is very important. With the extent of development in the field of machine learning research, personalization of services has become very common. Typically, a user’s intention on a Web site can be understood by looking at their past interactions. In concrete terms, this means that a user leaves a sequence of events about the history of his page views and interactions. An event can be that a user makes a search query, calls up an article page or receives an e-mail. This data forms the basis for working with the following techniques. Therefore, the first step is to collect or extract this data. This step has been done by Trell.

In this world of big data, Trell wants you to use the data to predict the age group of their users based on their activity on social media activities. This will help them to divide their huge userbase and cater differently to each of them. Given this huge dataset, predict the age group of the users, the evaluation metric for the competition is the Weighted F1 score.

The Machine learning model you develop will help Trell provide better experience to their users by giving them a better user age specific content which people might find more relatable

About the data

There are 25 independent features and 1 dependent in the dataset. You can download the datasets from the given links:

Training Data: https://dphi.s3.ap-south-1.amazonaws.com/dataset/train_age_dataset.csv Test Data: https://dphi.s3.ap-south-1.amazonaws.com/dataset/test_age_dataset.csv


Round 2:

Absenteeism is a habitual pattern of absence from a duty or obligation without good reason. Generally, absenteeism is unplanned absences. If a workplace exhibits high degree of absenteeism there is a problem. It has been viewed as an indicator of poor individual performance, as well as a breach of an implicit contract between employee and employer. This is a 740 x 22 tabular dataset, what each column represents is given in the data dictionary. Your task is to prepare an extensive report by preforming analysis on the dataset. Provide as many meaningful inferences and correlations using graphs as you can.

Data Set Name: Analysis of absenteeism in a company Abstract: The database was created with records of absenteeism at work from July 2007 to July 2010 at a courier company in Brazil. Attribute Information:

  1. Individual identification (ID)
  2. Reason for absence (ICD). Absences attested by the International Code of Diseases (ICD) stratified into 21 categories as follows: 1 Certain infectious and parasitic diseases 2 Neoplasms 3 Diseases of the blood and blood-forming organs and certain disorders involving the immune mechanism 4 Endocrine, nutritional and metabolic diseases 5 Mental and behavioral disorders 6 Diseases of the nervous system 7 Diseases of the eye and adnexa 8 Diseases of the ear and mastoid process 9 Diseases of the circulatory system 10 Diseases of the respiratory system 11 Diseases of the digestive system 12 Diseases of the skin and subcutaneous tissue 13 Diseases of the musculoskeletal system and connective tissue 14 Diseases of the genitourinary system 15 Pregnancy, childbirth and the puerperium 16 Certain conditions originating in the perinatal period 17 Congenital malformations, deformations and chromosomal abnormalities 18 Symptoms, signs and abnormal clinical and laboratory findings, not elsewhere classified 19 Injury, poisoning and certain other consequences of external causes 20 External causes of morbidity and mortality 21 Factors influencing health status and contact with health services. And 7 categories without (ICD), patient follow-up (22), medical consultation (23), blood donation (24), laboratory examination (25), unjustified absence (26), physiotherapy (27), dental consultation (28).
  3. Month of absence
  4. Day of the week (Monday (2), Tuesday (3), Wednesday (4), Thursday (5), Friday (6))
  5. Seasons (summer (1), autumn (2), winter (3), spring (4))
  6. Transportation expense
  7. Distance from Residence to Work (kilometers)
  8. Service time
  9. Age
  10. Work load Average/day
  11. Hit target
  12. Disciplinary failure (yes=1; no=0)
  13. Education (high school (1), graduate (2), postgraduate (3), master and doctor (4))
  14. Son (number of children)
  15. Social drinker (yes=1; no=0)
  16. Social smoker (yes=1; no=0)
  17. Pet (number of pet)
  18. Weight
  19. Height
  20. Body mass index
  21. Absenteeism time in hours (target)

Data link: https://drive.google.com/drive/folders/1Ts8S-OTE6FU8ikhCudO3k21EEjaOkORM?usp=sharing Our Report: https://drive.google.com/file/d/1oFWtR4PBBRlhEgdFAc0-mTkVeWZ-yk4P/view?usp=sharing

About

A data science competition organised by Consulting & Anlaytics club IIT seeing more than 2000+ participants , Our team landed up with Rank 32.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published