This project aims to predict overall writing quality by analyzing keystroke logs that capture detailed writing process features. The goal is to explore how typing behavior affects essay outcomes and to provide insights for writing instruction, automated evaluation systems, and intelligent tutoring systems.
The project was developed as part of a Kaggle competition, where we achieved a top 100 position with a public score of 0.5718 and a private score of 0.5761.
- Extracted temporal features from keystroke logs to effectively represent typing behavior.
- Scaled and normalized features for compatibility with machine learning and deep learning models.
- Built a hybrid pipeline combining boosting techniques and custom neural networks:
- Boosting Models:
- GradientBoostingRegressor
- AdaBoostRegressor with ElasticNet, SVR, RandomForest, and KNN base estimators
- Neural Networks:
- Custom architectures (Bottleneck DenseLayerNet, DenseNet) integrated with weighted sample training.
- Boosting Models:
- Performed hyperparameter tuning to enhance model performance, achieving a 15% boost in ensemble accuracy and a 0.92 R² score on validation data.
- Applied dynamic sample weighting using residual errors for iterative performance improvement in AdaBoost.
- Combined predictions from boosting models and neural networks using a fusion strategy for robust final predictions.
- Competition Rank: Top 100 on Kaggle leaderboard.
- Metrics:
- Public Score: 0.5718
- Private Score: 0.5761
- Validation R²: 0.92
- Programming Language: Python
- Libraries:
- TensorFlow/Keras
- Scikit-learn
- NumPy, Pandas
- Machine Learning Models:
- GradientBoostingRegressor, AdaBoostRegressor
- Deep Learning Models:
- Bottleneck DenseLayerNet, DenseNet
- Techniques:
- Ensemble Learning
- Model Fusion