This project analyzes the relationship between agricultural variables, such as harvested area, and climatic factors, such as rainfall, temperature, and humidity, on rice production in Sumatra. It employs multiple linear regression to develop a predictive model for rice production, providing actionable insights for policymakers and stakeholders.
- Background
- Data Overview
- Methodology
- Results
- Conclusion
- Recommendations
- Tools Used
- Link to Google Colab and Slide
Indonesia is one of the largest rice producers globally, with rice being the staple food for its population. This project aims to:
- Analyze the impact of climate variables on rice production.
- Develop a predictive model using multiple linear regression to assist in agricultural planning and resource allocation.
The dataset consists of 224 entries from different regions in Sumatra, including:
- Provinsi (Province)
- Tahun (Year)
- Produksi (Rice Production, kg)
- Luas Panen (Harvested Area, ha)
- Curah Hujan (Rainfall, mm)
- Kelembapan (Humidity, %)
- Suhu Rata-rata (Average Temperature, °C)
- Data Cleaning:
- Verified the dataset had no missing or duplicate values.
- Exploratory Data Analysis (EDA):
- Strong correlation observed between harvested area and rice production (r = 0.91).
- Weak correlations between climatic factors and production.
- Model Building:
- Applied multiple linear regression using:
- X1: Rainfall
- X2: Humidity
- X3: Average Temperature
- X4: Harvested Area
- Dataset split: 80% training, 20% testing.
- Applied multiple linear regression using:
- R-squared: 0.8698 (86.98% of the variance explained by the model).
- Mean Squared Error (MSE): 115,079,741,001.90.
- Key findings:
- Harvested area significantly impacts rice production positively.
- Rainfall has a small positive impact.
- Humidity and temperature negatively impact rice production.
The study highlights the critical role of agricultural and climatic variables in rice production. The findings emphasize the need for climate adaptation strategies and efficient resource management to ensure stable rice production in the face of changing environmental conditions.
- Collect additional data (e.g., soil quality, irrigation practices) for improved accuracy.
- Experiment with advanced models (e.g., Random Forest, Gradient Boosting).
- Develop strategies to mitigate the effects of climate change on agriculture.
The following tools and libraries were used in this project:
- Python:
pandas
: For data manipulation and analysis.numpy
: For numerical computations.matplotlib
: For creating static, animated, and interactive visualizations.seaborn
: For statistical data visualization.scikit-learn
:LinearRegression
: To build the multiple linear regression model.train_test_split
: To split the dataset into training and testing subsets.mean_squared_error
andr2_score
: For evaluating model performance.
- Google Colab: To run and share the notebook interactively.