This repository contains various data analysis projects implemented using the R programming language. Each project aims to explore a specific problem or relationship using different datasets and statistical/visualization techniques. The goal is to showcase my abilities in data processing, analysis, and visualization in R.
Below is a brief description of each project found in this repository:
1. Analyzing the Relationship Between CPI and HDI (Analyzing-the-Relationship-Between--CPI--and---HDI-.pdf
)
- Description: This project visualizes the relationship between the Corruption Perceptions Index (CPI) and the Human Development Index (HDI). It investigates the potential connections between countries' perceptions of corruption and their human development levels.
- Techniques Used: Data loading, scatter plot with
ggplot2
and smooth line, data point labeling withggrepel
.
- Description: This study demonstrates applications of Analysis of Variance (ANOVA). Both one-way and two-way ANOVA tests are used to examine the relationship between diamond prices and factors like cut quality and color, using the
diamonds
dataset. Tukey HSD post-hoc test is also applied to identify specific group differences after ANOVA. - Techniques Used: Box plots, One-Way ANOVA, Two-Way ANOVA, Post-hoc tests (Tukey HSD).
- Description: This project performs comprehensive descriptive statistics and exploratory data analysis on the
mtcars
dataset. It explores relationships between variables, summarizes statistics by groups, and examines basic distribution characteristics. - Techniques Used: Data manipulation with
dplyr
, visualizing multivariate relationships withGGally
(ggpairs
), group-wise descriptive statistics with thepsych
package, Kruskal-Wallis test.
- Description: This project introduces the functions (
d
,p
,q
,r
) for fundamental statistical distributions (Uniform, Binomial, Normal) in R and visually demonstrates the Central Limit Theorem (CLT) through simulation. - Techniques Used: Statistical distribution functions, simulation, histogram and density curve plotting with
ggplot2
.
5. Evaluation of Variables with Hypothesis Tests (Evaluation-of-Variables-with-Hypothesis-Tests.pdf
)
- Description: This project illustrates how different types of hypothesis tests are applied in R. Common tests such as one-sample t-test, dependent/independent two-sample t-tests, proportion test, and chi-square test are explained with examples.
- Techniques Used: One-sample t-test, Two-sample t-tests,
prop.test
(proportion test),chisq.test
(chi-square test).
- Description: Using electricity price time series data, this project creates a heatmap to visualize the intensity of price changes on a yearly, weekly, and daily basis. This helps in identifying patterns and seasonal trends within time series data.
- Techniques Used: Time series data processing, line plots with
ggplot2
, heatmap creation withgeom_tile
,viridis
color palette,facet_wrap
for year-wise separation.
- Description: This study examines the basic features and measures of dispersion of datasets (descriptive statistics with the
stargazer
package), correlation, and covariance between variables. Additionally, skewness and kurtosis values are calculated and visualized. - Techniques Used: Summary statistics tables with
stargazer
, correlation, covariance, partial correlation, skewness and kurtosis calculations and visualizations.
- Description: This project analyzes global port freight traffic data and visualizes it on a geographical map. The busiest ports are identified, and their locations and traffic volumes are displayed on the map.
- Techniques Used: Loading external data, data manipulation with
dplyr
, geographical visualization withggplot2
andmap_data
.
Each project is created as an R Markdown (.Rmd
) file (the PDF outputs are compiled versions of these .Rmd
files). To inspect, modify, or apply the source code of the projects with your own data, you can use the .Rmd
files.
- Clone the Repository:
git clone [https://github.com/Arif-F-Aytkn/R-Data-Analysis.git](https://github.com/Arif-F-Aytkn/R-Data-Analysis.git)
- Open in RStudio: Open the
.Rmd
file you are interested in using RStudio. - Install Required Packages: The packages used in each project are specified at the beginning. You might need to install them using
install.packages("package_name")
. - Run the Code: You can
knit
the.Rmd
file or run the code blocks individually to see the outputs and analyses.
These projects were created to enhance my R data analysis skills. In the future, I aim to deepen the analyses, make assumption checks more explicit, and enrich the interpretation of the outputs.
Arif Furkan AYTKN