This repo is the work summary of Corporación Favorita Grocery Sales Forecasting.
Under report folder, contain source code for reporting.
report/README.Rmd: R markdown to provide data insight of Corporación Favorita Grocery Sales data.
Further report detail under report folder.
Under process folder, contain source code for data processing.
model/data_integration.R: R Script to integratedtrain.csvandtest.csvwithstores.csv,items.csvandtransactions.csv.
Further processing detail under process folder.
Under model folder, contain source code for data modeling.
Further modeling detail under model folder.
Data source from Corporación Favorita, access the data from this competition Corporación Favorita Grocery Sales Forecasting in Kaggle.
That is 7 data files:
-
train.csv: Training data, includes the targetunit_salesbydate,store_nbr, anditem_nbrand a uniqueidto label rows,onpromotioncolumn tells whether thatitem_nbrwas on promotion for a specifieddateandstore_nbr. -
test.csv: Test data, with thedate,store_nbr,item_nbrcombinations that are to be predicted, along with theonpromotioninformation. -
stores.csv: Store metadata, includingcity,state,typeandcluster(grouping of similar stores). -
items.csv: Item metadata, includingfamily,class, andperishable(have a score weight of1.25; otherwise, the weight is1.0). -
transactions.csv: The count of sales transactions for eachdate,store_nbrcombination. Only included for the training data timeframe. -
oil.csv: Daily oil price, includes values during both the train and test data timeframe. -
holidays_events.csv: Holidays and Events, with metadata.