The project attempts to provide a solution for the Carrefour Phenix Challenge. The application runs calculations of sales volume and sales amount by store.
- The files do not contain duplicate data
- The structures are safe. The calculation tasks fail if there are some parsing errors
- For each Transaction file we have corresponding Product files.
The output results are described bellow:
It takes Transaction file of the day and aggregate sales volume of each product by store. The result give only top 100 best sales by store and is saved as :
top_100_ventes_${store_uuid}_YYYYMMDD.data
The structure of the result file :
productId|salesVolume
It takes Transaction and Product files of the last 7 days (starting from today date). The Transaction files are then joined to Product files using the date in file name (fileDate
) and the storeUuid
in order to calculate the turnover generated by each product in each store.
The result give only top 100 best sales by store and is saved as :
top_100_ca_${store_uuid}_YYYYMMDD-J7.data
The structure of the result file :
productId|salesAmount
To run this application you must have Java installed on your machine with version 8
Java 8
Get the packaged JAR in the target folder and execute it as follow :
phenix-challenge 1.0.0
Usage: phenix-challenge [options]
--input.data.folder <value>
input data folder
--output.result.folder <value>
output result folder
--input.data.folder
argement must point to folder that contains Transaction and Product data files.
--output.result.folder
argument must contain the path to a valid folder where calculation results will be saved
Both arguments are mandatory.
Example :
java -jar phenix-challenge-1.0.0-RC.jar --input.data.folder =/path/to/data/folder --output.result.folder=/path/to/folder
In order to retrieve and parse input files, we read all the properties needed from resource config file : configs.yaml
.
If files structure changes one must modify that file in order to run the tasks.
Default configuration in configs.yaml
- fileType: Transaction
fileNamePattern: "(transactions_)(${file_date})(.data)"
fileDatePattern: "yyyyMMdd"
fileProperties:
delimiter: "|"
hasHeader: false
quote: "\""
escape: "\\"
charset: "UTF-8"
- fileType: Product
fileNamePattern: "(reference_prod-)(${store_uuid})(_)(${file_date})(.data)"
fileDatePattern: "yyyyMMdd"
fileProperties:
delimiter: "|"
hasHeader: false
quote: "\""
escape: "\\"
charset: "UTF-8"
- fileType: SalesResult
fileNamePattern: "top_100_${measure_type}_${aggregation_level}_${file_date}${delta}.data"
fileDatePattern: "yyyyMMdd"
fileProperties:
delimiter: "|"
hasHeader: false
quote: "\""
escape: "\\"
charset: "UTF-8"
Clone this repo to your local machine using :
git clone https://github.com/blackbishop313/phenix-challenge
Requirements :
- JVM
- Scala (v 2.11 or above)
- Maven
Run test with maven command
mvn test
- Maven - Dependency Management
- complete tests
- improve memory usage (actually using Scala Stream to handle large data files could cause memory problems due to the fact that Stream memorize values).
- improve file importing (add escaping, quotes, handle line parsing errors)
- Mounir Hamoudi - blackbishop313
This project is licensed under the Apache License- see the LICENSE.md file for details