A Java-based command-line application that benchmarks various file compression algorithms. The tool allows you to compress your own data using different algorithms, measure their performance, and compare results such as compression ratio and time taken.
This project is written for the Boğaziçi University Master of Science in Software Engineering SWE 510.01 Data Structures and Algorithms course to showcase the usage of multiple classes, inheritance, different access permissions, polymorphism, arrays, strings, and multiple constructors using Java.
The repository link of the project can be located via given link
- File Compression Test Tool
- Table of Contents
- Features
- Supported Compression Algorithms
- Prerequisites
- Installation
- Building the Project
- Running the Tool
- Usage Examples
- Sample Output
- Project Structure
- Additional Details
- Key Classes and Interfaces
- Polymorphism and Inheritance Usage
- Design Choices and Implementation Details
- Extensibility
- Further Developments and Improvements
- Benchmark Multiple Algorithms: Compress files using GZip, BZip2, LZ4, and Run-Length Encoding (RLE).
- Extensible Design: Easily add new compression algorithms and options.
- Detailed Output: Displays algorithm name, time taken, original size, compressed size, and compression ratio.
- Verbose Mode: Enable verbose output to see detailed processing information.
- GZip : Currently not implemented. Only added as a mock up.
- BZip2 : Currently not implemented. Only added as a mock up.
- LZ4 : Currently not implemented. Only added as a mock up.
- Run-Length Encoding (RLE) : Implemented fully.
- Java Development Kit (JDK) 8 or higher: Download JDK
Clone the Repository
git clone https://github.com/Yusufss4/file-compression-test-tool
cd file-compression-test-tool
Use Java compiler to build the project. The following command compiles all Java files in the project:
javac src/main/java/com/bu/compression/*.java src/main/java/com/bu/compression compressors/*.java
Execute the JAR file using the java -jar
command, followed by the options and input file.
Basic Syntax
Move to the project directory before running the tool.
cd file-compression-test-tool/src/main/java
Run the tool with the following command:
java com.bu.compression.Main [options] <inputfile>
Options
--help
: Show the help message.--verbose
: Enable verbose output.--gzip
: Use GZip compression algorithm.--bzip2
: Use BZip2 compression algorithm.--lz4
: Use LZ4 compression algorithm.--rle
: Use Run-Length Encoding compression algorithm.
java com.bu.compression.Main --gzip sample.txt
java com.bu.compression.Main --verbose --gzip --bzip2 --lz4 --rle sample.txt
java com.bu.compression.Main --help
Running compressor: GZip
Algorithm: GZip
Lossless: true
Compression Level: 7
Time taken: 120 ms
Original Size: 204800 bytes
Compressed Size: 102400 bytes
Compression Ratio: 50.00%
Running compressor: LZ4
Algorithm: LZ4
Lossless: true
Compression Level: 5
Time taken: 80 ms
Original Size: 204800 bytes
Compressed Size: 153600 bytes
Compression Ratio: 75.00%
FileCompressionTestTool/
├── pom.xml // Currently it is not implemented. Only added for future Maven support.
├── README.md
├── src/
│ ├── main/
│ │ ├── java/
│ │ │ └── com/
│ │ │ └── bu/
│ │ │ └── compression/
│ │ │ ├── AlgorithmOptionParser.java
│ │ │ ├── BenchmarkResult.java
│ │ │ ├── CommandLineParser.java
│ │ │ ├── CompressionTestTool.java
│ │ │ ├── Main.java
│ │ │ ├── compressors/
│ │ │ │ ├── Compressor.java
│ │ │ │ ├── GZipCompressor.java
│ │ │ │ ├── BZip2Compressor.java
│ │ │ │ ├── LZ4Compressor.java
│ │ │ │ └── RunLengthEncodingCompressor.java
│ │ │ └── parsers/
│ │ │ ├── GZipParser.java
│ │ │ ├── BZip2Parser.java
│ │ │ ├── LZ4Parser.java
│ │ │ └── RLEParser.java
│ └── test/
│ └── java/
│ └── com/
│ └── bu/
│ └── compression/
│ └── (Unit test classes will go here)
Compressor
(Abstract Class)- Concrete Compressor Classes:
GZipCompressor
BZip2Compressor
LZ4Compressor
RunLengthEncodingCompressor
CommandLineParser
(Class)CompressionTestTool
(Class)BenchmarkResult
(Class)
Inheritance allows us to define a base class (Compressor
) and have multiple subclasses inherit from it. This enables code reuse and establishes a common interface for all compression algorithms.
- Definition: An abstract class that defines the common interface and shared properties for all compression algorithms.
- Key Methods and Properties:
compress(File inputFile, File outputFile)
: Abstract method to be implemented by subclasses. Every compressor must have acompress
method.- Common properties like
name
,extension
,description
,isLossless
, etc.
Each concrete compressor class extends the Compressor
class and provides a specific implementation of the compress
method.
-
GZipCompressor
- Extends
Compressor
. - Implements the
compress
method using mocked GZip compression. - May have additional properties like
compressionLevel
via overloaded constructors.
- Extends
-
BZip2Compressor
- Extends
Compressor
. - Implements the
compress
method using mocked BZip2 compression.
- Extends
-
LZ4Compressor
- Extends
Compressor
. - Implements the
compress
method using mocked LZ4 compression.
- Extends
-
RunLengthEncodingCompressor
- Extends
Compressor
. - Implements the
compress
method using Run-Length Encoding.
- Extends
Polymorphism allows us to treat objects of different classes that share a common superclass as objects of the superclass type. This enables writing code that works with superclass types but operates on subclass objects. It really simplifies the code and makes it more maintainable.
- List of Compressors: In the
CommandLineParser
andCompressionTestTool
classes, we use aList<Compressor>
to hold instances of various compressor subclasses.
// In CommandLineParser
private List<Compressor> compressors = new ArrayList<>();
// In CompressionTestTool
for (final Compressor compressor : compressors) {
compressor.compress(inputFile, outputFile);
}
- Dynamic Method Dispatch: When we call
compressor.compress(inputFile, outputFile);
, the JVM determines at runtime whichcompress
method to invoke based on the actual object type (e.g.,GZipCompressor
,BZip2Compressor
, etc.).
- Extensibility: New compressor types can be added without modifying the code that uses the
Compressor
interface. - Maintainability: Code that operates on
Compressor
objects doesn't need to know about the specifics of each subclass.
// compressors list contains various Compressor objects
List<Compressor> compressors = Arrays.asList(
new GZipCompressor(),
new BZip2Compressor(),
new LZ4Compressor()
);
for (Compressor compressor : compressors) {
compressor.compress(inputFile, outputFile);
// The correct compress method is called based on the object's actual type
}
- Package Organization: Classes are organized into packages (
compressors
,parsers
, etc.) to group related functionality. - Interfaces and Abstract Classes: Use of interfaces and abstract classes to define contracts and common behavior.
- Encapsulation: Implementation details are hidden, exposing only necessary interfaces.
- Package Structure: Classes are organized into packages based on functionality (e.g.,
compressors
,parsers
). Also the folder structure is organized in a way to support Maven project structure. - JavaDoc Comments: JavaDoc comments are used to document classes, methods, and fields. This could help in generating documentation and understanding the code.
In the Compressor
class, properties are encapsulated and accessed through getter methods. This ensures that the internal state of the class is protected and can only be modified through subclasses. That is why the properties are declared as protected
and accessed through getter methods.
public abstract class Compressor {
protected final String name;
protected final String extension;
protected final String description;
protected final boolean isLossless;
protected final int compressionLevel;
protected final String version;
protected final String author;
protected final Map<String, String> settings;
But for example in the BenchmarkResult
class, the properties are declared as private
and only can be accessed through getter methods. This ensures that the internal state of the class can be only be set via the constructor. Otherwise the user could accidentally change the values of the properties and mix the algorithms results.
public class BenchmarkResult {
private String algorithmName;
private long timeTaken; // in milliseconds
private long originalSize; // in bytes
private long compressedSize; // in bytes
private boolean isLossless;
private int compressionLevel;
In our program we did not use the 'public' access modifier for the properties. We used 'protected' and 'private' access modifiers to hide the implementation details and to prevent the user from changing the values of the properties.
Design Choice:
The 'Compressor' abstract class has overloaded constructors to allow different compression algorithms to be created with different settings. For example, the 'GZipCompressor' class has a constructor that takes a compression level as an argument. This allows to add new algorithms with specific settings without modifying the existing abstract class.
/**
* Constructs a Compressor with the specified name and extension.
*
* @param name the name of the compression algorithm
* @param extension the file extension for the compressed file
*/
public Compressor(String name, String extension) {
this(name, extension, "", true, 5, "1.0", "Unknown", new HashMap<>());
}
/**
* Constructs a Compressor with the specified name, extension, and description.
*
* @param name the name of the compression algorithm
* @param extension the file extension for the compressed file
* @param description a brief description of the algorithm
*/
public Compressor(String name, String extension, String description) {
this(name, extension, description, true, 5, "1.0", "Unknown", new HashMap<>());
}
/**
* Constructs a Compressor with the specified name, extension, and compression
* level.
*
* @param name the name of the compression algorithm
* @param extension the file extension for the compressed file
* @param compressionLevel the level of compression (1-9)
*/
public Compressor(String name, String extension, int compressionLevel) {
this(name, extension, "", true, compressionLevel, "1.0"
To add a new compression algorithm:
-
Create a New Compressor Class
- Extend the
Compressor
abstract class. - Implement the
compress
method.
- Extend the
-
Register the Parser
- Add the new parser to the
algorithmParsers
map inCommandLineParser
.
- Add the new parser to the
- Implement Compression Algorithms: Implement the GZip, BZip2, and LZ4 compression algorithms. Currently they are only added as mock up to showcase the extensibility of the program.
- Add Support Algorithm Options: Add support for specifying algorithm-specific options like compression level in the command line parser. Currently they are supported class-wise but not in the command line.
- Unit Tests: Write unit tests to ensure the correctness of the compression algorithms and benchmarking logic.
- Maven Support: Add Maven support to manage dependencies and build the project.
- More Compression Algorithms: Add more compression algorithms to compare and benchmark.
Thank you for using the File Compression Test Tool!