A program that analyzes a directory and provides statistics such as the largest file, number of files and directories, most common words in text files, largest images, and vacant directories. It efficiently traverses the directory tree and processes file contents using system calls and C++ standard library functions. Watch a GIF of me interacting with it below!
- Largest File Detection: Identifies and reports the largest file in the directory, including its path and size.
- Total File and Directory Count: Computes the number of files and directories recursively within the given directory.
- Total File Size Calculation: Aggregates the size of all files in the directory.
- Most Common Words in
.txt
Files: Words are defined as sequences of at least 5 alphabetic, case-insensitive characters. Sorted by frequency in descending order (ties broken alphabetically). - Largest Images Detection: Uses
identify
to detect image dimensions, and returns the topN
largest images by pixel count, sorted in decending order (ties broken alphabetically). - Vacant Directory Identification: A vacant directory contains no files, even recursively; reports only top-level vacant directories (subdirectories of already vacant directories are excluded). Returned in alphabetical order.
- Efficient System Calls: Minimizes the number of
stat()
,opendir()
,readdir()
,fopen()
, andpopen()
system calls for optimized file traversal and processing.
N
specifies how many of the most common words and largest images to return.<directory_name>
is the name of the directory to analyze.- Example usage (see the
usage()
function inside ofmain.cpp
for more help):
./analyzeDir 5 ./test11
will return the 5 most common words and 5 largest images in directory test11
, along with other stats.
- Assumes that none of the file names nor directory names contain spaces or quotations.
- Assumes each file path contains less than 4096 characters.
- If multiple words have the same number of occurrences, or multiple images have the same number of pixels, they are returned in alphabetical order.
- Top-level vacant directories are returned in alphabetical order.
- Considers all files as potential images, regardless of their extension.
- Considers only files with the
.txt
extension when calculating the most common words. - Calls to
identify
viapopen()
introduce some overhead especially with many files, aslibc
callsfork()
twice for eachpopen()
system call. - To benchmark performance, run the program twice to minimize filesystem caching effects:
time ./analyzeDir 10 ./test_directory # first run
time ./analyzeDir 10 ./test_directory # second run (more accurate)
git clone https://github.com/yourusername/Directory-Analyzer.git
cd Downloads/Directory-Analyzer[-main] # -main will only be in the folder name if you downloaded as a ZIP
make
./analyzeDir <N> <directory_name>
Example:
./analyzeDir 5 test11
Example Output:
--------------------------------------------------------------
Largest file: "some_dir/largest_file.txt"
Largest file size: 10485760
Number of files: 342
Number of dirs: 45
Total file size: 123456789
Most common words from .txt files:
- "example" x 32
- "directory" x 25
- "analyze" x 21
Vacant directories:
- "empty_dir1"
- "empty_dir2"
Largest images:
- "images/img1.png" 1920x1080
- "images/img2.jpg" 1280x720
To remove the compiled binary and object files:
make clean
This will delete all .o
files and the analyzeDir
executable.