Directory-Analyzer

A program that analyzes a directory and provides statistics such as the largest file, number of files and directories, most common words in text files, largest images, and vacant directories. It efficiently traverses the directory tree and processes file contents using system calls and C++ standard library functions. Watch a GIF of me interacting with it below!

Features

Largest File Detection: Identifies and reports the largest file in the directory, including its path and size.
Total File and Directory Count: Computes the number of files and directories recursively within the given directory.
Total File Size Calculation: Aggregates the size of all files in the directory.
Most Common Words in .txt Files: Words are defined as sequences of at least 5 alphabetic, case-insensitive characters. Sorted by frequency in descending order (ties broken alphabetically).
Largest Images Detection: Uses identify to detect image dimensions, and returns the top N largest images by pixel count, sorted in decending order (ties broken alphabetically).
Vacant Directory Identification: A vacant directory contains no files, even recursively; reports only top-level vacant directories (subdirectories of already vacant directories are excluded). Returned in alphabetical order.
Efficient System Calls: Minimizes the number of stat(), opendir(), readdir(), fopen(), and popen() system calls for optimized file traversal and processing.

Usage / Limitations

Running the Directory Analyzer:

N specifies how many of the most common words and largest images to return.
<directory_name> is the name of the directory to analyze.
Example usage (see the usage() function inside of main.cpp for more help):

./analyzeDir 5 ./test11

will return the 5 most common words and 5 largest images in directory test11, along with other stats.

Limitations:

Assumes that none of the file names nor directory names contain spaces or quotations.
Assumes each file path contains less than 4096 characters.
If multiple words have the same number of occurrences, or multiple images have the same number of pixels, they are returned in alphabetical order.
Top-level vacant directories are returned in alphabetical order.
Considers all files as potential images, regardless of their extension.
Considers only files with the .txt extension when calculating the most common words.
Calls to identify via popen() introduce some overhead especially with many files, as libc calls fork() twice for each popen() system call.
To benchmark performance, run the program twice to minimize filesystem caching effects:

time ./analyzeDir 10 ./test_directory               # first run
time ./analyzeDir 10 ./test_directory               # second run (more accurate)

If you want to startup the project on your local machine:

1. Download the code as a ZIP and unzip it or clone the repository:

git clone https://github.com/yourusername/Directory-Analyzer.git

Clone:

Download:

Unzip:

2. Open up a terminal, navigating into the repository:

cd Downloads/Directory-Analyzer[-main]        # -main will only be in the folder name if you downloaded as a ZIP

3. Compile the code:

make

4. Run the code:

./analyzeDir <N> <directory_name>

Example:

./analyzeDir 5 test11

Example Output:

--------------------------------------------------------------
Largest file:      "some_dir/largest_file.txt"
Largest file size: 10485760
Number of files:   342
Number of dirs:    45
Total file size:   123456789
Most common words from .txt files:
 - "example" x 32
 - "directory" x 25
 - "analyze" x 21
Vacant directories:
 - "empty_dir1"
 - "empty_dir2"
Largest images:
 - "images/img1.png" 1920x1080
 - "images/img2.jpg" 1280x720

Cleaning Up:

To remove the compiled binary and object files:

make clean

This will delete all .o files and the analyzeDir executable.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
tests		tests
.clang-format		.clang-format
.gitattributes		.gitattributes
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
analyzeDir.cpp		analyzeDir.cpp
analyzeDir.h		analyzeDir.h
main.cpp		main.cpp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Directory-Analyzer

Features

Usage / Limitations

Running the Directory Analyzer:

Limitations:

If you want to startup the project on your local machine:

1. Download the code as a ZIP and unzip it or clone the repository:

2. Open up a terminal, navigating into the repository:

3. Compile the code:

4. Run the code:

Cleaning Up:

About

Uh oh!

Releases

Uh oh!

Languages

prempreetbrar/Directory-Analyzer

Folders and files

Latest commit

History

Repository files navigation

Directory-Analyzer

Features

Usage / Limitations

Running the Directory Analyzer:

Limitations:

If you want to startup the project on your local machine:

1. Download the code as a ZIP and unzip it or clone the repository:

2. Open up a terminal, navigating into the repository:

3. Compile the code:

4. Run the code:

Cleaning Up:

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Uh oh!

Languages