Skip to content

A program that analyzes directory contents using system calls, identifying file sizes, most common words, largest images, and vacant directories.

Notifications You must be signed in to change notification settings

prempreetbrar/Directory-Analyzer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Directory-Analyzer

A program that analyzes a directory and provides statistics such as the largest file, number of files and directories, most common words in text files, largest images, and vacant directories. It efficiently traverses the directory tree and processes file contents using system calls and C++ standard library functions. Watch a GIF of me interacting with it below!

run  

Features

  • Largest File Detection: Identifies and reports the largest file in the directory, including its path and size.
  • Total File and Directory Count: Computes the number of files and directories recursively within the given directory.
  • Total File Size Calculation: Aggregates the size of all files in the directory.
  • Most Common Words in .txt Files: Words are defined as sequences of at least 5 alphabetic, case-insensitive characters. Sorted by frequency in descending order (ties broken alphabetically).
  • Largest Images Detection: Uses identify to detect image dimensions, and returns the top N largest images by pixel count, sorted in decending order (ties broken alphabetically).
  • Vacant Directory Identification: A vacant directory contains no files, even recursively; reports only top-level vacant directories (subdirectories of already vacant directories are excluded). Returned in alphabetical order.
  • Efficient System Calls: Minimizes the number of stat(), opendir(), readdir(), fopen(), and popen() system calls for optimized file traversal and processing.

Usage / Limitations

Running the Directory Analyzer:

  • N specifies how many of the most common words and largest images to return.
  • <directory_name> is the name of the directory to analyze.
  • Example usage (see the usage() function inside of main.cpp for more help):
./analyzeDir 5 ./test11

will return the 5 most common words and 5 largest images in directory test11, along with other stats.

Limitations:

  • Assumes that none of the file names nor directory names contain spaces or quotations.
  • Assumes each file path contains less than 4096 characters.
  • If multiple words have the same number of occurrences, or multiple images have the same number of pixels, they are returned in alphabetical order.
  • Top-level vacant directories are returned in alphabetical order.
  • Considers all files as potential images, regardless of their extension.
  • Considers only files with the .txt extension when calculating the most common words.
  • Calls to identify via popen() introduce some overhead especially with many files, as libc calls fork() twice for each popen() system call.
  • To benchmark performance, run the program twice to minimize filesystem caching effects:
time ./analyzeDir 10 ./test_directory               # first run
time ./analyzeDir 10 ./test_directory               # second run (more accurate)

If you want to startup the project on your local machine:

1. Download the code as a ZIP and unzip it or clone the repository:

git clone https://github.com/yourusername/Directory-Analyzer.git

Clone:

clone

Download:

download

Unzip:

unzip

2. Open up a terminal, navigating into the repository:

cd Downloads/Directory-Analyzer[-main]        # -main will only be in the folder name if you downloaded as a ZIP



terminal  

3. Compile the code:

make



compile  

4. Run the code:

./analyzeDir <N> <directory_name>

Example:

./analyzeDir 5 test11



run

Example Output:

--------------------------------------------------------------
Largest file:      "some_dir/largest_file.txt"
Largest file size: 10485760
Number of files:   342
Number of dirs:    45
Total file size:   123456789
Most common words from .txt files:
 - "example" x 32
 - "directory" x 25
 - "analyze" x 21
Vacant directories:
 - "empty_dir1"
 - "empty_dir2"
Largest images:
 - "images/img1.png" 1920x1080
 - "images/img2.jpg" 1280x720

Cleaning Up:

To remove the compiled binary and object files:

make clean



clean  

This will delete all .o files and the analyzeDir executable.

About

A program that analyzes directory contents using system calls, identifying file sizes, most common words, largest images, and vacant directories.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published