This is a C++ project created under the premise that we want to be able to identify all possible full sequences of words that exist in a whitespace-less sequence of characters: Project Instructions.txt
I am offering this code unlicensed (as it's simply just a project for a class cancelled last minute due to air quality issues), but if you do use any part of my code/concepts, please do note the original source in your file (a link to this page will suffice). Thanks!
Run in a UNIX shell like Bash or Terminal.
g++ -o file_recreation file_recreation.cpp
g++ -o file_recreation2 file_recreation2.cpp
Second and third arguments should be the filepath to the dictionary .txt file and the filepath to a "compressed" input .txt file.
./file_recreation dictionary.txt Examples/Ex1.txt
./file_recreation2 dictionary2.txt Examples/Ex2.txt
Files outputted represent every possible "original file" based on the dictionary .txt file used to define what constitutes as a "word", found in a generated Output directory.
The rankings.txt file can be used (also found in the Output directory) to get the ranks of each file's content. In addition, the console will output the top (up to 3) lowest ranked files (which represent the 3 files with the highest probability of being the original file).