This is the code for the paper Finimizers: Variable-length bounded-frequency minimizers for k-mer sets by J. N. Alanko, E. Biagi, S. J. Puglisi.
Let 
First, pull the submodules with:
git submodule update --init --recursive
Then, go the SBWT submodule and build it using the instructions in the submodule. And compile the experiments with:
cd SBWT/build
cmake .. -DCMAKE_C_COMPILER=$(which gcc-10) -DCMAKE_CXX_COMPILER=$(which g++-10) -D MAX_KMER_LENGTH=250
make -j4
cd ../..
Select the desired branch: main: single index, double: double index (+reverse complements). The following instructions are for the main branch.
make benchmark --always-make CXX=g++-10
The code takes a plain-matrix SBWT file as input generated from canonical unitigs. You can generate one by running:
./SBWT/build/bin/sbwt build -i <unitigs.fna> -o <index.sbwt> -k <31> 
Then, you can build the Finimizers index with:
./benchmark build-fmin -o <finimizer-index>  -i <index.sbwt> -u <unitigs.fna> [--lcs LCS.sdsl] [-t 1] [--type rarest] 
Usage:
build-fmin [OPTION...]
  -o, --out-file arg    Output index filename prefix.
  -i, --index-file arg  SBWT file. This has to be a binary matrix.
  -u, --in-file arg     The unitigs in FASTA or FASTQ format, possibly gzipped.
                        Multi-line FASTQ is not supported.
      --type arg        Decide which streaming search type you prefer. 
                        Available types:  rarest shortest verify.
                        The latter two only provide some stats. (default: rarest)
  -t arg                Maximum finimizer frequency (default: 1)
      --lcs arg         Provide in input the LCS file if available. 
                        (default: "")
  -h, --help            Print usage
You can query 
./benchmark search-fmin -o <out-file>  -i <finimizer-index> -q <query-file.fa> 
Usage:
  search-fmin [OPTION...]
  -o, --out-file arg    Output filename, or stdout if not given.
  -i, --index-file arg  Index filename prefix.
  -q, --query-file arg  The query in FASTA or FASTQ format, possibly gzipped.
                        Multi-line FASTQ is not supported.
  -h, --help            Print usage
Support for localization queries is currently available only for "rarest". The output for each kmer is a pair (unitig id, index) or (-1,-1) if not found.
The code works with the DNA alphabet = {A,C,G,T}.
A DSPSS is required as input to build the SBWT index. You can obtain canonical unitigs or eulertigs using ggcat.
ggcat build --min-multiplicity 1 -k <k> --output-file <unitigs.fna> --threads-count 48 <input.fna>
To reduce the space usage it is advisable to flip the unitigs with unitig-flipper.
unitig_flipper --input <unitigs.fna> --output <flipped_unitigs.fna> -k <k>