Skip to content

port functional version of partitions-M algorithm #116

@sritchie

Description

@sritchie

notes on a working version:

;;;;;; Partitions - [Algorithm M](https://archive.org/details/B-001-001-251/page/428/mode/2up)

;; In Algorithm M, the idea is to find the partitions of a list of items that
;; may contain duplicates. Within the algorithm, the collections are stored
;; as "multisets," which are maps that map items to their frequency. (keyval
;; pairs with a value of 0 are not included.) Note that in this algorithm, the
;; multisets not are stored as maps, but all multisets are stored together
;; across multiple vectors.

;; Here is what the internal vectors/variables will look like when the algorithm
;; is visiting the partition ([1 1 2 2 2] [1 2] [1]):

;; TODO make clear that the f row is totally wrong here??

;; c[i] =      1 2|1 2|1
;; v[i] =      2 3|1 1|1
;; u[i] =      4 4|2 1|1
;; ---------------------------
;;    i =      0 1 2 3 4 5
;; f[x]=i:     0   1   2 3
;; l = 2
;; n = 8
;; m = 2

;; You can think of (c,v) and (c,u) as the (keys,vals) pairs of two multisets.
;; u[i] represents how many c[i]'s were left before choosing the v values for the current partition.
;; (Note that v[i] could be 0 if u[i] is not 0.)
;; f[x] says where to begin looking in c, u, and v, to find information about the xth partition.
;; l is the number of partitions minus one.
;; n is the total amount of all items (including duplicates).
;; m is the total amount of distinct items.

;; During the algorithm, a and b are temporary variables that end up as f(l) and
;; f(l+1). In other words, they represent the boundaries of the "workspace" of
;; the most recently written-out partition.
;;
;; NOTE this now makes sense... they are the bounds of the current one that
;; we're working on?

(declare m2 m5 m6)

(def !counter (atom -1))

(defn- multiset-partitions-M
  ;; NOTE that this first arity is only ever called from the start.
  ([multiset r s] ;; M1
   ;; TODO we already know N... just pass it??
   (let [;; the number of distinct items.
         m (count multiset)

         ;; NOTE `f` consists of indices of the STARTS of each of the pieces of
         ;; the partition being considered. So here we start with 0, and
         ;; probably we could rewrite this shit to not have the final element.
         ;; But let's see how it goes.
         f [0 m]
         c []
         u []
         v []
         ;; NOTE that this is the initialization. These vectors will grow over
         ;; time, as new values are assoc'd into the next spots.
         ;;
         ;; NOTE `c` ends up as the keys, it's just the range. u and v end up as the values.
         [c u v] (loop [j 0, c c, u u, v v]
                   (if (= j m)
                     [c u v]
                     (let [j+1   (inc j)
                           j+1-v (multiset j+1)]
                       (recur j+1
                              (assoc c j j+1)
                              (assoc u j j+1-v)
                              (assoc v j j+1-v)))))]
     (reset! !counter -1)
     (multiset-partitions-M f c u v r s)))
  ;;`r` and `s` are the max and min bounds, respectively
  ([f c u v r s]
   (prn c u v)
   ;; "At this point we want to find all partitions of the vector u in the
   ;; current frame, into parts that are lexicographically < v. First we will
   ;; use v itself."

   ;; so in this loop, we are starting with the current frame, and writing a NEW
   ;; frame to the right.
   (let [n-blocks      (dec (count f))
         [f' c' u' v'] (m2 f c u v)]
     (cond
       ;; Did we march forward?
       (> (count f') (count f))
       (if (and r (= n-blocks r))
         (m5 f c u v r s)
         (recur f' c' u' v' r s))

       ;; Did we NOT march forward, but we don't have enough blocks yet?
       (and s (< n-blocks s))
       (do (swap! !counter inc)
           (m5 f c u v r s))

       :else
       (lazy-seq
        (let [part (for [[p q] (partition 2 1 f)]
                     ;; TODO recover the zero filter?
                     (zipmap (subvec c p q)
                             (subvec v p q)))]
          (cons part (m5 f c u v r s))))))))


;; WTF... okay, so
(defn- m2
  "Figure out the next partition conj-ed onto the end, AND choose the `v`!"
  [f c u v]
  ;; Remember, `a` and `b` are the bounds of the current stack frame. So we are
  ;; going to roll through the `subvec` from a to b-1, writing something new
  ;; from `b` onward. It would be more "functional" to build the new thing vs
  ;; writing it on the end, at least conj-ing it??
  ;;
  ;; NOTE we are setting the new row of `u` by subtracting the current `v` from
  ;; the current `u` to add a new partition. Then we set the new `v` by either
  ;; copying over the old one, or copying `u`.
  (let [a (peek (pop f))
        b (peek f)]
    ;; The assumption is obviously that you are going to hit a 0 through
    ;; subtraction ONLY, and then you are going to wipe out all of the zeros to
    ;; the right of that.
    ;;
    ;; Leading zeros should not matter.
    #_(assert (not (zero? (v a))))
    (loop [j a
           k b
           leading? true
           v-changed? false
           c c
           u u
           v v]
      (cond
        ;; This fix should work...
        #_#_(and leading?
                 (< j b)
                 (try (zero? (v j))
                      (catch Exception _ (prn f c u v) (throw _))))
        (recur (inc j) k true v-changed? c u v)

        (< j b)
        (let [vj (v j)
              uk (- (u j) vj)]
          (if (zero? uk)
            (recur (inc j) k false true c u v)
            (let [c (assoc c k (c j))
                  u (assoc u k uk)
                  v (assoc v k (if v-changed?
                                 uk
                                 (min uk vj)))
                  v-changed? (or v-changed? (< uk vj))]
              (recur (inc j) (inc k) false v-changed? c u v))))

        :else
        (let [f' (if (or (= k b)
                         (every? zero? (subvec v b k)))
                   f
                   (conj f k))]
          [f' c u v])))))

;; SO IF you have a prefix of zeros... then the first

;; So once `changed?` becomes true, it can never go unchanged again.

(defn- m5 [f c u v r s]
  (let [a  (peek (pop f))
        ;; TODO replace with `(count c)` don't keep final one!
        b  (peek f)
        l (- (count f) 2) ;; index of second-to-last elem
        ;; Also assumes no fully zero entries.
        j (loop [j (dec b)]
            ;; Go backwards to the first non-zero j entry, starting with `(dec
            ;; b)`.
            (if (zero? (v j))
              (recur (dec j))
              j))]
    ;; We are in the LAST partition of the bunch; given a limit, restricting
    ;; ourselves to the case where we are dealing with a set of a SINGLE
    ;; element... (= j a)
    ;;
    ;; The first predicate says, given that we are a single-elem set, if we
    ;; decrement, is `m2` going to make a bunch of stuff automatically that we
    ;; are going to end up skipping? Nice optimization...

    ;; The second predicate says, are we in a singleton? skip too.
    #_
    (when-not (= (v j) 1)
      (prn "remaining in set:   " (u j))
      (prn "budget:             " (- r l))
      (prn "if we dec m2 makes: " (> (quot (u j) (dec (v j)))
                                     (- r l)))
      ;; which equals...
      (prn "skip?               " (< (* (dec (v j)) (- r l))
                                     (u j))))
    ;; TODO SO what is the equivalent thing to do for minimums??
    ;;
    ;; Well, if you don't yet have enough, you can predict what is going to
    ;; happen and see if it leads to a place greater than the minimums. What he
    ;; is doing, I think, is reshuffling the remaining amounts so that he
    ;; DEFINITELY generates something beyond the mins.
    (cond
      ;; OKAY so the first check is, am I going to create a tail of singletons
      ;; that are going to be immediately trimmed off? I was trying to prevent
      ;; anything beyond that size from ever getting generated. but that is a
      ;; failure, I think, since you might need to generate a ton then trim off
      ;; the ends...
      (and (= j a)
           (or (= (v j) 1)
               (and r
                    (let [new-val (dec (v j))
                          uj      (u j)]
                      ;; (inc l) is the # of partitions
                      ;; (dec ...) is the number of new. cancels out of course.
                      (> (+ (inc l)
                            (dec (quot uj new-val)))
                         r)))))
      (if (zero? l)
        ()
        (recur (pop f)
               (subvec c 0 a)
               (subvec u 0 a)
               (subvec v 0 a)
               r
               s))

      :else
      ;; Decrement `v_j` and set all remaining elements in THIS partition to
      ;; `u`. NOTE: Every time you adjust `v_j` in any capacity, you set the
      ;; rest of the partition to `u`.
      (let [v      (update v j dec)
            prefix (subvec v 0 (inc j))
            v      (into prefix (subvec u (inc j)))
            amount-to-dec (if s
                            (let [diff-uv (apply + (for [i (range a (inc j))]
                                                     (- (u i) (v i))))
                                  min-partitions-left (- s (inc l))]
                              (prn "hi: " u v (- min-partitions-left diff-uv))
                              (max 0 (- min-partitions-left diff-uv)))
                            0)
            ;; So we are taking a single decrement... and we are then saying,
            ;;
            ;; 1. how many are left to allocate of these element types
            v (if (zero? amount-to-dec)
                v
                ;; go back from the end of the partition... now remember we have
                ;; a fixed `u` budget and we are deciding what set we'd like to
                ;; pull. USUALLY we just decrement, which ends up, say, pulling
                ;; out that single element into its own thing.
                ;;
                ;; Here we are doing that... but then we are additionally
                ;; modifying that `v`.
                ;;
                ;; The partition came in something like u=[2 2 2] and v=[2 1 0].
                ;; The DIFFERENCE here is going to be the next `row`, ie, the
                ;; set of leftovers available after this partition gets served.
                ;;
                ;; So when we subtract from there - `diff-uv` is the cardinality
                ;; of the block... like if we kept decrementing like `m5` we'd
                ;; be subtracting off these nonzero elems.
                ;;;;
                ;;
                (loop [k-1    (dec b)
                       v      v
                       amount amount-to-dec]
                  (let [vk (v k-1)]
                    (if (> amount vk)
                      (recur (dec k-1)
                             (assoc v k-1 0)
                             (- amount vk))
                      (assoc v k-1 (- vk amount))))))]
        ;; Here is the fix.
        (if (zero? (v a))
          (recur (pop f)
                 (subvec c 0 a)
                 (subvec u 0 a)
                 (subvec v 0 a)
                 r
                 s)
          (multiset-partitions-M f c u v r s))))))

(defn items->multiset
  "returns [ditems, multiset]"
  [items]
  (let [freqs  (frequencies items)
        ditems (into [] (distinct) items)]
    [ditems (into {} (map-indexed
                      (fn [i item]
                        (let [j (inc i)]
                          [j (freqs item)])))
                  ditems)]))

(defn multiset->items
  "Returns the items."
  [ditems mset]
  (into [] (mapcat
            (fn [[i n]]
              (repeat n (ditems (dec i)))))
        mset))

(defn- partitions-M
  [items & {from :min to :max}]
  (let [N (count items)]
    (if (= N 0)
      (if (<= (or from 0) 0 (or to 0))
        '(())
        ())
      ;; `from` and `to` only make sense inside the bounds.
      (let [from (if (and from (<= from 1)) nil from)
            to   (if (and to (>= to N)) nil to)]
        (cond
          ;; Check if the order is reversed?
          (not (<= 1 (or from 1) (or to N) N)) ()
          (= N 1) (list (list [(first items)]))
          :else
          (let [[ditems start-multiset] (items->multiset items)]
            (for [part (multiset-partitions-M start-multiset to from)]
              (for [multiset part]
                (multiset->items ditems multiset)))))))))

(defn partitions
  "All the lexicographic distinct partitions of items.
    Optionally pass in :min and/or :max to specify inclusive bounds on the number of parts the items can be split into."
  [items & args]
  (cond
    (= (count items) 0) (apply partitions-H items args)
    (all-different? items) (apply partitions-H items args)
    :else (apply partitions-M items args)))

(defn m2-test [f c u v i]
  (prn f c u v)
  (let [[f' c' u' v'] (m2 f c u v)]
    (cond (> i 4) [:failed]
          (> (count f') (count f))
          (recur f' c' u' v' (inc i))
          :else [f c u v])))

(defn checker [f c u v]
  (let [[f'] (m2-test f c u v 0)]
    (if (keyword? f')
      f'
      (- (- (count f') 2)
         (- (count f) 2)))))

#_
(checker [0 3]
         [1 2 3]
         [4 4 4]
         [0 1 2])


;; FINDINGS:
;;
;; First thing is that we might have to deal with leading zeros. Can we prevent that from happening, and NOT get back to a situation where we have one of those? I bet we can ignore that happening if we look at how Alex was shifting around the digits, and make sure there is never a leading zero. It makes sense, since we can't ever decrement... maybe?
;;
;; Next thing we need to do is make a better prediction for the max bound.

;;
;; Then I'll go back and try to figure out what to do about the min bound, and
;; what it is generating.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions