## rotational duplicates

APL-related discussions - a stream of APL consciousness.
Not sure where to start a discussion ? Here's the place to be
Forum rules
This forum is for discussing APL-related issues. If you think that the subject is off-topic, then the Chat forum is probably a better place for your thoughts !

### rotational duplicates

Skip Cave posted an interesting puzzle on the J Programming Forum on 2020-05-16. A rewording of the problem is as follows:
Given an integer matrix, find the rows which are rotational duplicates of each other. Two rows r and s are rotational duplicates if r≡k⌽s for some integer k.

Some solutions are presented in the following message.
.
.
.
.
.
.
.
.
.
.
.
.
.
Roger|Dyalog

Posts: 208
Joined: Thu Jul 28, 2011 10:53 am

### Re: rotational duplicates

Given an integer matrix, find the rows which are rotational duplicates of each other. Two rows r and s are rotational duplicates if r≡k⌽s for some integer k.

The crux of the problem is to find a signature (key) which uniquely identifies the rotational variants of a vector. The groups of rotational duplicates obtain by (sig mat){⊂⍵}⌸mat. The solutions here use as signature the lexicographically least (first in a sort) of the possible rotations of a vector.

Test Data

`      odometer← ⊢ ⊤⍤1 0 ⍳∘(×/)      x←,⍨ ⍪⍨ odometer 5⍴5      ⍴x6250 10      4↑x0 0 0 0 0 0 0 0 0 00 0 0 0 1 0 0 0 0 10 0 0 0 2 0 0 0 0 20 0 0 0 3 0 0 0 0 3      ¯4↑x4 4 4 4 1 4 4 4 4 14 4 4 4 2 4 4 4 4 24 4 4 4 3 4 4 4 4 34 4 4 4 4 4 4 4 4 4`

sig0

`      sig0← (⊃∘⍋ ⌷ ⊢) ∘ (⍳∘≢ ⌽⍤0 1 ⊢) ⍤1      ⍴ sig0 x6250 10      ⍴ q← ∪ sig0 x629 10      (10↑q) (¯10↑q)┌───────────────────┬───────────────────┐│0 0 0 0 0 0 0 0 0 0│2 4 4 4 3 2 4 4 4 3││0 0 0 0 1 0 0 0 0 1│2 4 4 4 4 2 4 4 4 4││0 0 0 0 2 0 0 0 0 2│3 3 3 3 3 3 3 3 3 3││0 0 0 0 3 0 0 0 0 3│3 3 3 3 4 3 3 3 3 4││0 0 0 0 4 0 0 0 0 4│3 3 3 4 4 3 3 3 4 4││0 0 0 1 1 0 0 0 1 1│3 3 4 3 4 3 3 4 3 4││0 0 0 1 2 0 0 0 1 2│3 3 4 4 4 3 3 4 4 4││0 0 0 1 3 0 0 0 1 3│3 4 3 4 4 3 4 3 4 4││0 0 0 1 4 0 0 0 1 4│3 4 4 4 4 3 4 4 4 4││0 0 0 2 1 0 0 0 2 1│4 4 4 4 4 4 4 4 4 4│└───────────────────┴───────────────────┘`

(Misalignments in the display are due to defects in the APL Chat Forum software.)

sig0 applies independently to each row of the matrix. It finds all the rotations of a row, and selects the first in the sort of these rotations.

sig1

`      sig1← (⊃∘⍋ ⌷ ⊢) ∘ ((⍸ ⊢ = ⌊/) ⌽⍤0 1 ⊢) ⍤ 1      (sig0 ≡ sig1) x1`

The lexicographically least rotation of a vector necessarily begins with the smallest element. Instead of doing all rotations of a row, sig1 works with only those rotations which begins with the smallest element.

sig2

`      sig2←{                                                      (n c)←⍴⍵                                              0 1↓ t ⌷⍨⊂ (⊂c×⍳n)⌷ ⍋t← (c⌿⍳n), ((n×c)⍴⍳c) ⌽⍤¯1 ⊢c⌿⍵      }                                                    (sig0 ≡ sig2) x1`

sig2 is like sig0, working with all rotations of a row, but does so on the entire matrix at once and not separately on each row (not f⍤1).

sig3

`      sig3←{        (n c)←⍴⍵            ⍝ # rows and columns                         r←(⍳n)×1+2×⌈/|,⍵    ⍝ the "radius" increment for each row        q←⍵+⍤¯1⊢r           ⍝ increase each row by the radius            b←q=⍤¯1⌊/q          ⍝ where elements equal a row minimum         s←+/b               ⍝ # times that happens for each row          r -⍤¯1⍨ t ⌷⍨ ⊂ (⊂+\¯1↓0,s) ⌷ ⍋t← (c|⍸,b) ⌽ s⌿q                           }      (sig0 ≡ sig3) x1`

Likewise, sig3 is sig1 reworked to work on the entire matrix at once. It assumes that the maximal element in the entire matrix (needed for the “radius”) is not so large as to consume all available precision. If the maximal element is indeed too large, do sig3⍳⍨x instead of just sig3 x.

Benchmarks

`      cmpx 'sig0 x' 'sig1 x' 'sig2 x' 'sig3 x'  sig0 x → 2.44E¯2 |   0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕  sig1 x → 1.99E¯2 | -19% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕  sig2 x → 1.94E¯2 | -21% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕  sig3 x → 2.73E¯3 | -89% ⎕⎕⎕      w←?100 600⍴10    ⍝ wide matrix      cmpx 'sig0 w' 'sig1 w' 'sig2 w' 'sig3 w'  sig0 w → 4.08E¯2 |    0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕  sig1 w → 3.84E¯3 |  -91% ⎕  sig2 w → 1.24E¯1 | +203% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕  sig3 w → 4.84E¯3 |  -89% ⎕            n←?6000 10⍴10    ⍝ narrow matrix      cmpx 'sig0 n' 'sig1 n' 'sig2 n' 'sig3 n'  sig0 n → 2.40E¯2 |   0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕  sig1 n → 1.68E¯2 | -30% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕  sig2 n → 1.85E¯2 | -23% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕  sig3 n → 1.66E¯3 | -94% ⎕⎕`

These benchmarks illustrate the rule-of-thumb that code that lets primitive functions “see” larger amounts of data tends to be faster.
Roger|Dyalog

Posts: 208
Joined: Thu Jul 28, 2011 10:53 am

### Re: rotational duplicates

The “radius” technique used in sig3 of the previous post converts a function on a cell (here vectors) into an equivalent one on every cell of an entire array. The idea is that by adding a different multiple of the radius to different cells, the function operating on all the cells at once automatically produces independent results.

We illustrate this with an example. Suppose we want to sort the rows of matrix y (sort each row; funny language, English :-) One way is to do {⍵[⍋⍵]}⍤1 ⊢y:

`      y 4 ¯18 17  52  ¯5 52 30 ¯10 ¯18  44¯7 ¯13 16 ¯20 ¯14 21 59  20   6  3420 ¯19 42  29  33 19 45 ¯35 ¯33 ¯34      {⍵[⍋⍵]}⍤1 ⊢y¯18 ¯18 ¯10  ¯5  4 17 30 44 52 52¯20 ¯14 ¯13  ¯7  6 16 20 21 34 59¯35 ¯34 ¯33 ¯19 19 20 29 33 42 45`

With the radius approach, we first find the number 1+2×⌈/|,y, the “radius”:

`      ⌈/|,y59      1+2×⌈/|,y 119       ⊢ r← (⍳≢y) × 1+2×⌈/|,y0 119 238`

When the rows are incremented by successive multiples of the radius, a value in a row is necessarily less than any of the values in the next row. If the incremented matrix is ravelled and then sorted, the values of a row stay together. A final reshape and subtraction of r produce the required result:

`      y +⍤¯1 ⊢r  4 ¯18  17  52  ¯5  52  30 ¯10 ¯18  44112 106 135  99 105 140 178 139 125 153258 219 280 267 271 257 283 203 205 204      {⍵[⍋⍵]} , y+⍤¯1 ⊢r   ⍝ newlines manually inserted into display¯18 ¯18 ¯10 ¯5 4 17 30 44 52 52         99 105 106 112 125 135 139 140 153 178       203 204 205 219 257 258 267 271 280 283      (⍴y) ⍴ {⍵[⍋⍵]} , y+⍤¯1 ⊢r¯18 ¯18 ¯10  ¯5   4  17  30  44  52  52 99 105 106 112 125 135 139 140 153 178203 204 205 219 257 258 267 271 280 283      r -⍤¯1⍨ (⍴y) ⍴ {⍵[⍋⍵]} , y+⍤¯1 ⊢r¯18 ¯18 ¯10  ¯5  4 17 30 44 52 52¯20 ¯14 ¯13  ¯7  6 16 20 21 34 59¯35 ¯34 ¯33 ¯19 19 20 29 33 42 45      f← {r -⍤¯1⍨ (⍴⍵)⍴ {⍵[⍋⍵]} , ⍵+⍤¯1 ⊢r←(⍳≢⍵)×1+2×⌈/|,⍵}      f y¯18 ¯18 ¯10  ¯5  4 17 30 44 52 52¯20 ¯14 ¯13  ¯7  6 16 20 21 34 59¯35 ¯34 ¯33 ¯19 19 20 29 33 42 45      {⍵[⍋⍵]}⍤1 ⊢y¯18 ¯18 ¯10  ¯5  4 17 30 44 52 52¯20 ¯14 ¯13  ¯7  6 16 20 21 34 59¯35 ¯34 ¯33 ¯19 19 20 29 33 42 45`

f invokes {⍵[⍋⍵]} once, independent of the number of rows in the argument. In contrast, {⍵[⍋⍵]}⍤1⊢y can (for all we know) invoke {⍵[⍋⍵]} on each row of y. The technique also works for grade ⍋ (and other functions).

`      g ← {(⊃⌽⍴⍵) | (⍴⍵) ⍴ ⍋ , ⍵+⍤¯1 ⊢(⍳≢⍵)×1+2×⌈/|,⍵}      (⍋⍤1 ≡ g) y1`

Benchmarks done in version 17.1. The results may change if/when the {⍵[⍋⍵]}⍤1 and ⍋⍤1 implementations are improved.

`      x←?1e4 10⍴100      cmpx 'f x' '{⍵[⍋⍵]}⍤1 ⊢x'  f x          → 2.67E¯3 |   0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕                {⍵[⍋⍵]}⍤1 ⊢x → 5.04E¯3 | +88% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕      cmpx 'g x' '⍋⍤1 ⊢x'  g x    → 3.32E¯3 |   0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕       ⍋⍤1 ⊢x → 3.93E¯3 | +18% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕      x←?1e4 5⍴100      cmpx 'f x' '{⍵[⍋⍵]}⍤1 ⊢x'  f x          → 1.73E¯3 |    0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕                    {⍵[⍋⍵]}⍤1 ⊢x → 4.44E¯3 | +156% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕      cmpx 'g x' '⍋⍤1 ⊢x'  g x    → 2.17E¯3 |   0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕               ⍋⍤1 ⊢x → 3.77E¯3 | +73% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕      x←?1e4 20⍴100      cmpx 'f x' '{⍵[⍋⍵]}⍤1 ⊢x'  f x          → 4.46E¯3 |   0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕          {⍵[⍋⍵]}⍤1 ⊢x → 6.11E¯3 | +37% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕      cmpx 'g x' '⍋⍤1 ⊢x'  g x    → 5.75E¯3 |   0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕  ⍋⍤1 ⊢x → 4.13E¯3 | -29% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕      x←?1e4 80⍴100      cmpx 'f x' '{⍵[⍋⍵]}⍤1 ⊢x'  f x          → 1.39E¯2 |   0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕  {⍵[⍋⍵]}⍤1 ⊢x → 1.02E¯2 | -27% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕              cmpx 'g x' '⍋⍤1 ⊢x'  g x    → 1.88E¯2 |   0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕  ⍋⍤1 ⊢x → 5.58E¯3 | -71% ⎕⎕⎕⎕⎕⎕⎕⎕⎕`

(Misalignments in the display are due to defects in the APL Chat Forum software.)
Roger|Dyalog

Posts: 208
Joined: Thu Jul 28, 2011 10:53 am