rotational duplicates

by **Roger|Dyalog** on Mon May 18, 2020 12:07 am

Skip Cave posted an interesting puzzle on the J Programming Forum on 2020-05-16. A rewording of the problem is as follows:

Given an integer matrix, find the rows which are rotational duplicates of each other. Two rows r and s are rotational duplicates if r≡k⌽s for some integer k.

Some solutions are presented in the following message.
.
.
.
.
.
.
.
.
.
.
.
.
.

by **Roger|Dyalog** on Mon May 18, 2020 12:08 am

Given an integer matrix, find the rows which are rotational duplicates of each other. Two rows r and s are rotational duplicates if r≡k⌽s for some integer k.

The crux of the problem is to find a signature (key) which uniquely identifies the rotational variants of a vector. The groups of rotational duplicates obtain by (sig mat){⊂⍵}⌸mat. The solutions here use as signature the lexicographically least (first in a sort) of the possible rotations of a vector.

Test Data

      odometer← ⊢ ⊤⍤1 0 ⍳∘(×/)

      x←,⍨ ⍪⍨ odometer 5⍴5
      ⍴x
6250 10
      4↑x
0 0 0 0 0 0 0 0 0 0
0 0 0 0 1 0 0 0 0 1
0 0 0 0 2 0 0 0 0 2
0 0 0 0 3 0 0 0 0 3
      ¯4↑x
4 4 4 4 1 4 4 4 4 1
4 4 4 4 2 4 4 4 4 2
4 4 4 4 3 4 4 4 4 3
4 4 4 4 4 4 4 4 4 4

sig0

      sig0← (⊃∘⍋ ⌷ ⊢) ∘ (⍳∘≢ ⌽⍤0 1 ⊢) ⍤1

      ⍴ sig0 x
6250 10
      ⍴ q← ∪ sig0 x
629 10
      (10↑q) (¯10↑q)
┌───────────────────┬───────────────────┐
│0 0 0 0 0 0 0 0 0 0│2 4 4 4 3 2 4 4 4 3│
│0 0 0 0 1 0 0 0 0 1│2 4 4 4 4 2 4 4 4 4│
│0 0 0 0 2 0 0 0 0 2│3 3 3 3 3 3 3 3 3 3│
│0 0 0 0 3 0 0 0 0 3│3 3 3 3 4 3 3 3 3 4│
│0 0 0 0 4 0 0 0 0 4│3 3 3 4 4 3 3 3 4 4│
│0 0 0 1 1 0 0 0 1 1│3 3 4 3 4 3 3 4 3 4│
│0 0 0 1 2 0 0 0 1 2│3 3 4 4 4 3 3 4 4 4│
│0 0 0 1 3 0 0 0 1 3│3 4 3 4 4 3 4 3 4 4│
│0 0 0 1 4 0 0 0 1 4│3 4 4 4 4 3 4 4 4 4│
│0 0 0 2 1 0 0 0 2 1│4 4 4 4 4 4 4 4 4 4│
└───────────────────┴───────────────────┘

(Misalignments in the display are due to defects in the APL Chat Forum software.)

sig0 applies independently to each row of the matrix. It finds all the rotations of a row, and selects the first in the sort of these rotations.

sig1

      sig1← (⊃∘⍋ ⌷ ⊢) ∘ ((⍸ ⊢ = ⌊/) ⌽⍤0 1 ⊢) ⍤ 1

      (sig0 ≡ sig1) x
1

The lexicographically least rotation of a vector necessarily begins with the smallest element. Instead of doing all rotations of a row, sig1 works with only those rotations which begins with the smallest element.

sig2

      sig2←{                                              
        (n c)←⍴⍵                                      
        0 1↓ t ⌷⍨⊂ (⊂c×⍳n)⌷ ⍋t← (c⌿⍳n), ((n×c)⍴⍳c) ⌽⍤¯1 ⊢c⌿⍵
      }
                                              
      (sig0 ≡ sig2) x
1

sig2 is like sig0, working with all rotations of a row, but does so on the entire matrix at once and not separately on each row (not f⍤1).

sig3

      sig3←{
        (n c)←⍴⍵            ⍝ # rows and columns                 
        r←(⍳n)×1+2×⌈/|,⍵    ⍝ the "radius" increment for each row
        q←⍵+⍤¯1⊢r           ⍝ increase each row by the radius    
        b←q=⍤¯1⌊/q          ⍝ where elements equal a row minimum 
        s←+/b               ⍝ # times that happens for each row  
        r -⍤¯1⍨ t ⌷⍨ ⊂ (⊂+\¯1↓0,s) ⌷ ⍋t← (c|⍸,b) ⌽ s⌿q                     
      }

      (sig0 ≡ sig3) x
1

Likewise, sig3 is sig1 reworked to work on the entire matrix at once. It assumes that the maximal element in the entire matrix (needed for the “radius”) is not so large as to consume all available precision. If the maximal element is indeed too large, do sig3⍳⍨x instead of just sig3 x.

Benchmarks

      cmpx 'sig0 x' 'sig1 x' 'sig2 x' 'sig3 x'
  sig0 x → 2.44E¯2 |   0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
  sig1 x → 1.99E¯2 | -19% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
  sig2 x → 1.94E¯2 | -21% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
  sig3 x → 2.73E¯3 | -89% ⎕⎕⎕

      w←?100 600⍴10    ⍝ wide matrix
      cmpx 'sig0 w' 'sig1 w' 'sig2 w' 'sig3 w'
  sig0 w → 4.08E¯2 |    0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
  sig1 w → 3.84E¯3 |  -91% ⎕
  sig2 w → 1.24E¯1 | +203% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
  sig3 w → 4.84E¯3 |  -89% ⎕
      
      n←?6000 10⍴10    ⍝ narrow matrix
      cmpx 'sig0 n' 'sig1 n' 'sig2 n' 'sig3 n'
  sig0 n → 2.40E¯2 |   0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
  sig1 n → 1.68E¯2 | -30% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
  sig2 n → 1.85E¯2 | -23% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
  sig3 n → 1.66E¯3 | -94% ⎕⎕

These benchmarks illustrate the rule-of-thumb that code that lets primitive functions “see” larger amounts of data tends to be faster.

by **Roger|Dyalog** on Mon May 18, 2020 10:49 pm

The “radius” technique used in sig3 of the previous post converts a function on a cell (here vectors) into an equivalent one on every cell of an entire array. The idea is that by adding a different multiple of the radius to different cells, the function operating on all the cells at once automatically produces independent results.

We illustrate this with an example. Suppose we want to sort the rows of matrix y (sort each row; funny language, English :-) One way is to do {⍵[⍋⍵]}⍤1 ⊢y:

      y
 4 ¯18 17  52  ¯5 52 30 ¯10 ¯18  44
¯7 ¯13 16 ¯20 ¯14 21 59  20   6  34
20 ¯19 42  29  33 19 45 ¯35 ¯33 ¯34

      {⍵[⍋⍵]}⍤1 ⊢y
¯18 ¯18 ¯10  ¯5  4 17 30 44 52 52
¯20 ¯14 ¯13  ¯7  6 16 20 21 34 59
¯35 ¯34 ¯33 ¯19 19 20 29 33 42 45

With the radius approach, we first find the number 1+2×⌈/|,y, the “radius”:

      ⌈/|,y
59
      1+2×⌈/|,y 
119 
      ⊢ r← (⍳≢y) × 1+2×⌈/|,y
0 119 238

When the rows are incremented by successive multiples of the radius, a value in a row is necessarily less than any of the values in the next row. If the incremented matrix is ravelled and then sorted, the values of a row stay together. A final reshape and subtraction of r produce the required result:

      y +⍤¯1 ⊢r
  4 ¯18  17  52  ¯5  52  30 ¯10 ¯18  44
112 106 135  99 105 140 178 139 125 153
258 219 280 267 271 257 283 203 205 204

      {⍵[⍋⍵]} , y+⍤¯1 ⊢r   ⍝ newlines manually inserted into display
¯18 ¯18 ¯10 ¯5 4 17 30 44 52 52   
      99 105 106 112 125 135 139 140 153 178 
      203 204 205 219 257 258 267 271 280 283

      (⍴y) ⍴ {⍵[⍋⍵]} , y+⍤¯1 ⊢r
¯18 ¯18 ¯10  ¯5   4  17  30  44  52  52
 99 105 106 112 125 135 139 140 153 178
203 204 205 219 257 258 267 271 280 283

      r -⍤¯1⍨ (⍴y) ⍴ {⍵[⍋⍵]} , y+⍤¯1 ⊢r
¯18 ¯18 ¯10  ¯5  4 17 30 44 52 52
¯20 ¯14 ¯13  ¯7  6 16 20 21 34 59
¯35 ¯34 ¯33 ¯19 19 20 29 33 42 45

      f← {r -⍤¯1⍨ (⍴⍵)⍴ {⍵[⍋⍵]} , ⍵+⍤¯1 ⊢r←(⍳≢⍵)×1+2×⌈/|,⍵}
      f y
¯18 ¯18 ¯10  ¯5  4 17 30 44 52 52
¯20 ¯14 ¯13  ¯7  6 16 20 21 34 59
¯35 ¯34 ¯33 ¯19 19 20 29 33 42 45

      {⍵[⍋⍵]}⍤1 ⊢y
¯18 ¯18 ¯10  ¯5  4 17 30 44 52 52
¯20 ¯14 ¯13  ¯7  6 16 20 21 34 59
¯35 ¯34 ¯33 ¯19 19 20 29 33 42 45

f invokes {⍵[⍋⍵]} once, independent of the number of rows in the argument. In contrast, {⍵[⍋⍵]}⍤1⊢y can (for all we know) invoke {⍵[⍋⍵]} on each row of y. The technique also works for grade ⍋ (and other functions).

      g ← {(⊃⌽⍴⍵) | (⍴⍵) ⍴ ⍋ , ⍵+⍤¯1 ⊢(⍳≢⍵)×1+2×⌈/|,⍵}
      (⍋⍤1 ≡ g) y
1

Benchmarks done in version 17.1. The results may change if/when the {⍵[⍋⍵]}⍤1 and ⍋⍤1 implementations are improved.

      x←?1e4 10⍴100
      cmpx 'f x' '{⍵[⍋⍵]}⍤1 ⊢x'
  f x          → 2.67E¯3 |   0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕              
  {⍵[⍋⍵]}⍤1 ⊢x → 5.04E¯3 | +88% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
      cmpx 'g x' '⍋⍤1 ⊢x'
  g x    → 3.32E¯3 |   0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕     
  ⍋⍤1 ⊢x → 3.93E¯3 | +18% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕

      x←?1e4 5⍴100
      cmpx 'f x' '{⍵[⍋⍵]}⍤1 ⊢x'
  f x          → 1.73E¯3 |    0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕                  
  {⍵[⍋⍵]}⍤1 ⊢x → 4.44E¯3 | +156% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
      cmpx 'g x' '⍋⍤1 ⊢x'
  g x    → 2.17E¯3 |   0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕             
  ⍋⍤1 ⊢x → 3.77E¯3 | +73% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕

      x←?1e4 20⍴100
      cmpx 'f x' '{⍵[⍋⍵]}⍤1 ⊢x'
  f x          → 4.46E¯3 |   0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕        
  {⍵[⍋⍵]}⍤1 ⊢x → 6.11E¯3 | +37% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
      cmpx 'g x' '⍋⍤1 ⊢x'
  g x    → 5.75E¯3 |   0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
  ⍋⍤1 ⊢x → 4.13E¯3 | -29% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕

      x←?1e4 80⍴100
      cmpx 'f x' '{⍵[⍋⍵]}⍤1 ⊢x'
  f x          → 1.39E¯2 |   0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
  {⍵[⍋⍵]}⍤1 ⊢x → 1.02E¯2 | -27% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕        
      cmpx 'g x' '⍋⍤1 ⊢x'
  g x    → 1.88E¯2 |   0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
  ⍋⍤1 ⊢x → 5.58E¯3 | -71% ⎕⎕⎕⎕⎕⎕⎕⎕⎕

(Misalignments in the display are due to defects in the APL Chat Forum software.)

The tool of thought for

software solutions

rotational duplicates

rotational duplicates

Re: rotational duplicates

Re: rotational duplicates

Who is online

QUICK LINKS