⍷ follies

APL-related discussions - a stream of APL consciousness.
Not sure where to start a discussion ? Here's the place to be
Forum rules
This forum is for discussing APL-related issues. If you think that the subject is off-topic, then the Chat forum is probably a better place for your thoughts !

⍷ follies

Postby Roger|Dyalog on Tue Feb 16, 2021 11:47 pm

I had occasion to write a more comprehensive QA suite for the ⍷ (find) function. Of course, that requires that an understanding what ⍷ is supposed to do. I soon came to the following examples:

      x  ← 1 4⍴'abab'
x0 ← 0 4⍴'abab'
y ← 4 9⍴'abab'

x ⍷ y
1 0 1 0 1 0 0 0 0
0 1 0 1 0 1 0 0 0
1 0 1 0 1 0 0 0 0
0 1 0 1 0 1 0 0 0

x0 ⍷ y
1 1 1 1 1 1 0 0 0
1 1 1 1 1 1 0 0 0
1 1 1 1 1 1 0 0 0
1 1 1 1 1 1 0 0 0

It is fairly obvious that first result is correct. But is the second result correct? Why or why not?

Definition

I believe the definition of ⍺⍷⍵ posits that a template of shape ⍴⍺ is moved over each valid position in ⍵, and ⍺ is matched against the templated items in ⍵. (A position is invalid if the template there placed would extend beyond the confines of ⍵.) This definition suffices to explain both of the above results:

      ⍳⍴y
┌───┬───┬───┬───┬───┬───┬───┬───┬───┐
0 0│0 1│0 2│0 3│0 4│0 5│0 6│0 7│0 8│
├───┼───┼───┼───┼───┼───┼───┼───┼───┤
1 0│1 1│1 2│1 3│1 4│1 5│1 6│1 7│1 8│
├───┼───┼───┼───┼───┼───┼───┼───┼───┤
2 0│2 1│2 2│2 3│2 4│2 5│2 6│2 7│2 8│
├───┼───┼───┼───┼───┼───┼───┼───┼───┤
3 0│3 1│3 2│3 3│3 4│3 5│3 6│3 7│3 8│
└───┴───┴───┴───┴───┴───┴───┴───┴───┘

x ≡ (⍴x) ↑ 0 1 ↓ y
0
x ≡ (⍴x) ↑ 1 3 ↓ y
1

(Misalignments in the display are due to defects in the APL Chat Forum software.)

For x⍷y, the valid positions are those marked in red above (4 6↑⍳⍴y). For a position i, the templated items are (⍴x)↑i↓y. For example, (x⍷y)[⊂0 1] is 0 because x≡(⍴x)↑0 1↓y is 0, and (x⍷y)[⊂1 3] is 1 because x≡(⍴x)↑1 3↓y is 1. Likewise, for x0⍷y, the valid positions are also those marked in red, and for each such position i the templated items are (⍴x0)↑i↓y. Since x0≡(⍴x0)↑i↓y, the x0⍷y result is correct.

Model

The definition can be codified as follows:

      
ebar←{
r←(≢⍴⍺)⌈≢⍴⍵ ⍝ maximum rank
r>≢⍴⍺:(⍺⍴⍨(⍴⍺),⍨(r-≢⍴⍺)⍴1)∇ ⍵ ⍝ if ⍺ has lesser rank, make it the same rank
(⍴⍺)∨.>r↑(⍴⍵),¯1:(⍴⍵)⍴0 ⍝ return 0s if ⍺ has greater rank or is longer
ww←⍵
(⍴⍵) ↑ ⍺∘{⍺≡(⍴⍺)↑⍵↓ww}¨ ⍳(×⍴⍺)+(⍴⍵)-⍴⍺
}

Monsters

I proceeded to compare the model against ⍷ itself, and found some disagreements:

      (0 3⍴'a') (⍷ ≡ ebar) 3 4⍴'b'
1
(0 3⍴'a') (⍷ ≡ ebar) 3 4⍴5
0
(0 3⍴2) (⍷ ≡ ebar) 3 4⍴'b'
0
(0 3⍴2) (⍷ ≡ ebar) 3 4⍴5
1

Side-by-side comparisons where the results disagree:

      (0 3⍴'a') ⍷ 3 4⍴5             (0 3⍴'a') ebar 3 4⍴5
1 1 0 0 0 0 0 0
1 1 0 0 0 0 0 0
1 1 0 0 0 0 0 0

(0 3⍴2) ⍷ 3 4⍴'b' (0 3⍴2) ebar 3 4⍴'b'
1 1 0 0 0 0 0 0
1 1 0 0 0 0 0 0
1 1 0 0 0 0 0 0

So which of ⍷ or ebar is correct? I submit that ⍷ is wrong, because ⍷ and ebar are fundamentally based on ≡, and it is not possible for an array of characters to match an array of numbers.

      (0 3⍴'a') ≡ 0 3⍴5
0
(0 3⍴2) ≡ 0 3⍴'a'
0

How about a more monstrous monster?

      a←0 3⍴⊂⍬
b←1 13⍴(3⍴⊂⍬),3⍴⊂''

a ebar b ⍝ correct
1 1 1 0 0 0 1 1 1 0 0 0 0

a ⍷ b ⍝ incorrect
1 1 1 1 1 1 1 1 1 1 1 0 0

Here, we can not say that either argument is an array of characters or an array of numbers. Instead, the analysis is based on that ⍺ is matched against the templated items in ⍵. The following are the first 6 applications of ≡ in a⍷b:

      (0 3⍴⊂⍬) ≡ 0 3⍴(⊂⍬ ),(⊂⍬ ),⊂⍬
1
(0 3⍴⊂⍬) ≡ 0 3⍴(⊂⍬ ),(⊂⍬ ),⊂''
1
(0 3⍴⊂⍬) ≡ 0 3⍴(⊂⍬ ),(⊂''),⊂''
1
(0 3⍴⊂⍬) ≡ 0 3⍴(⊂''),(⊂''),⊂''
0
(0 3⍴⊂⍬) ≡ 0 3⍴(⊂''),(⊂''),⊂⍬
0
(0 3⍴⊂⍬) ≡ 0 3⍴(⊂''),(⊂⍬ ),⊂⍬
0

A key point about ⍺≡⍵ (in the Dyalog/APL2 style of arrays) is that, for ⍺≡⍵ to be true, the shapes must match, and the items must match or (if there are no items, that is, if the arrays are empty) the prototypes must match.

Missteps

Several missteps were made in the APL model before arriving at the current one. A subtle one was using indexing to do the templating.

      ⍺∘{⍺≡ww[⍵+⍳⍴⍺]}⍤0  ⍝ misstep
⍺∘{⍺≡(⍴⍺)↑⍵↓ww}¨ ⍝ correct

It was a misstep to use indexing to get the templated items, because if ⍺ is empty, ⍵+⍳⍴⍺ loses the information that the template is positioned at ⍵. The "more monstrous monster" brings this flaw to the fore.

Another misstep was one of those "off-by-1" errors. (And, this being APL, several off-by-1 errors can be committed by the one expression.) At one point ⍳1+(⍴⍵)-⍴⍺ was used to compute the valid positions, producing invalid positions if ⍴⍺ had a 0 in any dimension. This was fixed by using ⍳(×⍴⍺)+(⍴⍵)-⍴⍺.
Roger|Dyalog
 
Posts: 238
Joined: Thu Jul 28, 2011 10:53 am

Re: ⍷ follies

Postby Adam|Dyalog on Tue Aug 23, 2022 4:24 pm

In an email on May 26, 2021, Roger wrote:I will write an appendix to the APL Chat Forum post ⍷ Follies at an appropriate time.

The email was in reaction to a great number of emails that had been exchanged internally at Dyalog about the issue at hand. Unfortunately, Roger never got a chance to do write the appendix, before being taken from us. Here is my attempt at summarising what had transpired until then.

In February, Roger had posed that the primitive had a bug in that it was finding empty subarrays of the wrong type, while defined in terms of match (≡), which does distinguish between empty arrays of unequal type. I devised an alternative mental model to describe Find's behaviour where, rather than checking if the left argument could be extracted from the right argument by peeling off outer elements, one could check if the left argument could be overlaid on the right argument, without the right argument changing. I wrote two almost identical models, to emphasise the difference between the extraction model and the overlay model (like Roger's code, assumes ⎕IO←0):

      
ee←{ ⍝ extraction model
ra←≢sa←⍴⍺ ⋄ rw←≢sw←⍴⍵
rm←ra⌈rw
rm>ra:⍵ ∇⍨⍺⍴⍨sa,⍨1⍴⍨rm-ra
sa∨.>rm↑sw,¯1:sw⍴0
_Extract_←{ ⍝ does extracting ⍺⍺ from ⍵⍵ change ⍺⍺?
⍺⍺≡⍺↑⍵↓⍵⍵
}
sw↑sa∘(⍺ _Extract_ ⍵)¨(-⍨∘×⍨sa)↓⍳sw
}
eo←{ ⍝ overlay model
ra←≢sa←⍴⍺ ⋄ rw←≢sw←⍴⍵
rm←ra⌈rw
rm>ra:⍵ ∇⍨⍺⍴⍨sa,⍨1⍴⍨rm-ra
sa∨.>rm↑sw,¯1:sw⍴0
_Overlay_←{ ⍝ does overlaying ⍺⍺ on ⍵⍵ change ⍵⍵?
⍵⍵≡⍺⍺@((⍳⍺)+⊂⍵)⊢⍵⍵
}
sw↑sa∘(⍺ _Overlay_ ⍵)¨(-⍨∘×⍨sa)↓⍳sw
}

Morten Kromberg speculated that the behaviour stemmed from early flat APL where Match didn't exist. Instead, common practice was to use And-reduction (often written as the inner product ∧.=) over element-wise equality, which ignores type mismatches because the comparison of two empty arrays (a scalar function application) itself is empty, thus making the reduction yield the identity element of And, which is true (1).

In April, Roger wrote that he disagree[d] strongly with the "alternative APL and mental model" which I had devised, because it among other things [meant he] can not give a good accounting of it. Also that all the descriptions (APL or non-APL) of string search/find that [he had] seen do not use that mental model.

Morten agreed with Roger that my model was strained at best and clearly a modern construction based on a more complete understanding of ≡ and prototypes, than a possible explanation for what the implementors where thinking when they did this work. He reiterated his theory about And-reductions over equality in a moving window, thus posing that the current behaviour can be seen as correct.

In May, I found support for Morten's theory, based on that exact usage in Adin Falkoff's APL '79 conference proceeding A note on pattern matching: Where do you find the match to an empty array?. Indeed, replacing ≡ in Roger's above ebar model with {(⍺≡⍥⍴⍵)∧(∧/⍺≡¨⍥,⍵)} (≡¨ is needed instead of = because = pervades nested arrays) would make it align with the behaviour of the primitive as implemented:
      
ebar2←{
r←(≢⍴⍺)⌈≢⍴⍵ ⍝ maximum rank
r>≢⍴⍺:(⍺⍴⍨(⍴⍺),⍨(r-≢⍴⍺)⍴1)∇ ⍵ ⍝ if ⍺ has lesser rank, make it the same rank
(⍴⍺)∨.>r↑(⍴⍵),¯1:(⍴⍵)⍴0 ⍝ return 0s if ⍺ has greater rank or is longer
ww←⍵
(⍴⍵) ↑ ⍺∘{⍺ {(⍺≡⍥⍴⍵)∧(∧/⍺≡¨⍥,⍵)} (⍴⍺)↑⍵↓ww}¨ ⍳(×⍴⍺)+(⍴⍵)-⍴⍺
}
User avatar
Adam|Dyalog
 
Posts: 134
Joined: Thu Jun 25, 2015 1:13 pm


Return to APL Chat

Who is online

Users browsing this forum: No registered users and 1 guest