⍷ follies
Forum rules
This forum is for discussing APLrelated issues. If you think that the subject is offtopic, then the Chat forum is probably a better place for your thoughts !
This forum is for discussing APLrelated issues. If you think that the subject is offtopic, then the Chat forum is probably a better place for your thoughts !
2 posts
• Page 1 of 1
⍷ follies
I had occasion to write a more comprehensive QA suite for the ⍷ (find) function. Of course, that requires that an understanding what ⍷ is supposed to do. I soon came to the following examples:
It is fairly obvious that first result is correct. But is the second result correct? Why or why not?
Definition
I believe the definition of ⍺⍷⍵ posits that a template of shape ⍴⍺ is moved over each valid position in ⍵, and ⍺ is matched against the templated items in ⍵. (A position is invalid if the template there placed would extend beyond the confines of ⍵.) This definition suffices to explain both of the above results:
(Misalignments in the display are due to defects in the APL Chat Forum software.)
For x⍷y, the valid positions are those marked in red above (4 6↑⍳⍴y). For a position i, the templated items are (⍴x)↑i↓y. For example, (x⍷y)[⊂0 1] is 0 because x≡(⍴x)↑0 1↓y is 0, and (x⍷y)[⊂1 3] is 1 because x≡(⍴x)↑1 3↓y is 1. Likewise, for x0⍷y, the valid positions are also those marked in red, and for each such position i the templated items are (⍴x0)↑i↓y. Since x0≡(⍴x0)↑i↓y, the x0⍷y result is correct.
Model
The definition can be codified as follows:
Monsters
I proceeded to compare the model against ⍷ itself, and found some disagreements:
Sidebyside comparisons where the results disagree:
So which of ⍷ or ebar is correct? I submit that ⍷ is wrong, because ⍷ and ebar are fundamentally based on ≡, and it is not possible for an array of characters to match an array of numbers.
How about a more monstrous monster?
Here, we can not say that either argument is an array of characters or an array of numbers. Instead, the analysis is based on that ⍺ is matched against the templated items in ⍵. The following are the first 6 applications of ≡ in a⍷b:
A key point about ⍺≡⍵ (in the Dyalog/APL2 style of arrays) is that, for ⍺≡⍵ to be true, the shapes must match, and the items must match or (if there are no items, that is, if the arrays are empty) the prototypes must match.
Missteps
Several missteps were made in the APL model before arriving at the current one. A subtle one was using indexing to do the templating.
It was a misstep to use indexing to get the templated items, because if ⍺ is empty, ⍵+⍳⍴⍺ loses the information that the template is positioned at ⍵. The "more monstrous monster" brings this flaw to the fore.
Another misstep was one of those "offby1" errors. (And, this being APL, several offby1 errors can be committed by the one expression.) At one point ⍳1+(⍴⍵)⍴⍺ was used to compute the valid positions, producing invalid positions if ⍴⍺ had a 0 in any dimension. This was fixed by using ⍳(×⍴⍺)+(⍴⍵)⍴⍺.
x ← 1 4⍴'abab'
x0 ← 0 4⍴'abab'
y ← 4 9⍴'abab'
x ⍷ y
1 0 1 0 1 0 0 0 0
0 1 0 1 0 1 0 0 0
1 0 1 0 1 0 0 0 0
0 1 0 1 0 1 0 0 0
x0 ⍷ y
1 1 1 1 1 1 0 0 0
1 1 1 1 1 1 0 0 0
1 1 1 1 1 1 0 0 0
1 1 1 1 1 1 0 0 0
It is fairly obvious that first result is correct. But is the second result correct? Why or why not?
Definition
I believe the definition of ⍺⍷⍵ posits that a template of shape ⍴⍺ is moved over each valid position in ⍵, and ⍺ is matched against the templated items in ⍵. (A position is invalid if the template there placed would extend beyond the confines of ⍵.) This definition suffices to explain both of the above results:
⍳⍴y
┌───┬───┬───┬───┬───┬───┬───┬───┬───┐
│0 0│0 1│0 2│0 3│0 4│0 5│0 6│0 7│0 8│
├───┼───┼───┼───┼───┼───┼───┼───┼───┤
│1 0│1 1│1 2│1 3│1 4│1 5│1 6│1 7│1 8│
├───┼───┼───┼───┼───┼───┼───┼───┼───┤
│2 0│2 1│2 2│2 3│2 4│2 5│2 6│2 7│2 8│
├───┼───┼───┼───┼───┼───┼───┼───┼───┤
│3 0│3 1│3 2│3 3│3 4│3 5│3 6│3 7│3 8│
└───┴───┴───┴───┴───┴───┴───┴───┴───┘
x ≡ (⍴x) ↑ 0 1 ↓ y
0
x ≡ (⍴x) ↑ 1 3 ↓ y
1
(Misalignments in the display are due to defects in the APL Chat Forum software.)
For x⍷y, the valid positions are those marked in red above (4 6↑⍳⍴y). For a position i, the templated items are (⍴x)↑i↓y. For example, (x⍷y)[⊂0 1] is 0 because x≡(⍴x)↑0 1↓y is 0, and (x⍷y)[⊂1 3] is 1 because x≡(⍴x)↑1 3↓y is 1. Likewise, for x0⍷y, the valid positions are also those marked in red, and for each such position i the templated items are (⍴x0)↑i↓y. Since x0≡(⍴x0)↑i↓y, the x0⍷y result is correct.
Model
The definition can be codified as follows:
⍝
ebar←{
r←(≢⍴⍺)⌈≢⍴⍵ ⍝ maximum rank
r>≢⍴⍺:(⍺⍴⍨(⍴⍺),⍨(r≢⍴⍺)⍴1)∇ ⍵ ⍝ if ⍺ has lesser rank, make it the same rank
(⍴⍺)∨.>r↑(⍴⍵),¯1:(⍴⍵)⍴0 ⍝ return 0s if ⍺ has greater rank or is longer
ww←⍵
(⍴⍵) ↑ ⍺∘{⍺≡(⍴⍺)↑⍵↓ww}¨ ⍳(×⍴⍺)+(⍴⍵)⍴⍺
}
Monsters
I proceeded to compare the model against ⍷ itself, and found some disagreements:
(0 3⍴'a') (⍷ ≡ ebar) 3 4⍴'b'
1
(0 3⍴'a') (⍷ ≡ ebar) 3 4⍴5
0
(0 3⍴2) (⍷ ≡ ebar) 3 4⍴'b'
0
(0 3⍴2) (⍷ ≡ ebar) 3 4⍴5
1
Sidebyside comparisons where the results disagree:
(0 3⍴'a') ⍷ 3 4⍴5 (0 3⍴'a') ebar 3 4⍴5
1 1 0 0 0 0 0 0
1 1 0 0 0 0 0 0
1 1 0 0 0 0 0 0
(0 3⍴2) ⍷ 3 4⍴'b' (0 3⍴2) ebar 3 4⍴'b'
1 1 0 0 0 0 0 0
1 1 0 0 0 0 0 0
1 1 0 0 0 0 0 0
So which of ⍷ or ebar is correct? I submit that ⍷ is wrong, because ⍷ and ebar are fundamentally based on ≡, and it is not possible for an array of characters to match an array of numbers.
(0 3⍴'a') ≡ 0 3⍴5
0
(0 3⍴2) ≡ 0 3⍴'a'
0
How about a more monstrous monster?
a←0 3⍴⊂⍬
b←1 13⍴(3⍴⊂⍬),3⍴⊂''
a ebar b ⍝ correct
1 1 1 0 0 0 1 1 1 0 0 0 0
a ⍷ b ⍝ incorrect
1 1 1 1 1 1 1 1 1 1 1 0 0
Here, we can not say that either argument is an array of characters or an array of numbers. Instead, the analysis is based on that ⍺ is matched against the templated items in ⍵. The following are the first 6 applications of ≡ in a⍷b:
(0 3⍴⊂⍬) ≡ 0 3⍴(⊂⍬ ),(⊂⍬ ),⊂⍬
1
(0 3⍴⊂⍬) ≡ 0 3⍴(⊂⍬ ),(⊂⍬ ),⊂''
1
(0 3⍴⊂⍬) ≡ 0 3⍴(⊂⍬ ),(⊂''),⊂''
1
(0 3⍴⊂⍬) ≡ 0 3⍴(⊂''),(⊂''),⊂''
0
(0 3⍴⊂⍬) ≡ 0 3⍴(⊂''),(⊂''),⊂⍬
0
(0 3⍴⊂⍬) ≡ 0 3⍴(⊂''),(⊂⍬ ),⊂⍬
0
A key point about ⍺≡⍵ (in the Dyalog/APL2 style of arrays) is that, for ⍺≡⍵ to be true, the shapes must match, and the items must match or (if there are no items, that is, if the arrays are empty) the prototypes must match.
Missteps
Several missteps were made in the APL model before arriving at the current one. A subtle one was using indexing to do the templating.
⍺∘{⍺≡ww[⍵+⍳⍴⍺]}⍤0 ⍝ misstep
⍺∘{⍺≡(⍴⍺)↑⍵↓ww}¨ ⍝ correct
It was a misstep to use indexing to get the templated items, because if ⍺ is empty, ⍵+⍳⍴⍺ loses the information that the template is positioned at ⍵. The "more monstrous monster" brings this flaw to the fore.
Another misstep was one of those "offby1" errors. (And, this being APL, several offby1 errors can be committed by the one expression.) At one point ⍳1+(⍴⍵)⍴⍺ was used to compute the valid positions, producing invalid positions if ⍴⍺ had a 0 in any dimension. This was fixed by using ⍳(×⍴⍺)+(⍴⍵)⍴⍺.
 RogerDyalog
 Posts: 238
 Joined: Thu Jul 28, 2011 10:53 am
Re: ⍷ follies
In an email on May 26, 2021, Roger wrote:I will write an appendix to the APL Chat Forum post ⍷ Follies at an appropriate time.
The email was in reaction to a great number of emails that had been exchanged internally at Dyalog about the issue at hand. Unfortunately, Roger never got a chance to do write the appendix, before being taken from us. Here is my attempt at summarising what had transpired until then.
In February, Roger had posed that the primitive had a bug in that it was finding empty subarrays of the wrong type, while defined in terms of match (≡), which does distinguish between empty arrays of unequal type. I devised an alternative mental model to describe Find's behaviour where, rather than checking if the left argument could be extracted from the right argument by peeling off outer elements, one could check if the left argument could be overlaid on the right argument, without the right argument changing. I wrote two almost identical models, to emphasise the difference between the extraction model and the overlay model (like Roger's code, assumes ⎕IO←0):
⍝
ee←{ ⍝ extraction model
ra←≢sa←⍴⍺ ⋄ rw←≢sw←⍴⍵
rm←ra⌈rw
rm>ra:⍵ ∇⍨⍺⍴⍨sa,⍨1⍴⍨rmra
sa∨.>rm↑sw,¯1:sw⍴0
_Extract_←{ ⍝ does extracting ⍺⍺ from ⍵⍵ change ⍺⍺?
⍺⍺≡⍺↑⍵↓⍵⍵
}
sw↑sa∘(⍺ _Extract_ ⍵)¨(⍨∘×⍨sa)↓⍳sw
}
eo←{ ⍝ overlay model
ra←≢sa←⍴⍺ ⋄ rw←≢sw←⍴⍵
rm←ra⌈rw
rm>ra:⍵ ∇⍨⍺⍴⍨sa,⍨1⍴⍨rmra
sa∨.>rm↑sw,¯1:sw⍴0
_Overlay_←{ ⍝ does overlaying ⍺⍺ on ⍵⍵ change ⍵⍵?
⍵⍵≡⍺⍺@((⍳⍺)+⊂⍵)⊢⍵⍵
}
sw↑sa∘(⍺ _Overlay_ ⍵)¨(⍨∘×⍨sa)↓⍳sw
}
Morten Kromberg speculated that the behaviour stemmed from early flat APL where Match didn't exist. Instead, common practice was to use Andreduction (often written as the inner product ∧.=) over elementwise equality, which ignores type mismatches because the comparison of two empty arrays (a scalar function application) itself is empty, thus making the reduction yield the identity element of And, which is true (1).
In April, Roger wrote that he disagree[d] strongly with the "alternative APL and mental model" which I had devised, because it among other things [meant he] can not give a good accounting of it. Also that all the descriptions (APL or nonAPL) of string search/find that [he had] seen do not use that mental model.
Morten agreed with Roger that my model was strained at best and clearly a modern construction based on a more complete understanding of ≡ and prototypes, than a possible explanation for what the implementors where thinking when they did this work. He reiterated his theory about Andreductions over equality in a moving window, thus posing that the current behaviour can be seen as correct.
In May, I found support for Morten's theory, based on that exact usage in Adin Falkoff's APL '79 conference proceeding A note on pattern matching: Where do you find the match to an empty array?. Indeed, replacing ≡ in Roger's above ebar model with {(⍺≡⍥⍴⍵)∧(∧/⍺≡¨⍥,⍵)} (≡¨ is needed instead of = because = pervades nested arrays) would make it align with the behaviour of the primitive as implemented:
⍝
ebar2←{
r←(≢⍴⍺)⌈≢⍴⍵ ⍝ maximum rank
r>≢⍴⍺:(⍺⍴⍨(⍴⍺),⍨(r≢⍴⍺)⍴1)∇ ⍵ ⍝ if ⍺ has lesser rank, make it the same rank
(⍴⍺)∨.>r↑(⍴⍵),¯1:(⍴⍵)⍴0 ⍝ return 0s if ⍺ has greater rank or is longer
ww←⍵
(⍴⍵) ↑ ⍺∘{⍺ {(⍺≡⍥⍴⍵)∧(∧/⍺≡¨⍥,⍵)} (⍴⍺)↑⍵↓ww}¨ ⍳(×⍴⍺)+(⍴⍵)⍴⍺
}

AdamDyalog  Posts: 143
 Joined: Thu Jun 25, 2015 1:13 pm
2 posts
• Page 1 of 1
Who is online
Users browsing this forum: No registered users and 1 guest
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group