Experimenting with the new NGET & NPUT

by **crishog** on Tue Sep 13, 2016 10:47 am

txt←'Hello'

(txt,(⎕ucs 13),txt,(⎕ucs 13),txt,⎕ucs 13)⎕NPUT 'd:\tmp' 1

And because I want to examine what they are up to, let's read it "raw"

'd:\tmp'⎕ntie ¯1 ⋄ ⎕ucs ⎕nread ¯1 80 ¯1 0 ⋄ ⎕nuntie ¯1
72 101 108 108 111 13 72 101 108 108 111 13 72 101 108 108 111 13

So far so good, next:

(txt,(⎕ucs 10),txt,(⎕ucs 10),txt,⎕ucs 10)⎕NPUT 'd:\tmp' 1
'd:\tmp'⎕ntie ¯1 ⋄ ⎕ucs ⎕nread ¯1 820 ¯1 0 ⋄ ⎕nuntie ¯1
72 101 108 108 111 13 10 72 101 108 108 111 13 10 72 101 108 108 111 13 10
{( ⍴⎕←⊃⍵) ,3⊃⍵}⎕NGET 'd:\tmp' 1
Hello Hello Hello
3 13 10

The LF has been translated to CR/LF, which is what the documentation says, but to get just 10, I now have to supply a translation as well as a newline - could we supply an empty vector for the default encoding?

Now this is quite neat:

(txt,(⎕ucs 133),txt,(⎕ucs 13),txt,⎕ucs 13)⎕NPUT 'd:\tmp' 1
'd:\tmp'⎕ntie ¯1 ⋄ ⎕ucs ⎕nread ¯1 80 ¯1 0 ⋄ ⎕nuntie ¯1
72 101 108 108 111 194 133 72 101 108 108 111 13 72 101 108 108 111 13
{( ⍴⎕←⊃⍵) ,3⊃⍵}⎕NGET 'd:\tmp' 1
Hello Hello Hello
3 133

As the documentation says, the newline element is the first delimiter found, but it still recognizes the CRs as delimiters. However:

(txt,(⎕ucs 12),txt,(⎕ucs 12),txt,⎕ucs 12)⎕NPUT 'd:\tmp' 1
'd:\tmp'⎕ntie ¯1 ⋄ ⎕ucs ⎕nread ¯1 82 ¯1 0 ⋄ ⎕nuntie ¯1
72 101 108 108 111 12 72 101 108 108 111 12 72 101 108 108 111 12
{( ⍴⎕←⊃⍵) ,3⊃⍵}⎕NGET 'd:\tmp' 1
Hello Hello Hello
3

The form feed is recognized as a delimiter, but not returned in the newline element - the same applies to Vertical Tab (11), the line & paragraph separators (8232 & 8233). As you can see from the examples above 13, 13 10 & 133 are returned correctly and 10 is returned as 13 10, unless I specify the translation & newline

One final observation:

(txt,(⎕ucs 13),txt,(⎕ucs 13),txt)⎕NPUT 'd:\tmp' 1
'd:\tmp'⎕ntie ¯1 ⋄ ⎕ucs ⎕nread ¯1 80 ¯1 0 ⋄ ⎕nuntie ¯1
72 101 108 108 111 13 72 101 108 108 111 13 72 101 108 108 111 13 10
{( ⍴⎕←⊃⍵) ,3⊃⍵}⎕NGET 'd:\tmp' 1
Hello Hello Hello
3 13

Because I didn't put a trailing 13, I get the default 13 10 appended to the file - this is true whatever delimiter I choose (unless I supply translation & newline, when the trailing delimiter is the newline character(s)) & I believe it's mentioned somewhere. Is it strictly necessary to have a trailing delimiter on a file? Is it just good practice or will the absence of a delimiter break some applications which read text files?

by **Richard|Dyalog** on Wed Sep 14, 2016 2:01 pm

> The LF has been translated to CR/LF, which is what the documentation says, but to get just 10, I now have to supply a translation as well as a newline - could we supply an empty vector for the default encoding?

For convenience, ⎕NGET recognises any line ending sequence and processes it as such, but normalises it (when it presents the text in the default character vector format) to a single line feed character. This simplifies the APL processing the text as it has only ever to deal with the line feed case.

As a result ⎕NPUT, which "undoes" this mapping, only ever maps line feed characters back to the specified (or default) line ending sequence, anything else is left alone. An omitted encoding (or empty vector encoding) means the default, which is dependent on the host convention (CRLF or LF). If you want LF line endings you do have to specify that - but note that this does not change any other line endings to LF too.

> The form feed is recognized as a delimiter, but not returned in the newline element - the same applies to Vertical Tab (11), the line & paragraph separators (8232 & 8233).

This is because the UNICODE standard says that all newline elements must be processed as such. But as Paragraph Separator etc. are never used as line endings on any host platform, it probably wouldn't be useful to report them as the line ending type in the file.

> Is it strictly necessary to have a trailing delimiter on a file?

Yes, according to the Posix standard. Specifically, a text file consists of lines, and lines require a trailing delimiter - see
http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_397 and
http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_206

I hope this helps!

by **crishog** on Thu Dec 01, 2016 1:59 pm

And I finally return to playing with this

> The form feed is recognized as a delimiter, but not returned in the newline element - the same applies to Vertical Tab (11), the line & paragraph separators (8232 & 8233).

In fact it returns a 0 for all of these.

The 11 12 8232 & 8233 can happily be included in the data string for a put, but you can't specify them as delimiters (you get a domain error) - however the get (with the parameter to return a nested vector) returns the sub-strings correctly separated by these characters and trailing ones removed.

So I take it that this is an mechanism to "chop up" the lines of a file with characters which aren't valid line endings and the only valid line endings are LF, CR CR/LF & 133 (which is a sort of CR + LF character).

I confused myself by reading "Table 3: Line separators:" of the NGET documentation as if the "other line separator characters" part were also valid as entries for the 3rd element of NPUT's left argument

The tool of thought for

software solutions

Experimenting with the new NGET & NPUT

Experimenting with the new NGET & NPUT

Re: Experimenting with the new NGET & NPUT

Re: Experimenting with the new NGET & NPUT

Who is online

QUICK LINKS