Extract parts of the string between brackets

Learning APL or new to Dyalog? Ask "silly" questions here, without fear...

Extract parts of the string between brackets

Postby alexeyv on Thu Mar 03, 2016 10:29 pm

Hi,

I'm toying with the simple function which should extract parts of the string between brackets, i.e. HTML tag names, to the nested array of strings:
Code: Select all
      str←'hel<o><worl>d'
      DISPLAY parse_brackets str '<' '>'
┌→───────────┐
│ ┌→┐ ┌→───┐ │
│ │o│ │worl│ │
│ └─┘ └────┘ │
└∊───────────┘


My (naive) implementation of this parse_brackets is the following:

      R←parse_brackets(str br1 br2);open;close
⍝ APL2 compatibility to test with GNU APL
⎕ML←3
⍝ Indexes of the open bracket br1
open←(str=br1)/⍳⍴str
⍝ Indexes of the close bracket br2 - 1
⍝ ¯1 to exclude closing bracket
close←¯1+(str=br2)/⍳⍴str
⍝ construct a matrix with start in the first
⍝ line and lengths of extracted words in the
⍝ second line;
⍝ split it to vertical blocks;
⍝ for each pair (begin of the string; length)
⍝ drop up to begin and take the length
R←{⍵[2]↑⍵[1]↓str}¨⊂[1]open,[0.5]close-open


But I feel this implementation is rather clumsy and there must be at least several more elegant ways to do it. Any criticism and ideas on how to implement this in more APLish way(here I feel I still doing the 'functional' way of programming, i.e. transform data to the list and apply lambda function to each element of this list).
alexeyv
 
Posts: 56
Joined: Tue Nov 17, 2015 4:18 pm

Re: Extract parts of the string between brackets

Postby Morten|Dyalog on Thu Mar 03, 2016 11:08 pm

Here are a few variations:

Code: Select all
      txt←'hel<o><worl>d'     
      {1↓¨(⍵=⊃⍵)⊂⍵}{(+\1 ¯1 0['<>'⍳⍵])/⍵}txt
 o  worl
      {1↓¨({⍵×⌈\⍵}+\1 ¯1 0['<>'⍳⍵])⊂⍵}txt              ⍝ IBM style ⊂, requires ⎕ML←3
 o  worl
      {(¯1+⍵⍳¨'>')↑¨⍵}{1↓¨(⍵='<')⊂⍵}txt
 o  worl
    ('<(\w+)>' ⎕S '\1')txt
 o  worl


The last is perhaps not "very APL-ish", but in this case the REGEX is pretty elegant, IMHO.

P.S. We are toying with the idea of using ⊆ to denote the APL2-style partitioned enclose in Dyalog v16.0 (with monadic ⊆ becoming "enclose if simple"). We'd like to make all the useful functionality from different migration levels available with ⎕ML=1.
User avatar
Morten|Dyalog
 
Posts: 394
Joined: Tue Sep 09, 2008 3:52 pm

Re: Extract parts of the string between brackets

Postby Roger|Dyalog on Fri Mar 04, 2016 6:09 pm

The phrase +\1 ¯1 0['<>'⍳⍵] is number 5 in my list of Sixteen APL Amuse-Bouches. It has an ancient pedigree and stars in an amusing anecdote.

I am in awe of the phrasing (and the sarcasm) "... who runs computing in Bavaria from his headquarters in Munich, and ... who runs computing all over the world from his headquarters in Holland."
Roger|Dyalog
 
Posts: 218
Joined: Thu Jul 28, 2011 10:53 am

Re: Extract parts of the string between brackets

Postby alexeyv on Fri Mar 04, 2016 9:05 pm

Wow, great replies! I've only studied the reply with IBM's notation, because I after I posted this question I was looking through the Mastering Dyalog APL and remembered what IBM's ⊂ function really allows to split vector to nested array.

It took me a while to understand this second answer and to actually understand the
      (1 ¯1 0)['<>'⍳⍵]
construction to mark beginning and and of words with '1' and '¯1'.

Next was interesting to see the how to fill the array between 1 and ¯1 with 1s.

But I failed to see the need of {⍵×⌈\⍵} function. For me it looks like task is already solved even without this function:
      str
hel<o><worl>d
1↓¨(+\(1 ¯1 0)['<>'⍳str])⊂str
o worl
alexeyv
 
Posts: 56
Joined: Tue Nov 17, 2015 4:18 pm

Re: Extract parts of the string between brackets

Postby Morten|Dyalog on Sat Mar 05, 2016 9:35 am

alexeyv wrote:But I failed to see the need of {⍵×⌈\⍵} function.

I can't see it either - now - I must have been moving a bit too fast and confused myself.
User avatar
Morten|Dyalog
 
Posts: 394
Joined: Tue Sep 09, 2008 3:52 pm


Return to New to Dyalog?

Who is online

Users browsing this forum: Bing [Bot] and 1 guest