Extract parts of the string between brackets
5 posts
• Page 1 of 1
Extract parts of the string between brackets
Hi,
I'm toying with the simple function which should extract parts of the string between brackets, i.e. HTML tag names, to the nested array of strings:
My (naive) implementation of this parse_brackets is the following:
But I feel this implementation is rather clumsy and there must be at least several more elegant ways to do it. Any criticism and ideas on how to implement this in more APLish way(here I feel I still doing the 'functional' way of programming, i.e. transform data to the list and apply lambda function to each element of this list).
I'm toying with the simple function which should extract parts of the string between brackets, i.e. HTML tag names, to the nested array of strings:
- Code: Select all
str←'hel<o><worl>d'
DISPLAY parse_brackets str '<' '>'
┌→───────────┐
│ ┌→┐ ┌→───┐ │
│ │o│ │worl│ │
│ └─┘ └────┘ │
└∊───────────┘
My (naive) implementation of this parse_brackets is the following:
R←parse_brackets(str br1 br2);open;close
⍝ APL2 compatibility to test with GNU APL
⎕ML←3
⍝ Indexes of the open bracket br1
open←(str=br1)/⍳⍴str
⍝ Indexes of the close bracket br2 - 1
⍝ ¯1 to exclude closing bracket
close←¯1+(str=br2)/⍳⍴str
⍝ construct a matrix with start in the first
⍝ line and lengths of extracted words in the
⍝ second line;
⍝ split it to vertical blocks;
⍝ for each pair (begin of the string; length)
⍝ drop up to begin and take the length
R←{⍵[2]↑⍵[1]↓str}¨⊂[1]open,[0.5]close-open
But I feel this implementation is rather clumsy and there must be at least several more elegant ways to do it. Any criticism and ideas on how to implement this in more APLish way(here I feel I still doing the 'functional' way of programming, i.e. transform data to the list and apply lambda function to each element of this list).
- alexeyv
- Posts: 56
- Joined: Tue Nov 17, 2015 4:18 pm
Re: Extract parts of the string between brackets
Here are a few variations:
The last is perhaps not "very APL-ish", but in this case the REGEX is pretty elegant, IMHO.
P.S. We are toying with the idea of using ⊆ to denote the APL2-style partitioned enclose in Dyalog v16.0 (with monadic ⊆ becoming "enclose if simple"). We'd like to make all the useful functionality from different migration levels available with ⎕ML=1.
- Code: Select all
txt←'hel<o><worl>d'
{1↓¨(⍵=⊃⍵)⊂⍵}{(+\1 ¯1 0['<>'⍳⍵])/⍵}txt
o worl
{1↓¨({⍵×⌈\⍵}+\1 ¯1 0['<>'⍳⍵])⊂⍵}txt ⍝ IBM style ⊂, requires ⎕ML←3
o worl
{(¯1+⍵⍳¨'>')↑¨⍵}{1↓¨(⍵='<')⊂⍵}txt
o worl
('<(\w+)>' ⎕S '\1')txt
o worl
The last is perhaps not "very APL-ish", but in this case the REGEX is pretty elegant, IMHO.
P.S. We are toying with the idea of using ⊆ to denote the APL2-style partitioned enclose in Dyalog v16.0 (with monadic ⊆ becoming "enclose if simple"). We'd like to make all the useful functionality from different migration levels available with ⎕ML=1.
-
Morten|Dyalog - Posts: 394
- Joined: Tue Sep 09, 2008 3:52 pm
Re: Extract parts of the string between brackets
The phrase +\1 ¯1 0['<>'⍳⍵] is number 5 in my list of Sixteen APL Amuse-Bouches. It has an ancient pedigree and stars in an amusing anecdote.
I am in awe of the phrasing (and the sarcasm) "... who runs computing in Bavaria from his headquarters in Munich, and ... who runs computing all over the world from his headquarters in Holland."
I am in awe of the phrasing (and the sarcasm) "... who runs computing in Bavaria from his headquarters in Munich, and ... who runs computing all over the world from his headquarters in Holland."
- Roger|Dyalog
- Posts: 218
- Joined: Thu Jul 28, 2011 10:53 am
Re: Extract parts of the string between brackets
Wow, great replies! I've only studied the reply with IBM's notation, because I after I posted this question I was looking through the Mastering Dyalog APL and remembered what IBM's ⊂ function really allows to split vector to nested array.
It took me a while to understand this second answer and to actually understand the
Next was interesting to see the how to fill the array between 1 and ¯1 with 1s.
But I failed to see the need of {⍵×⌈\⍵} function. For me it looks like task is already solved even without this function:
It took me a while to understand this second answer and to actually understand the
(1 ¯1 0)['<>'⍳⍵]construction to mark beginning and and of words with '1' and '¯1'.
Next was interesting to see the how to fill the array between 1 and ¯1 with 1s.
But I failed to see the need of {⍵×⌈\⍵} function. For me it looks like task is already solved even without this function:
str
hel<o><worl>d
1↓¨(+\(1 ¯1 0)['<>'⍳str])⊂str
o worl
- alexeyv
- Posts: 56
- Joined: Tue Nov 17, 2015 4:18 pm
Re: Extract parts of the string between brackets
alexeyv wrote:But I failed to see the need of {⍵×⌈\⍵} function.
I can't see it either - now - I must have been moving a bit too fast and confused myself.
-
Morten|Dyalog - Posts: 394
- Joined: Tue Sep 09, 2008 3:52 pm
5 posts
• Page 1 of 1
Who is online
Users browsing this forum: Bing [Bot] and 1 guest
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group