Problem Scraping a Web Page

General APL language issues

Problem Scraping a Web Page

Postby PGilbert on Tue May 19, 2015 12:18 pm

I would like to obtain the LOTTO numbers displayed at this web page for my personal use: http://www.lotterypost.com/game/246/results. They are in a table named 'podtable preformatted'. When you get the content of the Web page using conga:

Code: Select all
rc hdrs response←Samples.HTTPGet 'http://www.lotterypost.com/game/246/results'


The table is not fill with the numbers, I suppose the page needs to be executed inside a Web Browser (Java code that needs to be executed ?). If I try to get the page executed inside a Web Browser:

Code: Select all
⎕using← 'System.Windows.Controls,WPF/PresentationFramework.dll' 'System,System.dll'
      url←'http://www.lotterypost.com/game/246/results'
      wb←⎕NEW WebBrowser
      wb.Navigate(⎕NEW Uri(⊂url))
      wb.Document
mshtml.HTMLDocumentClass


the displayed HTML should be somewhere inside wb.Document but according to Microsoft 'The Document object needs to be cast to the COM interface you are expecting.' first and I don't know how to do that.

Any help will be appreciated in getting the displayed HTML.
User avatar
PGilbert
 
Posts: 436
Joined: Sun Dec 13, 2009 8:46 pm
Location: Montréal, Québec, Canada

Re: Problem Scraping a Web Page

Postby Morten|Dyalog on Wed May 20, 2015 8:33 am

I *think* Dyalog has already done the casting. I can "dot into" the object and see things, e.g.

Code: Select all
      'b' wb.Document.⎕NL ¯2
 baseUrl  bgColor  body
      wb.Document.bgColor
#ffffff


Lots of names don't return anything, and I have no idea where to locate your HTML, though... But it is probably in there somewhere...
User avatar
Morten|Dyalog
 
Posts: 453
Joined: Tue Sep 09, 2008 3:52 pm

Re: Problem Scraping a Web Page

Postby PGilbert on Wed May 20, 2015 12:07 pm

Thanks Morten for your answer. I think also that everything is there but I can't seems to be able to get it out (needs some more casting on a System.__ComObject object). However, I just found that with the use of the WebBrowser object of WindowsForms everything is exposed properly in the Document property:

Code: Select all
 url←'http://www.lotterypost.com/game/246/results'

 ⎕USING←'System.Windows.Forms,System.Windows.Forms.dll'

 wb←⎕NEW WebBrowser
 wb.Navigate(⊂url)
 wb.Document ⍝ Need to wait here for the document to load. Otherwise VALUE ERROR


If anybody finds a better way to Scrap a heavily JavaScript Web Page please let us know on this thread.
User avatar
PGilbert
 
Posts: 436
Joined: Sun Dec 13, 2009 8:46 pm
Location: Montréal, Québec, Canada

Re: Problem Scraping a Web Page

Postby kai on Wed May 20, 2015 2:11 pm

It's amazing, whenever I try setting []USING is almost always end up staring at an error of some sort:

Assembly load failed:
Could not load file or assembly 'C:\Program Files\Dyalog\Dyalog APL-64 14.1 Unicode\WPF/PresentationFramework.dll' or one of its dependencies. The system cannot find the file specified.
Could not load file or assembly 'C:\Windows\Microsoft.NET\Framework64\v2.0.50727\WPF/PresentationFramework.dll' or one of its dependencies. The system cannot find the file specified.
User avatar
kai
 
Posts: 137
Joined: Thu Jun 18, 2009 5:10 pm
Location: Hillesheim / Germany

Re: Problem Scraping a Web Page

Postby PGilbert on Wed May 20, 2015 2:49 pm

Hello Kai, using the 'WPF/' sub-directory in ⎕USING to point toward some assemblies of Microsoft is still a mystery to me also because no such sub-directory exists on my computer. But I must say that it has been working for me without problem. I have not seen that same way of doing in the C# community.

It was discussed in a previous post but not resolved. If Dyalog would like to explain that way of doing to reach some Microsoft assemblies that would be welcome.
User avatar
PGilbert
 
Posts: 436
Joined: Sun Dec 13, 2009 8:46 pm
Location: Montréal, Québec, Canada

Re: Problem Scraping a Web Page

Postby MikeHughes on Thu May 21, 2015 8:23 am

Pierre,

It does work the same way as all the other subdirectories.

On my machine C:\Windows\Microsoft.NET\Framework\v4.0.30319 is the directory where all the .Net stuff is stored and there is a WPF subdirectory. This path depends on the version of .Net you are using.

With []Using, unless the full file name is given explicitly, Dyalog looks for the dll to be loaded in two places, the subdirectory relative to....
1. the same directory where the .exe is running, if not found there (or there is no subdirectory),
2.in the standard Microsoft .Net library, mine is the one above.

It resolves the relative subdirectory using these two directories for the search. This is why Syncfusion/4.5/....dll works as it is relative to the Dyalog.exe directory and why the WPF/...dll works as it is relative to the Microsoft directory. Adding a WPF subdirectory under Dyalog.exe would allow you to add your own dll (of the same name) to be loaded in place of the Microsoft one if you really wanted to do that.
User avatar
MikeHughes
 
Posts: 86
Joined: Thu Nov 26, 2009 9:03 am
Location: Market Harborough, Leicestershire, UK

Re: Problem Scraping a Web Page

Postby MikeHughes on Thu May 21, 2015 8:52 am

I think the reason why you don't see WPF/ in the C# is that VS is designed to search all the official MS directories (including WPF/) when you look in the assemblies section when trying to resolve the dlls. So they all get listed.
By the time they get added to a VS project the full name has been resolved and is used in properties. In VS if you look at properties for a WPF file, you will see WPF as a subdirectory in the name.
User avatar
MikeHughes
 
Posts: 86
Joined: Thu Nov 26, 2009 9:03 am
Location: Market Harborough, Leicestershire, UK

Re: Problem Scraping a Web Page

Postby PGilbert on Fri May 22, 2015 1:20 am

Thanks Michael for the education. I can understand now where the WPF sub-directory is located and the logic behind ⎕USING.

Thanks for your time.
User avatar
PGilbert
 
Posts: 436
Joined: Sun Dec 13, 2009 8:46 pm
Location: Montréal, Québec, Canada


Return to Language

Who is online

Users browsing this forum: No registered users and 1 guest