Problem Scraping a Web Page
8 posts
• Page 1 of 1
Problem Scraping a Web Page
I would like to obtain the LOTTO numbers displayed at this web page for my personal use: http://www.lotterypost.com/game/246/results. They are in a table named 'podtable preformatted'. When you get the content of the Web page using conga:
The table is not fill with the numbers, I suppose the page needs to be executed inside a Web Browser (Java code that needs to be executed ?). If I try to get the page executed inside a Web Browser:
the displayed HTML should be somewhere inside wb.Document but according to Microsoft 'The Document object needs to be cast to the COM interface you are expecting.' first and I don't know how to do that.
Any help will be appreciated in getting the displayed HTML.
- Code: Select all
rc hdrs response←Samples.HTTPGet 'http://www.lotterypost.com/game/246/results'
The table is not fill with the numbers, I suppose the page needs to be executed inside a Web Browser (Java code that needs to be executed ?). If I try to get the page executed inside a Web Browser:
- Code: Select all
⎕using← 'System.Windows.Controls,WPF/PresentationFramework.dll' 'System,System.dll'
url←'http://www.lotterypost.com/game/246/results'
wb←⎕NEW WebBrowser
wb.Navigate(⎕NEW Uri(⊂url))
wb.Document
mshtml.HTMLDocumentClass
the displayed HTML should be somewhere inside wb.Document but according to Microsoft 'The Document object needs to be cast to the COM interface you are expecting.' first and I don't know how to do that.
Any help will be appreciated in getting the displayed HTML.
-
PGilbert - Posts: 436
- Joined: Sun Dec 13, 2009 8:46 pm
- Location: Montréal, Québec, Canada
Re: Problem Scraping a Web Page
I *think* Dyalog has already done the casting. I can "dot into" the object and see things, e.g.
Lots of names don't return anything, and I have no idea where to locate your HTML, though... But it is probably in there somewhere...
- Code: Select all
'b' wb.Document.⎕NL ¯2
baseUrl bgColor body
wb.Document.bgColor
#ffffff
Lots of names don't return anything, and I have no idea where to locate your HTML, though... But it is probably in there somewhere...
-
Morten|Dyalog - Posts: 453
- Joined: Tue Sep 09, 2008 3:52 pm
Re: Problem Scraping a Web Page
Thanks Morten for your answer. I think also that everything is there but I can't seems to be able to get it out (needs some more casting on a System.__ComObject object). However, I just found that with the use of the WebBrowser object of WindowsForms everything is exposed properly in the Document property:
If anybody finds a better way to Scrap a heavily JavaScript Web Page please let us know on this thread.
- Code: Select all
url←'http://www.lotterypost.com/game/246/results'
⎕USING←'System.Windows.Forms,System.Windows.Forms.dll'
wb←⎕NEW WebBrowser
wb.Navigate(⊂url)
wb.Document ⍝ Need to wait here for the document to load. Otherwise VALUE ERROR
If anybody finds a better way to Scrap a heavily JavaScript Web Page please let us know on this thread.
-
PGilbert - Posts: 436
- Joined: Sun Dec 13, 2009 8:46 pm
- Location: Montréal, Québec, Canada
Re: Problem Scraping a Web Page
It's amazing, whenever I try setting []USING is almost always end up staring at an error of some sort:
Assembly load failed:
Could not load file or assembly 'C:\Program Files\Dyalog\Dyalog APL-64 14.1 Unicode\WPF/PresentationFramework.dll' or one of its dependencies. The system cannot find the file specified.
Could not load file or assembly 'C:\Windows\Microsoft.NET\Framework64\v2.0.50727\WPF/PresentationFramework.dll' or one of its dependencies. The system cannot find the file specified.
Assembly load failed:
Could not load file or assembly 'C:\Program Files\Dyalog\Dyalog APL-64 14.1 Unicode\WPF/PresentationFramework.dll' or one of its dependencies. The system cannot find the file specified.
Could not load file or assembly 'C:\Windows\Microsoft.NET\Framework64\v2.0.50727\WPF/PresentationFramework.dll' or one of its dependencies. The system cannot find the file specified.
-
kai - Posts: 137
- Joined: Thu Jun 18, 2009 5:10 pm
- Location: Hillesheim / Germany
Re: Problem Scraping a Web Page
Hello Kai, using the 'WPF/' sub-directory in ⎕USING to point toward some assemblies of Microsoft is still a mystery to me also because no such sub-directory exists on my computer. But I must say that it has been working for me without problem. I have not seen that same way of doing in the C# community.
It was discussed in a previous post but not resolved. If Dyalog would like to explain that way of doing to reach some Microsoft assemblies that would be welcome.
It was discussed in a previous post but not resolved. If Dyalog would like to explain that way of doing to reach some Microsoft assemblies that would be welcome.
-
PGilbert - Posts: 436
- Joined: Sun Dec 13, 2009 8:46 pm
- Location: Montréal, Québec, Canada
Re: Problem Scraping a Web Page
Pierre,
It does work the same way as all the other subdirectories.
On my machine C:\Windows\Microsoft.NET\Framework\v4.0.30319 is the directory where all the .Net stuff is stored and there is a WPF subdirectory. This path depends on the version of .Net you are using.
With []Using, unless the full file name is given explicitly, Dyalog looks for the dll to be loaded in two places, the subdirectory relative to....
1. the same directory where the .exe is running, if not found there (or there is no subdirectory),
2.in the standard Microsoft .Net library, mine is the one above.
It resolves the relative subdirectory using these two directories for the search. This is why Syncfusion/4.5/....dll works as it is relative to the Dyalog.exe directory and why the WPF/...dll works as it is relative to the Microsoft directory. Adding a WPF subdirectory under Dyalog.exe would allow you to add your own dll (of the same name) to be loaded in place of the Microsoft one if you really wanted to do that.
It does work the same way as all the other subdirectories.
On my machine C:\Windows\Microsoft.NET\Framework\v4.0.30319 is the directory where all the .Net stuff is stored and there is a WPF subdirectory. This path depends on the version of .Net you are using.
With []Using, unless the full file name is given explicitly, Dyalog looks for the dll to be loaded in two places, the subdirectory relative to....
1. the same directory where the .exe is running, if not found there (or there is no subdirectory),
2.in the standard Microsoft .Net library, mine is the one above.
It resolves the relative subdirectory using these two directories for the search. This is why Syncfusion/4.5/....dll works as it is relative to the Dyalog.exe directory and why the WPF/...dll works as it is relative to the Microsoft directory. Adding a WPF subdirectory under Dyalog.exe would allow you to add your own dll (of the same name) to be loaded in place of the Microsoft one if you really wanted to do that.
-
MikeHughes - Posts: 86
- Joined: Thu Nov 26, 2009 9:03 am
- Location: Market Harborough, Leicestershire, UK
Re: Problem Scraping a Web Page
I think the reason why you don't see WPF/ in the C# is that VS is designed to search all the official MS directories (including WPF/) when you look in the assemblies section when trying to resolve the dlls. So they all get listed.
By the time they get added to a VS project the full name has been resolved and is used in properties. In VS if you look at properties for a WPF file, you will see WPF as a subdirectory in the name.
By the time they get added to a VS project the full name has been resolved and is used in properties. In VS if you look at properties for a WPF file, you will see WPF as a subdirectory in the name.
-
MikeHughes - Posts: 86
- Joined: Thu Nov 26, 2009 9:03 am
- Location: Market Harborough, Leicestershire, UK
Re: Problem Scraping a Web Page
Thanks Michael for the education. I can understand now where the WPF sub-directory is located and the logic behind ⎕USING.
Thanks for your time.
Thanks for your time.
-
PGilbert - Posts: 436
- Joined: Sun Dec 13, 2009 8:46 pm
- Location: Montréal, Québec, Canada
8 posts
• Page 1 of 1
Who is online
Users browsing this forum: No registered users and 1 guest
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group