OpenOffice.org Forum at OOoForum.orgThe OpenOffice.org Forum
 
 [Home]   [FAQ]   [Search]   [Memberlist]   [Usergroups]   [Register
 [Profile]   [Log in to check your private messages]   [Log in

Webscraping web "GET" queries via dynamic URL

 
Post new topic   Reply to topic    OOoForum.org Forum Index -> OpenOffice.org Code Snippets
View previous topic :: View next topic  

Is Calc more programmable than Excel?
way more
0%
 0%  [ 0 ]
more
50%
 50%  [ 1 ]
same thing only different
50%
 50%  [ 1 ]
about the same
0%
 0%  [ 0 ]
almost
0%
 0%  [ 0 ]
needs more work
0%
 0%  [ 0 ]
features? we don't need no steenkin' features!
0%
 0%  [ 0 ]
Total Votes : 2

Author Message
leebert
General User
General User


Joined: 12 May 2005
Posts: 19

PostPosted: Wed Apr 15, 2009 8:41 am    Post subject: Webscraping web "GET" queries via dynamic URL Reply with quote

Assume you have two sheets, one named "Analysis" and one named "Income"

In the first cell of the Analysis sheet is a stock symbol, e.g.:
"IBM"

In the first cell of the Income sheet is the formula:

="http://stockdata.com/stocks/income?SYMBOL=" & $Analysis.A1 & "&another_arg"

income.A1's value should look like this:
http://stockdata.com/stocks/income?SYMBOL=IBM&another_arg
(n.b. this is not a real URL... you might try some other financial websites with tabular HTML field data Wink

Now call the following macro:

Dim cURL as String
Dim Doc As Object
Dim Sheet As Object
Dim CellRange As Object
Dim Cell As Object

Doc = thiscomponent

' income annual
Sheet = Doc.Sheets.getByName("Income")
CellRange = Sheet.getCellRangeByName("A1:A1")
Cell = CellRange.GetCellByPosition(0,0)
cURL = Cell.String()

CellRange = Sheet.getCellRangeByName("A3:A3")
Cell = CellRange.GetCellByPosition(0,0)

aLinks = ThisComponent.AreaLinks
aLinks.InsertAtPosition(Cell.CellAddress, cURL, "HTML_Tables", "calc_HTML_WebQuery", "")

This should insert the HTML table from the website into the A3:A3 range in the "Income" sheet.


Last edited by leebert on Sat Apr 18, 2009 8:47 am; edited 3 times in total
Back to top
View user's profile Send private message
leebert
General User
General User


Joined: 12 May 2005
Posts: 19

PostPosted: Sat Apr 18, 2009 8:46 am    Post subject: Parsing out HTML links from spreadsheet cell Reply with quote

Related to first webscraping exercise, extracting an internal, unique Company ID# from first fetch of HTML table:

In Cell B33 there'd be a a cell with 4 links (might look like this):
Income Statements | Balance Sheets | Statements of Cash Flow | Financial Ratios

The first link would be:
http:/stockdata.com/info.asp?CompanyID=8528&Info=INCOME&Symbol=IBM&Periods=4

We want to extract that CompanyID 8528, and to do that we have to parse out the http link.

' pick first field of text hash with URL in it, where earlier webquery placed it - say, in sheet $BasicIncome.B33:

Sheet = Doc.Sheets.getByName("BasicIncomel")
CellRange = Sheet.getCellRangeByName("B33:B33")
Cell = CellRange.GetCellByPosition(0,0)

' get data out of text hash with URL in it
' additional HTML links / URLs could be stored in the same cell, as index 2, 3, 4, etc....
CELL_URL = Cell.getTextfields.getByIndex(1).URL

' now store this in the working cell near top of sheet
CellRange = Sheet.getCellRangeByName("A3:A3")
Cell = CellRange.GetCellByPosition(0,0)
Cell.String = CELL_URL ' not a numeric value, so use Cell.String, not Cell.Value


Now we have that URL
(http:/stockdata.com/info.asp?CompanyID=8528&Info=INCOME&Symbol=IBM)
in a proper cell instead of embedded as a HTML link, we can parse out the ComanyID to build other, related, URLs to get more data on CompanyID #8528

That can be done In the spreadsheet by getting the position of the CompanyID GET argument:
=FIND("CompanyID=", A3)+10
...and then use the LEFT() & RIGHT() functions to extract the Company ID value 8528

And build new http GET formatted strings for reports with more annual periods (10 instead of 4).
http:/stockdata.com/info.asp?CompanyID=8528&Info=INCOME&Symbol=IBM&Periods=10

To get 10 yeas of data.

Wink
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    OOoForum.org Forum Index -> OpenOffice.org Code Snippets All times are GMT - 8 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group