Website Parsing/Retrieval using XML
February 27, 2008 – 9:51 pm by JP
I recently posted links to the HTML and Internet Explorer object references. These are excellent tools to get web data into MS Office for further processing.
Lately I have been reading into XML as it appears to be a much faster way to access web pages than instantiating Internet Explorer. My first experiment into XML a few months ago was a disaster; I chalk it up to jumping in too fast without studying, but often that is the way I learn best.
Here is some sample code showing how to retrieve information (in Excel) from a website using XML. First you will need to set a reference to Microsoft XML, in my case (Excel 2003) it is version 6.0. The filename in Windows XP is c:\windows\system32\msxml6.dll.
-
Sub TestXML()
-
' check google.com using xml
-
Dim xSite As XMLHTTP60
-
Set xSite = New XMLHTTP60
-
xSite.Open "GET", "http://www.google.com/", False
-
xSite.Send
-
-
Do Until xSite.readyState = 4
-
Loop
-
-
MsgBox xSite.getAllResponseHeaders
-
MsgBox xSite.getResponseHeader("Last-Modified")
-
MsgBox "Status Text: " & xSite.statusText & vbCr & vbCr & "Status: " & xSite.Status
-
-
Range("A1") = Left$(xSite.responseText, 100)
-
End Sub
Please note that as far as I can tell, this will only work in IE 7.
First we set a reference to the XML object, then use the Open method to build a connection string. The Send method actually executes the action and accesses the website. Then we run through some message boxes showing some samples of what you can do with the query.
The first message box shows the hidden response headers from the google homepage. Once you list them, you can single them out, which is what the second message box does. The third message box shows off the Status and StatusText properties, which you can use (along with readyState) to see if a webpage has finished loading before your macro continues.
The last part shows how you can parse the website into an Excel range.
For more information on the properties and methods you can access with XML, check out:
XMLHttpRequest Object
Enjoy,
JP
Print This Post
|
Email This Post
|
Permalink
|
Filed Under: Excel, Internet Explorer, VBA, automation
Tags: Excel, Internet Explorer, parsing, VBA, XML
This post has 452 views since February 27, 2008 – 9:51 pm.






