Website Parsing/Retrieval using XML


February 27, 2008 – 9:51 pm by JP

I recently posted links to the HTML and Internet Explorer object references. These are excellent tools to get web data into MS Office for further processing.

Lately I have been reading into XML as it appears to be a much faster way to access web pages than instantiating Internet Explorer. My first experiment into XML a few months ago was a disaster; I chalk it up to jumping in too fast without studying, but often that is the way I learn best.

Here is some sample code showing how to retrieve information (in Excel) from a website using XML. First you will need to set a reference to Microsoft XML, in my case (Excel 2003) it is version 6.0. The filename in Windows XP is c:\windows\system32\msxml6.dll.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Sub TestXML()
' check google.com using xml
Dim xSite As XMLHTTP60
Set xSite = New XMLHTTP60
xSite.Open "GET", "http://www.google.com/", False
xSite.Send
 
Do Until xSite.readyState = 4
Loop
 
MsgBox xSite.getAllResponseHeaders
MsgBox xSite.getResponseHeader("Last-Modified")
MsgBox "Status Text: " & xSite.statusText & vbCr & vbCr & "Status: " & xSite.Status
 
Range("A1") = Left$(xSite.responseText, 100)
End Sub

Please note that as far as I can tell, this will only work in IE 7.

First we set a reference to the XML object, then use the Open method to build a connection string. The Send method actually executes the action and accesses the website. Then we run through some message boxes showing some samples of what you can do with the query.

The first message box shows the hidden response headers from the google homepage. Once you list them, you can single them out, which is what the second message box does. The third message box shows off the Status and StatusText properties, which you can use (along with readyState) to see if a webpage has finished loading before your macro continues.

The last part shows how you can parse the website into an Excel range.

For more information on the properties and methods you can access with XML, check out:
XMLHttpRequest Object

Enjoy,
JP



Print This Post Print This Post  |  Email This Post Email This Post  |  Permalink  |  Subscribe to this feed Subscribe now!

Filed Under: Excel, Internet Explorer, VBA, automation
Tags: , , , ,

This post has 139 views since February 27, 2008 – 9:51 pm.

Post a Comment

Browse Posts:


« Resending Outlook Messages via VBA Code | Data Filter Tool Add-In - Coming Soon »