Website Parsing/Retrieval using XML


February 27, 2008 – 9:51 pm by JP

I recently posted links to the HTML and Internet Explorer object references. These are excellent tools to get web data into MS Office for further processing.

Lately I have been reading into XML as it appears to be a much faster way to access web pages than instantiating Internet Explorer. My first experiment into XML a few months ago was a disaster; I chalk it up to jumping in too fast without studying, but often that is the way I learn best.

Here is some sample code showing how to retrieve information (in Excel) from a website using XML. First you will need to set a reference to Microsoft XML, in my case (Excel 2003) it is version 6.0. The filename in Windows XP is c:\windows\system32\msxml6.dll.


Sub TestXML()
' check google.com using xml
Dim xSite As XMLHTTP60
Set xSite = New XMLHTTP60
xSite.Open "GET", "http://www.google.com/", False
xSite.Send

Do Until xSite.readyState = 4
Loop

MsgBox xSite.getAllResponseHeaders
MsgBox xSite.getResponseHeader("Last-Modified")
MsgBox "Status Text: " & xSite.statusText & vbCr & vbCr & "Status: " & xSite.Status

Range("A1") = Left$(xSite.responseText, 100)
End Sub

Please note that as far as I can tell, this will only work in IE 7.

First we set a reference to the XML object, then use the Open method to build a connection string. The Send method actually executes the action and accesses the website. Then we run through some message boxes showing some samples of what you can do with the query.

The first message box shows the hidden response headers from the google homepage. Once you list them, you can single them out, which is what the second message box does. The third message box shows off the Status and StatusText properties, which you can use (along with readyState) to see if a webpage has finished loading before your macro continues.

The last part shows how you can parse the website into an Excel range.

For more information on the properties and methods you can access with XML, check out:
XMLHttpRequest Object

Enjoy,
JP


If you enjoyed this page, share it!
    StumbleUpon Technorati Digg Google del.icio.us MisterWong TwitThis

Print This Post Print This Post  |  Email This Post Email This Post  |  rss Subscribe to Posts Feed  |  rss Subscribe to Comments

Filed Under: Excel, Internet Explorer, VBA, automation
Tags: , , , ,

Post a Comment


Certain comments (including first-time comments) are subject to moderation and will not appear immediately. You can use HTML tags in your comment. If you include a greater-than or less-than sign or anything else that could be interpreted as HTML, you need to escape those characters. To post VBA code in your comment, use [VBA] tags, like this: [VBA]Code goes here[/VBA].





Subscribe without commenting

Keep Reading:

Browse Posts:


« Resending Outlook Messages via VBA Code || Data Filter Tool Add-In - Coming Soon »