Using XPath to select content in an XML document to scrape for SEO is nothing new, but traditionally SEOs are doing it within Google Docs for simplicity and ease of use. There are some limitations though, which includes a 50 ImportXML limit per spreadsheet and the fact that it’s not done in-line while browsing. I’ve been playing around with a Google Chrome extension called Scraper which allows you to scrape content in-line while browsing.
Let’s walk though some examples of how this is awesome.
Prospecting Guest Posts
Let’s say we’re quickly trying to find guest post opportunities for a food related site.
To do this, I search for food inurl:”write for us” and show 100 results per page (you have to turn off Google Instant for this).
Step One – Advanced Search Query
Step Two – Select and Right Click Listing
Select “Scrape Similar” in the menu and the extension will find the XPath to the selected content and extract it and the repetitive elements similar to it.
Step 3 – View Output
At this stage, you can make edits to the XPath and remove fields of data that have been extracted. You can also define presets for frequently used XPath. The extension does a fair job selecting the content correctly, but depending on the markup of the page, you may need to edit the XPath to select the right text. In this example, it did it perfect without correction.
Step 4 – Export to Google Docs, FTW
From here, it’ll send it directly into Google Docs where you can mash up with the SEOmoz API or other data.
7 More Examples
#1 An Alltop Scraper
You can scrape the curated blog lists at Alltop, such as this huge list of marketing blogs.
It looks like this, in a matter of seconds.
#2 Scrape WordPress Blog Post Comments
Let’s say I wanted to quickly contact everyone who left a comment on a post on Outspoken Media’s blog, such as my link building personas post.
I had to make a quick edit to the XPath so it didn’t select the the comment anchor URL.
Run this on your guest posts or the comments being left on your competitor’s site.
(You’ll likely have to customize it per blog if the extension doesn’t get it automatically.)
#3 Blog Directory Scraper
Need a quick list of 102 gaming blog? Just head over to the BOTW Blog Directory.
A little edit to the XPath: //div/ul/li/a/@href
And in a few seconds a spreadsheet list of URLs to 102 gaming blogs.
#4 Link Placement and Buys
Similar to the guest post search, but try these in Google and scrape.
inurl:edu alumni discount code
inurl:sponsor intitle:sponsors seattle
#5 Followerwonk Scraper
A quick search for zombie on Followerwonk.
A little Xpath: //div/table/tbody/tr/td/a
#6 Tumblr Submit Scraper
Looking to launch content on Tumblr?
#7 A Google Plus Profile Scraper
Looking for food bloggers on Google Plus?
Right Tool, Right Job
This doesn’t replace the benefit of doing some XPath within Google Docs, since you can do scripting and iterate on imports. However, I really like this tool so far. It can do a lot very easily and very quickly.
It does have some bugs and I’ve had to restart my browser a few times because it can stop working.
If you have any other ideas on how it could be used, be sure to drop a comment below.