Data scraping is a great skill for any technical marketer to have in their tool belt. If you’re at an early stage startup and don’t have an outbound call team, this tactic is a great way to test the waters with outbound B2B lead generation and put your scraping skills to work for your company.
Tools required: Link prospector and URL Profiler.
For our example, we’re trying to sell people on a product targeted at the restaurant industry.
For this example, we just want to test in the land that I call home: California.
First, we need to find a list of cities in California, Luckily This list from Wikipedia is great,
For the purposes of this blog post, I scraped it for you, but if you call yourself a hunter, then you need to learn how to fish. For tasks like this, where you are grabbing rows of well structured HTML – just use the Scraper Chrome extension. Here’s a video that shows how simple it is to use this chrome extension to scrape that list of California cities. It’s not always that easy, and if you want to be a pro, learn ImportXML in Google Docs.
We are going to use Link Prospector, a tool that scrapes Google results to get our lead data. But first we need to feed it with search terms. Just scraping for [city name] CA “restaurant” will get a lot of general results and Yelp listings. By going niche, and creating a list of restaurant types (chinese, greek, sushi etc.) to append to the search queries, we can get a collection of much more diverse leads.
I couldn’t find a good list of restaurant types, so i used some common sense and compiled this list.
Here’s how I like to set up my spreadsheets to build out the search queries, feel free to explore the spreadsheet for yourself and take a copy for your records 🙂
We’ve created a database that contains every permutation of city name and major restaurant type in the state of California… rock on.
Link Prospector is a fantastic tool with limitless uses. At its core, it allows you to scrape Google results in bulk. Custom reports in Link Prospector can take up to 1000 search terms at a time, and you can get pretty specific with how you set up your scraper, as seen in the screenshot below:
Drop in 1000 queries from the concatenated list, and let the report run. This is a ton of scraping – it takes some time depending on what else is in the queue. It may take up to 36 hours to get a report back.
Now it’s time to trim the fat.
Not everything that comes back from Link Prospector is going to be perfect. Looking at our report, we can see there are going to be a few sites that pop up a bunch due operating directories that list restaurants.
While it is possible to extract phone numbers these types of sites, you should not scrape the same domain hundreds of times from the same IP address. You could get caught and your IP could get blacklisted. To clean up the data, select export domains from the finished Link Prospector report.
Go through the link prospector export and remove any domain that had a bunch of listings due to being a directory, or business platform (like Yelp, Seamless, GrubHub etc). If you know you’re vertical (you should) this shouldn’t take more than two minutes per export. Here’s a video showing my process, it’s stupid simple. Just refer back to the export list for results with lots of listings.
If you are going after SMBs, sort the results by column D, which is a measurement of the home page PageRank. Most sites above a PR 4 will not be local restaurants. Don’t blindly delete everything above a PR 4, but pay special attention to high PR sites, they will most likely have switchboards and gatekeepers.
Now that you have a cleaned up data set, let’s automate phone number extraction.
Open URL Profiler, and copy all the URLs from the Link Prospector export that you want to run. Right click and select paste on the white box in the “URL List” section.
Now Let’s setup the custom scraper for phone number and title tag extraction. The title tags are key, they are going to be used later to inform your outbound callers of the lead quality, relevance and restaurant type. Some bad results will make it on this list (Google isn’t perfect) so be sure to let your team know it’s okay to reject some of the leads on this list.
In the custom regex section, use this formula (this only works in the US):
1?\W*([2-9][0-8][0-9])\W*([2-9][0-9]{2})\W*([0-9]{4})(\se?x?t?(\d*))?
There’s bound to be some duffy results in this list, so to inform your outbound folks, let’s pull the title tag as well
/html/head/title
2400 results took my computer about 7 minutes to scrape.
Open up the export and delete columns A, C-P, so you are left with something like this
I find it easier to clean up extra characters in a Google Doc, so let’s copy and paste into a fresh sheet.
Do find and replace, with the replacement being nothing, for the following characters:
; , = . < > ? . / ) ( -|#$@+][
Replace each character one at a time and remember to replace a blank space to fill in any gaps in between the numbers.
Sometimes you’ll get errors in the google doc due to an equals sign getting scraped at the beginning of the phone number. To tackle this, I sort column B by values and then fix by hand. Here’s the finished list. Not every result is a fit, there are some irrelevant results in the mix — but there are a ton of good leads in that spreadsheet.
Let’s make those phones ring!
But first, make sure to develop a script for yourself or your team. Outbound calls are a different beast from inbound phone leads. But if you’re business sees even some success with this tactic, it may be worth investing in. If you aren’t sure if outbound calls are right for your business, using Link Prospector and URL Profiler is a great way to determine if it’s viable.
Happy cold calling 🙂
Photo Credits: