Tuesday, July 21, 2020

How to scrape for email contacts


Polls might tell you, that, VBA is a dying language. According to PYPL index, Python has the top share with a +3.9% trend. While, VBA is at 14 with 0% movement. But, still in the top 20 beating Ruby at #15. With that said let's do a simple quick project that will prove VBA is still powerful or useful in some aspect.

Web scraping

We have a preferred language and I know, for some, VBA is not one of them. Let's pause for a second and admit that VBA is still potent with web scraping. Although, it still based on a internet explorer library. But, I should say it still perfect for tricking robot.txt to some website in dealing with automated crawling. 

The Goal

We need to do a script that will scrape email addresses. Yes, it's a goldmine for the email marketer. I will attempt to scrape publicly shown contact information. For this demo, I will scrape Students or potentially school staffs email address. The only drawback with all web scraping techniques is the maximum number of attempts allowed for some website. Still relate's to the robot.txt setup. But, you can always delay between calls. If you do bulk fetch. Here are the steps and let me know in the comment section below for your thoughts.
  1. Let's do https://directory.utexas.edu/index.php? It's University of Texas directory
  2. Let's try to fetch contact details for top used names "James". It will return all names with it's corresponding url to individual info page
  3. Scrape the email address and other relevant info
The Script:
That's it, this is just very foundational. You can tweak this script a little bit to fit your requirements. To learn more about VBA, click Buy Now button for the book. Now available in Amazon.

No comments:

Post a Comment