As the number of stocks in my portfolio is increasing I'm spending more and more time on tracking the expected dividends. So I thought why not write a web crawler that does this job for me?

One of the most popular stocks around the DGI community is Realty Income (O), and in this guide I'll illustrate how to crawl their websites in order to get the latest known dividend payment.

When building a web crawler, the first thing to do is find a pattern in the website which enables you to tell a machine what is should extract. For Realty Income this is a table on their dividend payment information site, as illustrated below.

Realty Income dividend table

If you look at the source code of that website, you will notice that this is the only table given on this page. This indicates that we have an unique identifier, which we can build a script to extract. And you will also notice that the first row of that table is always the information about the next dividend.

Therefor it's quite simple: write a script that keeps an eye on that table, and you'll always have updated information about the dividend payments for O.

So let's get to it. As in my guide about getting stock quotes, I'll use PHP as an illustration here, but the logic is very similar in any programming language: you just need to find that unique identifier.


= new DOMDocument();

# Look for our unique identifier: the table
$tables $doc->getElementsByTagName('table');
# Get the rows of that table
$rows $tables->item(0)->getElementsByTagName('tr');

# And loop trough the rows
foreach ($rows as $row)
$cols $row->getElementsByTagName('td');

# Only do something if there is content to be found
if (!empty($cols->length))
# The cell values have additional characters we do not need, 
        # therefore use substr() to get rid of them
        # The dividend per share is given in column 1: item(0)
        # The payment date is given in column 5: item(4)
$dividend_per_share substr($cols->item(0)->textContent1, -2);
$payment_date substr($cols->item(4)->textContent0, -2);
# In this example we just need the first row, so quit        

Well, it's that easy. The variables $dividend_per_share and $payment_date now contains the information we want.

With those variables now being set, you can do whatever you want with them. I compare them to my database, and update it if this is new information. The database then contains update information on the dividend to be received, as illustrated on my expected dividends page. With this list being automatically updated, it means less manually work for me.

Let me know if you have any questions, as this may be somewhat technical. And I'll try to give more example with other stocks in the future.

No comments.

Leave a comment

Providing an email address is required but will not be available to the public. If you have a gravatar associated with that email it will be displayed next to your comment.