Wednesday, August 1, 2012

Google PageRank checksum algorithm

The Google PageRank functionality in Google Toolbar works by querying Google's server for information on the PageRank of a specific page. This might seem easy enough to implement in your own program/website, but the problem is that the toolbar calculates a checksum on the page URL before querying the server, and the server only responds if the checksum is correct. Fortunately the checksum algorithm was reverse engineered from Google Toolbar 7. I was provided the hand decompiled version of the algorithm in C from a friend. Then I went ahead and rewrote it in PHP for web development usage. You can find both versions below.

As an example, the query URL for the page 'http://en.wikipedia.org/wiki/Cypherpunk' is http://toolbarqueries.google.com/tbr?client=navclient-auto&features=Rank&q=info:http://en.wikipedia.org/wiki/Cypherpunk&ch=783735859783

Any other query with a checksum other than 783735859783 will result in a '403 forbidden' response.
Enjoy.

C Version (original): PHP Version:

~ Dmitry

3 comments:

  1. Very cool, but i wonder how quickly they modify their checksum now that this is out.

    ReplyDelete
  2. This is the ruby implementation I did for the gem PageRankr: https://github.com/blatyo/page_rankr/blob/master/lib/page_rankr/ranks/google/checksum.rb

    Google hasn't changed the algorithm since I first wrote my implementation, which was in 2010.

    ReplyDelete
  3. AFAIK it's been changed in 2011. http://www.seroundtable.com/pagerank-update-may12-15097.html

    ReplyDelete