Forum Moderators: open
[google.com...]
I can access this xml document directly within ie5. I assume other browsers support xml too. It contains lots of cool info in plan text, including the page rank. The only problem is the ch variable seems to be some type of redundant encrpytion of the url. In other words...you have to know the correct ch to get the xml document. It might also be encrypted to your specific ip, so Im not sure that you will be able to access my page. Anybody know how to generate the ch?
[google.com...]
in which "www.mydomain.org" is the PageRank you want, you will get back this from Google (all angle brackets were changed to braces):
{?xml version="1.0" encoding="ISO-8859-1" standalone="no" ?}
{!DOCTYPE GSP (View Source for full doctype...)}
- {GSP VER="3.1"}
{TM}0.159834{/TM}
{Q}info:http://www.mydomain.org/nsearch.html{/Q}
- {RES SN="1" EN="1"}
{M}0{/M}
- {R N="1" L="1"}
{U}http://www.mydomain.org/nsearch.html{/U}
{T}MyDomain Name Search{/T}
{RK}6{/RK}
{S}MyDomain name search. If you can't spell somebody's name, use{br} your best guess for their last name only: Last name only: {b}...{/b}{/S}
- {HAS}
{L TAG="link:" /}
{C SZ="3k" TAG="cache:" /}
{RT TAG="related:" /}
{/HAS}
{/R}
{/RES}
{/GSP}
The PageRank is between the {RK} and {/RK} -- in this case, it's a 6.
You can see that the title and the first sentence on the page also come back. If the page is in the ODP directory (not the case in this example) this info also comes back from Google, with the category that it is in.
However, there's a catch that makes it more complex. You need the "ch=0123456789" in the query string. It appears to be a ten-digit checksum based on the domain name you are requesting. If the number does not match with that domain, from which it is apparently generated within the toolbar code, you get a "not authorized" message in Explorer instead of the above information.
Writing a script would require knowing how this 10-digit checksum is generated. You'd have to collect a bunch of domains and checksums, and try to see if there's a pattern. It might be a simple checksum, or it might even be some sort of one-way hash.
I don't think it's worth the effort for a single-digit PageRank.
As doofus says the calculations necessary make this a tricky task, while its definately cool to see the data being recieved theres probably limited scope for automating it :(
I suspect as soon as the checksum algo was decoded it could be changed anyway since the googlebar is self updating, it would be a moving target.
Anyways, Im sure there is some method of stepping through the software with a debugger or whatnot to determine the checksum method, but just hacking it out by looking at patterns seems unlikely. Oh well.
Nobody do a little bit of assembly here ?
I learned everything I need to reverse engineer the toolbar and rip the checksum function in an afternoon. It's so easy. In fact there is no real
protection in this toolbar you know.
If you are afraid of assembly, you can put a proxy between your MSIE and the web and script MSIE to issue queries.
Your script request an URL via MSIE.
The toolbar detect the "openurl" event and launch a request to google's backend throu your proxy.
You can do everything you want thank to your proxy. Either store the url with the valid checksum for later retrieval or get the response data...
Hmm I remember that now there is some more thing to find out such as the encoding for the timestamps, look at it for 10 seconds and you will find that it's a kind of uuencoding.
And now what ?
You build a tool to cheat at google, google will do it's best to catch you and the cycle continue ad nauseum ...
And one day, to avoid bancrupcy google will let people pay to have their result at the top of SERP. The guy with more dollars win once again.
What a nice world.
Take a break, take a deep breath, relax and go design a nice and usable website with interesting content, play by the rules and you will have traffic !
While building the links (which obviously takes a hell of a long time), it is useful to check your PR to see how it is building, and to what level you are at compared to your competitors.
It is boring enough building links, let alone checking everypage on your website for the PR. Just wanted a little time saver in what is (when we're honest), a rather monotonous marketing position!!
(Apologies of you thought I was trying to cheat it somehow)
I know this isn't what people had in mind, but anything else will probably get you a letter from Google's attorneys (if you don't stop when they ask you nicely).
Gee maybe if business slows down - I could open "Chris_R's PR Checking Service". For only $19.99 a month I will check 100 pages for you - give you their PR and reverse link count on a nice excel spreadsheet.
It's not too surprising that the algo was changed. What's more surprising is that Google cleverly does not return an error message for PageRank queries coming in that use the obsolete checksum. Instead of an error message, you get bogus PageRank values. These values are typically plus or minus two complete digits on the 0-10 scale. Sites that were a 7 might be a 9. One site that was an 8 became a 10.
This is the famous Google sense of humor at work.
Since the toolbar is self-updating, the checksum algo can be made a moving target. Anyone who goes to all the trouble to decompile and analyze the algo, still has to keep checking with the latest toolbar in Explorer, to make sure the PR values coming back are not bogus due to a change on Google's end. Whatever clever program anyone writes after cracking the checksum algo will not be self-updating from Google, I presume.
None of us likes using Explorer with the Google toolbar. But Google makes the rules, and Google finds ways to make us play by their rules.