homepage Welcome to WebmasterWorld Guest from 54.197.94.241
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Google / Google AdSense
Forum Library, Charter, Moderators: incrediBILL & jatar k & martinibuster

Google AdSense Forum

    
Crawler script that verifies Adsense publisher ID
moheybee




msg:4073946
 11:50 pm on Feb 3, 2010 (gmt 0)

I found a script called Adsense Code Checker that scans your files via FTP to check if your publisher ID has been swapped out with another one.

I'm curious if anyone knows of a script that could be run via cron and would crawl a specified number of pages (or specific URLs) to see if a publisher ID was different than the one specified (instead of scanning the files directly via FTP).

I know it wouldn't necessarily have to be a script directly written for this purpose, but in my searches so far I haven't come up with anything that would work.

(I know if someone unauthorized has access to your website files and database, then Adsense publisher ID should be the least of one's concerns, but this is still important for reasons I'd rather not explain)

 

incrediBILL




msg:4073991
 1:23 am on Feb 4, 2010 (gmt 0)

yes, a single server command line called "grep" in Windows or Linux will scan all your files and easily show you files that don't contain your pub ID.

For instance:

grep "pub-" -i *.html | grep "pub-your.id.here" -iv

The first grep scans all html files for any pub ID and pipes the data into a second grep that only displays pub IDs if they aren't yours.

That's all it takes to find out.

grep "pub-" -i *.html | grep "pub-your.id.here" -iv | wc -l

Add "wc" (word count) on the end and it'll tell you how many occurrences so just checking to see if the result is 0 is all you need to see it's OK.

You could call this from PHP or a cron job once a day and email the results or set it to send email to a pager alarm if the value > 0.

moheybee




msg:4074461
 6:18 pm on Feb 4, 2010 (gmt 0)

Thank you for your post incrediBILL.

I don't think I was clear enough, I'm looking for a script that actually crawls URLs on a website (either a specified list of URLs or a list built as the time of crawling) and checks for publisher IDs.

I have multiple cases where a publisher ID is stored in a MySQL database (ad system, forum, etc.) and would prefer to have the check done via the front end HTML actually served.

Anyone have any ideas?

incrediBILL




msg:4074515
 7:26 pm on Feb 4, 2010 (gmt 0)

I'm not aware of anything specific but there are a ton of Perl and PHP crawler scripts out there that will do the crawling for you, just make sure you limit the actual crawl to your domain.

A simple bit of code inserted in a full site crawler can check each page returned for pub IDs that aren't yours and display an error or send an email.

If you want to do something much simpler look into site alarm monitors as many of them not only alarm you if your site is down, but can also alarm you if certain things don't appear in the HTML, such as your pub-ID. You pick the pages to monitor as I'd suspect your top 10 pages would be enough for a quick sanity check.

multiple cases where a publisher ID is stored in a MySQL database

Then you don't need to crawl the web site, you need to write a simple query to make sure only your Pub IDs exist, way easier than crawling a website.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google AdSense
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved