Welcome to WebmasterWorld Guest from 107.20.34.173

Forum Moderators: incrediBILL & martinibuster

Message Too Old, No Replies

Crawler script that verifies Adsense publisher ID

     

moheybee

11:50 pm on Feb 3, 2010 (gmt 0)

10+ Year Member



I found a script called Adsense Code Checker that scans your files via FTP to check if your publisher ID has been swapped out with another one.

I'm curious if anyone knows of a script that could be run via cron and would crawl a specified number of pages (or specific URLs) to see if a publisher ID was different than the one specified (instead of scanning the files directly via FTP).

I know it wouldn't necessarily have to be a script directly written for this purpose, but in my searches so far I haven't come up with anything that would work.

(I know if someone unauthorized has access to your website files and database, then Adsense publisher ID should be the least of one's concerns, but this is still important for reasons I'd rather not explain)

incrediBILL

1:23 am on Feb 4, 2010 (gmt 0)

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



yes, a single server command line called "grep" in Windows or Linux will scan all your files and easily show you files that don't contain your pub ID.

For instance:

grep "pub-" -i *.html | grep "pub-your.id.here" -iv

The first grep scans all html files for any pub ID and pipes the data into a second grep that only displays pub IDs if they aren't yours.

That's all it takes to find out.

grep "pub-" -i *.html | grep "pub-your.id.here" -iv | wc -l

Add "wc" (word count) on the end and it'll tell you how many occurrences so just checking to see if the result is 0 is all you need to see it's OK.

You could call this from PHP or a cron job once a day and email the results or set it to send email to a pager alarm if the value > 0.

moheybee

6:18 pm on Feb 4, 2010 (gmt 0)

10+ Year Member



Thank you for your post incrediBILL.

I don't think I was clear enough, I'm looking for a script that actually crawls URLs on a website (either a specified list of URLs or a list built as the time of crawling) and checks for publisher IDs.

I have multiple cases where a publisher ID is stored in a MySQL database (ad system, forum, etc.) and would prefer to have the check done via the front end HTML actually served.

Anyone have any ideas?

incrediBILL

7:26 pm on Feb 4, 2010 (gmt 0)

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



I'm not aware of anything specific but there are a ton of Perl and PHP crawler scripts out there that will do the crawling for you, just make sure you limit the actual crawl to your domain.

A simple bit of code inserted in a full site crawler can check each page returned for pub IDs that aren't yours and display an error or send an email.

If you want to do something much simpler look into site alarm monitors as many of them not only alarm you if your site is down, but can also alarm you if certain things don't appear in the HTML, such as your pub-ID. You pick the pages to monitor as I'd suspect your top 10 pages would be enough for a quick sanity check.

multiple cases where a publisher ID is stored in a MySQL database

Then you don't need to crawl the web site, you need to write a simple query to make sure only your Pub IDs exist, way easier than crawling a website.
 

Featured Threads

Hot Threads This Week

Hot Threads This Month