Forum Moderators: open

Message Too Old, No Replies

Cloaking Script

explain

         

drewst

8:14 pm on Apr 26, 2005 (gmt 0)

10+ Year Member



Ive written a script to log google search key words which are stored in a database for later consultation.
assume referer is the $_SERVER['HTTP_REFERER']

$url = strtolower(urldecode($referer));

if (eregi("www\.google+(\.[a-z]{2,3})+[/\]+search",$url))
{
preg_match("'(\?¦&)q=(.*?)(&¦$)'si", " $url ",
$keywords);
$this->referer = "Google";
}

$keys = substr($url, strpos($url,"q="));
$keys = substr($keys,2);

if (strpos($keys,"&"))
$keys = substr($keys, 0,strpos($keys,"&"));

$keywords = urldecode($keys);

echo $keywords;

simply would this be detected as cloaking by google bot?

Thanks

Drew

volatilegx

10:07 pm on Apr 26, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I think not.

drewst

10:11 pm on Apr 26, 2005 (gmt 0)

10+ Year Member



the only reason i ask is when i was researching the issue i found some posts that suggested it may cause a problem when a bot visits, i didnt think so myself but i felt i shuold check

Drew

MrSpeed

2:52 am on Apr 27, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I have a question about the code.
What are you doing in the IF statement ? It looks like it's true if the referer was google but you really don't do anything with that info do you?

Maybe this was just well written pseudo code?

It also interesting how you have some really cool regular expressions but then switch over to strpos and substr function to get the query terms.

I code by the seat of my pants and I am happy when things work no matter how I got there, so I'm just trying to learn a bit here.

drewst

9:17 am on Apr 27, 2005 (gmt 0)

10+ Year Member



forst of all regular expressions are used because you dont know the exact format of the referer

www.google.com/search
www.google.co.uk/search

etc...

if we detect its a google search referer then we know the keywords will be between

q= ...keywords... &

for this reason we can stip eveyghin of left and right of it!

to be honest that keyword code should be within the if statement, i think that is what you referreing to but i jusdt copied and pastes some of my test code, your rite it should be in the if.

the result is the searchengine name and the keywords

Is this clearer

Drew
ill change the code above

drewst

9:20 am on Apr 27, 2005 (gmt 0)

10+ Year Member



if (eregi("www\.google+(\.[a-z]{2,3})+[/\]+search",$url))
{
preg_match("'(\?¦&)q=(.*?)(&¦$)'si", " $url ", $keywords);

$this->referer = "Google";

$keys = substr($url, strpos($url,"q="));
$keys = substr($keys,2);

if (strpos($keys,"&"))
$keys = substr($keys, 0,strpos($keys,"&"));
}

drewst

11:18 am on Apr 27, 2005 (gmt 0)

10+ Year Member



Actually a better expression is this

eregi("www\.google((\.[a-z]{2,3}){1,2})[/\]search",$url)

the earlier code was just a bit of a bash :)
this expression allows

www.google.co.uk = OK
www.google.com = OK

www.google.coms = IVALID greater than 3 char extension
www.google.co.uk.uk = INVALID more than 2 extesions

so final code looks a bit like this

if (eregi("www\.google((\.[a-z]{2,3}){1,2})[/\]search",$url))
{

//check we have some keywords first or at least structure for them
//starts with? or & then a character followed by as many chars as you like ending in & or $
//use [0] for all [1] for & [2] for keys [3] for &

(preg_match("'(\?¦&)q=(.*?)(&¦$)'", " $url ",$keywords))?$this->keywords= urldecode $keywords [2]):$this->keywords="Unknown Keywords";

$this->referer = "Google";
}

feedback would be great

Thanks

Drew

MrSpeed

11:50 am on Apr 27, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member




you dont know the exact format of the referer
www.google.com/search
www.google.co.uk/search

I would not have though of that. I probably would have tested for strpos($url,"google.com/search?")
That's why I'm just a hack :)

The reason I'm kind of interested in this is that I was just about to write some code to do exactly the same thing on raw log files.

I was going to sort the search terms by engine and then by the number of times each term was searched.

I have looked around the web and have yet to see any free scripts that will do this basic task.

I have a few scripts that do some very specific things that others may be interested in but I'm afraid to put them on the web because I'll be the laughing stock of the coding community. It sometimes takes me 10 lines to do what others can do with one.

drewst

12:05 pm on Apr 27, 2005 (gmt 0)

10+ Year Member



mate im in the middle of writing a fully functional web logger/tracker, i have a few posts around so check them out.

Im attempting to produce a php class that is far superior to anything out there with a few hidden gems :) and all open source. Unfortunately with php being server side its difficlut getting info about the users enviro, i can get browser, os etc.. but things liek screen res etc.. is difficult. I have a few options that i am investigating which avoid the use of javascript.

at presnt i can detect 20 different search engines and identify a handful of bots/spiders

Feel free to give me a hand with the code, i need all the help i can get :)

any features you would like to see in it just let me know ill be happy to include them

Drew