Google query with curl - PHP Server Side Scripting forum at WebmasterWorld - WebmasterWorld

Forum Moderators: coopster

Message Too Old, No Replies

Google query with curl

They're detecting the effort?

cameraman

2:16 am on Mar 25, 2007 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

For some reason my pages are flying in and out of the supplemental index daily and it's making me jump all over the place for my most important search phrase. Not that it does any good, but I keep wanting to find myself - it just drives me batty not knowing where I am.

So last night I wrote a script that is supposed to submit the phrase, look through the results page by page, and finally show me the page that has me on it.

What happens instead is no matter what I supply for the starting point, the first page comes back and they've fooled with the links - the navigation links at the bottom contain the domain name which is doing the submission instead of 'google.com'.

It occurred to me after triple-checking my script for errors that I could see how google might take steps to keep people from doing such a thing - while my intentions are honorable, I could see how someone could use the methodology to display search results sans advertisements.

Do you think that this is the case - can they detect that the query is via curl, they don't like it, and it's hopeless? I think I can accomplish the same thing with javascript, but JS isn't my 'native tongue' so I'd prefer a php solution. This is my first foray into curl so I don't mind tweaking options etc. if there's hope, but I don't want to waste my time if they're going to keep shutting me down.

capulet_x

2:46 am on Mar 25, 2007 (gmt 0)

10+ Year Member

Could this be related?

[googlewebmastercentral.blogspot.com...]

cameraman

3:02 am on Mar 25, 2007 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

I don't think so - I'm trying to submit a search through automation, not influence one. For example, my script sends out:
http: //www.google.com/search?q=bright+blue+widget&hl=en&start=10&sa=N
so that the script can look through the resulting page for:
www.example.com.
If it doesn't find it, it advances the start number by 10 and looks again. When you do a search, the numbered links at the bottom have that syntax, but the links that my script gets back are instead:
http://www.example.com/search?q=bright+blue+widget&hl=en&start=10&sa=N
and no matter what start=, I get the first page for the query, not the one requested.

What they're talking about in the article is when a page is crafted for their bot to read that winds up influencing other searches.

joelgreen

10:50 pm on Mar 25, 2007 (gmt 0)

10+ Year Member

they could put some simple checks for bots. Does your identify itself as a browser? I mean does it set user agent string? Google could also check for other things, which are "must have" for usual browser.

cameraman

6:27 am on Mar 26, 2007 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

I figured it out - I'm a dingaling. All they were doing is using relative urls, so when the document landed at the browser, it added the domain. I just had to turn them into absolute urls before sending to the browser.

Yes, I'm passing the agent along from $_SERVER.

Thanks for the replies, capulet_x and joelgreen.

justageek

7:50 pm on Mar 26, 2007 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Just to close out with a few thoughts on the original question...If you smack Google enough times they will replace the SERP page with a page that says somthing along the lines of "we noticed a lot of activity from your machine so you make want to check it for a virus". It is IP based and after a certain amount of time you can see the SERP again.

But you have to hit them pretty hard to get this to show up so most people will never see it.

JAG