Welcome to WebmasterWorld Guest from 54.144.243.34

Forum Moderators: DixonJones & mademetop

%09 in Logfiles

Googlebot requesting strange pages.

   
10:27 am on Nov 30, 2005 (gmt 0)

10+ Year Member



Hi

I've done a quick search, but can't find reference to this. Googlebot has requested hundreds and hundreds of pages from our website with '%09%09%09%09' inserted into the querystring/URL.

At the moment, all of these pages fail. It would be possible for us to capture the requested page, remove all of the '%09' characters and return the resulting page - but we're worried Google might then see our site as having infinite pages.

Has anyone else experience with this? Is there a best course of action?

6:42 pm on Dec 1, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



%09 is the escape character for a tab. Not sure why this would be showing up in a URL. I tried some experiments in plain text html and cannot reproduce this. It could be a parsing error in your code in reference to a link. It could be that G is broke for some reason.

Are you pages plain html pages?

6:45 pm on Dec 1, 2005 (gmt 0)

10+ Year Member



You may want to try this.

[webmasterworld.com...]

12:46 pm on Dec 2, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Your link tells of the problem. I wonder what causes it. What platforms and servers you guys running. Linux/Apache?
2:39 pm on Dec 2, 2005 (gmt 0)

10+ Year Member



Hi and thanks for the responses....

We're running on a microsoft server platform. caspita isn't from what I've read, so I guess it's not particularly platform related.

We do run a dynamic site, but we're actually returning .htm pages with querystrings. The %09%09...'s are inserted into the querysting. I'm thinking that it is either Googlebot running errors, or that Googlebot is using this as a technique to check we don't return an infinite number of pages (i.e. we don't return a page for any querystring, it has to follow specific rules).

For now, unless we hear a reason otherwise, we're going to allow the site to error and return a 404 page when these pages are requested, rather than fix it.

Incidentally, we're 99% certain we have no links with (this many) tabs in the querystring from within our site, so we think Googlebot has just 'made up' these pages.

Interesting though.....

4:23 pm on Dec 2, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yep,

I think returning 404 pages is the best solution. Theoretically, the pages should be dropped eventually. But I have been having a hard time with old pages still showing in the SERPs lately

Good Luck!

3:19 am on Dec 7, 2005 (gmt 0)



webmaster99:

It would not surprise me one bit if Google did not test dynamic/server-gen pages further than standard, no offense to you, I am sure if Google does it, it does it for some reason likely related to influencing search results / spam or thereabouts.

webdude:
I always take my old pages and leave them on my server but I re-code them to redirect the user AND the se's to the new page as follows:
oldpage.htm code:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 3.2//EN">

<html>
<head>
<meta name="robots"
content="noindex,follow">

<title></title>
<script language="JavaScript"
type="text/javascript">
{location.replace('http://yourdomaine.com/newpage.htm');}//-->
</script>
</head>

<body>
</body>
</html>

Then upload oldpage.htm and leave it in place, I got pages changed 3-4 years ago but I leave them because for some reason, certain se's / links have never updated and still from time to time a visitor is sent to oldpage.htm (which now kindly and immediately redirects the visitor to newpage.htm).
And, I think it's more user-friendly than a 404, there are SOME pages which I can not re-direct to the new page (because there IS no new page), some those I re-direct to the MAIN page, the few that are left get a custom 404 with several links to the main parts of my site.

Peace out

8:57 pm on Dec 7, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



topsites

I think you misunderstood me. I get the redirect stuff. I do it all the time. I was talking about getting the old pages out of the SERPs. I have pages in there from 2002 that haven't been accessed or viewed. I would like to get rid of those. I've tried just about everything. It's a G supplemental problem.

 

Featured Threads

My Threads

Hot Threads This Week

Hot Threads This Month