Forum Moderators: open
Format1 = www.site.com/FL/Miami/123456.asp
Format2 = www.site.com/page.asp?state=FL&cityid=123456&city=Miami
Googlebot log entry:
2004-01-25 08:55:38 69.44.59.30 GET /page.asp state=FL&cityid=123456&city=Miami
80 - 64.68.82.27 Googlebot/2.1+(+http://www.googlebot.com/bot.html) - 200 0 0 8929 218
Also, in the log entry, shouldn't "/page.asp state=FL" look like "/page.asp?state=FL"?
Thanks
Did Googlebot find a link off your site that was in the "wrong" format?
Not at all, all links are in the following format: www.site.com/FL/Miami/123456.asp
That is why I am very confused, I was under the impression that the rewrite filter would change things server side, and Gbot would never see the page www.site.com/page.asp?state=FL&cityid=123456&city=Miami
Hopefuly some of the isapi_rewrite gurus can shed some light on this issue.
PR is equally misleading. The static pages show decent PR but don't appear in results, while the dynamic pages have zero PR but show up well in many cases. "Spidering" the pages shows everything OK - good URLs, 200 OK headers, etc.
There are a few things I could try, but I'm afraid to get too aggressive lest I lose the decent traffic going to the site now.
I don't think there are many links to the query string pages, and it almost seems as if Google is crawling them from memory.
I use a 404 processor rather than ISAPI_rewrite, but Google continues to index the dynamic URLs.
Hmmm, I might be a little leery entrusting a URI rewrite routine to a 404 processor. But, knowing you, I'm going to assume that you've covered all your bases.
I would definitely look at each step involved with this method. Is it possible that the 404 processor is returning a 302 somewhere along the way?
In my system, I rewrite all URLs to a script file. Googlebot gets the correct pages but the server writes the rewritten filename (the script's name) to the log. I was hoping there might be an isapi-rewrite setting/switch or something to make the requested URL be written to the log. I want to know which URLs Googlebot is requesting.
Nobody got any ideas?
If there isn't a standard way of dealing with it, I suppose I could write the requested URLs to a text file from the script.
[isapirewrite.com...]
Second item from the bottom is about the "U" flag which, apparently, causes the "U"nmangled (requested) URL to be written to the log.
I'm gonna test it.