homepage Welcome to WebmasterWorld Guest from 174.129.103.100
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Microsoft / Microsoft IIS Web Server and ASP.NET
Forum Library, Charter, Moderators: ocean10000

Microsoft IIS Web Server and ASP.NET Forum

    
ISAPI_rewrite - wrong URLs crawled
defanjos




msg:947655
 5:00 pm on Jan 25, 2004 (gmt 0)

I am using isapi_rewrite and wrote a rule to have all urls in format1 be served by format2 (see below).
The problem is when googlebot came around it "saw" format2 and not format1 like I wanted, what did I do wrong?
All links to the pages are like <a href="http://www.site.com/FL/Miami/123456.asp">Link Text</a>, and the browser window shows format1 once you go to the page.

Format1 = www.site.com/FL/Miami/123456.asp

Format2 = www.site.com/page.asp?state=FL&cityid=123456&city=Miami

Googlebot log entry:

2004-01-25 08:55:38 69.44.59.30 GET /page.asp state=FL&cityid=123456&city=Miami
80 - 64.68.82.27 Googlebot/2.1+(+http://www.googlebot.com/bot.html) - 200 0 0 8929 218

Also, in the log entry, shouldn't "/page.asp state=FL" look like "/page.asp?state=FL"?

Thanks

 

mattglet




msg:947656
 6:12 pm on Jan 25, 2004 (gmt 0)

not sure what the answer to your first problem is, but i know that referrer logs don't usually show the? in the URI. what you're seeing is common.

-Matt

defanjos




msg:947657
 8:00 pm on Jan 25, 2004 (gmt 0)

Thanks Matt.
Never noticed before the "?" was not in the logs.

tedster




msg:947658
 8:13 pm on Jan 25, 2004 (gmt 0)

Did Googlebot find a link off your site that was in the "wrong" format? (or is that a dumb question, maybe?)

I'm just preparing to get into my first URL rewrite project on a Windows server, so this is a vital issue for me.

defanjos




msg:947659
 8:42 pm on Jan 25, 2004 (gmt 0)

Did Googlebot find a link off your site that was in the "wrong" format?

Not at all, all links are in the following format: www.site.com/FL/Miami/123456.asp

That is why I am very confused, I was under the impression that the rewrite filter would change things server side, and Gbot would never see the page www.site.com/page.asp?state=FL&cityid=123456&city=Miami

Hopefuly some of the isapi_rewrite gurus can shed some light on this issue.

defanjos




msg:947660
 8:55 pm on Jan 25, 2004 (gmt 0)

Actually let me rephrase something, the links look like:
<a href="FL/Miami/123456.asp">Link Text</a>

I wonder if that is causing the problem, I am going to change them to:
<a href="http://www.site.com/FL/Miami/123456.asp">Link Text</a>

tedster




msg:947661
 9:46 pm on Jan 25, 2004 (gmt 0)

I meant to ask if there might be an inbound link to your pages, like on a forum or link partner somewhere, that was in the wrong format. However, I didn't state my question very well.

defanjos




msg:947662
 9:52 pm on Jan 25, 2004 (gmt 0)

I meant to ask if there might be an inbound link to your pages, like on a forum or link partner somewhere, that was in the wrong format

No, these pages are a couple of weeks old, and other than me, no one knows of their existence.

f00sion




msg:947663
 9:05 pm on Jan 26, 2004 (gmt 0)

googlebot saw the correct links but iis is serving up page.asp with all the parameters, which is what it is recording in your log files. as long as there are no links anywhere to the unfriendly uri's then google will never be the wiser.

pageoneresults




msg:947664
 9:13 pm on Jan 26, 2004 (gmt 0)

I'd also recommend that you get into the habit of using Absolute URIs in any rewriting routine. I've seen some ill effects occur when using Relative URIs.

Absolute = http*://www.example.com/sub/file.asp
Relative = /sub/file.asp

defanjos




msg:947665
 9:45 pm on Jan 26, 2004 (gmt 0)

f00sion,

googlebot saw the correct links but iis is serving up page.asp with all the parameters, which is what it is recording in your log files.

That makes a lot of sense, thanks.
I feel better now.

Pageoneresults,
Thanks, I'll try that.

pageoneresults




msg:947666
 10:06 pm on Jan 26, 2004 (gmt 0)

Just to be on the safe side, I would add a Disallow: in your robots.txt file...

User-agent: *
Disallow: /page.asp

rogerd




msg:947667
 10:24 pm on Jan 26, 2004 (gmt 0)

Be cautious with this. I've got a site with a similar situation. I use a 404 processor rather than ISAPI_rewrite, but Google continues to index the dynamic URLs. Virtually all linkage is to the "static" URLs, hence the relative PR of the static pages should shove any duplicate content on query string URLs out. That hasn't happened yet, though.

PR is equally misleading. The static pages show decent PR but don't appear in results, while the dynamic pages have zero PR but show up well in many cases. "Spidering" the pages shows everything OK - good URLs, 200 OK headers, etc.

There are a few things I could try, but I'm afraid to get too aggressive lest I lose the decent traffic going to the site now.

I don't think there are many links to the query string pages, and it almost seems as if Google is crawling them from memory.

pageoneresults




msg:947668
 10:39 pm on Jan 26, 2004 (gmt 0)

I use a 404 processor rather than ISAPI_rewrite, but Google continues to index the dynamic URLs.

Hmmm, I might be a little leery entrusting a URI rewrite routine to a 404 processor. But, knowing you, I'm going to assume that you've covered all your bases.

I would definitely look at each step involved with this method. Is it possible that the 404 processor is returning a 302 somewhere along the way?

f00sion




msg:947669
 7:29 am on Jan 27, 2004 (gmt 0)

i would think rogers problem would be related to the server returning a 404 and then possibly a 301 or 302 when the 404 page redirects to the real page? I went the 404 route for a while but it kinda always seemed like a mickey mouse fix.. using one of the isapi filters is so much cleaner and more seemless.

PhilC




msg:947670
 5:01 pm on Feb 1, 2004 (gmt 0)

Hmmm. I came looking for the answer to this isapi_rewrite problem, but it doesn't look like anyone has it.

In my system, I rewrite all URLs to a script file. Googlebot gets the correct pages but the server writes the rewritten filename (the script's name) to the log. I was hoping there might be an isapi-rewrite setting/switch or something to make the requested URL be written to the log. I want to know which URLs Googlebot is requesting.

Nobody got any ideas?

If there isn't a standard way of dealing with it, I suppose I could write the requested URLs to a text file from the script.

PhilC




msg:947671
 5:05 pm on Feb 1, 2004 (gmt 0)

Found the answer! - I think.

[isapirewrite.com...]

Second item from the bottom is about the "U" flag which, apparently, causes the "U"nmangled (requested) URL to be written to the log.

I'm gonna test it.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Microsoft / Microsoft IIS Web Server and ASP.NET
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved