Forum Moderators: phranque

Message Too Old, No Replies

How is google able to see my rewritten URL?

I'd rather like it not to

         

miyazaki

8:52 pm on Dec 21, 2008 (gmt 0)

10+ Year Member



Hello,

I've got a working .htaccess file for my shopping cart system, where various 'items' all get redirected to a single page, like so:

RewriteRule ^item_saw.html$ item.php?18?%{QUERY_STRING} [L]
RewriteRule ^item_hammer.html$ item.php?19?%{QUERY_STRING} [L]
RewriteRule ^item_screwdriver.html$ item.php?20?%{QUERY_STRING} [L]

and, this works great. But, Google is not only adding the pages called

/item_saw.html
/item_hammer.html
/item_screwdriver.html

etc, but it's also adding

/item.php?18?
/item.php?19?
/item.php?20?

etc... which I don't want it to do! I suspect that breaks up my PageRank somewhat, and it's just plain unattractive.

I have no links to the underlying pages on my site, so how come Google is able to tell what page I'm internally serving? Surely the item.php bit never makes it out of my server?

Any idea how Google is able to work out the underlying page name? And how to stop it?

g1smd

9:56 pm on Dec 21, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



*** various 'items' all get redirected to a single page ***

Note this is supposed to be an internal rewrite, not a redirect.

However, you may have some other rules that are interfering with this. All redirects should be listed first, and all rewrites should be listed last.

You need to use Live HTTP Headers to examine the server response. My guess is that you'll see a 302 returned somewhere in the system.

Your internal filepath is also likely invalid. You have two question marks in it. Maybe one of those should be an ampersand?

You should also set up a series of redirects so that requests for URLs with parameters are redirected so that the browser makes a new request for the correct URL.

jdMorgan

10:26 pm on Dec 21, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Also, the substitution URL syntax is invalid. Use:

RewriteRule ^item_saw\.html$ item.php?18[b]&[/b]%{QUERY_STRING} [L]

- or -
 RewriteRule ^item_saw\.html$ item.php?18 [[b]QSA[/b],L] 

Note also that the literal periods in the RewriteRule pattern have been escaped.

Jim

miyazaki

10:30 pm on Dec 21, 2008 (gmt 0)

10+ Year Member



Thanks for the reply. Yes, you're right, it's supposed to be an internal rewrite, not a redirect. And yes, having two '?'s in my URL's does indeed suck, but I only ever read the query string from my own Javascript, so it doesn't actually seem break anything... I'll have a go at removing the extra '?', though I suspect that won't affect the Google-related problem I'm having.

I've only got a few other rules in my .htaccess file, and I'm pretty sure they're not interfering...

I used an online app to test the HTTP headers I get back, and there's no redirection at all! But, in the HTML that's returned, I do have this:

<!-- PASS THE POST-MOD-REWRITE URL TO JAVASCRIPT -->
<script type="text/javascript">
current_url = "http://www.example.com/item.php?27?";
</script>

Google can't fish URL's out of Javascript variables can it?

jdMorgan

10:44 pm on Dec 21, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Google has just released their Chrome browser, complete with their very own "V8" JavaScript accelerator. So, it's a bad assumption to think that they don't (or won't) understand JS.

You might consider cloaking the JS if the HTTP User-Agent header indicates a search robot is making the request.

Jim

miyazaki

11:01 pm on Dec 21, 2008 (gmt 0)

10+ Year Member



That's a very good (and scary) point jdMorgan.

And it appears that Google now can execute Javascript code, and follow links contained within. I found another chap who seems to have experienced the same situation:

I can't post the URL here, but the number 1 result for a google search for 'new reality google follows links' should get you the article I just read. :)

Guess I'll have to cloak my Javascript like you say! Yurgh... site... getting... messier.

Thanks for the advice guys.

jdMorgan

3:09 pm on Dec 22, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yeah, you might want to look into moving some of this functionality to the server side -- For two reasons: First, to avoid needing to 'know' the port-rewrite URL 'on the page' where it can be 'seen,' and second, because the page will then work for users with JS disabled.

Like simple JS, Ajax is 'cool' and all that, but it should not be used for critical functions that will break your site if they aren't executed. I see a lot of folks who think that --for example-- they can *either* use client-side scripting like Ajax, *or* they can use server-side scripting such as PHP. The truth is that they should use both, with the selection made based on what is most appropriate and most robust, not on what 'their favorite language' is. This situation reminds me of the old saw, "If all you have is a hammer, then every problem looks like a nail."

Jim