Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Is whitelisting dynamic url's important for rankings?

         

Sgt_Kickaxe

9:44 pm on Feb 3, 2012 (gmt 0)



In reviewing my website logs I've noticed Google 'fishing' for dynamic url's that don't exist more often recently. The pages being called have never existed and do not show up in GWT as being 404. In some instances 4-5 seemingly random parameters are tested.

My site is dynamic and parameters do in fact change the content of the page although Google hasn't touched on a combination that would make a change(yet). I have a list of roughly 100 parameters that will change the page in a way I intend and potentially 10,000 in ways I do not want.

Should I place the 100 parameters in an array and whitelist them, returning 404 error codes if the parameter passed is NOT on the list?

Would having 100 parameters in an array/whitelist significantly slow down a page from rendering over time or is that an acceptable number?

Just wanting this 'loose thread' tucked away before Google yanks on it, other suggestions?

tedster

2:52 am on Feb 4, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If your URL schema exposes the query string, then I'd say returning a 404 for invalid parameters makes a lot of sense.

g1smd

7:39 am on Feb 4, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Yes, return 404 for invalid values and invalid combinations of parameters, etc.

However, the sheer complexity of doing this is the reason that I swapped to extnsionless rewritten URLs some years ago. It made the whole process of blocking invalid URLs quite a bit easier.

enigma1

10:18 am on Feb 4, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It depends on the request. Can you give an example of the requests? Is it something like:

http://example.com/blue-widgets.html?color=red
where color is the part unknown to you? What are these random parameters - if they aren't site specific please let us know.

or something like
http://example.com/widgets
http://example.com/blue-widgets
http://example.com/blue-widgets.php
etc...
but you have a url of
http://example.com/big-blue-widgets.html

or something else?

In the first case I return a 200 OK for the document as I don't process a parameter called "color" in general I don't process unknown parameters, but I do have the document. In the second case I will try to find the most relevant document and do a 301 to it. There is no telling where the spider found the link if it was an real anchor or something that looked like a link mangled with some text or broken html on another domain.

It will be a problem if the robot parsed your domain's pages and found or thought it found those links BTW. I know for a fact they won't show all site errors in GWT (it's been tested) although the pages are indexed for long time.

Sgt_Kickaxe

1:31 pm on Feb 4, 2012 (gmt 0)



example would be more like

example.com/?stuff=things

I am contemplating using 301 to a much simpler example.com/things via htaccess but remember it being a pain to get right when the ? comes right after the .com/ in a shared hosting environment.

What to do... :) thanks for the opinions.

g1smd

7:42 pm on Feb 4, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



In your RewriteRule regEx test for ^(index\.php)?$ to match "/index.php" OR "/".