homepage Welcome to WebmasterWorld Guest from 54.237.213.31
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
Redirect dynamic url starting with ?
IngoZ



 
Msg#: 4440079 posted 4:24 pm on Apr 12, 2012 (gmt 0)

I want to redirect few urls to a 404 not found page but don't know how. These dynamic urls were indexed by Google. I tried to remove them in Google Webmaster Tools but it's not possible, the urls are redirecting to the homepage and when I create a removal request appears as "site removal" not "page removal". All I want is to eliminate these pages.

/?start=50

/?start=100

/?ref=akagunduz.com

/?refsite=www.n1ads.com&ref=alexa-traffic-rank

 

enigma1

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4440079 posted 5:23 pm on Apr 12, 2012 (gmt 0)

First you shouldn't redirect from inside your domain to a 404. You can do a 404 straight way, although in this case you don't even need to do that. Just make sure these incorrect links aren't exposed somewhere in your domain otherwise the errors you see won't go away.

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4440079 posted 10:09 pm on Apr 12, 2012 (gmt 0)

You want less code, not more. Take away the line that redirects bogus requests to the home page.

When you say "tried to remove" are you talking about the URL Removal area or the "ignore parameters" area? Here you need the parameters. This function is for parameters you no longer use, and for parameters that don't affect the content of the page.

IngoZ



 
Msg#: 4440079 posted 10:27 pm on Apr 12, 2012 (gmt 0)

I used the Url removal option, not really redirecting, those pages are displaying the content from the homepage, lost positions for most keywords. I removed from SERPs all duplicated pages except the pages above. Also used rel canonical.
At your suggestion I added as parameters "?" and "=".

incrediBILL

WebmasterWorld Administrator incredibill us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 4440079 posted 12:34 am on Apr 13, 2012 (gmt 0)

Um, not sure how that's going to work as "?" and "=" aren't parameters, "start", "ref" and "refsite" are actual parameters.

Another simple method is to put code into your pages so when parameters are passed that should be ignored by Googlebot you include the meta robots NOINDEX in the header of the page.

Also, did you add those URLs into robots.txt?

Once your make a crawling mess getting rid of it can create just as big, if not a bigger, mess to undo the damage.

BTW, work on methods that work for removing stuff from ALL search engines otherwise it'll still show up in Bing, Yahoo, etc. and ultimately end up scraped somewhere and right back into Google all over again.

IngoZ



 
Msg#: 4440079 posted 8:56 am on Apr 13, 2012 (gmt 0)

I have added in my robots.txt

User-agent: *
Disallow: /*=
Disallow: /*?
Disallow: /*&

IngoZ



 
Msg#: 4440079 posted 7:55 pm on Apr 13, 2012 (gmt 0)

I still think I should use something in htacces to block these urls, faster.

For another site I have an url indexed like this
website.tld/?refsite=www.n1ads.com
, also displaying the content from the main page, I can't remove it.
g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4440079 posted 2:15 am on Apr 14, 2012 (gmt 0)

Send 404 or even 410:

RewriteRule %{QUERY_STRING} (^|&)something=value(&|$)
RewriteRule ^somepath - [G]

IngoZ



 
Msg#: 4440079 posted 12:29 pm on Apr 14, 2012 (gmt 0)

I used something like this and works

RewriteEngine On
RewriteCond %{QUERY_STRING} ^ref=(.*)
RewriteRule ^.* /404.php%1? [NE,R=permanent]

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4440079 posted 1:41 pm on Apr 14, 2012 (gmt 0)

Your condition will match only when ref is the first parameter. My example allowed for there to be preceding parameters and still match.

The (.*) capture will capture the value for the ref parameter and the rest of the query string parameter names and values. My code captured only the first value.

The condition will now be checked for all requests: pages, images, stylesheets, js files. You should limit what is checked.

You're now sending status "301 Moved" in response to those requests. That is a problem. You should send 404.

The rule needs the L flag.

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4440079 posted 5:16 pm on Apr 14, 2012 (gmt 0)

You're now sending status "301 Moved" in response to those requests. That is a problem. You should send 404.

:: detour to mod_rewrite docs, which I really ought to have memorized by now ::

Is there a mod_rewrite flag that says 404? I've only ever found a 410 [G].

I don't think we ever nailed down the original question: Did these queries formerly exist, or are they purely the product of google's fevered imagination? Does the site use query strings at all?


I just checked something I should have checked ages ago on my own (100% static) site. Was distressed to discover that if I make up a completely random query and tack it onto the name of a completely random html page, the query is simply ignored. Is this a problem?

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4440079 posted 6:55 pm on Apr 14, 2012 (gmt 0)

Yes, it is.

It's a potential source of infinite duplicate content. However, searchengines should be quite good at spotting this problem. With no dynamic content on the page, all URL versions should be byte for byte identical.

If you use no query stings at all for anything then such requests can all be either redirected or blocked.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved