Welcome to WebmasterWorld Guest from 54.162.239.134

Forum Moderators: Ocean10000 & incrediBILL & phranque

Message Too Old, No Replies

410 an url in htaccess

     
10:05 pm on Feb 1, 2011 (gmt 0)

10+ Year Member



I would like to 410 an url like below in htaccess. I have tried a few things but nothing seems to work.

The url looks like this

http://www.example.com/?TB_iframe=true&height=505&width=1200

Any help is appreciated.
11:23 pm on Feb 1, 2011 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Test the QUERY_STRING value with a RewriteCond.

The RewriteRule will need the [G] flag to send "410 Gone".
12:12 am on Feb 2, 2011 (gmt 0)

10+ Year Member



Is it possible to 410 Gone any url with a question mark in it?
12:49 am on Feb 2, 2011 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Yes it is.

However, do you mean "any and every URL with a question mark even when there is nothing after the question mark", or "any and every URL with attached parameters" or what?
1:41 am on Feb 2, 2011 (gmt 0)

10+ Year Member



"any and every URL with a question mark even when there is nothing after the question mark" is what I mean

what would i use for this?
9:50 am on Feb 2, 2011 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Since the question mark is a separator between the path part of the URL and the attached parameters, you will need to instead test THE_REQUEST and look for a question mark within.

In the code, the question mark will need to be escaped, thus \? and it might be worth also looking for the encoded %3F or whatever it is. You'll need to look up the actual ASCII code.

This is a topic that has been discussed several times before, and I remember that some of those threads contain a lot of example code.
9:32 pm on Feb 15, 2011 (gmt 0)

10+ Year Member



Well I was lucky enough that someone responded to my problem at another forum and it fixed my problem.

Here is what worked.

RewriteCond %{QUERY_STRING} ^TB_iframe=true&height=505&width=1200$
RewriteRule ^.*$ - [G]
9:48 pm on Feb 15, 2011 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



The .* pattern will mean that every request for any page, any image, any stylesheet, even requests for robots.txt, will be further tested to see if this query string is present.
12:49 am on Feb 18, 2011 (gmt 0)

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Also, this only generates a 410 response if the query string is exactly "TB_iframe=true&height=505&width=1200."

You might want

RewriteCond %{THE_REQUEST} ^[A-Z]+\ /[^?\ ]*\?
RewriteRule ^$ - [G]

This returns a 410 for any request for the "site root URL" with *any* query appended.

Everything depends on *exactly* which URLs you want to return a 410, and which ones you don't.

Jim

[edited by: jdMorgan at 10:26 pm (utc) on Mar 17, 2011]

12:58 am on Mar 12, 2011 (gmt 0)

WebmasterWorld Senior Member crobb305 is a WebmasterWorld Top Contributor of All Time 10+ Year Member



I need some help removing some redirect urls with old attached parameters from Google, and try to encourage Google to stop crawling nonexistent urls (and hopefully my redirect file altogether).

I use a php redirect script to house my affiliate links. Over the years, some parameters have come and gone, yet Google continues to index them (despite denial in robots.txt) and Google keeps trying to crawl them (despite being served 404). In my WMT account, I see dozens of these old urls, in the format http://www.example.com/go/go.php?url=someoldparameter

At the current time, my go.php? file contains only about 5 valid affiliate links, so there is no need for Google to keep trying to crawl old ones that were deleted a year ago.

A) Should I serve a 410 for each parameter?
B) If I serve a 410, I assume I would list these one by one in htaccess?
C) Can I add a "noindex" in the go.php file to stop all future crawling/indexing of these redirects since my denial in robots.txt has proven insufficient?

Thanks in advance for your help
C
10:31 pm on Mar 17, 2011 (gmt 0)

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Remove the Disallow from robots.txt. If you Disallow a URL in robots.txt, then you cannot expect to reliably redirect that URL because a robots.txt-compliant client will never request that URL.

Consider modifying your script to look up the "affiliate links" in a database and determine if they are currently valid. If not, redirect to remove the affiliate ID (if that's what you're getting at here).

I'd certainly consider redirection over a 410 or 404 -- You don't want to be throwing away the credit for the inbound links, do you? (I'm asking)

Jim
10:06 pm on Mar 20, 2011 (gmt 0)

WebmasterWorld Senior Member crobb305 is a WebmasterWorld Top Contributor of All Time 10+ Year Member



I'd certainly consider redirection over a 410 or 404 -- You don't want to be throwing away the credit for the inbound links, do you? (I'm asking)


None of the dead affiliate links (links that no longer exist in the redirect file) have any IBLs. So I am not worried about losing any link juice. I just think Googlebot needs to know they are 410, and I am curious to know how to 410 them based on parameter. They all have the link structure http://www.example.com/go/go.php?url=someoldparameter. Is your recommended code above valid in this case?

I will also follow your advice, and remove the robots.txt restriction. In fact, I am sitting here thinking that it *may* have had an impact on my quality score (since Panda), given that Googlebot keeps encountering 28 restricted urls (affiliate links), and 22 of them are dead. I may have a really inflated ad-to-content ratio just based on what it "thinks" the number of active affiliate links on my site are (pure speculation, but trying to get out of Panda Prison).

Nevertheless, I do want to 410 those dead parameters.
12:38 am on Mar 21, 2011 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Something like this...

RewriteCond %{QUERY_STRING} url=(something|otherthing|thisthing|thatthing)
RewriteRule go/go\.php - [G]
12:58 am on Mar 21, 2011 (gmt 0)

WebmasterWorld Senior Member crobb305 is a WebmasterWorld Top Contributor of All Time 10+ Year Member



g1smd, thanks for your help.

I was about to test something similar, but your way is more efficient (multiple parameters handled in the same rule).
1:11 am on Mar 21, 2011 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



A "local OR" parses much faster for code placed in a .htaccess file.
1:24 am on Mar 21, 2011 (gmt 0)

WebmasterWorld Senior Member crobb305 is a WebmasterWorld Top Contributor of All Time 10+ Year Member



g1smd,

I'm not sure I follow. Do I need to write the rule differently, to include an OR? I was just doing to use what you had specified above, with about 15 parameters, in the format you suggested.
1:43 am on Mar 21, 2011 (gmt 0)

WebmasterWorld Senior Member crobb305 is a WebmasterWorld Top Contributor of All Time 10+ Year Member



It works. The old urls return 410, I submitted them for removal, and I removed the deny in robots.txt. Thanks for your help :)
7:46 am on Mar 21, 2011 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



The
(this|that)
construct is called a "local OR" and it parses faster than the usual [OR] construct:

RewriteCond %{...} ... [OR]
RewriteCond %{...} ...
 

Featured Threads

Hot Threads This Week

Hot Threads This Month