homepage Welcome to WebmasterWorld Guest from 23.22.173.58
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
410 an url in htaccess
troyid

10+ Year Member



 
Msg#: 4261430 posted 10:05 pm on Feb 1, 2011 (gmt 0)

I would like to 410 an url like below in htaccess. I have tried a few things but nothing seems to work.

The url looks like this

http://www.example.com/?TB_iframe=true&height=505&width=1200

Any help is appreciated.

 

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4261430 posted 11:23 pm on Feb 1, 2011 (gmt 0)

Test the QUERY_STRING value with a RewriteCond.

The RewriteRule will need the [G] flag to send "410 Gone".

troyid

10+ Year Member



 
Msg#: 4261430 posted 12:12 am on Feb 2, 2011 (gmt 0)

Is it possible to 410 Gone any url with a question mark in it?

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4261430 posted 12:49 am on Feb 2, 2011 (gmt 0)

Yes it is.

However, do you mean "any and every URL with a question mark even when there is nothing after the question mark", or "any and every URL with attached parameters" or what?

troyid

10+ Year Member



 
Msg#: 4261430 posted 1:41 am on Feb 2, 2011 (gmt 0)

"any and every URL with a question mark even when there is nothing after the question mark" is what I mean

what would i use for this?

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4261430 posted 9:50 am on Feb 2, 2011 (gmt 0)

Since the question mark is a separator between the path part of the URL and the attached parameters, you will need to instead test THE_REQUEST and look for a question mark within.

In the code, the question mark will need to be escaped, thus \? and it might be worth also looking for the encoded %3F or whatever it is. You'll need to look up the actual ASCII code.

This is a topic that has been discussed several times before, and I remember that some of those threads contain a lot of example code.

troyid

10+ Year Member



 
Msg#: 4261430 posted 9:32 pm on Feb 15, 2011 (gmt 0)

Well I was lucky enough that someone responded to my problem at another forum and it fixed my problem.

Here is what worked.

RewriteCond %{QUERY_STRING} ^TB_iframe=true&height=505&width=1200$
RewriteRule ^.*$ - [G]

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4261430 posted 9:48 pm on Feb 15, 2011 (gmt 0)

The .* pattern will mean that every request for any page, any image, any stylesheet, even requests for robots.txt, will be further tested to see if this query string is present.

jdMorgan

WebmasterWorld Senior Member jdmorgan us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4261430 posted 12:49 am on Feb 18, 2011 (gmt 0)

Also, this only generates a 410 response if the query string is exactly "TB_iframe=true&height=505&width=1200."

You might want

RewriteCond %{THE_REQUEST} ^[A-Z]+\ /[^?\ ]*\?
RewriteRule ^$ - [G]

This returns a 410 for any request for the "site root URL" with *any* query appended.

Everything depends on *exactly* which URLs you want to return a 410, and which ones you don't.

Jim

[edited by: jdMorgan at 10:26 pm (utc) on Mar 17, 2011]

crobb305

WebmasterWorld Senior Member crobb305 us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4261430 posted 12:58 am on Mar 12, 2011 (gmt 0)

I need some help removing some redirect urls with old attached parameters from Google, and try to encourage Google to stop crawling nonexistent urls (and hopefully my redirect file altogether).

I use a php redirect script to house my affiliate links. Over the years, some parameters have come and gone, yet Google continues to index them (despite denial in robots.txt) and Google keeps trying to crawl them (despite being served 404). In my WMT account, I see dozens of these old urls, in the format http://www.example.com/go/go.php?url=someoldparameter

At the current time, my go.php? file contains only about 5 valid affiliate links, so there is no need for Google to keep trying to crawl old ones that were deleted a year ago.

A) Should I serve a 410 for each parameter?
B) If I serve a 410, I assume I would list these one by one in htaccess?
C) Can I add a "noindex" in the go.php file to stop all future crawling/indexing of these redirects since my denial in robots.txt has proven insufficient?

Thanks in advance for your help
C

jdMorgan

WebmasterWorld Senior Member jdmorgan us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4261430 posted 10:31 pm on Mar 17, 2011 (gmt 0)

Remove the Disallow from robots.txt. If you Disallow a URL in robots.txt, then you cannot expect to reliably redirect that URL because a robots.txt-compliant client will never request that URL.

Consider modifying your script to look up the "affiliate links" in a database and determine if they are currently valid. If not, redirect to remove the affiliate ID (if that's what you're getting at here).

I'd certainly consider redirection over a 410 or 404 -- You don't want to be throwing away the credit for the inbound links, do you? (I'm asking)

Jim

crobb305

WebmasterWorld Senior Member crobb305 us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4261430 posted 10:06 pm on Mar 20, 2011 (gmt 0)

I'd certainly consider redirection over a 410 or 404 -- You don't want to be throwing away the credit for the inbound links, do you? (I'm asking)


None of the dead affiliate links (links that no longer exist in the redirect file) have any IBLs. So I am not worried about losing any link juice. I just think Googlebot needs to know they are 410, and I am curious to know how to 410 them based on parameter. They all have the link structure http://www.example.com/go/go.php?url=someoldparameter. Is your recommended code above valid in this case?

I will also follow your advice, and remove the robots.txt restriction. In fact, I am sitting here thinking that it *may* have had an impact on my quality score (since Panda), given that Googlebot keeps encountering 28 restricted urls (affiliate links), and 22 of them are dead. I may have a really inflated ad-to-content ratio just based on what it "thinks" the number of active affiliate links on my site are (pure speculation, but trying to get out of Panda Prison).

Nevertheless, I do want to 410 those dead parameters.

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4261430 posted 12:38 am on Mar 21, 2011 (gmt 0)

Something like this...

RewriteCond %{QUERY_STRING} url=(something|otherthing|thisthing|thatthing)
RewriteRule go/go\.php - [G]

crobb305

WebmasterWorld Senior Member crobb305 us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4261430 posted 12:58 am on Mar 21, 2011 (gmt 0)

g1smd, thanks for your help.

I was about to test something similar, but your way is more efficient (multiple parameters handled in the same rule).

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4261430 posted 1:11 am on Mar 21, 2011 (gmt 0)

A "local OR" parses much faster for code placed in a .htaccess file.

crobb305

WebmasterWorld Senior Member crobb305 us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4261430 posted 1:24 am on Mar 21, 2011 (gmt 0)

g1smd,

I'm not sure I follow. Do I need to write the rule differently, to include an OR? I was just doing to use what you had specified above, with about 15 parameters, in the format you suggested.

crobb305

WebmasterWorld Senior Member crobb305 us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4261430 posted 1:43 am on Mar 21, 2011 (gmt 0)

It works. The old urls return 410, I submitted them for removal, and I removed the deny in robots.txt. Thanks for your help :)

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4261430 posted 7:46 am on Mar 21, 2011 (gmt 0)

The
(this|that) construct is called a "local OR" and it parses faster than the usual [OR] construct:

RewriteCond %{...} ... [OR]
RewriteCond %{...} ...

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved