homepage Welcome to WebmasterWorld Guest from 54.205.119.163
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
.htaccess confusion
Battling to sort out my Sitemap...
Mikroz




msg:4588567
 7:06 pm on Jun 28, 2013 (gmt 0)

Hi everyone,

Although I enjoy dabbling with .htaccess files, I'll readily and quickly admit to them defeating more often than not. Now is one of those times...

When you my website, the .htaccess in the root changes your destination to my blog, located at blog.example.com.

I have an .htaccess in the root and a robots.txt in the root. There is another one of each in the 'blog' folder.

ROOT .htaccess (applicable lines)

RewriteCond $1 !^(sitemap\.xml|sitemap\.xml\.gz|robots\.txt)$
# All other requests to...
RewriteRule ^(.*)$ http://blog.example.com/$1 [R=301,L]

# Sitemap redirection
Redirect 301 /sitemap.xml http://blog.example.com/sitemap.xml
Redirect 301 /sitemap.xml.gz http://blog.example.com/sitemap.xml.gz

ROOT robots.txt (applicable line)

#Sitemap
Sitemap: http://blog.example.com/sitemap.xml.gz

==============================

BLOG .htaccess (applicable lines)

# Sitemap redirection
Redirect 301 /sitemap.xml http://blog.example.com/sitemap.xml
Redirect 301 /sitemap.xml.gz http://blog.example.com/sitemap.xml.gz

BLOG robots.txt (applicable line)

# Sitemap
Sitemap: http://blog.example.com/sitemap.xml.gz


Can one of you brainiacs kindly help me decipher why I [or Google] cannot access my Sitemap, please?

[edited by: phranque at 7:12 pm (utc) on Jun 28, 2013]
[edit reason] unlinked & exemplified urls [/edit]

 

phranque




msg:4588578
 7:31 pm on Jun 28, 2013 (gmt 0)

what response are you getting?

you shouldn't mix mod_alias and mod_rewrite directives within the same configuration.

http://httpd.apache.org/docs/2.2/rewrite/avoid.html [httpd.apache.org]:
when there are Redirect and RewriteRule directives in the same scope, the RewriteRule directives will run first, regardless of the order of appearance in the configuration file.

Mikroz




msg:4588585
 7:47 pm on Jun 28, 2013 (gmt 0)

Hi phranque,

Firefox gives me this:

Firefox has detected that the server is redirecting the request for this address in a way that will never complete.

You've lost me on the alias vs. rewrite bit... Isn't what I have a rewrite directive?

EDIT: I did read the link! Promise. :D

Thanks for the reply.

lucy24




msg:4588592
 8:01 pm on Jun 28, 2013 (gmt 0)

Why are you redirecting the sitemap? The chances that someone will have an old bookmark for the sitemap are vanishingly small. Just put its current actual URL in robots.txt. Or put the sitemap in the root; search engines will look for it there even if robots.txt doesn't say anything about it.

No reason to muck about with explicit .gz either. Let the server take care of compression if it feels so inclined.

The main problem is that your question mixes up two different things. One is the URL; the other is physical location. Your blog may live in a directory within a directory within a directory, but the only thing visitors-- including the googlebot-- need to know is that its URL is blog.example.com.

If you have a subdomain, you need one robots.txt at
www.example.com/robots.txt
and another at
blog.example.com/robots.txt
regardless of where they physically live. If the two robots.txt happen to be identical, you can rewrite -- NOT redirect -- requests for one so they point to the other. Not even the googlebot knows when it has been rewritten.

Same goes for sitemaps. As far as a visitor is concerned,
blog.example.com
is the root.

Mikroz




msg:4588596
 8:09 pm on Jun 28, 2013 (gmt 0)

Thanks lucy. That makes a lot of sense to me.

My worry was that when Google came looking for the sitemap, at website.com, it wouldn't find it, so I was trying to tell it [Google] to look at blog.website.com for it. :)

The blog is powered by Wordpress, which uses a plugin to generate, optionally compress and list the sitemap, so it sits at the blog.website.com UR, not in the root.

The root and the blog subdomain run seperate robots files.

I will remove the .htaccess entries and see what happens.

Mikroz




msg:4588600
 8:17 pm on Jun 28, 2013 (gmt 0)

It's working! :D

Thank you. Consider me well schooled.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved