homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

Crawling error 404 on Live Pages

Msg#: 4448406 posted 12:19 pm on May 2, 2012 (gmt 0)

Hello, although I have this problem for several months just now I had time to deal with it and also I hoped that google will solve this issue in time but unfortunately it doesn't. I spend the last days giving real full attention to this issue, reading and informing myself but I did not found a real solution, so I really need please your help.

I have 2 subdomains with 2 different content, they are adult site so I will not post the real link in public. In google webmaster all pages except main page of each subdomain are giving 404 not found error in crawling stats and they are neither indexed in google (except main page of each subdomain). All the pages are live and including the Fetch as Google can read the pages content but still give a 404 not found http respond.
I did check with my hosting company and they confirm everything goes well, nothing to report there. See lower please an example of one fetch as google page.

Some possible hint: my root domain is redirect from non.www to .www , the subdomains have no kind of redirects and www.subdomain is not even activate in a host record.

This is an old domain with quite good reputation and with a lot of work during the years... the fact that pages are giving 404 error not to mention not indexing hundreds of pages with unique content affects really bad our incomes...

Any suggestion please....Txs Andy

Fetch as Googlebot
This is how Googlebot fetched the page.
URL: http://www.example.com/
Date: Wednesday, May 2, 2012 4:19:45 AM PDT
Googlebot Type: Web
Download Time (in milliseconds): 995
HTTP/1.1 404 Not Found
Date: Wed, 02 May 2012 11:19:47 GMT
Server: Apache
Content-Encoding: gzip
X-Content-Encoded-By: Joomla! 1.5
Expires: Mon, 1 Jan 2001 00:00:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
Set-Cookie: 53f94ae28286625d467fde6c991deb78=j71bn6iersl2oe1nri5la1f9o5; path=/
Set-Cookie: OlWeb=example; expires=Mon, 22-Apr-2013 11:19:47 GMT; path=/
Set-Cookie: ColorCSS=red; expires=Mon, 22-Apr-2013 11:19:47 GMT; path=/
Set-Cookie: ScreenType=wide; expires=Mon, 22-Apr-2013 11:19:47 GMT; path=/
Set-Cookie: FontSize=4; expires=Mon, 22-Apr-2013 11:19:47 GMT; path=/
Last-Modified: Wed, 02 May 2012 11:19:48 GMT
Keep-Alive: timeout=15, max=100
Connection: Keep-Alive
Transfer-Encoding: chunked
Content-Type: text/html; charset=utf-8

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">

[edited by: Andy_Langton at 2:10 pm (utc) on May 2, 2012]
[edit reason] De-linked example [/edit]



WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month

Msg#: 4448406 posted 9:01 pm on May 2, 2012 (gmt 0)

First and most obvious question: What happens when you yourself request a page? Not in FTP but in the usual way, by entering its URL in your browser? I can't tell for sure if you're saying that humans can find the pages just fine, but g### can't, or that nobody can get there if they ask for a page by name.

Are the subdomains named in the form

subdomain.example.com ?

Do the index pages link directly to the other pages? Can anyone follow the links, or does it require some kind of validation? (This isn't likely to be the problem, or g### would be throwing 401 instead of 404, but let's cover all bases.) Does it make any difference whether you enter the URL directly as opposed to clicking a link?

Preliminary ideas: Simple rewrite error. Simple cookie error.


Msg#: 4448406 posted 11:47 am on May 3, 2012 (gmt 0)

HI, txs for the replay. The links are working just find, doesnt matter if you type, if yo navigate direct on browser, if you click links, all working fine.In fact the site has traffic and clients navigate with no problem on the site.
Yes the subdomain is named: subdomain.example.com
All our pages are not violating any kind of rules, all clean hard manual work...
I think also that the issue can be in rewrite error or cookie or something similar because I just create a new subdomain on this domain and has the same problem, so all 3 subdomains having the same issue..
I live you here the domain .htaccess .. The subdomains have the standard joomla 1.5. htacess.txt file.In my joomla admin you have the option to enable mod rewritte or not, I have to enable because if not the url are not seo friendly...But this , enabling this option, I made on other 4 different domains / hosting accounts and work ok...

Txs Andy
rewriteengine on
rewritecond %{HTTP_HOST} ^www.yyyyyy.com$ [OR]
rewritecond %{HTTP_HOST} ^yyyyyy.com$
<IfModule pagespeed_module>
ModPagespeed on
ModPagespeedEnableFilters extend_cache
ModPagespeedEnableFilters combine_css
ModPagespeedEnableFilters move_css_to_head
ModPagespeedEnableFilters elide_attributes
ModPagespeedEnableFilters rewrite_css
ModPagespeedEnableFilters rewrite_images
ModPagespeedEnableFilters collapse_whitespace
ModPagespeedEnableFilters inline_javascript
FileETag None
#php_value upload_max_filesize 128M
#php_value max_input_time 60
#php_value memory_limit 200M
#php_value post_max_size 128M
# -FrontPage-
IndexIgnore .htaccess */.?* *~ *# */HEADER* */README* */_vti*
Options +FollowSymLinks
rewriterule vvvv/([^/.]+)/([^/.]+)/([^/.]+) index.php?page=content&p=$1&id=$2&ti=$3 [L,QSA]
rewriterule yyyy/([^/.]+)/([^/.]+) index.php?page=content&p=$1&ti=$2 [L,QSA]


5+ Year Member

Msg#: 4448406 posted 2:56 pm on May 3, 2012 (gmt 0)

Maybe Googlebot is banned by server software.
A simple way to check is to Fetch as Googlebot on another site on the same IP or ask hosting service.


Msg#: 4448406 posted 8:34 pm on May 3, 2012 (gmt 0)

Pffffff.. Well I found the problem, strangely after all this days the idea come to me when explaining to"lucy24" the .httaccess issue. The problem was there as suggest by you most probably the mod rewrite; the root domain do not have joomla installed on server but all the subdomains are on joomla, I just had to copy and paste the content of httaccess.txt into the .htaccess of the root domain ,like this the mod rewrite was proper done. It looks like even if each subdomain have their own httaccess the master is the root domain httaccess that count...

Well txs and good luck


WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month

Msg#: 4448406 posted 11:16 pm on May 3, 2012 (gmt 0)

It looks like even if each subdomain have their own httaccess the master is the root domain httaccess that count...

htaccess files build on each other. The part that's hardest to wrap your brain around is this: Even if the DNS points the user's request straight at a specific domain or subdomain, the request still has to go through the server's config file and any intervening htaccess files.

Global Options:
 top home search open messages active posts  

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved