Welcome to WebmasterWorld Guest from

Message Too Old, No Replies

Crawling error 404 on Live Pages



12:19 pm on May 2, 2012 (gmt 0)

Hello, although I have this problem for several months just now I had time to deal with it and also I hoped that google will solve this issue in time but unfortunately it doesn't. I spend the last days giving real full attention to this issue, reading and informing myself but I did not found a real solution, so I really need please your help.

I have 2 subdomains with 2 different content, they are adult site so I will not post the real link in public. In google webmaster all pages except main page of each subdomain are giving 404 not found error in crawling stats and they are neither indexed in google (except main page of each subdomain). All the pages are live and including the Fetch as Google can read the pages content but still give a 404 not found http respond.
I did check with my hosting company and they confirm everything goes well, nothing to report there. See lower please an example of one fetch as google page.

Some possible hint: my root domain is redirect from non.www to .www , the subdomains have no kind of redirects and www.subdomain is not even activate in a host record.

This is an old domain with quite good reputation and with a lot of work during the years... the fact that pages are giving 404 error not to mention not indexing hundreds of pages with unique content affects really bad our incomes...

Any suggestion please....Txs Andy

Fetch as Googlebot
This is how Googlebot fetched the page.
URL: http://www.example.com/
Date: Wednesday, May 2, 2012 4:19:45 AM PDT
Googlebot Type: Web
Download Time (in milliseconds): 995
HTTP/1.1 404 Not Found
Date: Wed, 02 May 2012 11:19:47 GMT
Server: Apache
Content-Encoding: gzip
X-Content-Encoded-By: Joomla! 1.5
Expires: Mon, 1 Jan 2001 00:00:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
Set-Cookie: 53f94ae28286625d467fde6c991deb78=j71bn6iersl2oe1nri5la1f9o5; path=/
Set-Cookie: OlWeb=example; expires=Mon, 22-Apr-2013 11:19:47 GMT; path=/
Set-Cookie: ColorCSS=red; expires=Mon, 22-Apr-2013 11:19:47 GMT; path=/
Set-Cookie: ScreenType=wide; expires=Mon, 22-Apr-2013 11:19:47 GMT; path=/
Set-Cookie: FontSize=4; expires=Mon, 22-Apr-2013 11:19:47 GMT; path=/
Last-Modified: Wed, 02 May 2012 11:19:48 GMT
Keep-Alive: timeout=15, max=100
Connection: Keep-Alive
Transfer-Encoding: chunked
Content-Type: text/html; charset=utf-8

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">

[edited by: Andy_Langton at 2:10 pm (utc) on May 2, 2012]
[edit reason] De-linked example [/edit]


9:01 pm on May 2, 2012 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month

First and most obvious question: What happens when you yourself request a page? Not in FTP but in the usual way, by entering its URL in your browser? I can't tell for sure if you're saying that humans can find the pages just fine, but g### can't, or that nobody can get there if they ask for a page by name.

Are the subdomains named in the form

subdomain.example.com ?

Do the index pages link directly to the other pages? Can anyone follow the links, or does it require some kind of validation? (This isn't likely to be the problem, or g### would be throwing 401 instead of 404, but let's cover all bases.) Does it make any difference whether you enter the URL directly as opposed to clicking a link?

Preliminary ideas: Simple rewrite error. Simple cookie error.


11:47 am on May 3, 2012 (gmt 0)

HI, txs for the replay. The links are working just find, doesnt matter if you type, if yo navigate direct on browser, if you click links, all working fine.In fact the site has traffic and clients navigate with no problem on the site.
Yes the subdomain is named: subdomain.example.com
All our pages are not violating any kind of rules, all clean hard manual work...
I think also that the issue can be in rewrite error or cookie or something similar because I just create a new subdomain on this domain and has the same problem, so all 3 subdomains having the same issue..
I live you here the domain .htaccess .. The subdomains have the standard joomla 1.5. htacess.txt file.In my joomla admin you have the option to enable mod rewritte or not, I have to enable because if not the url are not seo friendly...But this , enabling this option, I made on other 4 different domains / hosting accounts and work ok...

Txs Andy
rewriteengine on
rewritecond %{HTTP_HOST} ^www.yyyyyy.com$ [OR]
rewritecond %{HTTP_HOST} ^yyyyyy.com$
<IfModule pagespeed_module>
ModPagespeed on
ModPagespeedEnableFilters extend_cache
ModPagespeedEnableFilters combine_css
ModPagespeedEnableFilters move_css_to_head
ModPagespeedEnableFilters elide_attributes
ModPagespeedEnableFilters rewrite_css
ModPagespeedEnableFilters rewrite_images
ModPagespeedEnableFilters collapse_whitespace
ModPagespeedEnableFilters inline_javascript
FileETag None
#php_value upload_max_filesize 128M
#php_value max_input_time 60
#php_value memory_limit 200M
#php_value post_max_size 128M
# -FrontPage-
IndexIgnore .htaccess */.?* *~ *# */HEADER* */README* */_vti*
Options +FollowSymLinks
rewriterule vvvv/([^/.]+)/([^/.]+)/([^/.]+) index.php?page=content&p=$1&id=$2&ti=$3 [L,QSA]
rewriterule yyyy/([^/.]+)/([^/.]+) index.php?page=content&p=$1&ti=$2 [L,QSA]


2:56 pm on May 3, 2012 (gmt 0)

5+ Year Member

Maybe Googlebot is banned by server software.
A simple way to check is to Fetch as Googlebot on another site on the same IP or ask hosting service.


8:34 pm on May 3, 2012 (gmt 0)

Pffffff.. Well I found the problem, strangely after all this days the idea come to me when explaining to"lucy24" the .httaccess issue. The problem was there as suggest by you most probably the mod rewrite; the root domain do not have joomla installed on server but all the subdomains are on joomla, I just had to copy and paste the content of httaccess.txt into the .htaccess of the root domain ,like this the mod rewrite was proper done. It looks like even if each subdomain have their own httaccess the master is the root domain httaccess that count...

Well txs and good luck


11:16 pm on May 3, 2012 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month

It looks like even if each subdomain have their own httaccess the master is the root domain httaccess that count...

htaccess files build on each other. The part that's hardest to wrap your brain around is this: Even if the DNS points the user's request straight at a specific domain or subdomain, the request still has to go through the server's config file and any intervening htaccess files.

Featured Threads

Hot Threads This Week

Hot Threads This Month