Forum Moderators: Robert Charlton & goodroi
That's been happening for the last 6 months or so, but not consistently. Occasionally Google will show a number that is reasonably close to accurate. (I'm assuming that's probably an issue of which data center the search is drawn from)
Now, yesterday, prompted by a comment in another thread, I did a site:mysite.com search and it came up with 4515 pages.
Also I've noticed a growing number of URL only listings when I do the site: type searches. Again, the number is not consistent.
When Google returns the 4400+ number of pages, the number of URL only listings is also much larger.
Initially the URL only listings showed up for only bottom level, low traffic pages. Recently the URL only listings began to include higher traffic, higher level pages.
A year or so ago I deleted about 1,100 pages, and made the folders they had been in noindex, nofollow. I needed to keep the folders because of other content (images) that still are in those folders.
I also deleted a couple hundred other pages and replaced them with pages that redirected (meta refreshed actually) to my home page. I've since deleted those couple hundred pages.
Questions
Does anyone have an idea of why Google is showing so many phantom pages?
Am I likely headed for, or currently suffering, a duplicate content penalty?
If so, how do I avoid or recover from it?
Possible solutions I'm considering
Creating a 301 redirect from mysite.com to www.mysite.com.
[I don't know much about 301 redirects so I searched around WW and found this thread
An Introduction to Redirecting URLs on an Apache Server [webmasterworld.com]
by DaveAtIFG. I'm hoping I can understand enough of that to work my way through doing a 301 redirect if that's the best course of action.]
Because the issue seems to be intermittent, I don't know if I actually have to do anything?
I'm hoping for some guidance, suggestions, or ideas on what to do if anything.
>> absolute = http://www.domain.com/page.html
Err, not quite.
/page.html, and anything else that begins with a /, is position-in-site absolute, but starting-domain relative.
../page.html or ../../ is a relative URL.
Anything that includes the full domain name is always fully absolute.
If you link to a folder, always include the trailing / at the end. It is important.
.
I just sorted out a mess where Xenu generated a massive site map for a site, one that was much larger than expected, contained every page duplicated, and loads with a title of "301 Moved". It turns out that although the site uses domain.com as the base in all the internal links, that the host name is configured as www.domain.com, and that many of the internal links did not include a trailing / on folder names. There was a valid .htaccess file directing calls for www.domain.com over to domain.com and it was correctly set up. So, what happens when you link to domain.com/folder is that there is an automatic internal server redirect to www.domain.com/folder/ (remember the host name is set to www.domain.com here) and then the 301 redirect inside the .htaccess file takes over and sends the visitor over to domain.com/folder/ instead. By including the trailing / this could have been avoided. Changing the server host name over to not include the www is also a good idea, but even if that was done, any request for domain.com/folder would still have to have an internal automatic redirect to domain.com/folder/ anyway. So, always include the / on the end to avoid any redirect happening at all.
Also part of this redirect issue is to make sure that if you link to an index page inside a folder then do NOT include the filename in the link. End with the folder name and a trailing / at the very end.
Why? This confuses me, of course confusing me is easily done). Does a se get confused between ...
.com/index.htm and
.com/folder/index.htm
or .... what?
Link to www.domain.com/folder/ or to /folder/ to avoid that problem ever happening.
.
Make sure the foldername ends in a / to avoid the server having to do a redirect from www.domain.com/folder to www.domain.com/folder/ which may go via domain.com/folder/ if the host name is not the same as the one in, or implied by, your link.
It's a bad link because when you change the technology of your site over to PHP or ASP then all your links will be instantly broken.
Thanks for the quick response. That makes it clearer for me.
[I've just been talking to folks about converting from a static page site to a dynamic. Part of the conversation has been that maybe I should convert only the old part(s) that would most benefit by being dynamic, and leaving the the reast static. So knowing some of the possible problems with conversion could surely help.]
Today it dropped from over 6000 pages this morning to the correct count of 1800 pages this evening.
Hope the site comes back now that the phatoms have fled the index...keeping fingers crossed.