louieramos

msg:4466996 | 2:20 am on Jun 19, 2012 (gmt 0) |
Is your site a blog and have tags and multiple categories?
|
mslina2002

msg:4467004 | 2:51 am on Jun 19, 2012 (gmt 0) |
| "...we have omitted some results very similar..." |
| Usually means dupe content. When you click on that you will be thrown back to page 1 again. Click through again until you can get to the last page that G will show you. The last few pages you can usually see why you have dupes.
|
g1smd

msg:4467099 | 5:51 am on Jun 19, 2012 (gmt 0) |
Change the listings to 100 URLs per page. Save all pages of both results sets. Compare the two listings to find out which pages are dropped. I recently did that by cutting off the page header and footer from each page and then joining all the results together in order in two files and then running DIFF against the two files.
|
phranque

msg:4467113 | 6:31 am on Jun 19, 2012 (gmt 0) |
a search like this will be helpful: http://www.google.com/search?q=site%3Aexample.com&safe=off&filter=0&num=100 i think you will have to turn off Instant for the &num=100 to work. | "...we have omitted some results very similar..." |
| this can also mean links were followed to robots.txt-excluded urls and the "snippet-less"/url-only results start to look "very similar". how many pages to you expect to have indexed? have you specified and/or submitted a sitemap? have you crawled the site?
|
g1smd

msg:4467114 | 6:39 am on Jun 19, 2012 (gmt 0) |
Another thing to check.
site:example.com -inurl:www site:www.example.com One of those should return zero results.
|
phranque

msg:4467117 | 6:50 am on Jun 19, 2012 (gmt 0) |
site:example.com -inurl:www site:www.example.com
One of those should return zero results. |
| ...unless you are also using other subdomains such as blog.example.com, secure.example.com, etc...
|
driller41

msg:4467195 | 2:34 pm on Jun 19, 2012 (gmt 0) |
Feb 2011, that is a year and a half ago - perhaps it is time to move on and build a new site if this one is still not performing - just a thought.
|
realmaverick

msg:4467197 | 2:36 pm on Jun 19, 2012 (gmt 0) |
One of the best ways, is to take a sentence of text from your page and do site:www.example.com "insert sentence here" Ensure you click to view omitted results too.
|
synthese

msg:4467343 | 8:25 pm on Jun 19, 2012 (gmt 0) |
Thanks for some awesome responses. @louieramos - Yes it is a blog - all tags categories have been noindexed for a long time. @mslina - I've done that can cannot see any difference between these results, and the ones before you click the "show omitted" link.
|
synthese

msg:4467344 | 8:30 pm on Jun 19, 2012 (gmt 0) |
@g1smd @phranque This has shown an https://example.com in the result set. Which is bizarre as I have SSL turned off at the hosting but its serving a default apache page. Not sure how to get rid of this - something in DNS settings?
|
synthese

msg:4467348 | 8:39 pm on Jun 19, 2012 (gmt 0) |
@phranque - Expecting about 550 pages to be indexed. This is what is in the sitemap. I haven't crawled the site -- what would you use to do that (maybe a sitemap generator?).
|
synthese

msg:4467352 | 8:55 pm on Jun 19, 2012 (gmt 0) |
Okay this is weird. G is showing about 3 forum.example.com urls -- despite the forum subdomain being deleted in 2006, and all non www. urls 301'd to www. (apache rewrite rule). I've also noticed that there's a good 50 or so URLs in the dupe index that have been returning 404s for over 12 months... this is frustrating.
|
synthese

msg:4467353 | 9:03 pm on Jun 19, 2012 (gmt 0) |
@driller41 - Big call, and I've certainly thought about it. Prepanda: 918k visits in the month before panda hit. Now: 170k visits this last month. I've had another domain sitting there for some time wondering whether to shift the whole site and start over -- but the risk is that things will get even worse.
|
g1smd

msg:4467357 | 9:25 pm on Jun 19, 2012 (gmt 0) |
I'd not particularly worry about a single root https domain holding page being listed. If it take only minutes to fix then I would try to get your branding there or a redirect to http in place.
|
phranque

msg:4467369 | 10:31 pm on Jun 19, 2012 (gmt 0) |
is your server returning that content? e.g. is it your IP address? which port? i typically use xenu and/or screaming frog to crawl sites. re: unexpected urls in the index - are you excluding any urls from being crawled in robots.txt? e.g. what does [forum.example.com...] say?
|
netmeg

msg:4467371 | 10:33 pm on Jun 19, 2012 (gmt 0) |
I'd sic screaming frog on it. I know we're not supposed to mention specific tools, but to my mind this is such an essential SEO tool and probably so necessary to what you're trying to figure out, I'm gonna risk it.
|
phranque

msg:4467409 | 12:08 am on Jun 20, 2012 (gmt 0) |
forgot about the example subdomain/forum linking problem... what does what does http://forum.example.com/robots.txt say?
|
synthese

msg:4467714 | 7:52 pm on Jun 20, 2012 (gmt 0) |
http://forum.example.com does not exist, and has not for 5 years. I have a redirect setup: anything.example.com -> www.example.com [edited by: tedster at 10:07 pm (utc) on Jun 20, 2012]
|
g1smd

msg:4467719 | 8:00 pm on Jun 20, 2012 (gmt 0) |
Make sure that it is a 301 redirect. Make sure that from every non-canonical URL the redirect to the matching canonical URL happens in a single step.
|
tedster

msg:4467766 | 10:11 pm on Jun 20, 2012 (gmt 0) |
| I have a redirect setup: anything.example.com -> www.example.com |
| Warning - that kind of "wildcard" subdomain set-up has sometimes been used by competitors to wreak havoc. Following g1smd's advice about using a 301 redirect is pretty good insurance, but only "pretty good." This is a case where I would highly prefer a 404 status - if it ain't there, then don't resolve the request.
|
|