If a site: search shows many pages, then googlebot is indexing those pages. But only you can tell if those disallow lines still permit the pages you want to have in the index actually to be indexed. It's certainly valid syntax.
Btw, how old is your business site?
My site is very old. Back in 2005 it took a sudden, catastrophic loss of placement in Google. I still get top-notch placement in the other search engines.
I may just need to ask G what the deal is. I know that they're "banning" my site to some degree. I can search for unique text strings on the front page of my site and none of them show up in the G serps.
However (again) I can do a site:domain.com search and I get a couple of thousand pages returned.
My PR is zero. Before 1995 it was 6.
I'm baffled. We're a nationwide services company, with many affiliates and we're kicking butt on the other SEs.
What percentage of urls you have in Google's index tagged supplemental?
You may wish to file a reinclusion request [mattcutts.com].
You can do that within Google Sitemaps. Its under "Tools": Submit a reinclusion request.
It looks like you have been suppressed using "site-unique bias" like Kinderstart.
Google does not like directories. A site with "thousands of pages" sounds like a directory. It is therefore likely that nothing you can do will fix the problem.
In order to request reinclusion you now have to stipulate that you did something wrong and fixed it. This is hard to do if you don't know what you did wrong. It is even harder if you did not do anything wrong and were suppressed because Google doesn't like your site. Google does not solicit reinclusion requests from sites that they banned or suppressed for editorial reasons.
" It looks like you have been suppressed using "site-unique bias" like Kinderstart."
When I looked at their site way back when they also had a very bad www/non-www issue that also amounted to massive subdomain cross linking as well.
Ask EFV all about that little problem or you could ask me, been there, done that, don't care to repeat that.
In google webmastertools you can test very effectively your robots.txt file to find if anything prevents the site beeing searched. I would suggest you sign up and use the tool before making reinclusion requests etc.
Google bans sites for using deceptive practices but they also ban or use site-unique bias to suppress sites that they just don't like for undisclosed "editorial" reasons.
If your "nationwide" business is a large business, you are probably in luck. Google pretty much does not ban or use site-unique bias on sites belonging to large businesses strictly for editorial or competitive reasons. (I am not aware of a single case.) There are huge multi-million page data base driven and highly duplicative sites (e.g. Amazon) out there that are heavily indexed in Google. (They will indeed ban a site that uses deceptive practices regardless of business size but will usually rapidly reinstate a large business that corrects the problem. Smaller businesses might have to wait a long time for a sandbox penalty to run out.)
I would recommend the following:
Submit the site for reinclusion. If you have inadvertently triggered a spam trap (seems unlikely), they might tell you and you can fix it and resubmit. If the site was suppressed for editorial reasons you will get no response. The people in the reinclusion dept. certainly don't have the authority to reinclude sites that violate Google's editorial policy, regardless of merit. The reinclusion procedure is clearly only for sites that have fixed a deceptive practice.
If that doesn't work try to contact Matt Cutts. Explain the nature of the business and why the material in the site is non-duplicative and helpful to customers. That has worked for some people. Matt has more discretion regarding merit of your case.
If that doesn't work have someone in your management try to call Google. Writing to Google is probably futile. They likely get 1,000 letters a day that go directly into a special dumpster.
If that doesn't work, you are probably "SOL".
[edited by: tedster at 5:07 pm (utc) on April 4, 2007]
> Disallow: /enter.cgi?
There are no garantees how google will respond to wild cards in robots.txt. Those are not to the robots.txt standard and usage is potluck. I would remove your robots.txt and see if that does it in 30 days.
Google accepts an extension to the robot exclusion rules as outlined here:
Using Disallow: /enter would disallow all URLs that start with /enter and would make your robots file simpler.
Would that be useful, or is there an enter4.cgi that you do want to be spidered?
If it is not the robots.txt, my guess is it is 'on page' factors.
To make a blanket 'Google does not like directories' statement is not entirely accurate. There are ways to get a large directory indexed and ranked, but you will need to follow the 'clear hierarchy' Google suggests to a T. Make sure you *do not* duplicate titles, descriptions, or headings (unless there is a good reason to), and find a way to make a 'clear' indication of where the most important page(s) are located.
I have been 'playing' with a directory for the last two years and after running multiple versions of the software I am using in different sections of the site to gauge search engine responses I have found the toughest time a SE has with a directory site is finding the 'key' page for a search term, and where all pages are 'weighted' evenly by whatever system you are running, *no* (or very few) pages will rank.
IOW If you have a 2000 page directory site, and try to weight 2000 pages evenly, you may end up having issues, but if you are running a 2000 page site and clearly define 200 'upper-level' pages, which allow visitors to easily locate all 2000 pages, you will probably have an easier time.