| 11:14 pm on May 11, 2009 (gmt 0)|
A url-only result usually means that the site has prohibited the url from being indexed - but there are backlinks in play that make Google want to list it anyway.
Have you checked your robots.txt file? And if the site assembles the content dynamically, have you checked the robots meta tag to make sure that it doesn't say "noindex"?
[edited by: tedster at 4:05 am (utc) on May 12, 2009]
| 11:16 pm on May 11, 2009 (gmt 0)|
Check the robots.txt file and meta robots tags for errors.
Does the site do any redirects (like non-www to www, and so on)?
*** But we didn't made any even by far changes that could be related. ***
What *did* you change? Maybe not intentionally, but perhaps you did change something that mattered.
| 12:22 am on May 12, 2009 (gmt 0)|
Thanks for your replies.
We didn't really changed anything in the structure for at least a year. Our robots.txt file:
When do you think we can expect a deep recrawl? Considering PR5 for the main page.
[edited by: encyclo at 12:27 am (utc) on May 12, 2009]
[edit reason] replaced link with robots.txt contents [/edit]
| 12:23 am on May 12, 2009 (gmt 0)|
The site is dynamic, with cached content.
Redirects... we might have some in place, but again those are years old. I don't think google could've catched any old links. And we surely don't have some of those that are showing as URLs from the old system.
And it's not that all results are showing like that. But just a lot are. Some 50% at least.
Is there anything wrong with our robots.txt file?
[edited by: Kres7787 at 12:26 am (utc) on May 12, 2009]
| 12:28 am on May 12, 2009 (gmt 0)|
Your robots.txt file bans all robots - it must have been changed by you or your developers. You should remove it as soon as possible - for the moment, just delete the robots.txt file!
| 12:32 am on May 12, 2009 (gmt 0)|
I just checked. It was edited a week ago! WHAT THE HELL!
So can you please explain what that line does? Completely rejects ALL bots?
Thanks a lot man.
| 12:34 am on May 12, 2009 (gmt 0)|
Yes, it blocks everything (disallow access to the root-level folder - / - and everything below it). Delete the file completely right now, you can create a new one later at your leisure. :)
Google will work it out soon enough, it might take a few days to a few weeks for things to get back to the previous state however.
| 12:38 am on May 12, 2009 (gmt 0)|
I just did and have as well emailed our developers about this.
We could've been hacked. As I don't believe our guys would do that.
I can only say three words, DAMN and THANK YOU :)
Could that be the reason why we were just getting URLs as results? As Tedster said, some backlinks could've made those URLs results visible?
| 12:42 am on May 12, 2009 (gmt 0)|
I wish there was a paid deep crawl :)
| 12:49 am on May 12, 2009 (gmt 0)|
I believe I know what happened...
I found robots.txt file also on our test domain where we're developing new stuff. I can understand the need for such robots setup on that test domain.
BUT perhaps somebody from the developers team accidentally published robots from test domain to the live site.
Who's going to laugh on this? That move could cost us some 900,000 uniques if it will take 30 days for re crawl to occur. Will know tomorrow morning. They're sleeping :)
| 10:32 am on May 12, 2009 (gmt 0)|
Exact same thing happened once on a project I worked on (luckily I wasn't culprit).
New item for the testing to live rollout checklist!
| 10:52 am on May 12, 2009 (gmt 0)|
It happened same with me before some time, check your robots.txt, it has banned your website to crawling,
go your webmaster account and checked the website status there. You will see the website status there. Also you will find to generate the robots.txt option there Or use this in your robots.txt file for some time
| 12:42 pm on May 12, 2009 (gmt 0)|
The "Allow:" directive is *not* supported by all robots.
If the previous robots.txt file contained only the directives posted above, then all that is necessary is to delete it or to replace it with a blank file just to prevent 404 errors on robots.txt fetch attempts.
If the "Allow:" directive is used, it should be used only in a robots policy record addressed to the robots which support it (check the individual robots' "help" pages for support info).
| 12:58 pm on May 12, 2009 (gmt 0)|
Thanks again people. We're working on this now. First time that we're seriously working on it.
Thanks for all your input.
| 6:19 pm on May 12, 2009 (gmt 0)|
A robots.txt disallow means that Google will show those as URL-only entries. As soon as someone says "URL only" here, that's the big clue as to where to look first. :)
I do not rely on using robots.txt files.
The development server is locked down with .htpasswd so that nothing gets in and has a look round.
| 6:32 pm on May 12, 2009 (gmt 0)|
even if you haven't changed anything in a year, things might change anyway: adsense all of a sudden started to show charities. Why? There were some syntax errors in my robots.txt file that Google seems to have ignored for years. They obviously changed their parser a couple of weeks ago - and I was cut off for a while before figuring out what went wrong.
| 8:28 pm on May 12, 2009 (gmt 0)|
I mentioned this in the order thread. Our google earnings went down from steady $130 per day, to $15 per day, DUE to this robots.txt crap up. So not only has our traffic had a major blow, but also our earnings. Lovely.
So be careful people. Learn from idiots like us. ;)
| 9:07 pm on May 12, 2009 (gmt 0)|
Kres7787 with and only generating 130.00 per day I would as well look very hard at the way your presenting the ads there is something wrong.
| 12:24 pm on May 13, 2009 (gmt 0)|
|I would as well look very hard at the way your presenting the ads there is something wrong |
Not necessarily - it could just be a low paying niche.
| 2:09 pm on May 13, 2009 (gmt 0)|
yes somewhat lower CPM. We do not show Adsense on all pages as well. Due to some restrictions with ad agencies.
[edited by: tedster at 4:12 pm (utc) on May 13, 2009]