|Site not indexed - is image file name the problem?|
google indexing, seo
Hi, I have a site that's been up and running for a month. It's not being indexed by Google or Yahoo or Bing but I've done all the regular things to help with the indexing process. This includes:
- On Google Webmaster Tools, submitting the site, verifying the URL, and submitting sitemap.
- Submitted the site to a handful of good directories for listing.
Currently, the site has already a small handful of backlinks from forums (organic stuff; I didn't hire any backlink builders).
I've tried to make the site SEO-friendly with good URL structure and target keywords in relevant places only. There has been no attempt to keyword stuff (ie. there is no keyword stuffing in the alt tags for images).
In short, it's been a fairly typical process and nothing blackhat. The ONLY thing that might be causing an issue is the image file names. Virtually all images on the website have variations of target keywords as part of their file name (eg. target_keyword.png) I'm wondering if the bots view this as similar to keyword stuffing in the alt tags. But I don't see this being mentioned in any SEO articles I've seen so wanted to hear your thoughts on it.
I also know some people estimate it takes as long as 3 months for the bots to crawl new sites and to have them indexed, so perhaps it's too early to worry. But most websites I've built in the past would get indexed within 2 weeks so I'm not sure what's causing the delay this time.
So what do you think? Is the keyword "stuffing" in the image file name an issue?
Welcome to WebmasterWorld, skipper9
I do not think it is your keywords-rich image names. You seem to have a problem with all three major search engines, which would point to some kind of technical issue.
Have you checked your robots.txt just in case that it is not blocking the site?
What do your logs say - is Googlebot and other boots visiting?
Have you tried "Fetch as Googlebot" of a home page and a few internal pages in Google Webmaster Tools? What is the result?
If you do site:example.com does it really return no results?
|It's not being indexed by Google or Yahoo or Bing |
Not being indexed or not being crawled?
Are you in North America? In ARIN territory (dot com and similar) you'd have to try hard to prevent search engines from knowing a new domain exists. Forget three months; you should expect the robots within three days! But in some other registry areas, robots may not know a new domain exists unless someone else links to them.
aakk will correct me on this, but my impression is that even "nofollow" in a link doesn't mean "pretend you haven't seen this". It simply means "we make no warranties for the quality of what's on the other end". So if anyone at all links to you, the search engines will know about it. That's assuming the forums and directories themselves get crawled regularly.
Thanks aakk999 and lucy24.
To your questions, I actually don't even have a robots.txt file for this particular site, which is why I didn't think I had been accidentally blocking the bots. As far as I know, there's no other way to block them if there aren't any explicit instructions provided in a robots.txt file?
As for my logs, it does seem like they're crawling through. Here's a sample log:
|18.104.22.168 - - [27/Oct/2013:03:02:06 -0400] "GET /myurl HTTP/1.1" 200 12523 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" |
The IP location is Mountain View, CA. It also actually says Googlebot in the log so looks likely to be one. If they're crawling through, is there any reason why they aren't indexing?
aakk999 - thanks for the suggestion to try Fetch as Googlebot. I'd missed out on that one and just did it. Clicked "Submit to Index" so I hope that's the right way to get it indexed.
- in Google Webmaster Tools, the Crawl Errors page states that there are no crawl errors. The Crawl Stats page also shows that the bots have been crawling the site.
- The site is secured with SSL cert, and I'd submitted both the http and https versions of my site to Webmaster Tools.
And yep, I'd already tried the site:myurl.com search and it returned 0 results.
lucy24 - I'm not in North America but the server is (which is probably what really matters), and the site does end with .com. And yes, AFAI the search engines don't even have to respect the nofollow since it's really just a request on your part to them.
So the question now is, without a robots.txt file and seeing that Googlebots are actually crawling through, is there any reason why my site isn't being indexed?
Appreciate the help!
Next question is one aak also asked: How do you know you're not indexed? One way is with a site:example.com search. Another is to search for some fairly long bit of text from your site, in quotation marks. Anything?
You're already in wmt. There's a tab for Indexing. What does it currently say?
:: detour to get exact directions ::
Google Index in sidebar: Index Status: change clickbox to Advanced and ask it to show "ever crawled" along with "total indexed". Click Update. Are you seeing non-zero numbers?
:: idly wondering how the Total Indexed can increase by 24 overnight with no change in Ever Crawled number ::
|How do you know you're not indexed? One way is with a site:example.com search. |
Hi Lucy, as mentioned in my post just above your latest one, I have already done a google search using site:myURL.com and it is returning zero results. This is how I know that the site is not being indexed.
|Another is to search for some fairly long bit of text from your site, in quotation marks. Anything? |
I've tried this previously as well and it also returned zero results.
|Google Index in sidebar: Index Status: change clickbox to Advanced and ask it to show "ever crawled" along with "total indexed". Click Update. Are you seeing non-zero numbers? |
Yes! When I followed these steps, it showed the Ever Crawled was 13 but the "Total Indexed", "Blocked by Robots" and "Removed" all show 0. So I know Google is crawling but then not indexing the site. Any idea why it would be doing this?
|I actually don't even have a robots.txt file for this particular site |
This is completely off chance, but can you check your logs that the request for robots.txt returns 404 Not Found (and not something like HTTP 500).
Few months ago I was asked to look the site that dropped out of index, the reason was because robots.txt was returning HTTP 500 instead of HTTP 404.
Also, just doublecheck that you do not have meta noindex set and that if you use canonical link element, it is set up properly.
Because all this smells to me like a technical error somewhere.
Ok problem solved. Turns out in the root directory there was an .htaccess that was blocking bots but when you choose to Edit that file in File Manager, it shows a blank slate (ie. no code) and the only way to see the code and realise that it's actually not an empty file is to View it instead of Editing it. VERY strange. Thanks for the help!
I've been bitten a few times by accidentally double-clicking on .htaccess in ftp. Then it downloads the file-- which promptly becomes invisible and I don't even know it's sitting in the Downloads folder :)
View File probably means "show what you'd see in a browser". And in the case of htaccess, that's ... nothing.