Forum Moderators: Robert Charlton & goodroi
May be I am paranoid atm, but I started to look deeper when I saw that Google hadn't cached anything from the day before I moved servers.
But when I try to "Fetch as Google" on a folder, it comes back with "Error"
Google says "HTTP/1.1 403 Forbidden"
If I place a dummy index.html file in there, then I get "HTTP/1.1 200 OK"
Was I blocking Google from spidering those folders once I removed the index.html files ?
OK, what am I missing here? Why on earth don't you simply say "Options -Indexes" in your top-level htaccess?
Or rather: Why wasn't it there all along?Well, when I started the site, I barely knew html and was self-taught. It was on a windows server, so .htaccess wasn't used.
Auto-indexes are for things like long lists of publicly downloadable files that you've dumped out there for everyone to paw through.Before I made my dummy index.html files I spotted some results that were example.com/folder/ and needed to block them from being shown, as they displayed the file structure of the site - something I do not wish to have.
NO. You were blocking google-- and everyone else, good and bad, human and robot-- from seeing an auto-generated file called "index.html" which lists every single thing in the directory.
There's one huge question you didn't answer.That's easy, unless there was a post that was removed and I didn't see it, then it wasn't asked.
How do visitors-- whether human or robot-- know what's in those directories?Like I said in my original post, I can "fetch as Google" on individual files that are inside those folders, and comes back as a success.
I have to assume you don't expect your normal human visitors to blunder into a raw index and click on filenames.
It was years later that I moved to an Apache server. I was using a reseller, and it was they who made the .htaccess file. They also said that it was done by hand, rather than in cpanel because cpanel was buggy.
It wasn't an auto-generated file.
Yes, cPanel and other such systems generate awful htaccess files, with broken syntax and many errors.You would have thought that after all these years, cpanel would get it correct. Both .htaccess and cpanel have been here for years, so they should have ironed out any bugs by now.
If you have no DirectoryIndex directive and you allow indexing, the server auto-generates a list of the filenames that are within the folder when the bare folder URL is requested.That is exactly what I was seeing before I put my dummy index.html files into every folder.
If there is a filename defined in the DirectoryIndex directive, users see the content of that file when the bare folder URL is requested.That is exactly what I was seeing after I put my dummy index.html files into every folder.
If you turn indexing off, users see a 403 error if there is no DirectoryIndex directive and no index file and the bare folder URL is requested.That is exactly what is happening now.
I can only answer questions if they are asked.
How do visitors-- whether human or robot-- know what's in those directories?
Like I said in my original post, I can "fetch as Google" on individual files that are inside those folders, and comes back as a success.
I have had people who have dissasembled my URLs.
If you came to my site and browsed it, you will only see URL's that end in .html
I don't use URL's like example.com/folder/
Before I made my dummy index.html files I spotted some results that were example.com/folder/ and needed to block them from being shown, as they displayed the file structure of the site - something I do not wish to have.
I can only answer questions if they are asked.
Nonsense. You can anticipate questions and prepare the answers ahead of time. In fact you won't last long in business if you don't.
I have had people who have dissasembled my URLs.
Sure. And that's what your custom 403 page is for. "Sorry, passing human, there's nothing here." But you can only disassemble an URL if you have an URL to disassemble.
If you came to my site and browsed it, you will only see URL's that end in .html
I don't use URL's like example.com/folder/
But search engines do, so you've now got Duplicate Content all over the place, unless you're forcibly redirecting from /folder/ to /folder/index.html. That's "redirecting", not "quietly rewriting" via mod_dir or IIS equivalent. And if the only function of those named /index.html pages is to prevent people from seeing the auto-generated index, you've got some pretty thin pages in high-profile locations.
Why the lecture?Luckily I had my flame-retardant pants on, but it was rather harsh.
Lame_Wolf didn't know what to do and finds this stuff confusing,'tis true. Along with a lifetime of sleep deprevention, I have 2 medical conditions that make it hard to take things in.
Obviously visitors found the locations with "dummy index" pages in place, so your question does not need to be asked or answered.In most instances I found that the visitor arrived on a valid URL and then breadcrumbed the site. I know I saw them when I used the site operator, and I am sure I saw one in the regular serps for something obscure. (The real page was above it, so it wasn't a thin content page.)
Who cares about duplicate content in this situation since duplicate URLs are grouped together and "the best one" is picked by Google?Exactly. Who cares. I'd rather rank for "click here" ;)
The only way they're "high-profile" locations is if they're highly linked, but they're "dummy pages", which would implicitly not be highly linked, so they're not high-profile.Very true.
It's 6 of one, half dozen of the other. The mechanism creating the "page" or "exposing the structure" was the issue, all Lame_Wolf asked is if the issue was solved adequately with the new procedure put in place.
Thank you for understanding the question asked, and for answering it in a professional and politeful way.
Np again, and I'm glad I could help. I thought the question you asked initially coupled with the information provided was very easy to understand and answer.Thank you. I tried to give a clear picture of things, so it was clear for others to understand, even if I didn't use the correct terminology.
I also think it's impressive you've overcome the challenges you have to not only teach yourself to write HTML, but to use mod_rewrite at all. Mod_Rewrite is a very difficult "thing" to understand and use.
I know people who don't have the same challenges as you outlined who throw their hands up and say "Ahhhhh, help! I can't do this!" when they try to use it, so I think it's fantastic you've been persistent and determined enough to overcome the obstacles in your path to even use it at all.Believe me, it can be extremely stressful at times, especially if I cannot grasp things when I know I should be able to do so. I am not thick, but if I cannot create a picture, then the brain melts into jelly.
I'd like to learn .htaccess but I haven't found a site yet where I can understand it in the way I need it. I need something that explains things in a simple manner, what things like [NC] mean, when to use them/not to use them, why you do things before others, etc etc
Thanks for your support and kindness. I bow my humble head.
If you need more or something different, let me know and how it could be better... If you do I'll see if I can find JD01 to post another one, and if he's too busy or something, then I'll see what I can come up with. Not sure if I can post the way he did though. Funny how he just disappeared, isn't it
Oh, no... I don't have the same challenges to overcome you do. All my respect for being able to persevere through challenges and adversity.
I think it's awesome to see someone overcome challenges they face and I'm inspired by it. So, Thank You!
...so the .htaccess seems to be working albeit rather messy. I will work on it again once the dust has settled with the site move, and my head is clearer.
Thanks again for everything. If I can ever return the favour, I will.
Back on topic (in case someone complains)
make sure everything is available on an xml-sitemap and the <changefreq> (change frequency) is set to daily
Funny how he just disappeared, isn't itI forgot to reply to this bit. Yes, it is. He could be dead. His site is parked, so it is possible.
If you posted it in the Apache Forum [webmasterworld.com] you might get some help with it.
(Especially if you let me know when you do.)
Don't even worry about it. I've gotten a Ton of help and information from here over the years as a reader, so I'm glad I can "give back" a bit to at least one of the contributors. I don't post questions often here, because I try to give back answers out of appreciation for those who contribute to making this place the authority for information and best place to visit when I need to know something.I too answer far more than I ask. In fact, I think I have had more posts removed than I have asked for help. :)
I'm glad you're getting reindexed again. I'd definitely make sure everything is available on an xml-sitemap and the <changefreq> (change frequency) is set to daily (EG <changefreq>daily</changefreq> reference: [sitemaps.org...]I have dropped a lot in the SERPS for certain keywords. Mainly due to being offline and DNS issues. Neither of which were my fault. (So far, the hosting provider have said that there's been DNS issues, problem with the container, problem with the datacenter, and a problem with a node) One long nightmare.