Forum Moderators: open

Message Too Old, No Replies

Indexing pages that reside in folders but are not part of site

         

irishaff

3:22 am on Nov 30, 2003 (gmt 0)

10+ Year Member



I have some new files sitting in my webserver which Im planning to integrate into my site in the next few days. Google has already indexed them? Has anybody seen this before? They are showing in the SERPS

johnser

5:30 pm on Nov 30, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If the pages are accessible on the server root and you don't have a robots.txt file in place, then expect every spider around to hit on them.

I got spam yest showing a snapshot of a new site not yet live but accessible in the root.

"We noticed that "www.unpublished-site.com" is not listed in over 753,000 search engines etc etc.."

Use a robots.txt file - (See the WW one)
J

caveman

4:38 am on Dec 1, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yikes, just read this post, looked at the server, and found an old copy of the index page in the root dir.

It was not named "index.html," it was named something like "indexblue.html," but basically it was idential to the index page as far a content goes (it was there as a demo for some clients).

Could this have gotten our index page blown up? Never occured to us that G would index a page not linked to by any other page in the site. Geez, G has trouble just finding all of our regular linked pages...

rfgdxm1

4:45 am on Dec 1, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Google *can't* find pages unless they are linked to from somewhere. A simple mention on "take a look at some new stuff I just wrote at (insert URL)" on a bulletin board somewhere can be enough for Google to find it.

Stefan

4:46 am on Dec 1, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Could this have gotten our index page blown up?

If you've still got all the raw logs since it was there with the real index, you can see if it got crawled by a googlebot. If you open them with Wordpad and "Find" the file name, it won't take that long. If it didn't get crawled, it's not the problem.

caveman

2:48 pm on Dec 1, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



rfg, Stephan, thanks.

The page I referred to was not crawled...I looked in the logs and also checked variations of the URL in G; found no results.

I did find another odd thing however, or maybe not odd. I will note it since it relates to posts about duplicate content.

During Dom we realized that a page called homepage.html was potentially causing us problems. We used it as an alternate homepage for non-Adwords PPC efforts, since it triggered a pop up (which Google doesn't allow for AdWords efforts). The page was identical to the index.html page but with a pop up javascript.

When we stopped running pop-ups back in April, we stopped using the homepage.html in PPC efforts...but we didn't delete homepage.html from the server until Dom/Es, when we realized it might be causing us a penalty for duplicate content.

Right after Dom, we deleted the page, and did a 301 back to the index.html.

Yesterday I did a search on G for the exact URL of that old page (www.mydomain.com/homepage.html) and the search returned our current "index.html" homepage.

Since we did a 301 from hompage.html back to the index.html page, perhaps this is not surprising. What bothers me is that G still keeps this way old, deleted URL (/homepage.html) with a cache of the current homepage. Essentially, they show two URL's with the same cache - our index.html homepage.

Any chance that this could be causing a problem? FYI, shortly after doing the 301, our index page reappeared. But the page is gone since Florida for it's main KW search. Still shows up for other searches.

<mods, if this belongs in one of the threads re dup content by all means move it>

Stefan

3:11 pm on Dec 1, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hey, Caveman

I'd say yes, there's a chance. If the old URI is totally gone then it should correct eventually, (maybe, perhaps, at a guess).

In the early days of Florida, people were reporting finding pages in the serps that had been gone for many months. Back during dom/esm, GG indicated that they liked to use an older, more stable database for their algo changes. Maybe that's what happened to you. As soon as we're full into the rolling updates again, (judging by the way I'm getting crawled that's now), then it might self-correct.

Right after Dom, we deleted the page, and did a 301 back to the index.html.

The page, URI, the whole thing, is totally off the server is it?

ADDED: Wondering why you needed a redirect if the file was gone... if there's a server redirect still in place then maybe google thinks the old URI is still good, and then sees it as dupe content.

caveman

4:59 pm on Dec 1, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The page, URI, the whole thing, is totally off the server is it?

ADDED: Wondering why you needed a redirect if the file was gone... if there's a server redirect still in place then maybe google thinks the old URI is still good, and then sees it as dupe content.


Stefan, good questions. Page was deleted during/after Dom. Yes, a 301 redirect was in place ... just removed it. This sort of implies one should not leave the 301 in place for more than a month or so, which come to think of it, I believe I've read before anyway.

I don't really think that this is the cause of my index page sinking into the depths, especially since:
- it comes up for searches other than the optimized two-word phrase that it was primarily targeting, and,
- 19 of the top 20 sites pre-Florida are also gone.

But, one never knows. Perhaps if not for this, there would have two of us left standing, not just one lone competitor using cloaking!

Stefan

5:49 pm on Dec 1, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Doing a bit of reading here via the site search, it seems that SE's that encounter a server redirect can use the URI they came in on, list the URI as live, and list the content of the page they got redirected to. In your case, I bet there's a chance that Google was finding dupe content on two URI's, homepage and index, even though they were both the same destination. (I might be wrong on this).

As you say, that might not be the problem, but I always figured when you really wanted a page gone, you just removed it from the server along with all internal links pointing to it, then the SE's wouldn't be able to find the URI, and it would eventually disappear from the serps.

If you do find that removing the redirect solves it, could you post a follow-up in this thread? It might help others with a similar problem.

ADDED: Perhaps the hompage crawl wouldn't have shown in the logs because of the server redirect... I'm not 100% on that.

caveman

6:44 pm on Dec 1, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yep, we should have deleted it after 30 days, i.e., long enough to ensure the 301 had been logged. That was lazy on our parts, or disorganized. Will post if anything good happens.

Man, the ways you can get slammed...