homepage Welcome to WebmasterWorld Guest from 107.20.37.62
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Visit PubCon.com
Home / Forums Index / Marketing and Biz Dev / Cloaking
Forum Library, Charter, Moderator: open

Cloaking Forum

    
Cloaking - Part 2 Question
Cloaking and directory structures...
Canton




msg:675436
 8:14 pm on Nov 2, 2001 (gmt 0)

First off, I'd like to thank Air and WebGuerilla for answering previous cloaking questions...very helpful folks.

Now, for my next question(s).

I'm going to redirect human visitors to a site not located on my server and make the pages in the root directory of each site those that I optimize for (and that are served to) Google. So, when a spider comes calling, if it's Google, the spider will simply get the page I submitted. If it's another engine, it will get the appropriate engine-specific page. The question revolves around where those other engine-specific files are located and how the robots.txt file should be used in conjunction with them.

The question: Can I 1) create directories within the root e.g. - named "av," "ink," fast," etc., 2) have file names in each sub-directory that are the same as those page file names in the root directory, and 3) set the cloaking script to grab the appropriate page from the appropriate sub-directory based on the fact that the page submitted to the engine was the root-directory page that goes by the same file name?

whew...hope that made sense.

Further, because I want the engine-specific pages to be in sub-directories and, presumably, be picked up by engines only when the engines come to the submitted (root directory) page, should I place in my robots.txt file a command for engines to totally ignore all these engine-specific sub-directories (so that they'll always pick up www.site.com/keyword1.htm instead of www.site.com/fast/kwl.htm), and so forth?

If I do this, will the engines be able to access those pages in the sub-directories because it actually thinks it's the root directory page URL, or will I be shooting myself in the foot by keeping them out of the sub-directories with the robots.txt file?

One more question to add: since links to other pages are important for spidering, themes, etc., I'm assuming that if everything I noted above is on the mark that relative links should be used between pages so that spiders would see links between root directory pages even though they're technically between sub-diretory pages? Yes?

OK...my rambling has come to an end. The first task for any the experts is to decode...Thanks in advance.

~Canton

 

WebGuerrilla




msg:675437
 9:43 pm on Nov 2, 2001 (gmt 0)

First off, I'd like to thank Air and WebGuerilla for answering previous cloaking questions...very helpful folks.

You are very welcome. :)

Can I 1) create directories within the root e.g. - named "av," "ink," fast," etc.,

I'm not sure I completely undersatand the redirection part, but as far as the specific engines go on the your secondary server, you would simply use the main files in the root directory to trigger the script to grab the appropriate page from wherever it sits. Rather than having an actuall Google page in your root directory, and sub directories for all the others, you would have a subdirectory for each engine.

The files in the root directory would actually be your script or some type of SSI call to trigger the script.

Example:

yourdomain.com/widget.html would actually not contain any content. It would either be your actual cgi script with an html extension, or it would be a blank file with and SSI call that launched the script. (most cloaking scripts are setup one of these two ways)

If Google were to request that page, the script would retrieve the appropriate page out of the Google folder. If it were alta vista, it would grab the page from the altavista folder, etc.

because I want the engine-specific pages to be in sub-directories and, presumably, be picked up by engines only when the engines come to the submitted (root directory) page, should I place in my robots.txt file a command for engines to totally ignore all these engine-specific sub-directories (so that they'll always pick up www.site.com/keyword1.htm instead of www.site.com/fast/kwl.htm), and so forth?

I would strongly reccomend that you not list the folders containing engine only pages in your robots.txt file. Spiders are not the only ones who look at robots files! Listing the directories there would expose your SE only content to anyone who wanted to look at your file.

If you set it up properly (i.e. don't goof and put a link on one of the pages to the engine subdirectories) search engines will not ever see them. Regardless of where the content came from, it will always be presented to whoever requested it at yourdomain.com/widgets.html

since links to other pages are important for spidering, themes, etc., I'm assuming that if everything I noted above is on the mark that relative links should be used between pages so that spiders would see links between root directory pages even though they're technically between sub-diretory pages? Yes?

Yes.

All the hyperlinks on the pages you deliver need to point to the actual location of your individual scripts or switch files, not the actuall subdirectory location! You need to be very careful about this. Especially if you are using any of the popular WYSIWYG editors. It can be very easy to make a mistake and include a link to a page within your engine directories. That will to your search engine pages getting listed in SERPS

Canton




msg:675438
 3:19 pm on Nov 3, 2001 (gmt 0)

WebGuerrilla...again, thanks for your help. One point I would like to clarify:

You note: All the hyperlinks on the pages you deliver need to point to the actual location of your individual scripts or switch files, not the actual subdirectory location!

So basically, the links on my page listed at www.mysite.com/google/keyword1.htm should be ABSOLUTE links to the pages in the root directory that correspond with the engine-specific pages. e.g. - instead of making the link on the page I noted above relative (href=keyword2.htm), which would allow the spider to pick up pages in my engine folders, I should make the link href=www.mysite.com/keyword2.htm.

Did I understand you correctly, and repeat it correctly here?

It would seem, though, that since the engine "THINKS" it's viewing root directory pages all the time, that relative links on the engine-specific (sub-directory) pages would also lead them to "believe" that the relative link was, in fact, relative to the root directory page rather than the sub-directory page. Yes?

Again, thanks for your time.

~Canton

Air




msg:675439
 4:29 pm on Nov 3, 2001 (gmt 0)

The links can be relative or absolute and will be followed correctly. I think what WebG was cautioning against is absolute links pointing to the actual subdirectory holding the search engine and regular visitor pages, and that some WYSWIG editors (if you use one) will build absolute links out of relative links that point to your current working directory. Obviously you want to guard against that or you will end up with absolute links to your optimized pages (subdirectory and all) being indexed and found by everyone.


Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Marketing and Biz Dev / Cloaking
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved