Forum Moderators: open

Message Too Old, No Replies

Is mod_rewrite safe?

mod_rewrite cloak cloaking safe

         

roots

10:37 am on Aug 6, 2003 (gmt 0)

10+ Year Member



Hi...

Is mod_rewrite based on IP & user agent safe way to cloak site?

wkitty42

4:47 pm on Aug 6, 2003 (gmt 0)

10+ Year Member



how else would you do it?

roots

12:39 pm on Aug 7, 2003 (gmt 0)

10+ Year Member



With PHP or with text "behind" SWF...
But that doesn't matter because if I'm going to cloak, I will use mod_rewrite.

I just want to know is there a way that bot can find out what going on?

wkitty42

3:35 pm on Aug 7, 2003 (gmt 0)

10+ Year Member



a bot? no... a human editor doing a follow up? yes...

mod_rewrite is an "internal to the server" switcheroo...

roots

9:08 am on Aug 8, 2003 (gmt 0)

10+ Year Member



But what if google sends us a bot from another IP with diferent agent name?

Does anybody noticed such activity?

Is there some other IPs from google except:

User agent: googlebot-Image/1.0
IP Address: 64.68.84.

User agent:googlebot/2.1
IP Address: 216.239.46. - Deepcrawler
IP Address: 64.68.82. - Freshbot

roots

10:02 am on Aug 8, 2003 (gmt 0)

10+ Year Member



I found a whole list at arin.net:

GOOGLE (GOOGLE)
Google Inc. (GOGL)
Google Inc. (ZG39-ARIN) arin-contact@google.com +1-650-318-0200
Google Inc. (AS15169) GOOGLE 15169
Google Inc. (AS15169) GOOGLE 15169
GOOGLE CW-204-188-0-B (NET-204-188-0-0-2) 204.188.0.0 - 204.188.0.255
Google Inc. GOOGLE (NET-216-239-32-0-1) 216.239.32.0 - 216.239.63.255
Google Inc. GOOGLE (NET-216-239-32-0-1) 216.239.32.0 - 216.239.63.255
Google Inc. EC12-1-GOOGLE (NET-64-68-80-0-1) 64.68.80.0 - 64.68.87.255
Google Inc. GOOGLE-2 (NET-66-102-0-0-1) 66.102.0.0 - 66.102.15.255
GOOGLE MFN-C177-64-124-78-16-29 (NET-64-124-78-16-1) 64.124.78.16 - 64.124.78.23
GOOGLE MFN-C177-64-124-78-24-29 (NET-64-124-78-24-1) 64.124.78.24 - 64.124.78.31
GOOGLE MFN-C177-64-124-78-80-29 (NET-64-124-78-80-1) 64.124.78.80 - 64.124.78.87
Google SBC068076250040030610 (NET-68-76-250-40-1) 68.76.250.40 - 68.76.250.47
Google Inc SBC067126100008030728 (NET-67-126-100-8-1) 67.126.100.8 - 67.126.100.15
Google Inc SBC068122234224030731 (NET-68-122-234-224-1) 68.122.234.224 - 68.122.234.231
Google Inc SBC068122234232030731 (NET-68-122-234-232-1) 68.122.234.232 - 68.122.234.239
Google Inc. MFN-T324-216-200-251-112-29 (NET-216-200-251-112-1) 216.200.251.112 - 216.200.251.119

ukgimp

10:20 am on Aug 8, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>>safe way to cloak site?

There is not really such a thing. Without seeming rude, if you have to ask then it is very dangerous for you and I would not start to play that game until you know it inside out.

roots

10:54 am on Aug 8, 2003 (gmt 0)

10+ Year Member



I know that there is no 100% safe cloak.
I just want to use safest method for cloaking and ask you for you opinions.

The plan is to use rewrite and script that will check for IP changes every day from [ws.arin.net...]

RewriteEngine on

RewriteCond %{http_user_agent} google [OR]
rewriteCond %{remote_addr} ^204.188.0.0$¦^204.188.0.255$ [OR]
rewriteCond %{remote_addr} ^216.239.32.0$¦^216.239.63.255$ [OR]
rewriteCond %{remote_addr} ^64.68.80.0$¦^64.68.87.255$ [OR]
rewriteCond %{remote_addr} ^66.102.0.0$¦^66.102.15.255$ [OR]
rewriteCond %{remote_addr} ^64.124.78.16$¦^64.124.78.23$ [OR]
rewriteCond %{remote_addr} ^64.124.78.24$¦^64.124.78.31$ [OR]
rewriteCond %{remote_addr} ^64.124.78.80$¦^64.124.78.87$ [OR]
rewriteCond %{remote_addr} ^68.76.250.40$¦^68.76.250.47$ [OR]
rewriteCond %{remote_addr} ^67.126.100.8$¦^67.126.100.15$ [OR]
rewriteCond %{remote_addr} ^68.122.234.224$¦^68.122.234.231$ [OR]
rewriteCond %{remote_addr} ^68.122.234.232$¦^68.122.234.239$ [OR]
rewriteCond %{remote_addr} ^216.200.251.112$¦^216.200.251.119$

RewriteRule ^index\.php$ index.php?google=here [L]

claus

11:42 am on Aug 8, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>> play that game

- or you would want to start by placing very limited bets - don't raise the stakes above what you're willing to lose

Mod_rewrite is a pretty strong card played the right way but the SE's always hold the trump card and your competitors tend to be on their side, no doubt about that. If it's visible to a human eye someone will see it sooner or later - then nothing, something, or anything in-between will happen.

Cloaking is not always "a bad thing", it depends much on when, how, why, where, who, etc. I think some thread in here defines the term as "showing something to someone and something else to others" or the like - that's pretty much a standard procedure for all kinds of personalization, ie. the way the web works.

If you choose to let some random SE see urls without session-id's that will also be cloaking by the above definition. Although it's in their own best interest there's no guarantee that they will not decide to penalize you anyway (this decision is only theirs), but there's a whole minefield between this example and "real SE spam" (doorwaypages etc.) and you should expect to lose some domains on the way if you plan to be very agg... competitive.

At least that's what i've heard out there... so the story goes, that is.

/claus


added: someone just saw your post and reminded me that i forgot to mention the hmm... whatsit... "no something".. "archive"... hmm... "cache"? -thingy meta-word... don't know what it means though.


added2:

I do know a little about those rewrite cond's though - it seems you're not doing what you intend to here (actually all the lines, this is just an example):

rewriteCond %{remote_addr} ^216.239.32.0$¦^216.239.63.255$ [OR]

What you search for here are these two IP's (one or the other):

(a) 216.239.32.0, and
(b) 216.239.63.255

- if you look at the ARIN text it says something else:

Google Inc. GOOGLE (NET-216-239-32-0-1) 216.239.32.0 - 216.239.63.255

That is all numbers from (a) to (b), not just those two numbers, so you'll miss a lot of them. To catch all possible IP's from the above ARIN line you will have to do something like this:

RewriteCond %{REMOTE_ADDR} "^216\.239\.(3[2-9]¦[4-5][0-9]¦6[0-3])\." [OR]

It seems like you have to write a pretty complicated script if this kind of syntax should be generated on the fly.

roots

10:52 am on Aug 11, 2003 (gmt 0)

10+ Year Member



TNX a lot for help...

As you can see I used wrong example for IP range from this forum.

I have writed new conditions based on your example:

#rewriteCond %{remote_addr} ^64.68.80.0$¦^64.68.87.255$ [OR]
rewriteCond %{remote_addr} ^64\.68\.8[0-7]\. [OR]

#rewriteCond %{remote_addr} ^64.124.78.16$¦^64.124.78.23$ [OR]
rewriteCond %{remote_addr} ^64\.124\.78\.(1[6-9]¦2[0-3])$ [OR]

#rewriteCond %{remote_addr} ^64.124.78.24$¦^64.124.78.31$ [OR]
rewriteCond %{remote_addr} ^64\.124\.78\.(2[4-9]¦3[0-1])$ [OR]

#rewriteCond %{remote_addr} ^64.124.78.80$¦^64.124.78.87$ [OR]
rewriteCond %{remote_addr} ^64\.124\.78\.8[0-7]$ [OR]

#rewriteCond %{remote_addr} ^66.102.0.0$¦^66.102.15.255$ [OR]
rewriteCond %{remote_addr} ^66\.102\.([0-9]¦1[0-5])\. [OR]

#rewriteCond %{remote_addr} ^67.126.100.8$¦^67.126.100.15$ [OR]
rewriteCond %{remote_addr} ^67\.126\.100\.([8-9]¦1[0-5])$ [OR]

#rewriteCond %{remote_addr} ^68.76.250.40$¦^68.76.250.47$ [OR]
rewriteCond %{remote_addr} ^68\.76\.250\.4[0-7]$ [OR]

#rewriteCond %{remote_addr} ^68.122.234.224$¦^68.122.234.231$ [OR]
#rewriteCond %{remote_addr} ^68.122.234.232$¦^68.122.234.239$ [OR]
rewriteCond %{remote_addr} ^68\.122\.234\.(22[4-9]¦23[0-9])$ [OR]

#rewriteCond %{remote_addr} ^204.188.0.0$¦^204.188.0.255$ [OR]
rewriteCond %{remote_addr} ^204.188\.0\. [OR]

#rewriteCond %{remote_addr} ^216.200.251.112$¦^216.200.251.119$ [OR]
rewriteCond %{remote_addr} ^216\.200\.251\.11[2-9]$ [OR]

#rewriteCond %{remote_addr} ^216.239.32.0$¦^216.239.63.255$
rewriteCond %{remote_addr} ^216\.239\.(3[2-9]¦[4-5][0-9]¦6[0-3])\.

Now I have problem with URI conditions...
RewriteCond %{request_uri} \/index.php$ [NC]
RewriteCond %{request_uri} ^/$ [NC]

because I need something like:
IF (uri1 OR uri2 THEN IF (addr1 OR addr2...))

BTW For now I'll write conditions manualy. But in case of frequent change I'll have to lose few days for developing script.

volatilegx

6:10 pm on Aug 11, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Cloaking via IP based detection is about as safe as you can get, but Mod_Rewrite probably isn't the best way to cloak.

You might want to build in some additional protections, such as using the NOARCHIVE meta tag on the optimized pages, and also making it so translators don't see the optimized version.

roots

7:48 am on Aug 12, 2003 (gmt 0)

10+ Year Member



> Mod_Rewrite probably isn't the best way to cloak

Do you mean mod_rewrite alone or there is something better?

> using the NOARCHIVE meta tag on the optimized pages

I found out that this meta is a disaster.
[searchengineworld.com...]

> making it so translators don't see the optimized version

What do you mean by translatiors?

What's the reason of hiding cached page on google?

volatilegx

5:02 pm on Aug 12, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>> Mod_Rewrite probably isn't the best way to cloak
>Do you mean mod_rewrite alone or there is something better?

I mean Mod_Rewrite alone. IMHO, it is too limiting. There are commercial cloaking programs out there which offer a lot more features.

> What do you mean by translatiors?

Like Alta Vista's Babelfish translator, or the Google toolbar's translator. Many of these translators pull the text from a website with an IP address similar (or the same as) to a spider's IP. Commercial cloaking software can differentiate translators from spiders.

> What's the reason of hiding cached page on google?

So people wanting to steal your optimized text can't see it in the Google cache.

The NOARCHIVE meta tag is still somewhat controversial and whether or not to use it is just a judgement call.

claus

2:17 am on Aug 13, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>> noarchive, translators

... and the alexa internet archiver should be disabled too as well as SE caches ...in fact all sorts of things that would possibly show your optimized pages to someone that shouldn't see them - or the opposite.

One thing is making sure the right pages go to the right people, the next important thing is privacy... to make sure that the content you have carefully developed to a select few does not reach anyone but these people. And integrity.. that these people will not by accident see content intended for others / not intended for them.

>> Now I have problem with URI conditions...
>> IF, THEN

These should not be the rewrite conditions, you should make them the rewrite rules in stead. Or perhaps you need more than one rewrite block, it all depends on the problem at hand.

>> software

There are always more than one way of doing things. You could also develop/use serverside scripts to customize content (that's the PHP from post #3 - it's really just another flavour of software) - anyway, you specifically asked for some advice on mod_rewrite and you can do it with mod_rewrite, but it's more labour than doing it script-based if it is with pre-developed software and you don't write your own, plus you would want to take precautions, ie. no archive and such stuff.

OTOH i would be a bit careful when selecting such software - you invariably lose some element of control with each step you do not do on your own. Anyway, i think i've heard something about a whole category on the dmoz/ODP/whatever with such stuff.

/claus