Forum Moderators: open

Message Too Old, No Replies

Alexa impersonator?

Is someone faking an Alexa spider?

         

Ralph_Slate

4:31 pm on May 29, 2002 (gmt 0)

10+ Year Member Top Contributors Of The Month



I have a program on my site that blocks spiders/site suckers, except for those belonging to a search engine. I have extra verification by IP, since most search engines spider from a fixed set of IP's. If I temorarily block one it's easy enough to spot and correct because they typically identify themself in the HTTP_USER_AGENT field.

I am seeing a lot of activity from a spider identifying itself as ia_archiver. Normally that's Alexa Internet's spider, but the IP usually resolves to alexa.com. This IP doesn't resolve.

When I do a traceroute on it from Visualroute, I get this:

¦ 0 ¦ ¦ 161.58.180.113 ¦ win10115.iad.dn.net ¦ Dulles, VA, USA ¦ -05:00 ¦ ¦ ¦ Verio, Inc. ¦
¦ 1 ¦ ¦ 161.58.176.129 ¦ - ¦ ?Englewood, CO 80112 ¦ ¦ 0 ¦ x ¦ Verio, Inc. ¦
¦ 2 ¦ ¦ 161.58.156.140 ¦ - ¦ ?Englewood, CO 80112 ¦ ¦ 0 ¦ x ¦ Verio, Inc. ¦
¦ 3 ¦ ¦ 129.250.27.215 ¦ ge-1-3-0.r02.stngva01.us.bb.verio.net ¦ Sterling, VA, USA ¦ -05:00 ¦ 0 ¦ x ¦ Verio, Inc. ¦
¦ 4 ¦ ¦ 129.250.5.47 ¦ p16-7-0-0.r02.mclnva02.us.bb.verio.net ¦ Mclean, VA, USA ¦ -05:00 ¦ 0 ¦ x ¦ Verio, Inc. ¦
¦ 5 ¦ ¦ 129.250.5.249 ¦ p4-3-0.r00.mclnva02.us.bb.verio.net ¦ Mclean, VA, USA ¦ -05:00 ¦ 0 ¦ x ¦ Verio, Inc. ¦
¦ 6 ¦ ¦ 205.215.2.37 ¦ - ¦ ?Atlanta, GA 30303-1537 ¦ ¦ 0 ¦ x ¦ NetRail, Inc. ¦
¦ 7 ¦ ¦ 66.28.28.173 ¦ g14-1.core01.dca01.atlas.cogentco.com ¦ Washington, DC, USA ¦ -05:00 ¦ 0 ¦ x ¦ Cogent Communications ¦
¦ 8 ¦ ¦ 66.28.4.22 ¦ p15-0.core02.dca01.atlas.cogentco.com ¦ Washington, DC, USA ¦ -05:00 ¦ 0 ¦ x ¦ Cogent Communications ¦
¦ 9 ¦ ¦ 66.28.4.82 ¦ p6-0.core01.jfk02.atlas.cogentco.com ¦ New York, NY, USA ¦ -05:00 ¦ 51 ¦ --x------- ¦ Cogent Communications ¦
¦ 10 ¦ ¦ 66.28.4.14 ¦ p15-0.core02.jfk02.atlas.cogentco.com ¦ New York, NY, USA ¦ -05:00 ¦ 0 ¦ x ¦ Cogent Communications ¦
¦ 11 ¦ ¦ 66.28.4.86 ¦ p14-0.core02.ord01.atlas.cogentco.com ¦ Chicago, IL, USA ¦ -06:00 ¦ 53 ¦ -x-- ¦ Cogent Communications ¦
¦ 12 ¦ 10 ¦ 66.28.4.61 ¦ p15-0.core01.ord01.atlas.cogentco.com ¦ Chicago, IL, USA ¦ -06:00 ¦ 31 ¦ x ¦ Cogent Communications ¦
¦ 13 ¦ ¦ 66.28.4.42 ¦ p5-0.core01.sfo01.atlas.cogentco.com ¦ - ¦ ¦ 69 ¦ x- ¦ Cogent Communications ¦
¦ 14 ¦ ¦ 66.28.6.154 ¦ g49.ba01.b001865-1.sfo01.atlas.cogentco.com ¦ - ¦ ¦ 62 ¦ x- ¦ Cogent Communications ¦
¦ 15 ¦ 100 ¦ ?66.28.31.74 ¦ ?Alexa-Internet.demarc.cogentco.com ¦ ¦ ¦ ¦ ¦ Cogent Communications ¦
¦ 16 ¦ ¦ 66.28.250.174 ¦ - ¦ ?Washington, DC 20007 ¦ ¦ 107 ¦ -x ¦ Cogent Communications ¦

It seems to be registered to Cogent Communications, to a machine that may have "Alexa" in its name. Cogent seems to be a high-end, high-traffic ISP.

What do you think? Is this an Alexa imposter, or does this spider really belong to Alexa?

Ralph Slate

volatilegx

5:56 pm on May 29, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



As far as I'm concerned, ia_archiver is prime blocking material. Alexa isn't really a search engine. I'd block the IP whether or not it resolves.

<added>By the way, welcome to WMW!</added>

Ralph_Slate

12:42 am on May 30, 2002 (gmt 0)

10+ Year Member Top Contributors Of The Month



I can see how people would want to block Alexa, but I can at least see them doing something constructive with the data they grab. It's an entirely different story, however, if someone is impersonating their spider to do who-knows-what with the data.

Ralph

wilderness

3:29 am on May 30, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



<snip>but I can at least see them doing something constructive with the data they grab>

Hey Ralph,
have you ever seen the site archive org?
Let us suppose your an ongoing businness with an acive presence on the internet which you monitor and protect?

For what ever reason you change the name of your internet presence or leave the internet all together?

What would you think happens to the previously copyrighted materials you presented on the internet?
Should this go on without your knowledge or approval?

How about if you just update a page that is no longer valid or even replace a page which you realized might have opened your presence up to some litiagtion?
Did you realize that thanks to Alexa that page can likley be viewed?

Now to the real skinny of Alexa!
The software is used to make comparisons of websites among other things. However the software only presents facts based on the websites the user adds? Misleading?

Of course Alexia cannot be held responsible for the actions of a user in which the software is freely provided!
Than WHY are they providing?
How does Alexa benefit?
It must generate revenue someplace or somehow?

In the end the software user can do what ever they desire with what is obtained from your website. Under the IP block of Alexa.

Perhaps I'm just blind and I don't see the benefit of allowing Alexa to pick my pockets ;-)

mbauser2

7:45 am on May 30, 2002 (gmt 0)

10+ Year Member



How does Alexa benefit?
It must generate revenue someplace or somehow?

Insisting that a website "must generate revenue" is possibly the single weakest argument in the history of the Internet.

Do you go around looking for the ulterior motives of libraries and historical museums, too?

Every one of your arguments could be used against a library. They've got out-of-print books! They let anybody read them! They don't ask the publisher's permission! Oh, the horror of it all!

Those damn librarians. When is somebody gonna do something about them?

wilderness

10:20 am on May 30, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



<snip>Those damn librarians. When is somebody gonna do something about them?>

Have you looked on your telephone bill lately?

NO WHERE in my statement did I say that websites must generate revenue!
I did say:

"How does Alexa benefit?
It must generate revenue someplace or somehow?"

I would hardly compare Alexa and ia_archiver with a library. Especially when both enter my websites. However even a library depending upon it's intended use of my content wouldn't be beyond scrutiny.

Cornell University freely offers materials from the mid 1800's to 1926 which is beyond copyright.
Does that mean that Cornell doesn't generate profit and I should allow them to roam and gather from my website what they please?