Forum Moderators: open

Message Too Old, No Replies

Linkapediabot

         

aristotle

7:15 pm on Aug 23, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I don't remember seeing this one before. A web search shows a few results for an old bot with the same name, but I doubt it's the same one.
Host: 54.147.52.254 
/
Http Code: 200 Date: Aug 23 14:33:52 Http Version: HTTP/1.0 Size in Bytes: 25446
Referer: -
Agent: linkapediabot (+http://www.linkapedia.com)

The IP is AWS. Unusual HTTP.

There's hardly any information on the website. The "about us" page seems to be a joke:

About Us
Mike leads our development team located in Colombia, South America. {goofy picture]
Kate is our Marketing Manager with a decade of startup experience in growth hacking. {goofy picture]
Simon is our designer. Our CEO, Dai is a Zen monk. {goofy picture]

Doesn't look trustworthy to me.

lucy24

8:58 pm on Aug 23, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



goofy picture

Do you know, I had to go check out the site just to see whether "goofy picture" was their literal text* or your synopsis.

I am always annoyed by sites that show a perfectly blank screen in Camino. Just what are they doing that's so fancy and sophisticated, only the most cutting-edge of browsers can display it? And if that is the case, how 'bout a "sorry, your browser is too old"** message? A plain white screen is a pretty good sign of, um, Technical Ineptitude or whatever it is that Google calls it.


* Happens a lot, or the "daily wtf" site would be out of business ;)
** Not to be confused with the oldbrowser message you get on sites that are too dumb to understand that your Safari version is directly linked to your OS version, so they should be looking for a mismatch rather than raw numbers.

keyplyr

1:49 am on Aug 24, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Our CEO, Dai is a Zen monk

That may explain the blank screen.

topr8

1:22 pm on Oct 20, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



it's been showing up a fair bit recently, although it can only get robots.txt anyway for various reasons ... it seems to just be ripping all your information for its own pages and then giving you a little reference link at the bottom. tediously slow to work, i assume the site is pre beta, although it is live.

keyplyr

3:22 am on Oct 27, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



scraper

lucy24

5:24 am on Oct 27, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Huh. Poring over raw logs I found one isolated visit from earlier this year. Or should I call it two visits? robots.txt, followed by a lone interior page ... eight hours later. Both from 54.162.224.26, so 80,000 guesses what response they got on the html.

I remain stumped about that one page, since it's of very narrow appeal and I can't think of anywhere it would be linked from. (Anywhere outside, I mean. If GWT/GSC knows anything, they're not saying.) The only distinguishing feature of the page is that it's got a couple hundred links* to pages on a third-party site ... but scrapers certainly don't need my help locating that other site's files.

As I said: Huh.


* I cross-checked my HTML. There appear to be 357 total ... which is a mystery of its own, because the count is supposed to be 319. I must have listed a lot of things twice.

keyplyr

8:11 am on Oct 27, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I remain stumped about that one page, since it's of very narrow appeal and I can't think of anywhere it would be linked from

Many different techniques are used for URL getting, not all linear... meaning not all bots are crawlers/spiders coming from links posted somewhere on the internet. Some are vertical... getting addresses from a list or dump. Some URLs are gathered at data centers for different reasons: security checking, data mining & now of course social recursive and the implication to marketing research.

I have pages where the content would not be logical if the user did not come from the preceding page but I often see direct requests for that page alone.

Moral of the story - a bot doesn't care. A bot just GETs. I for one like that and am not a big fan of artificial intelligence. I still remember what happened with Kubrick's H.A.L.