PediaSearch.com Crawler

Forum Moderators: open

Message Too Old, No Replies

PediaSearch.com Crawler

keyplyr

10:35 pm on Jul 27, 2006 (gmt 0)

While I do get traffic from Wikipedia articles that reference my webpage content, this has to be a violation of my copyright. From what I understand, this bot follows these links, scrapes your site's content, creates a book and sells it to the user.

85.214.51.184 - - [27/Jul/2006:10:53:39 -0400] "GET / HTTP/1.0" 200 10956 "-" "PediaSearch.com Crawler"

bull

6:13 pm on Jul 29, 2006 (gmt 0)

and it does not fetch robots.txt

deny from 85.214.

Mokita

10:01 pm on Jul 29, 2006 (gmt 0)

It did ask for robots.txt when it visited one of our sites, but it remains to be seen whether it honours it or not - as I'd never heard of it previously.

pediapress.com - - [24/Jul/2006:03:45:48 +1000] "GET /robots.txt HTTP/1.0" 200 1930 "-" "PediaSearch.com Crawler"
pediapress.com - - [24/Jul/2006:03:45:50 +1000] "GET / HTTP/1.0" 200 9489 "-" "PediaSearch.com Crawler"

wilderness

10:57 pm on Jul 29, 2006 (gmt 0)

and it does not fetch robots.txt
deny from 85.214.

Yo Jan!
Get with the program!

deny from 85. ;)

Have you changed locales yet and did you purchase that "dingy" I suggested for local transportation?

Best Don

zCat

10:59 pm on Jul 29, 2006 (gmt 0)

From the website, only genuine Wikipedia articles are printed, which is perfectly legal provided they follow the GFDL. However it does say "PediaPress suggests further articles based on what's in your book" - maybe they scrape the external links from Wikipedia articles?

The company behind it is located in Germany.

wilderness

11:20 pm on Jul 29, 2006 (gmt 0)

only genuine Wikipedia articles are printed

Isn't that an oxymoron?
It's a user based forum.

I've seen some of these pages that provide "not so accurate" information.

You or I, or anybody may go to any page and add inaccuracies should we desire and there are no controls to prevent such a thing (with the exception of another user updating our change later).

zCat

11:47 pm on Jul 29, 2006 (gmt 0)

Wilderness, I mean "genuine" in the sense that the content printed is (according to the bot's site) genuine, original and legal Wikipedia content.

Whether that content is always accurate is another question; I think there's an ongoing thread elsewhere on WW about the rights and wrongs of Wikipedia.

wilderness

1:47 am on Jul 30, 2006 (gmt 0)

zCat,
I realized what you meant and was just yanking your chain.

Don

zCat

7:19 am on Jul 30, 2006 (gmt 0)

Ah, the forum software must have swallowed your smileys ;-).

Used to be quite active in Wikipedia but I think it's getting out of control