homepage Welcome to WebmasterWorld Guest from 54.166.14.218
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
PediaSearch.com Crawler
keyplyr

WebmasterWorld Senior Member keyplyr us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 3025007 posted 10:35 pm on Jul 27, 2006 (gmt 0)

While I do get traffic from Wikipedia articles that reference my webpage content, this has to be a violation of my copyright. From what I understand, this bot follows these links, scrapes your site's content, creates a book and sells it to the user.

85.214.51.184 - - [27/Jul/2006:10:53:39 -0400] "GET / HTTP/1.0" 200 10956 "-" "PediaSearch.com Crawler"

 

bull

10+ Year Member



 
Msg#: 3025007 posted 6:13 pm on Jul 29, 2006 (gmt 0)

and it does not fetch robots.txt

deny from 85.214.

Mokita

5+ Year Member



 
Msg#: 3025007 posted 10:01 pm on Jul 29, 2006 (gmt 0)

It did ask for robots.txt when it visited one of our sites, but it remains to be seen whether it honours it or not - as I'd never heard of it previously.

pediapress.com - - [24/Jul/2006:03:45:48 +1000] "GET /robots.txt HTTP/1.0" 200 1930 "-" "PediaSearch.com Crawler"
pediapress.com - - [24/Jul/2006:03:45:50 +1000] "GET / HTTP/1.0" 200 9489 "-" "PediaSearch.com Crawler"

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 3025007 posted 10:57 pm on Jul 29, 2006 (gmt 0)

and it does not fetch robots.txt

deny from 85.214.

Yo Jan!
Get with the program!

deny from 85. ;)

Have you changed locales yet and did you purchase that "dingy" I suggested for local transportation?

Best Don

zCat

10+ Year Member



 
Msg#: 3025007 posted 10:59 pm on Jul 29, 2006 (gmt 0)

From the website, only genuine Wikipedia articles are printed, which is perfectly legal provided they follow the GFDL. However it does say "PediaPress suggests further articles based on what's in your book" - maybe they scrape the external links from Wikipedia articles?

The company behind it is located in Germany.

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 3025007 posted 11:20 pm on Jul 29, 2006 (gmt 0)

only genuine Wikipedia articles are printed

Isn't that an oxymoron?
It's a user based forum.

I've seen some of these pages that provide "not so accurate" information.

You or I, or anybody may go to any page and add inaccuracies should we desire and there are no controls to prevent such a thing (with the exception of another user updating our change later).

zCat

10+ Year Member



 
Msg#: 3025007 posted 11:47 pm on Jul 29, 2006 (gmt 0)

Wilderness, I mean "genuine" in the sense that the content printed is (according to the bot's site) genuine, original and legal Wikipedia content.

Whether that content is always accurate is another question; I think there's an ongoing thread elsewhere on WW about the rights and wrongs of Wikipedia.

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 3025007 posted 1:47 am on Jul 30, 2006 (gmt 0)

zCat,
I realized what you meant and was just yanking your chain.

Don

zCat

10+ Year Member



 
Msg#: 3025007 posted 7:19 am on Jul 30, 2006 (gmt 0)

Ah, the forum software must have swallowed your smileys ;-).

Used to be quite active in Wikipedia but I think it's getting out of control

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved