hosted by liquidweb


Go Back   Web Hosting Talk : Web Hosting Main Forums : Programming Discussion : Using spider to check for copyright violation
Reply

Programming Discussion Discussions related to web programming languages and other related issues. Topics may include configuration, optimization, practical usage and database connectivity.
Forum Jump

Using spider to check for copyright violation

Reply Post New Thread In Programming Discussion Subscription
 
Send news tip View All Posts Thread Tools Search this Thread Display Modes
  #1  
Old 05-29-2004, 07:50 PM
crafty crafty is offline
Newbie
 
Join Date: Dec 2002
Posts: 8

Using spider to check for copyright violation


I have invested time in putting together a text document that I have openly shared with an online community. Now I believe that someone might be claiming authorship of my work. Searching for a string on google provides the expected hits but my "nemesis" probably would not be indexed on a search engine.

Could I create (or download) a spider that would search for a specific string in a text file and merely report the URL back to me via email when found?

I'm aware of the philosophical issue about open source, and my having freely placed it on the web. I'm specifically interested in the mechanics of searching web sites for the contents of text files.

Thanks

Crafty

Reply With Quote


Sponsored Links
  #2  
Old 05-29-2004, 08:23 PM
Barti1987 Barti1987 is offline
Web Hosting Master
 
Join Date: Mar 2004
Location: USA
Posts: 4,342
Hmm...

interesting!

I never dealt with bots and spiders... yet..

Peace,

__________________
Testing 1.. Testing 1..2.. Testing 1..2..3...

Reply With Quote
  #3  
Old 05-29-2004, 11:12 PM
TechSolution TechSolution is offline
Web Hosting Evangelist
 
Join Date: Mar 2004
Posts: 502
It's possible... I've done one before (to a limited extent) It just depends how much time and bandwidth you have available.

Reply With Quote
Sponsored Links
  #4  
Old 05-29-2004, 11:52 PM
t c t c is offline
Web Hosting Master
 
Join Date: Mar 2003
Location: VA
Posts: 640
If you know how to program you could make simple applications that can be used anywhere and use controls such as INET or Winsock and just go on its way searching. You then could use the same type of control to email you results in the finds.

I suggest you asking around about this type of thing.

Reply With Quote
  #5  
Old 05-30-2004, 02:15 AM
crafty crafty is offline
Newbie
 
Join Date: Dec 2002
Posts: 8
The information that I find on the Internet about spidering, is how to use a spider to improve your standing on a search engine, or download all the MP3's or porn from a given website.

I obviously don't have the URL for the site. I was told by an associate to check a URL and I found my work credited to someone else. They told me they got it from another URL which I also checked and contacted. I've been naive about this. I don't intend to bring anyone to court, but I'd like to be able to email them and ask them to remove the content (or else credit me for the work).

After finding these two, I googled them and found they weren't indexed on the major search engines. I'm sure there is more out there and I would guess that a spider would be the most efficient way. Maybe I'm barking up the wrong tree. Is there a better way?

Thanks

Crafty

Reply With Quote
  #6  
Old 05-30-2004, 02:39 AM
Burhan Burhan is offline
Community Guide
 
Join Date: Jul 2003
Location: Kuwait
Posts: 5,100
Well, without a starting point, there is no way for your spider to find your content. The whole point of hyperlinks is so that there is a way to reach pages from other content.

If you have a page that is never linked from another page, it is very unlikely that your spider will find it. There are a lot of search engines other than the popular google, msn or yahoo.

Think of it this way -- how would you find where your content was posted? If you aren't able to locate it "pragmatically" -- that is, from typing a keyword in a search engine -- then it will be very difficult to write a spider that does the same.

There are other issues with this -- how will prove you hold the copyright to the content?

__________________
In order to understand recursion, one must first understand recursion.
If you feel like it, you can read my blog
Signal > Noise

Reply With Quote
  #7  
Old 05-30-2004, 09:44 AM
crafty crafty is offline
Newbie
 
Join Date: Dec 2002
Posts: 8
Good point. I don't know how to start. That may end my quest right there! I guess that I would look at sites that had similiar content, and harvest the URL's of where people had surfed from. I know this is available to the web host (it is on my site) but may not be available to a bot. I know these people would not have links to their sites, so the breadcrumbs that I would be looking for would be the users.

If they posted to the site maybe. I don't know. This is not my area of expertise hence my posting here.

As for how I would resolve it afterwards, there is no legal remedy since this isn't legally protected. I'm counting on peoples doing the right thing after I contact them. But that isn't really the point of the post.

Crafty

Reply With Quote
  #8  
Old 05-30-2004, 10:43 AM
TechSolution TechSolution is offline
Web Hosting Evangelist
 
Join Date: Mar 2004
Posts: 502
You could use DMOZ/ODP data for a starting point, along with referrer logs from your web host.

How much bandwidth would you have available to do this?

Reply With Quote
Reply

Related posts from TheWhir.com
Title Type Date Posted
New Internet Censorship Law in Russia Requires Web Hosts, ISPs to Block Illegal Websites Web Hosting News 2012-11-12 12:06:43
Pirate Bay Co-Founder Faces Deportation from Cambodia After Arrest Web Hosting News 2012-09-04 13:50:29
US Officials Seize 70 Domains Citing Counterfeiting, Fake SSL Certificates Web Hosting News 2012-07-13 10:45:43
New Google Transparency Report Provides Insight into URL Takedown Requests Web Hosting News 2012-05-25 13:59:26
Australian Court Dismisses Appeal in ISP Copyright Infringement Case Web Hosting News 2012-04-20 15:26:00


Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes
Postbit Selector

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off

Forum Jump
Login:
Log in with your username and password
Username:
Password:



Forgot Password?
Advertisement:
Web Hosting News:



 

X

Welcome to WebHostingTalk.com

Create your username to jump into the discussion!

WebHostingTalk.com is the largest, most influentual web hosting community on the Internet. Join us by filling in the form below.


(4 digit year)

Already a member?