Results 1 to 9 of 9
  1. #1
    Join Date
    May 2005
    Posts
    98

    Screen scrapers, data mining issues?

    We have a big issue with people using bots to grab our dynamic information and when one of our customers releases new info the site gets flooded with these scrapers downloading all the new content. We've tried everything from banning IPs but this bites us because they use big IP pools and then we ban real users. We tried setting cookies on the clients computer to make sure they're legit but they started using cookies in their scripts. Now, they are using proxies so the IP is not the same so it makes it very hard to even track if it's the same person but you can tell because the searches are in secuence and very fast. Any ideas on how to get around this?

  2. #2
    Join Date
    Sep 2002
    Location
    Nashville, TN
    Posts
    237
    I'm not really sure what you mean...

    -Chris

  3. #3
    Join Date
    May 2005
    Posts
    98
    He have customers(airlines) that have dynamic fares so customers can search for flights. People write scripts(bots) to search every day, every time, and get all the flights so they can sell/use them with something else. This uses a lot of resources and needs to be stopped.

  4. #4
    Join Date
    Apr 2005
    Location
    silicon and earthquakes
    Posts
    258
    Implement graphical authentication.

  5. #5
    Join Date
    May 2005
    Posts
    98
    yeah, but the boss does not want that becuase it slows down real users..

  6. #6
    Join Date
    Nov 2004
    Location
    Marietta PA
    Posts
    138
    There is a good article in this months issue of information security about screen scrappers and how to protect your site. Google Information Security Magazine and you should ficnd thel ink off of techtarget.com not sure if you can view the article on line but i will try to get the info from my copy.
    Digital Offensive
    http://www.digitaloffensive.com
    Take an offensive approach to Security know what your foes know!

  7. #7
    Join Date
    Jul 2003
    Location
    Nothing but, net
    Posts
    2,064
    Originally posted by detz
    yeah, but the boss does not want that becuase it slows down real users..
    Just have them type it in once per session and limit overall searches to 100 per session. That will completely shut down scrapers.

    No one is going to mind typing in 6 letters/numbers once and no one is going to search for over 100 flights in 1 session. If they do they can just type the 6 letters/numbers again.

  8. #8
    Join Date
    May 2005
    Posts
    98
    So something like
    USers comes
    Searches
    ---Enter code---
    results
    research
    results
    research
    results
    ...etc
    so only the first time they enter the code?

  9. #9
    Join Date
    Nov 2004
    Location
    Marietta PA
    Posts
    138
    Yeah the coe should store a cookie allowing them to auth once. You could also look at blocking them using htaccess to control bots from spidering.
    Digital Offensive
    http://www.digitaloffensive.com
    Take an offensive approach to Security know what your foes know!

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •