x10Hosting Forums

Corporate Free Hosting for the Masses.



Register

Reply
 
LinkBack Thread Tools Display Modes
x10 Sophmore

Join Date: Mar 2008
Posts: 120
Credits: 2,727
rockee has a spectacular aura about
Quote  
04-11-2008, 02:00 AM
Lightbulb Deal With Spybots, Spambots and Scrapers

THE HTACCESS APPROACH
I was going to write a comprehensive tutorial on the subject of banning scrapers, spy bots and other misbehaving robots that either don't follow the robots.text file instruction, completely ignore the robots.txt file altogether or read it then completely ignore it anyway and continue to scrape and spider your web site.

Clearly there are many such tutorials on the Internet and a Google search using all these key words below at once in a search query found more info than I could post here (the forum has a size limitation on posts):

bad bots spam bots online downloaders

There are many pages on this search to look through so don't just read the first page only as you may find some more goodies like a useful PHP-Nuke module, spider traps and country specific bad bots.

The most likely and seemingly easiest to follow, and which would have been very much the format I would have used here, is located at these addresses:

How to block spambots, ban spybots, and tell unwanted robots to go to hell.

List of Bad Bots

Blocking Bad Bots and Scrapers with .htaccess

htaccess guide - Blocking offline browsers and 'bad bots'


The list of sites to check out is many and varied but they all have very useful information if you want to squish some of these server hogging pests.


Because this is not a new subject you may find some of the information and sites a bit dated but the principle is worthy and especially if you can find some more recent list of bad bots or you are a prolific reader and analyzer of your site's log files.

One thing to remember also is that some of these bad bots actually hijack web sites and servers (zombies) so they can masquerade the Internet at will - log file analysis will allow you to perhaps spot these and maybe a common reference point that will allow an effective .htaccess entry.


THE ROBOTS TEXT FILE APPROACH
An alternative to the use of the .htaccess file is the robots.txt file but as I have outlined above its use is only relevant and effective if these bad bots read the dang thing and follow your instructions - most don't.


A good place to start and the authority on all matter relating to the robots.txt file is located here:

The Web Robots Pages at robotstxt.org

They include pages about these items below plus much more and they use a Previous/Next type of navigation system for easier reading and understanding:
  • database of robots
  • robots.txt file checker
  • robot related meta tag information
  • IP look up
  • how to get the best listing in search engines
The above site is worth a visit so you can get a handle on this robots.txt file and use it to your best advantage.

Here is a useful link to Wikipedia relating to BotNet


I hope this article will be of use and please post back if you can add to it with your own current lists of known mischievous robots and experiences.

Regards,
Rocky
Reply With Quote
rockee is offlineReport Post
x10 Sophmore

CoolFinalFan's Avatar

Join Date: Oct 2005
Posts: 190
Credits: 1,510
CoolFinalFan is on a distinguished road
Location: Myrtle Beach, SC USA

Send a message via MSN to CoolFinalFan
Quote  
04-13-2008, 01:51 PM
Thumbs up Re: Deal With Spybots, Spambots and Scrapers

hey thanks for the FYI here!
__________________


Reply With Quote
CoolFinalFan is offlineReport Post
Subeesh R, X10 Senior Moderator

tittat's Avatar

Join Date: Sep 2007
Posts: 1,723
Credits: 1,581
tittat is a glorious beacon of light
Location: Kerala,India

Quote  
04-14-2008, 04:01 AM
Re: Deal With Spybots, Spambots and Scrapers

One doubt... not related to this topic.

i have my .htaccess file with rewrites and a lot of other stuffs.
My question is if i have my .htaccess file too bigger,will that affect my "sites response time"?
__________________
If you find my post helpfull pls give me the reputation.




non ref id: co.cc


Reply With Quote
tittat is offlineReport Post
x10 Sophmore

Join Date: Mar 2008
Posts: 120
Credits: 2,727
rockee has a spectacular aura about
Quote  
04-14-2008, 04:38 AM
Re: Deal With Spybots, Spambots and Scrapers

You would not notice any overhead from a large .htaccess file doing mod_rewrites or doing any of it's tasks - it is only a folder by folder very tiny extension of the server's httpd.conf file anyway, imagine the size of a hosting company like X10 Hosting and the huge server configuration files it uses, but you would not notice much overhead at the browser level at all from those conf files being parsed.

My .htaccess file is huge by normal standards and contains 80% mod_rewrites and there is no noticeable overhead, and in any case how would you measure that latency, if there is any at all?

The .htaccess file even with many entries and jobs to do is usually much less than 10k, most less even than 1k, and compared with a 30k web page or a 60k graphic image being served this .htaccess file would use only a flea bite of the server's resources in comparison.

Regards,
Rocky
Reply With Quote
rockee is offlineReport Post
Subeesh R, X10 Senior Moderator

tittat's Avatar

Join Date: Sep 2007
Posts: 1,723
Credits: 1,581
tittat is a glorious beacon of light
Location: Kerala,India

Quote  
04-14-2008, 05:01 AM
Re: Deal With Spybots, Spambots and Scrapers

Quote:
You would not notice any overhead from a large .htaccess
thanxs rockee, this is what i wish to hear......


Any others have different opinion?
__________________
If you find my post helpfull pls give me the reputation.




non ref id: co.cc


Reply With Quote
tittat is offlineReport Post
x10 Sophmore

Join Date: Mar 2008
Posts: 120
Credits: 2,727
rockee has a spectacular aura about
Quote  
04-14-2008, 05:35 AM
Re: Deal With Spybots, Spambots and Scrapers

If you want a definitive answer or an informed opinion, then you should post your question in a forum where the tech support staff frequent most, as they are the only people at X10 Hosting that can give you the correct answer in relation to their servers.

The parsing of .htaccess files in service by clients on my servers, before I retired, did not noticeably affect those servers - what did affect the servers was all massive amount of needless traffic from the bad bots and scrapers, which the .htaccess files and the measure in place at the servers effectively reduced.

Regards,
Rocky
Reply With Quote
rockee is offlineReport Post
Subeesh R, X10 Senior Moderator

tittat's Avatar

Join Date: Sep 2007
Posts: 1,723
Credits: 1,581
tittat is a glorious beacon of light
Location: Kerala,India

Quote  
04-16-2008, 06:33 AM
Re: Deal With Spybots, Spambots and Scrapers

whenever i read the comments of rockee i am forced to give him reputation points...and i did...
i will say
Quote:
rockee will become famous soon.
Regards,
Subeesh
__________________
If you find my post helpfull pls give me the reputation.




non ref id: co.cc


Reply With Quote
tittat is offlineReport Post
x10 Sophmore

Join Date: Mar 2008
Posts: 120
Credits: 2,727
rockee has a spectacular aura about
Quote  
04-16-2008, 07:14 AM
Re: Deal With Spybots, Spambots and Scrapers

Thank you kindly Subeesh, I can appreciate you hunger for knowledge as I too have been there and done that, but for the life of me, I still can't satisfy my hunger.

Kindest regards and best wishes always,
Rocky
Reply With Quote
rockee is offlineReport Post
x10 Lieutenant

Zangetsu's Avatar

Join Date: Mar 2008
Posts: 414
Credits: 2,681
Zangetsu will become famous soon enough
Location: somewhere out there

Quote  
04-16-2008, 11:53 AM
Re: Deal With Spybots, Spambots and Scrapers

cool, but is there also a way to get rid of those spiders ?
__________________
wanna play a game? Click here
please help me out and register Here


This section is in need of attention from an expert on the subject.
Reply With Quote
Zangetsu is online nowReport Post
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


All times are GMT -5. The time now is 06:33 PM. Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO 3.2.0 RC7
vBCredits v1.4 Copyright ©2007 - 2008, PixelFX Studios

Books | 3dge Viral Emails | Currency Converter | Per Insurance | Hsbc