February 27th, 2005
My web server started running incredibly slowly the other day and my
first thought was that something nefarious was going on. So the
question was: how do I know what's going on? I'd never had cause to
find out before. I knew Apache kept activity logs, somewhere, and I'd
heard of Webalizer - but I'd never looked any further.
Webalizer is a web-based application that uses log data and
presents you with a large set of statistics in the form of tables and
charts, in a series of web pages. Apache on Fedora Core 3 comes
complete with Webalizer, so all I had to do was find out how to access
it.
By default, you can only view the Webalizer pages from the web
server machine itself (i.e. from 127.0.0.1) but you can change this by
editing Webalizer's config file.
I edited /etc/httpd/conf.d/webalizer.conf and added:
Allow from <IP address>
to allow me to view the Webalizer pages from my laptop (the specified IP address).
However, I use Apache as a proxy for the Zope web server for most of the stuff on my web site, so itauthor.com/anything gets rewritten and forwarded on to the Zope server - unless you specifically tell Apache not to. Before I could browse to itauthor.com/usage/ I therefore had to add the following rewrite condition to the httpd.conf file:
RewriteCond %{REQUEST_URI} !^/usage/.*
I put this before the URL rewrite rule (described elsewhere in this site).
As with any change to the httpd.conf file, I then had to restart Apache for the changes to take effect:
service httpd restart
On browsing to <yourdomain>/usage/ you are
presented with lots of nice stats about how many hits you've had, which
pages are accessed most, by whom, etc. You can drill down by month and
week, to find detailed information about what's been going on.
My stats showed me that the page on my site that was getting most hits was /cgi-bin/mt/mt-comments.cgi - MovableType's comments script.
After having had my fingers burnt in the past by allowing people to
comment on my blog, this is now disabled. It therefore does nothing,
and the spammers' calls to it simply waste a little bit of my bandwidth
and some of the server's processing power. I've therefore renamed this
file something fairly obscure, so that spammers can't find it and can't
try to use it. I could have deleted it, but I might want to use it at
some point in the future.
I'd read some stuff recently about bandwidth-theft, which is where
people use your site to supply content to theirs, without using space
on their server. The most common example of this is where people
display an image on your site within their own site, by "hotlinking" to
a URL on your domain.
There are two issues here:
a) You might not want them using the image in the first place (especially if they don't credit your site as the source).
b) You certainly don't want them slowing down your site by using your bandwidth rather than their own.
This is only really a problem if you have lots of images that are
being reused in this way, or if your images are high-resolution
pictures that use a lot of bandwidth to serve up.
To reduce the scale of this problem, you can rewrite requests for an
image, based on the referring site. You can either simply refuse to
send the image, or you can replace every image that's requested with a
single, small replacement image. If you take the latter approach, the
replacement image appears on the other sites, rather than the images
the sites' authors tried to link to. For example, the image I send is:

To achieve this I first enabled the use of .htaccess by editing the httpd.conf file (again), changing:
<Directory "/var/www/html"> Options Indexes Includes FollowSymLinks AllowOverride None Allow from all Order allow,deny</Directory>
to:
<Directory "/var/www/html"> Options Indexes Includes FollowSymLinks AllowOverride All Allow from all Order allow,deny</Directory>
Again, I then restarted Apache.
Once overrides are enabled, you can put an .htaccess file in the directory specified in the config file, or a child directory of it. In my case, I put it in the html directory, meaning that all content on the site is affected by it (because it's affects cascade down into child directories).
The .htaccess file in my html directory contains:
RewriteEngine on
RewriteCond %{HTTP_REFERER} ^$RewriteCond %{REQUEST_URI} !^/imageslinkable/[^/]*$RewriteRule \.(gif|jpg|jpeg)$ http://www.itauthor.com/imageslinkable/nosuchimage.gif [R,L]
RewriteCond %{HTTP_REFERER} !^http://[^.]+\.itauthor\.com/.* [NC]RewriteCond %{HTTP_REFERER} !^http://itauthor\.com/.* [NC]RewriteCond %{HTTP_REFERER} !^http://nnn\.nnn\.nnn\.nnn*$ [NC]RewriteCond %{REQUEST_URI} !^/imageslinkable/[^/]*$RewriteRule \.(gif|GIF|jpg||JPG|jpeg|JPEG|png|PNG)$ http://www.itauthor.com/imageslinkable/hotlinksbanned.gif [R,L]
In this example, nnn\.nnn\.nnn\.nnn is an IP address.
This rewrites calls that don't come from itauthor.com and changes the image types I use, so that all that is returned is the very small hotlinksbanned.gif
file, shown above. This trick is obviously only worth doing if the
replacement image is nice and small, otherwise you're not going to
reduce the bandwidth use by much.
I had to add a directory called imageslinkable for the hotlinksbanned.gif file and then use the line:
RewriteCond %{REQUEST_URI} !^/imageslinkable/[^/]*$
to exempt this directory from URL rewriting. If I hadn't done this,
then when the original URL is rewritten the browser would go off to get
http://www.itauthor.com/imageslinkable/hotlinksbanned.gif, at which point this URL would also be caught by the rewrite rule and the URL would be rewritten to http://www.itauthor.com/imageslinkable/hotlinksbanned.gif,
and so on, so you would never see the replacement image. Most browsers
would eventually show the user an error message about redirection.
Important
Don't rewrite the URL to the URL of an image on someone else's web site or you then become one of the bad guys!
Testing your rewrite rule
http://altlab.com/ has an explanation of hotlinking and how to prevent bandwidth theft:
http://altlab.com/htaccess_tutorial.html
although some of the examples are missing a closing ] at the end of a line.
Note: In rewrite conditions [NC] means to ignore case, and
[OR] means treat this condition and the following one as "this OR
that", rather than the implied "this AND that".
The altlab.com site and also has a very nice hotlink checker (http://altlab.com/hotlinkchecker.php) which allows you to enter the URL for an image on your site to check whether it can be view when called from another web site.
The bad news
This rewriting trick has a down-side. Some anti-virus software, or
proxying within a network, removes or rewrites the REFERER data. This
can result in some people being able to bypass the rewriting and see
the original image on instead of the replacement one. More of a problem
is the fact that some visitors to your site will not be able to see any
images, or or will see the replacement image instead of each and every
image on your site.
This isn't likely to be a huge problem, but it will happen, so
maybe you just have to decide whether it's worth inconveniencing some
visitors to your site in order to prevent your bandwidth being
hijacked. If you want to use it for a personal site, you can always try
it out and see how many people complain. That's the approach I've
adopted.
Related information
It turned out the sluggishness of my web
server had nothing to do with bandwidth theft. The machine I run Apache
on was also running a broken IMAP mail server, which was the culprit.
I've now turned that off, so normal service is resumed.
For related information about spam referrers, check out:
www.candygenius.com/how_to_lock_out_those_rat_bastard_comment_and_referrer_spammers
www.thebluesmokeband.com/stop.spamdexing.php
Incidentally, ...
It turned out the sluggishness of my web
server had nothing to do with bandwidth theft. The machine I run Apache
on was also running a broken IMAP mail server, which was the culprit.
I've now turned that off, so normal service is resumed.
Potentially similar posts