Web

Checking website links

November 28th, 2003

I was thinking of writing a Perl script that would work as a spider and check links on a website. Then I thought, yes I could do that, but someone must have done it already.

A search on HotScripts.com for link-checking scripts found: checklinks by James Marshall
http://www.jmarshall.com/tools/cl/ (last updated March 26, 2000)

This command-line Perl script parses HTML pages and checks the links (images as well as other HTML pages) and prints the details of any links that return an error.

The script only checks pages on the same host as the script. It won't check remotely hosted pages. You run it from the command line. The usage and options are shown if you run the command checklinks.pl

Here is a typical command:
checklinks.pl -v -I documentation http://www.whatever.xxx/documentation/index.html > checkresults.txt

This checks all links that have "documentation" somewhere in their path (this is specified with the -I flag), starting with the page http://www.whatever.xxx/documentation/index.html and puts all the results in a file called checkresults.txt.

Notes:

  1. The -v specifies verbose mode.
  2. Because I've restricted the checking to paths containing "documentation it won't check a link on a documentation page called mystuff/web-pages/images/mypic.gif because it doesn't contain the word "documentation". This, therefore, stops the checking spreading to non-documentation pages, but doesn't ensure that all documentation pages are free of broken links.

This works quite nicely and I could adapt it to suit my own requirements, but when I showed a colleague, he said he used Xenu Link Sleuth to check the links on his site. So I went and checked that out.

Read the rest of this entry »

Leave a comment

 

HTML tidy

November 22nd, 2003

Following on from my previous posting about perltidy. Here's an example of the output from perltidy. Note: this script is something I was working on a while ago – an online glossary – and I am not sure the version I've HTML-ised is the finished version. It's intended purely as an example of what perltidy does, not as an example of a working Perl script. I have now found a program that will reformat your HTML. I should have found it first time of looking really, because it's a very close relative of perltidy. HTML Tidy (originally written by Dave Raggett, now maintained by volunteers on SourceForge at: http://tidy.sourceforge.net/ You can download a Windows executable at: http://tidy.int64.org/ This is a command-line program – i.e. you open a console window (Start > Run > "cmd") and type in a command. For best results, put tidy.exe somewhere in your PATH (see What is your PATH?). Tidy UI (a Windows graphical user interface for HTML Tidy, written by Charles Reitzel): http://users.rcn.com/creitzel/tidy.html#tidyui This is a very nice GUI version of HTML Tidy. You'll want to use this if you don't like working on the command line. From a brief test it seems to do the job and is very easy to use. HTMLtrim: http://sourceforge.net/projects/htmltrim This is another GUI front end to HTML Tidy, but after one quick try-out I don't like it as much as tidyui, because it's not as Windowsy as tidyui and doesn't seem to offer much more than using the command line. By default it overwrites your original HTML file with a reformatted one. I'm sure this is one of the configuration options you can change, but I don't like overwriting the input file as default behaviour. Read the rest of this entry »

Leave a comment



^ back to top ^

Page 14 of 14« FirstNewer1011121314