Checking website links
November 28th, 2003
I was thinking of writing a Perl script that would work as a spider and check links on a website. Then I thought, yes I could do that, but someone must have done it already.
A search on HotScripts.com for link-checking scripts found: checklinks by James Marshall
http://www.jmarshall.com/tools/cl/ (last updated March 26, 2000)
This command-line Perl script parses HTML pages and checks the links (images as well as other HTML pages) and prints the details of any links that return an error.
The script only checks pages on the same host as the script. It won't check remotely hosted pages. You run it from the command line. The usage and options are shown if you run the command checklinks.pl
Here is a typical command:
checklinks.pl -v -I documentation http://www.whatever.xxx/documentation/index.html > checkresults.txt
This checks all links that have "documentation" somewhere in their path (this is specified with the -I flag), starting with the page http://www.whatever.xxx/documentation/index.html and puts all the results in a file called checkresults.txt.
Notes:
- The -v specifies verbose mode.
- Because I've restricted the checking to paths containing "documentation it won't check a link on a documentation page called mystuff/web-pages/images/mypic.gif because it doesn't contain the word "documentation". This, therefore, stops the checking spreading to non-documentation pages, but doesn't ensure that all documentation pages are free of broken links.
This works quite nicely and I could adapt it to suit my own requirements, but when I showed a colleague, he said he used Xenu Link Sleuth to check the links on his site. So I went and checked that out.
Potentially similar posts
- Help is just a search and a click away – August 2010
- Perl basics for beginners (on Windows) – August 2010
- EasyListener resurrected – June 2010
- How quickly do your pages load around the world – April 2009
- Xenu link checker 5 years on – November 2008