This form spiders a local copy of a Web site tree, starting at the page indicated, and working outwards, mainly to locate Anchor errors. Miscellanea are reported. Pages are read into an iframe and parsed by the browser. See also program CHEKLINX.EXE, via index.
The table may change or be out-of-date.
| Browser | Effects | |||||
|---|---|---|---|---|---|---|
| Misc. | Timeout!=0 | Timeout==0 | Read Self | Go Up | Usable | |
| MS IE 8 | [1] | OK | NO | YES | yes [5] | yes |
| Firefox 3.0 | OK | OK | NO [2] | NO [3] | YES | |
| Opera 9.6 | OK | OK | NO [4] | YES | YES | |
| Safari 4.0 | OK | OK | YES | YES | YES | |
| Chrome 3.0 | OK | OK | YES | YES | YES | |
This page has been developed mainly in Chrome (which is fast) and Opera (where success came first), using Windows XP sp3.
The pages scanned should be free of HTML and onload script syntax errors.
It is assumed that all folder and file names (but not anchor names) within the site will be lower-case on the server, and that therefore they will be lower-case in the links arrays and can be made lower-case within this code.
Program CHEKLINX.EXE reads the files exactly as on disc, with simplified parsing. This LINXCHEK.HTM uses the page structure at the completion of loading. Scripts executed during loading can, but commonly do not, add and/or remove anchors and/or links.
The Directory File (if any) is read first, and its lines are stored using an Object as in the box on the left.
A complete Entry
{Name: string,
Shib: number,
Ankas: object,
Dupes: number,
Cites: object,
Next: object}
The named page is read next. Page data is held in a linked list, and a complete entry is as the boxed form on the right. When a page is read, its anchors and links arrays are attached to Ankas and Cites. New entries named in Cites that are not folder names, exist on the disc (using the object FromDIR), and have an extension given as acceptable are added to the list. An object PagesObj is given an entry named for each file added to the list, and used to determine whether a name is new. While a page is being read, a similar object detects duplicate anchors.
After all necessary pages have been read, the entries are scanned to see whether all anchors are present on the appropriate pages, and whether any are duplicated on a page, etc.
The code uses simple browser-testing in order that, at least on my machine, the controls are initially set suitably.
If the page is invoked with something like linxchek.htm?GoAt=page.htm&Tout=0 (case-dependent query part) then it will immediately run beginning at page.htm, with elements of the Form optionally set by name from the query string. Form input controls currently are, in order, GoAt Xtns Tout Shib Smod Self GoUp WkDy . Use 0/1 / true/false for checkboxes.
Otherwise, you must set the controls and press the long button in the usual way.
A "Directory File" should be named. Otherwise, linked site files will be read on the presumption that they exist. If they do not, the consequences may be browser-dependent.
c:\current\astron-1.htm c:\current\programs\ c:\current\programs\someprog.pas
Its contents must resemble what, at a Windows XP command prompt, is given by DIR /B /S > $DIR.TXT (see green box on right). The characters / \ are equivalent; directories do not need a trailing slash, case does not matter. If you test on the system that you are using to read this, the c:\current\ part must match the corresponding part of what appears on yellow above after ' this : '.
The program will then read only page files that exist and have listed extensions.
The decision as to whether a file is deemed Present or Missing depends entirely on the Directory File that the user provides. Peculiar cases may remain to be handled. For example, a link to a file name containing ^ was earlier considered to be a link to a file with %5E in that position.
The Directory File is read using the iframe. Its contents are read with textContent or innerText.
When the WkDy box is set, the pages are scanned for anything like an ISO 8601 date (yyyy-mm-dd) followed by whitespace then exactly three letters (XXX). Whenever that is found, the date part is checked for validity and the day-of-week of the date is compared with XXX. Any apparent error is reported in a Confirm box. Certain common XXXs, such as "the" and "and", are disregarded. The code to do this was taken from Day-of-Week Checking in JavaScript Miscellany 1, with subsequent changes.
Years above 275350 are not checked. Negative years may be a problem?
The best ways of dealing with false positives include
• If the date is intentionally invalid, use another
separator;
• Rephrase so that the date, as displayed,
is not followed by a TLA.
This option slows the program considerably, in some browsers.
There may be no test for the initially-named page being missing; but that will rapidly become obvious.
A browser may not let a page load a copy of itself into its own iframe. If Self coerces to false, this page will not be queued for reading. To check that, try it as Start File.
A browser may not be able to handle pages in subdirectories here. If GoUp coerces to false, subdirectory pages will not be queued for reading.
If a file or directory name could be interpreted by JavaScript as a number, then there could possibly be a clash with another name that corresponded to the same number.
Some browsers MAY get confused about which directory a page is in.
With some browsers, e.g. Opera 10.01, it may be necessary that the pages scanned are free of "major" errors.
The code may seem slow to start; maybe a browser needs to get resources.
The code may tend to appear to run in fits and starts.
The browser's error-display system will indicate, for the pages scanned, any recognised errors in their HTML and in any of their scripts that run during or on page-load. Dismissing such an error should allow this page to proceed.
A run is completed and analysed when the status line of the form goes green, the iframe vanishes, and more buttons appear instead, within the blue Form. Press them in any sequence.
More into Consolidation? Check drive matches $dir.txt. Remove Root from FromDIR indexes?? Test a single file??