Google

netrik hacker's manual
>========================<

[This file contains a description of the page handling system. See hacking.txt or hacking.html for an overview of the manual.]

The page handling system is responsible for the central browser functionality: Loading pages to be displayed, loading new pages when links are followed etc. This also includes all page history handling, as all page loads either affect the page list (history), or depend on it, or both; and all history commands involve a page load.

The page handling is also a central component, because it invokes all of the other modules: The file loader (hacking-load.*) is used to fetch a new document (file) if necessary, and the layout engine (hacking-layout.*) to prepare it for rendering; the pager (hacking-pager.* is (indirectly) invoked to display the new page; and the link handling mechanism tells when and what page to load. Moreover, most of the modules are also coupled to the page handling, because they use the page structure handled by the page loading mechanism.

So the page handling is really the central component, controlling the program flow. Thus it can only partially be located in an own file (page.c); part of the page handling has to be done in main() directly.

load_page()

load_page() ist the main function of load.c, and does most of the work. It is responsible for loading a page so that it will be shown in the pager next time display() (see hacking-pager.*) is called.

The exact actions necessary to achieve that vary on the nature of the load operation. In the most basic operation mode it requires adding a new entry to the page list and loading a new document to use. Sometimes however there are only changes to the page list while the document is reaused, or the the page history stayes unchanged while some existing entry is reactivated and the settings reloaded. The case of reloading some document while the page list isn't altered, is possible as well.

load_page() takes almost the same arguments as layout(): A base URL, which is usually the URL of the current page, which is used as the base when following a realtive link or so; a main URL (as string), which can be absolute or relative, in the latter case to be combined with the base URL to form an absolute target URL; an optional form item, which tells where to find the form data of a form to submit while loading a new document; the page width telling how to layout loaded pages; and an error handle informing whether somer problem occured while loading the document.

In case a new document needs to be loaded, all of these parameters are simply passed on to layout(), which is responsible for actually loading a document from a file or a HTTP server, as well as invoking all necessary layouting passes thus the document can be actually rendered and displayed in the pager. This process is described in hacking-layout.*.

load_page() takes an additional "reference" parameter. This one may refer to some entry in the page list (normally the current entry), which contains an already loaded document to reuse. In this case no document load needs to be performed (i.e. layout() isn't called), as the layouting data of the reference page is reused. (Most of the other parameters aren't used in this case.) This feature is used when jumping to an anchor inside a document -- it doesn't require loading a new document.

First action (even before loading the document) is creating a page descriptor -- regardless whether a new document is loaded or existing layouting data is reused.

This however is left out if the page is reloaded from history (indicated by "url" being NULL), and thus already has a descriptor somewhere in the page list.

Page List Handling

The page list (history) is a global variable (of type "struct Page_list"), which basically consists of an array of pointers to individual page descriptors. It also stores the current number of entries in the list, as well es the current active entry. (The one that describes the page visible in the pager.)

The list has one entry for each page in the page history. Normally the last entry is the visible page, but other pages are also possible after going back in history.

Each page descriptor is of type "struct Page". This struct contains:

The page list is manipulated exclusively by load_page() and its helper functions. load_page() is also responsible for keeping the list up to date, so it always contains exactly those entries that shall be used by the history commands.

Thus, before a new page descriptor is created, the page list needs to be adjusted.

Normally a new entry is simply appended at the end of the list; however, there are several other cases.

When loading some new page while not being at the last entry in the page list (after going back to some older page from the page history), all entries after the current one have to be discarded.

Moreover, if the current page is an internal page (either a page loaded from stdin, or an error page), it isn't to be kept in history; in that case, we go further back in history until we find the last normal page entry.

All entries after the current or the last normal one are then cleread by calling free_page() in a loop.

Afterwards, the new page descriptor is created.

When reloading a page from history, it may also be necessary to delete internal pages, if leaving such. The last non-internal page is determined (starting with the end of the list), and all following are deleted.

add_page()

The add_page() function (in url-history.c) is responsible for actually creating the new page list entry. The list is a "struct Page_list", and contains the following information:

  • "num" is the number of entries currently stored in the list
  • "pos" is the entry number of the entry corresponding to the currently visible page
  • The history entries themselfs are stored in "page", which is an array of pointers to the page descriptors of all pages.

The new entry is added at the position indicated by "page_list.pos".

First the array is resized to the new history size -- the history now will end with the new entry generated. Then a new page descriptor is created and the pointer to it is stored at the proper position (the last list entry).

After creating the descriptor, some default values are set. (Pager position etc.)

Loading

Now having the page descriptor, we need to get the layouting data some way so the page can be displayed in the pager.

Again, the standard case is loading a new document using layout(). The pointer to the layouting data returned by that function is simply stored in the page descriptor. The URL is extracted from the layouting data and stored in the explicit "url" pointer of the page descriptor, which is necessary in case the layout data is descarded (when loading another document), but the page ist kept in history.

Local Links

As mentioned before, it's also possible instead of loadin a new document, to reuse existing layouting data of another one by passing a "reference" page. The primary application of that is when following a link that points to some anchor in the same document -- that doesn't require reloading the whole document, but just jumping to the anchor. Of course, it is also used when returning to the previous page after following such a local link, or going forward again.

The (pointer to) the layout data descriptor is simply copied from the page list entry indicated by the given "reference" parameter.

As layout() and thus also init_load() (see hacking-load.*) isn't used in this case, merge_urls() has to be called directly. If URL merging fails here, load_page() returns immediately; no page descriptor is created and nothing else is changed.

Also, if some anchor is active in the reference page, highlight_link() needs to be used to remove the highlighting, to get a "clean" item tree.

Anchors

If the URL contains a fragment identifier, the corresponding anchor is retrieved from the anchor list, and stored in "page->active_anchor"; this is described under Anchors in hacking-links.*. The pager then jumps to the anchor position and highligts it upon startup.

Handling in main()

Although load_page() does a great part of the work, some things have to be taken care of in main(); particularily determining in which manner the new page is to be loaded (reuse current document or load new one), and clearing the layout data of an old document before loading a new one. Maybe that could be done in load_page() too; however, it's not worth considering that now, as it will probably need to be handled completely different with the planned new basic program structure. (Using an event queue and a main dispatcher.)

Another thing that needs to be handled in main() is initiating a page load in reaction to commands given by the user inside the pager (or on the command promt), which often can't be clearly seperated from the load operation itself.

If the user activates the command prompt (by typing ':' inside the pager) and issues the ":e" or ":E" command, the URL is extracted from the command, and load_page() is called to load the desired new page. The URL of the current page is used as base for a relative URL with ":e". With ":E" (and also for ":e", if the current page is internal), no base is used; the URL is always interpreted absolutely.

These commands never pass a reference page, i.e. always involve loading a new document. (Even though it is possible to jump to a local anchor using ":e #anchor".)

If a link/form control was activated by pressing <return> on a selected link inside the pager, the action depends on the link or from element type. For normal links load_page() is used with the current URL as base, just as with ":e". The link URL is extracted from the text item containing the link with get_link(), by help of the "link_list" structure. This process is described under Following Links in hacking-links.*.

The only difference to the ":e" command (except for the way of getting the target URL) is that the current page is passed as "reference" to load_page() if the URL starts with '#' (i.e. points to an anchor in the same document), so that the document isn't reloaded in that case, but only the anchor is activated.

Form submit buttons are quite similar to normal links. First, get_form_item() (also in hacking-links.*) is used to retrieve the item (from the structure tree) that describes to the form in which the button resides. This is used to get the form's submit address ("action") first. Having this, load_page() is used to do the submit; the form item is passed as the "form" argument. (And passed on to init_load() there; see hacking-load.*.) This is both to tell init_load() that a form is to be submitted, and where to find the form data. init_load() (and its sub-functions) then take care of extracting the form data (using url_encode() or mime_encode() from forms.c, also described in hacking-links.*) and submitting it to the server. The resulting response page is loaded just like any other document.

Other form controls do not issue a load operation, but only adjust the form value appropriately. (This is described under Manipulating in hacking-links.*.)

The 'u', 'U' and 'c' pager commands also aren't really load operations, but they are described here because they involve similar actions as the preparations for a page load.

If 'u' was typed, the link URL is retrieved the same way like when following a link, and then just printed to the screen.

'U' is similar. Instead of printing the (relative) link URL directly, it merges it with the current page URL, thus getting the same absolute target URL that would be used if the link was actually followed.

'c' simply prints the "full_url" component of the current page URL.

If a history command was given, load_page() is called with the (split) URL taken from the requested "page_list" entry. We know which entry to take by "page_list.pos", which is set the desired new value before returning from the pager.

If the history entry refers to the same HTML document as the one displayed up to now, the current page descriptor is passed as "reference". To determine whether it is the same document, we need to check if all entries between the old and the new one (regardles whether the new one is before or after the old one in history) have "local" URLs, i.e. if the newer of the two entries was created only by following links to local anchors from the older one.