Pages list

The page list lets you keep track of the various URLs that are (or were) seen by the proxy. Aside from the overview of and direct access to content on the individual pages, the page list also allows you to visit the various proxy preview domains, fine-tune inclusion/exclusion settings and check the translation progress of individual pages.

../_images/pages_list.pngPages list

This menu has two main parts visible when you first open it: the “Pages, resources, external links & unknown” pane and the include/exclude rules.

Also notice the language selector on the right of the project selector. Information in the translation progress bars will refer to the selected language, and both the List and Highlight Views will open for the selected target language. The same is true for the various types of Preview links.

Make sure that you have selected a target language to gain access to many powerful features in the page list. If you do not (or there is no target language on the project at all), neither the Workbench links, nor the Previews will be available, and most of the context menu options will be greyed out.

Include & exclude rules

Inclusion rules let you set the scope of translation, that is, the list of pages that the crawler is allowed to visit. You can enter the prefixes that you wish to limit the scope to (or exclude from it). You can also exclude individual pages should you see fit.

The Dashboard has a number of features that support inclusion/exclusion prefixes, such as Auto-pretranslation or Work packages. However, the rules you specify here are special and powerful: they have influence over the entire project. An excluded page will stay untranslated over any proxy domains (both Preview and Live), and excluded pages are ignored by the crawler.

NOTE: You can enable/disable the application of rules using the checkbox associated with them or you can delete them completely.

Rule Application

Whenever the proxy comes across a page in an affected context (crawling, serving, translating, etc.), it will evaluate it according to the rules you provide. This process is summarized in the flowchart below:

../_images/inclusion-rules-eval.pngEvaluation of Inclusion/Exclusion Rules

A few points of note concerning inclusion/exclusion rules:

  • Path names are first checked for inclusion, then exclusion.
  • A path name has to match only one from the set of inclusion rules. Each such rule is applied to the path in sequence until a match is found or there are no more rules.
  • The rules are strings and matched from the beginning of the pathname. The proxy does not analyze them in detail or produce complex internal representations.
  • Query parameters are supported (but be careful, you can’t really count on a query parameter to have a set position!).
  • If a path falls outside of the scope of your inclusion rules or an exclusion rule applies to it, it will be greyed out in the page list and the text “Excluded by rule” will be visible next to it.
  • Paths that are excluded by rule can’t be included using the context menu. You need to edit your rules if they gobble a path that you want to include.
  • Manual page exclusions overwrite all other inclusion rules.

NOTE: If all pages are excluded on a project, crawls cannot be started, even if the currently active rules would allow for the inclusion of some, as-of-yet undiscovered page. In this case, crawls will exit after 0 pages visited. You have to ensure an entry point: that at least one of the known pages is in an included state. Otherwise, the crawler can’t set its foot in the door.

Though it may seem nonsensical to exclude every single URL on a project, we note this unusal case because it can come about from inadvertent use of inclusion rules.

Consider, for example, that if you set /en/ as the sole “Include only” rule on your project, but no page starting with /en/ is in the page list, then not a single valid entry point is provided to the crawler.