From 10831dd274ff65d4852b47dbc398adae61845206 Mon Sep 17 00:00:00 2001 From: Bernhard Posselt Date: Sat, 4 May 2013 00:15:41 +0200 Subject: use html purifier for sanitation --- 3rdparty/htmlpurifier/docs/dev-advanced-api.html | 26 + 3rdparty/htmlpurifier/docs/dev-code-quality.txt | 29 + 3rdparty/htmlpurifier/docs/dev-config-bcbreaks.txt | 79 ++ 3rdparty/htmlpurifier/docs/dev-config-naming.txt | 164 +++ 3rdparty/htmlpurifier/docs/dev-config-schema.html | 412 +++++++ 3rdparty/htmlpurifier/docs/dev-flush.html | 68 ++ 3rdparty/htmlpurifier/docs/dev-includes.txt | 281 +++++ 3rdparty/htmlpurifier/docs/dev-naming.html | 83 ++ 3rdparty/htmlpurifier/docs/dev-optimization.html | 33 + 3rdparty/htmlpurifier/docs/dev-progress.html | 309 +++++ .../htmlpurifier/docs/dtd/xhtml1-transitional.dtd | 1201 ++++++++++++++++++++ 3rdparty/htmlpurifier/docs/enduser-customize.html | 850 ++++++++++++++ 3rdparty/htmlpurifier/docs/enduser-id.html | 148 +++ 3rdparty/htmlpurifier/docs/enduser-overview.txt | 59 + 3rdparty/htmlpurifier/docs/enduser-security.txt | 18 + 3rdparty/htmlpurifier/docs/enduser-slow.html | 120 ++ 3rdparty/htmlpurifier/docs/enduser-tidy.html | 231 ++++ 3rdparty/htmlpurifier/docs/enduser-uri-filter.html | 204 ++++ 3rdparty/htmlpurifier/docs/enduser-utf8.html | 1060 +++++++++++++++++ 3rdparty/htmlpurifier/docs/enduser-youtube.html | 153 +++ 3rdparty/htmlpurifier/docs/entities/xhtml-lat1.ent | 196 ++++ .../htmlpurifier/docs/entities/xhtml-special.ent | 80 ++ .../htmlpurifier/docs/entities/xhtml-symbol.ent | 237 ++++ 3rdparty/htmlpurifier/docs/examples/basic.php | 23 + 3rdparty/htmlpurifier/docs/fixquotes.htc | 9 + 3rdparty/htmlpurifier/docs/index.html | 188 +++ 3rdparty/htmlpurifier/docs/proposal-colors.html | 49 + 3rdparty/htmlpurifier/docs/proposal-config.txt | 23 + .../htmlpurifier/docs/proposal-css-extraction.txt | 34 + 3rdparty/htmlpurifier/docs/proposal-errors.txt | 211 ++++ .../htmlpurifier/docs/proposal-filter-levels.txt | 137 +++ 3rdparty/htmlpurifier/docs/proposal-language.txt | 64 ++ .../htmlpurifier/docs/proposal-new-directives.txt | 44 + 3rdparty/htmlpurifier/docs/proposal-plists.txt | 218 ++++ 3rdparty/htmlpurifier/docs/ref-content-models.txt | 50 + 3rdparty/htmlpurifier/docs/ref-css-length.txt | 30 + 3rdparty/htmlpurifier/docs/ref-devnetwork.html | 47 + .../htmlpurifier/docs/ref-html-modularization.txt | 166 +++ .../htmlpurifier/docs/ref-proprietary-tags.txt | 26 + 3rdparty/htmlpurifier/docs/ref-whatwg.txt | 26 + 3rdparty/htmlpurifier/docs/specimens/LICENSE | 10 + .../docs/specimens/html-align-to-css.html | 165 +++ 3rdparty/htmlpurifier/docs/specimens/img.png | Bin 0 -> 2138 bytes .../docs/specimens/jochem-blok-word.html | 129 +++ .../specimens/windows-live-mail-desktop-beta.html | 74 ++ 3rdparty/htmlpurifier/docs/style.css | 76 ++ 46 files changed, 7840 insertions(+) create mode 100644 3rdparty/htmlpurifier/docs/dev-advanced-api.html create mode 100644 3rdparty/htmlpurifier/docs/dev-code-quality.txt create mode 100644 3rdparty/htmlpurifier/docs/dev-config-bcbreaks.txt create mode 100644 3rdparty/htmlpurifier/docs/dev-config-naming.txt create mode 100644 3rdparty/htmlpurifier/docs/dev-config-schema.html create mode 100644 3rdparty/htmlpurifier/docs/dev-flush.html create mode 100644 3rdparty/htmlpurifier/docs/dev-includes.txt create mode 100644 3rdparty/htmlpurifier/docs/dev-naming.html create mode 100644 3rdparty/htmlpurifier/docs/dev-optimization.html create mode 100644 3rdparty/htmlpurifier/docs/dev-progress.html create mode 100644 3rdparty/htmlpurifier/docs/dtd/xhtml1-transitional.dtd create mode 100644 3rdparty/htmlpurifier/docs/enduser-customize.html create mode 100644 3rdparty/htmlpurifier/docs/enduser-id.html create mode 100644 3rdparty/htmlpurifier/docs/enduser-overview.txt create mode 100644 3rdparty/htmlpurifier/docs/enduser-security.txt create mode 100644 3rdparty/htmlpurifier/docs/enduser-slow.html create mode 100644 3rdparty/htmlpurifier/docs/enduser-tidy.html create mode 100644 3rdparty/htmlpurifier/docs/enduser-uri-filter.html create mode 100644 3rdparty/htmlpurifier/docs/enduser-utf8.html create mode 100644 3rdparty/htmlpurifier/docs/enduser-youtube.html create mode 100644 3rdparty/htmlpurifier/docs/entities/xhtml-lat1.ent create mode 100644 3rdparty/htmlpurifier/docs/entities/xhtml-special.ent create mode 100644 3rdparty/htmlpurifier/docs/entities/xhtml-symbol.ent create mode 100644 3rdparty/htmlpurifier/docs/examples/basic.php create mode 100644 3rdparty/htmlpurifier/docs/fixquotes.htc create mode 100644 3rdparty/htmlpurifier/docs/index.html create mode 100644 3rdparty/htmlpurifier/docs/proposal-colors.html create mode 100644 3rdparty/htmlpurifier/docs/proposal-config.txt create mode 100644 3rdparty/htmlpurifier/docs/proposal-css-extraction.txt create mode 100644 3rdparty/htmlpurifier/docs/proposal-errors.txt create mode 100644 3rdparty/htmlpurifier/docs/proposal-filter-levels.txt create mode 100644 3rdparty/htmlpurifier/docs/proposal-language.txt create mode 100644 3rdparty/htmlpurifier/docs/proposal-new-directives.txt create mode 100644 3rdparty/htmlpurifier/docs/proposal-plists.txt create mode 100644 3rdparty/htmlpurifier/docs/ref-content-models.txt create mode 100644 3rdparty/htmlpurifier/docs/ref-css-length.txt create mode 100644 3rdparty/htmlpurifier/docs/ref-devnetwork.html create mode 100644 3rdparty/htmlpurifier/docs/ref-html-modularization.txt create mode 100644 3rdparty/htmlpurifier/docs/ref-proprietary-tags.txt create mode 100644 3rdparty/htmlpurifier/docs/ref-whatwg.txt create mode 100644 3rdparty/htmlpurifier/docs/specimens/LICENSE create mode 100644 3rdparty/htmlpurifier/docs/specimens/html-align-to-css.html create mode 100644 3rdparty/htmlpurifier/docs/specimens/img.png create mode 100644 3rdparty/htmlpurifier/docs/specimens/jochem-blok-word.html create mode 100644 3rdparty/htmlpurifier/docs/specimens/windows-live-mail-desktop-beta.html create mode 100644 3rdparty/htmlpurifier/docs/style.css (limited to '3rdparty/htmlpurifier/docs') diff --git a/3rdparty/htmlpurifier/docs/dev-advanced-api.html b/3rdparty/htmlpurifier/docs/dev-advanced-api.html new file mode 100644 index 000000000..4002fb8be --- /dev/null +++ b/3rdparty/htmlpurifier/docs/dev-advanced-api.html @@ -0,0 +1,26 @@ + + + + + + + +Advanced API - HTML Purifier + + + +

Advanced API

+ +
Filed under Development
+
Return to the index.
+
HTML Purifier End-User Documentation
+ +

+ Please see Customize! +

+ + + + diff --git a/3rdparty/htmlpurifier/docs/dev-code-quality.txt b/3rdparty/htmlpurifier/docs/dev-code-quality.txt new file mode 100644 index 000000000..afce502f4 --- /dev/null +++ b/3rdparty/htmlpurifier/docs/dev-code-quality.txt @@ -0,0 +1,29 @@ + +Code Quality Issues + +Okay, face it. Programmers can get lazy, cut corners, or make mistakes. They +also can do quick prototypes, and then forget to rewrite them later. Well, +while I can't list mistakes in here, I can list prototype-like segments +of code that should be aggressively refactored. This does not list +optimization issues, that needs to be done after intense profiling. + +docs/examples/demo.php - ad hoc HTML/PHP soup to the extreme + +AttrDef - a lot of duplication, more generic classes need to be created; +a lot of strtolower() calls, no legit casing + Class - doesn't support Unicode characters (fringe); uses regular expressions + Lang - code duplication; premature optimization + Length - easily mistaken for CSSLength + URI - multiple regular expressions; missing validation for parts (?) + CSS - parser doesn't accept advanced CSS (fringe) + Number - constructor interface inconsistent with Integer +Strategy + FixNesting - cannot bubble nodes out of structures, duplicated checks + for special-case parent node + RemoveForeignElements - should be run in parallel with MakeWellFormed +URIScheme - needs to have callable generic checks + mailto - doesn't validate emails, doesn't validate querystring + news - doesn't validate opaque path + nntp - doesn't constrain path + + vim: et sw=4 sts=4 diff --git a/3rdparty/htmlpurifier/docs/dev-config-bcbreaks.txt b/3rdparty/htmlpurifier/docs/dev-config-bcbreaks.txt new file mode 100644 index 000000000..31114b2b7 --- /dev/null +++ b/3rdparty/htmlpurifier/docs/dev-config-bcbreaks.txt @@ -0,0 +1,79 @@ + +Configuration Backwards-Compatibility Breaks + +In version 4.0.0, the configuration subsystem (composed of the outwards +facing Config class, as well as the ConfigSchema and ConfigSchema_Interchange +subsystems), was significantly revamped to make use of property lists. +While most of the changes are internal, some internal APIs were changed for the +sake of clarity. HTMLPurifier_Config was kept completely backwards compatible, +although some of the functions were retrofitted with an unambiguous alternate +syntax. Both of these changes are discussed in this document. + + + +1. Outwards Facing Changes +-------------------------------------------------------------------------------- + +The HTMLPurifier_Config class now takes an alternate syntax. The general rule +is: + + If you passed $namespace, $directive, pass "$namespace.$directive" + instead. + +An example: + + $config->set('HTML', 'Allowed', 'p'); + +becomes: + + $config->set('HTML.Allowed', 'p'); + +New configuration options may have more than one namespace, they might +look something like %Filter.YouTube.Blacklist. While you could technically +set it with ('HTML', 'YouTube.Blacklist'), the logical extension +('HTML', 'YouTube', 'Blacklist') does not work. + +The old API will still work, but will emit E_USER_NOTICEs. + + + +2. Internal API Changes +-------------------------------------------------------------------------------- + +Some overarching notes: we've completely eliminated the notion of namespace; +it's now an informal construct for organizing related configuration directives. + +Also, the validation routines for keys (formerly "$namespace.$directive") +have been completely relaxed. I don't think it really should be necessary. + +2.1 HTMLPurifier_ConfigSchema + +First off, if you're interfacing with this class, you really shouldn't. +HTMLPurifier_ConfigSchema_Builder_ConfigSchema is really the only class that +should ever be creating HTMLPurifier_ConfigSchema, and HTMLPurifier_Config the +only class that should be reading it. + +All namespace related methods were removed; they are completely unnecessary +now. Any $namespace, $name arguments must be replaced with $key (where +$key == "$namespace.$name"), including for addAlias(). + +The $info and $defaults member variables are no longer indexed as +[$namespace][$name]; they are now indexed as ["$namespace.$name"]. + +All deprecated methods were finally removed, after having yelled at you as +an E_USER_NOTICE for a while now. + +2.2 HTMLPurifier_ConfigSchema_Interchange + +Member variable $namespaces was removed. + +2.3 HTMLPurifier_ConfigSchema_Interchange_Id + +Member variable $namespace and $directive removed; member variable $key added. +Any method that took $namespace, $directive now takes $key. + +2.4 HTMLPurifier_ConfigSchema_Interchange_Namespace + +Removed. + + vim: et sw=4 sts=4 diff --git a/3rdparty/htmlpurifier/docs/dev-config-naming.txt b/3rdparty/htmlpurifier/docs/dev-config-naming.txt new file mode 100644 index 000000000..1f85b6545 --- /dev/null +++ b/3rdparty/htmlpurifier/docs/dev-config-naming.txt @@ -0,0 +1,164 @@ +Configuration naming + +HTML Purifier 4.0.0 features a new configuration naming system that +allows arbitrary nesting of namespaces. While there are certain cases +in which using two namespaces is obviously better (the canonical example +is where we were using AutoFormatParam to contain directives for AutoFormat +parameters), it is unclear whether or not a general migration to highly +namespaced directives is a good idea or not. + +== Case studies == + +=== Attr.* === + +We have a dead duck HTML.Attr.Name.UseCDATA which migrated before we decided +to think this out thoroughly. + +We currently have a large number of directives in the Attr.* namespace. +These directives tweak the behavior of some HTML attributes. They have +the properties: + +* While they apply to only one attribute at a time, the attribute can + span over multiple elements (not necessarily all attributes, either). + The information of which elements it impacts is either omitted or + informally stated (EnableID applies to all elements, DefaultImageAlt + applies to tags, AllowedRev doesn't say but only applies to a tags). + +* There is a certain degree of clustering that could be applied, especially + to the ID directives. The clustering could be done with respect to + what element/attribute was used, i.e. + + *.id -> EnableID, IDBlacklistRegexp, IDBlacklist, IDPrefixLocal, IDPrefix + img.src -> DefaultInvalidImage + img.alt -> DefaultImageAlt, DefaultInvalidImageAlt + bdo.dir -> DefaultTextDir + a.rel -> AllowedRel + a.rev -> AllowedRev + a.target -> AllowedFrameTargets + a.name -> Name.UseCDATA + +* The directives often reference generic attribute types that were specified + in the DTD/specification. However, some of the behavior specifically relies + on the fact that other use cases of the attribute are not, at current, + supported by HTML Purifier. + + AllowedRel, AllowedRev -> heavily specific; if ends up being + allowed, we will also have to give users specificity there (we also + want to preserve generality) DTD %Linktypes, HTML5 distinguishes + between and / + AllowedFrameTargets -> heavily specific, but also used by + and
. Transitional DTD %FrameTarget, not present in strict, + HTML5 calls them "browsing contexts" + Default*Image* -> as a default parameter, is almost entirely exlcusive + to + EnableID -> global attribute + Name.UseCDATA -> heavily specific, but has heavy other usage by + many things + +== AutoFormat.* == + +These have the fairly normal pluggable architecture that lends itself to +large amounts of namespaces (pluggability may be the key to figuring +out when gratuitous namespacing is good.) Properties: + +* Boolean directives are fair game for being namespaced: for example, + RemoveEmpty.RemoveNbsp triggers RemoveEmpty.RemoveNbsp.Exceptions, + the latter of which only makes sense when RemoveEmpty.RemoveNbsp + is set to true. (The same applies to RemoveNbsp too) + +The AutoFormat string is a bit long, but is the only bit of repeated +context. + +== Core.* == + +Core is the potpourri of directives, mostly regarding some minor behavioral +tweaks for HTML handling abilities. + + AggressivelyFixLt + ConvertDocumentToFragment + DirectLexLineNumberSyncInterval + LexerImpl + MaintainLineNumbers + Lexer + CollectErrors + Language + Error handling (Language is ostensibly a little more general, but + it's only used for error handling right now) + ColorKeywords + CSS and HTML + Encoding + EscapeNonASCIICharacters + Character encoding + EscapeInvalidChildren + EscapeInvalidTags + HiddenElements + RemoveInvalidImg + Lexing/Output + RemoveScriptContents + Deprecated + +== HTML.* == + + AllowedAttributes + AllowedElements + AllowedModules + Allowed + ForbiddenAttributes + ForbiddenElements + Element set tuning + BlockWrapper + Child def advanced twiddle + CoreModules + CustomDoctype + Advanced HTMLModuleManager twiddles + DefinitionID + DefinitionRev + Caching + Doctype + Parent + Strict + XHTML + Global environment + MaxImgLength + Attribute twiddle? (applies to two attributes) + Proprietary + SafeEmbed + SafeObject + Trusted + Extra functionality/tagsets + TidyAdd + TidyLevel + TidyRemove + Tidy + +== Output.* == + +These directly affect the output of Generator. These are all advanced +twiddles. + +== URI.* == + + AllowedSchemes + OverrideAllowedSchemes + Scheme tuning + Base + DefaultScheme + Host + Global environment + DefinitionID + DefinitionRev + Caching + DisableExternalResources + DisableExternal + DisableResources + Disable + Contextual/authority tuning + HostBlacklist + Authority tuning + MakeAbsolute + MungeResources + MungeSecretKey + Munge + Transformation behavior (munge can be grouped) + + diff --git a/3rdparty/htmlpurifier/docs/dev-config-schema.html b/3rdparty/htmlpurifier/docs/dev-config-schema.html new file mode 100644 index 000000000..39d866d4a --- /dev/null +++ b/3rdparty/htmlpurifier/docs/dev-config-schema.html @@ -0,0 +1,412 @@ + + + + + + + + Config Schema - HTML Purifier + + + +

Config Schema

+ +
Filed under Development
+
+
HTML Purifier End-User Documentation
+ +

+ HTML Purifier has a fairly complex system for configuration. Users + interact with a HTMLPurifier_Config object to + set configuration directives. The values they set are validated according + to a configuration schema, HTMLPurifier_ConfigSchema. +

+ +

+ The schema is mostly transparent to end-users, but if you're doing development + work for HTML Purifier and need to define a new configuration directive, + you'll need to interact with it. We'll also talk about how to define + userspace configuration directives at the very end. +

+ +

Write a directive file

+ +

+ Directive files define configuration directives to be used by + HTML Purifier. They are placed in library/HTMLPurifier/ConfigSchema/schema/ + in the form Namespace.Directive.txt (I + couldn't think of a more descriptive file extension.) + Directive files are actually what we call StringHashes, + i.e. associative arrays represented in a string form reminiscent of + PHPT tests. Here's a + sample directive file, Test.Sample.txt: +

+ +
Test.Sample
+TYPE: string/null
+DEFAULT: NULL
+ALLOWED: 'foo', 'bar'
+VALUE-ALIASES: 'baz' => 'bar'
+VERSION: 3.1.0
+--DESCRIPTION--
+This is a sample configuration directive for the purposes of the
+<code>dev-config-schema.html<code> documentation.
+--ALIASES--
+Test.Example
+ +

+ Each of these segments has a specific meaning: +

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
KeyExampleDescription
IDTest.SampleThe name of the directive, in the form Namespace.Directive + (implicitly the first line)
TYPEstring/nullThe type of variable this directive accepts. See below for + details. You can also add /null to the end of + any basic type to allow null values too.
DEFAULTNULLA parseable PHP expression of the default value.
DESCRIPTIONThis is a...An HTML description of what this directive does.
VERSION3.1.0Recommended. The version of HTML Purifier this directive was added. + Directives that have been around since 1.0.0 don't have this, + but any new ones should.
ALIASESTest.ExampleOptional. A comma separated list of aliases for this directive. + This is most useful for backwards compatibility and should + not be used otherwise.
ALLOWED'foo', 'bar'Optional. Set of allowed value for a directive, + a comma separated list of parseable PHP expressions. This + is only allowed string, istring, text and itext TYPEs.
VALUE-ALIASES'baz' => 'bar'Optional. Mapping of one value to another, and + should be a comma separated list of keypair duples. This + is only allowed string, istring, text and itext TYPEs.
DEPRECATED-VERSION3.1.0Not shown. Indicates that the directive was + deprecated this version.
DEPRECATED-USETest.NewDirectiveNot shown. Indicates what new directive should be + used instead. Note that the directives will functionally be + different, although they should offer the same functionality. + If they are identical, use an alias instead.
EXTERNALCSSTidyNot shown. Indicates if there is an external library + the user will need to download and install to use this configuration + directive. As of right now, this is merely a Google-able name; future + versions may also provide links and instructions.
+ +

+ Some notes on format and style: +

+ + + +

+ Also, as promised, here is the set of possible types: +

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
TypeExampleDescription
string'Foo'String without newlines
istring'foo'Case insensitive ASCII string without newlines
text"A\nb"String with newlines
itext"a\nb"Case insensitive ASCII string without newlines
int23Integer
float3.0Floating point number
booltrueBoolean
lookuparray('key' => true)Lookup array, used with isset($var[$key])
listarray('f', 'b')List array, with ordered numerical indexes
hasharray('key' => 'val')Associative array of keys to values
mixednew stdclassAny PHP variable is fine
+ +

+ The examples represent what will be returned out of the configuration + object; users have a little bit of leeway when setting configuration + values (for example, a lookup value can be specified as a list; + HTML Purifier will flip it as necessary.) These types are defined + in + library/HTMLPurifier/VarParser.php. +

+ +

+ For more information on what values are allowed, and how they are parsed, + consult + library/HTMLPurifier/ConfigSchema/InterchangeBuilder.php, as well + as + library/HTMLPurifier/ConfigSchema/Interchange/Directive.php for + the semantics of the parsed values. +

+ +

Refreshing the cache

+ +

+ You may have noticed that your directive file isn't doing anything + yet. That's because it hasn't been added to the runtime + HTMLPurifier_ConfigSchema instance. Run + maintenance/generate-schema-cache.php to fix this. + If there were no errors, you're good to go! Don't forget to add + some unit tests for your functionality! +

+ +

+ If you ever make changes to your configuration directives, you + will need to run this script again. +

+

Adding in-house schema definitions

+ +

+ Placing stuff directly in HTML Purifier's source tree is generally not a + good idea, so HTML Purifier 4.0.0+ has some facilities in place to make your + life easier. +

+ +

+ The first is to pass an extra parameter to maintenance/generate-schema-cache.php + with the location of your directory (relative or absolute path will do). For example, + if I'm storing my custom definitions in /var/htmlpurifier/myschema, run: + php maintenance/generate-schema-cache.php /var/htmlpurifier/myschema. +

+ +

+ Alternatively, you can create a small loader PHP file in the HTML Purifier base + directory named config-schema.php (this is the same directory + you would place a test-settings.php file). In this file, add + the following line for each directory you want to load: +

+ +
$builder->buildDir($interchange, '/var/htmlpurifier/myschema');
+ +

You can even load a single file using:

+ +
$builder->buildFile($interchange, '/var/htmlpurifier/myschema/MyApp.Directive.txt');
+ +

Storing custom definitions that you don't plan on sending back upstream in + a separate directory is definitely a good idea! Additionally, picking + a good namespace can go a long way to saving you grief if you want to use + someone else's change, but they picked the same name, or if HTML Purifier + decides to add support for a configuration directive that has the same name.

+ + + +

Errors

+ +

+ All directive files go through a rigorous validation process + through + library/HTMLPurifier/ConfigSchema/Validator.php, as well + as some basic checks during building. While + listing every error out here is out-of-scope for this document, we + can give some general tips for interpreting error messages. + There are two types of errors: builder errors and validation errors. +

+ +

Builder errors

+ +
+

+ Exception: Expected type string, got + integer in DEFAULT in directive hash 'Ns.Dir' +

+
+ +

+ You can identify a builder error by the keyword "directive hash." + These are the easiest to deal with, because they directly correspond + with your directive file. Find the offending directive file (which + is the directive hash plus the .txt extension), find the + offending index ("in DEFAULT" means the DEFAULT key) and fix the error. + This particular error would occur if your default value is not the same + type as TYPE. +

+ +

Validation errors

+ +
+

+ Exception: Alias 3 in valueAliases in directive + 'Ns.Dir' must be a string +

+
+ +

+ These are a little trickier, because we're not actually validating + your directive file, or even the direct string hash representation. + We're validating an Interchange object, and the error messages do + not mention any string hash keys. +

+ +

+ Nevertheless, it's not difficult to figure out what went wrong. + Read the "context" statements in reverse: +

+ +
+
in directive 'Ns.Dir'
+
This means we need to look at the directive file Ns.Dir.txt
+
in valueAliases
+
There's no key actually called this, but there's one that's close: + VALUE-ALIASES. Indeed, that's where to look.
+
Alias 3
+
The value alias that is equal to 3 is the culprit.
+
+ +

+ In this particular case, you're not allowed to alias integers values to + strings values. +

+ +

+ The most difficult part is translating the Interchange member variable (valueAliases) + into a directive file key (VALUE-ALIASES), but there's a one-to-one + correspondence currently. If the two formats diverge, any discrepancies + will be described in + library/HTMLPurifier/ConfigSchema/InterchangeBuilder.php. +

+ +

Internals

+ +

+ Much of the configuration schema framework's codebase deals with + shuffling data from one format to another, and doing validation on this + data. + The keystone of all of this is the HTMLPurifier_ConfigSchema_Interchange + class, which represents the purest, parsed representation of the schema. +

+ +

+ Hand-writing this data is unwieldy, however, so we write directive files. + These directive files are parsed by HTMLPurifier_StringHashParser + into HTMLPurifier_StringHashes, which then + are run through HTMLPurifier_ConfigSchema_InterchangeBuilder + to construct the interchange object. +

+ +

+ From the interchange object, the data can be siphoned into other forms + using HTMLPurifier_ConfigSchema_Builder subclasses. + For example, HTMLPurifier_ConfigSchema_Builder_ConfigSchema + generates a runtime HTMLPurifier_ConfigSchema object, + which HTMLPurifier_Config uses to validate its incoming + data. There is also an XML serializer, which is used to build documentation. +

+ + + + + diff --git a/3rdparty/htmlpurifier/docs/dev-flush.html b/3rdparty/htmlpurifier/docs/dev-flush.html new file mode 100644 index 000000000..0fddafcd2 --- /dev/null +++ b/3rdparty/htmlpurifier/docs/dev-flush.html @@ -0,0 +1,68 @@ + + + + + + + + Flushing the Purifier - HTML Purifier + + + +

Flushing the Purifier

+ +
Filed under Development
+
Return to the index.
+
HTML Purifier End-User Documentation
+ +

+ If you've been poking around the various folders in HTML Purifier, + you may have noticed the maintenance directory. Almost + all of these scripts are devoted to flushing out the various caches + HTML Purifier uses. Normal users don't have to worry about this: + regular library usage is transparent. However, when doing development + work on HTML Purifier, you may find you have to flush one of the + caches. +

+ +

+ As a general rule of thumb, run flush.php whenever you make + any major changes, or when tests start mysteriously failing. + In more detail, run this script if: +

+ + + +

+ You can check out the corresponding scripts for more information on what they + do. +

+ + + + diff --git a/3rdparty/htmlpurifier/docs/dev-includes.txt b/3rdparty/htmlpurifier/docs/dev-includes.txt new file mode 100644 index 000000000..e128a812a --- /dev/null +++ b/3rdparty/htmlpurifier/docs/dev-includes.txt @@ -0,0 +1,281 @@ + +INCLUDES, AUTOLOAD, BYTECODE CACHES and OPTIMIZATION + +The Problem +----------- + +HTML Purifier contains a number of extra components that are not used all +of the time, only if the user explicitly specifies that we should use +them. + +Some of these optional components are optionally included (Filter, +Language, Lexer, Printer), while others are included all the time +(Injector, URIFilter, HTMLModule, URIScheme). We will stipulate that these +are all developer specified: it is conceivable that certain Tokens are not +used, but this is user-dependent and should not be trusted. + +We should come up with a consistent way to handle these things and ensure +that we get the maximum performance when there is bytecode caches and +when there are not. Unfortunately, these two goals seem contrary to each +other. + +A peripheral issue is the performance of ConfigSchema, which has been +shown take a large, constant amount of initialization time, and is +intricately linked to the issue of includes due to its pervasive use +in our plugin architecture. + +Pros and Cons +------------- + +We will assume that user-based extensions will be included by them. + +Conditional includes: + Pros: + - User management is simplified; only a single directive needs to be set + - Only necessary code is included + Cons: + - Doesn't play nicely with opcode caches + - Adds complexity to standalone version + - Optional configuration directives are not exposed without a little + extra coaxing (not implemented yet) + +Include it all: + Pros: + - User management is still simple + - Plays nicely with opcode caches and standalone version + - All configuration directives are present + Cons: + - Lots of (how much?) extra code is included + - Classes that inherit from external libraries will cause compile + errors + +Build an include stub (Let's do this!): + Pros: + - Only necessary code is included + - Plays nicely with opcode caches and standalone version + - require (without once) can be used, see above + - Could further extend as a compilation to one file + Cons: + - Not implemented yet + - Requires user intervention and use of a command line script + - Standalone script must be chained to this + - More complex and compiled-language-like + - Requires a whole new class of system-wide configuration directives, + as configuration objects can be reused + - Determining what needs to be included can be complex (see above) + - No way of autodetecting dynamically instantiated classes + - Might be slow + +Include stubs +------------- + +This solution may be "just right" for users who are heavily oriented +towards performance. However, there are a number of picky implementation +details to work out beforehand. + +The number one concern is how to make the HTML Purifier files "work +out of the box", while still being able to easily get them into a form +that works with this setup. As the codebase stands right now, it would +be necessary to strip out all of the require_once calls. The only way +we could get rid of the require_once calls is to use __autoload or +use the stub for all cases (which might not be a bad idea). + + Aside + ----- + An important thing to remember, however, is that these require_once's + are valuable data about what classes a file needs. Unfortunately, there's + no distinction between whether or not the file is needed all the time, + or whether or not it is one of our "optional" files. Thus, it is + effectively useless. + + Deprecated + ---------- + One of the things I'd like to do is have the code search for any classes + that are explicitly mentioned in the code. If a class isn't mentioned, I + get to assume that it is "optional," i.e. included via introspection. + The choice is either to use PHP's tokenizer or use regexps; regexps would + be faster but a tokenizer would be more correct. If this ends up being + unfeasible, adding dependency comments isn't a bad idea. (This could + even be done automatically by search/replacing require_once, although + we'd have to manually inspect the results for the optional requires.) + + NOTE: This ends up not being necessary, as we're going to make the user + figure out all the extra classes they need, and only include the core + which is predetermined. + +Using the autoload framework with include stubs works nicely with +introspective classes: instead of having to have require_once inside +the function, we can let autoload do the work; we simply need to +new $class or accept the object straight from the caller. Handling filters +becomes a simple matter of ticking off configuration directives, and +if ConfigSchema spits out errors, adding the necessary includes. We could +also use the autoload framework as a fallback, in case the user forgets +to make the include, but doesn't really care about performance. + + Insight + ------- + All of this talk is merely a natural extension of what our current + standalone functionality does. However, instead of having our code + perform the includes, or attempting to inline everything that possibly + could be used, we boot the issue to the user, making them include + everything or setup the fallback autoload handler. + +Configuration Schema +-------------------- + +A common deficiency for all of the conditional include setups (including +the dynamically built include PHP stub) is that if one of this +conditionally included files includes a configuration directive, it +is not accessible to configdoc. A stopgap solution for this problem is +to have it piggy-back off of the data in the merge-library.php script +to figure out what extra files it needs to include, but if the file also +inherits classes that don't exist, we're in big trouble. + +I think it's high time we centralized the configuration documentation. +However, the type checking has been a great boon for the library, and +I'd like to keep that. The compromise is to use some other source, and +then parse it into the ConfigSchema internal format (sans all of those +nasty documentation strings which we really don't need at runtime) and +serialize that for future use. + +The next question is that of format. XML is very verbose, and the prospect +of setting defaults in it gives me willies. However, this may be necessary. +Splitting up the file into manageable chunks may alleviate this trouble, +and we may be even want to create our own format optimized for specifying +configuration. It might look like (based off the PHPT format, which is +nicely compact yet unambiguous and human-readable): + +Core.HiddenElements +TYPE: lookup +DEFAULT: array('script', 'style') // auto-converted during processing +--ALIASES-- +Core.InvisibleElements, Core.StupidElements +--DESCRIPTION-- +

+ Blah blah +

+ +The first line is the directive name, the lines after that prior to the +first --HEADER-- block are single-line values, and then after that +the multiline values are there. No value is restricted to a particular +format: DEFAULT could very well be multiline if that would be easier. +This would make it insanely easy, also, to add arbitrary extra parameters, +like: + +VERSION: 3.0.0 +ALLOWED: 'none', 'light', 'medium', 'heavy' // this is wrapped in array() +EXTERNAL: CSSTidy // this would be documented somewhere else with a URL + +The final loss would be that you wouldn't know what file the directive +was used in; with some clever regexps it should be possible to +figure out where $config->get($ns, $d); occurs. Reflective calls to +the configuration object is mitigated by the fact that getBatch is +used, so we can simply talk about that in the namespace definition page. +This might be slow, but it would only happen when we are creating +the documentation for consumption, and is sugar. + +We can put this in a schema/ directory, outside of HTML Purifier. The serialized +data gets treated like entities.ser. + +The final thing that needs to be handled is user defined configurations. +They can be added at runtime using ConfigSchema::registerDirectory() +which globs the directory and grabs all of the directives to be incorporated +in. Then, the result is saved. We may want to take advantage of the +DefinitionCache framework, although it is not altogether certain what +configuration directives would be used to generate our key (meta-directives!) + + Further thoughts + ---------------- + Our master configuration schema will only need to be updated once + every new version, so it's easily versionable. User specified + schema files are far more volatile, but it's far too expensive + to check the filemtimes of all the files, so a DefinitionRev style + mechanism works better. However, we can uniquely identify the + schema based on the directories they loaded, so there's no need + for a DefinitionId until we give them full programmatic control. + + These variables should be directly incorporated into ConfigSchema, + and ConfigSchema should handle serialization. Some refactoring will be + necessary for the DefinitionCache classes, as they are built with + Config in mind. If the user changes something, the cache file gets + rebuilt. If the version changes, the cache file gets rebuilt. Since + our unit tests flush the caches before we start, and the operation is + pretty fast, this will not negatively impact unit testing. + +One last thing: certain configuration directives require that files +get added. They may even be specified dynamically. It is not a good idea +for the HTMLPurifier_Config object to be used directly for such matters. +Instead, the userland code should explicitly perform the includes. We may +put in something like: + +REQUIRES: HTMLPurifier_Filter_ExtractStyleBlocks + +To indicate that if that class doesn't exist, and the user is attempting +to use the directive, we should fatally error out. The stub includes the core files, +and the user includes everything else. Any reflective things like new +$class would be required to tie in with the configuration. + +It would work very well with rarely used configuration options, but it +wouldn't be so good for "core" parts that can be disabled. In such cases +the core include file would need to be modified, and the only way +to properly do this is use the configuration object. Once again, our +ability to create cache keys saves the day again: we can create arbitrary +stub files for arbitrary configurations and include those. They could +even be the single file affairs. The only thing we'd need to include, +then, would be HTMLPurifier_Config! Then, the configuration object would +load the library. + + An aside... + ----------- + One questions, however, the wisdom of letting PHP files write other PHP + files. It seems like a recipe for disaster, or at least lots of headaches + in highly secured setups, where PHP does not have the ability to write + to its root. In such cases, we could use sticky bits or tell the user + to manually generate the file. + + The other troublesome bit is actually doing the calculations necessary. + For certain cases, it's simple (such as URIScheme), but for AttrDef + and HTMLModule the dependency trees are very complex in relation to + %HTML.Allowed and friends. I think that this idea should be shelved + and looked at a later, less insane date. + +An interesting dilemma presents itself when a configuration form is offered +to the user. Normally, the configuration object is not accessible without +editing PHP code; this facility changes thing. The sensible thing to do +is stipulate that all classes required by the directives you allow must +be included. + +Unit testing +------------ + +Setting up the parsing and translation into our existing format would not +be difficult to do. It might represent a good time for us to rethink our +tests for these facilities; as creative as they are, they are often hacky +and require public visibility for things that ought to be protected. +This is especially applicable for our DefinitionCache tests. + +Migration +--------- + +Because we are not *adding* anything essentially new, it should be trivial +to write a script to take our existing data and dump it into the new format. +Well, not trivial, but fairly easy to accomplish. Primary implementation +difficulties would probably involve formatting the file nicely. + +Backwards-compatibility +----------------------- + +I expect that the ConfigSchema methods should stick around for a little bit, +but display E_USER_NOTICE warnings that they are deprecated. This will +require documentation! + +New stuff +--------- + +VERSION: Version number directive was introduced +DEPRECATED-VERSION: If the directive was deprecated, when was it deprecated? +DEPRECATED-USE: If the directive was deprecated, what should the user use now? +REQUIRES: What classes does this configuration directive require, but are + not part of the HTML Purifier core? + + vim: et sw=4 sts=4 diff --git a/3rdparty/htmlpurifier/docs/dev-naming.html b/3rdparty/htmlpurifier/docs/dev-naming.html new file mode 100644 index 000000000..4060005bf --- /dev/null +++ b/3rdparty/htmlpurifier/docs/dev-naming.html @@ -0,0 +1,83 @@ + + + + + + + +Naming Conventions - HTML Purifier + + + +

Naming Conventions

+ +
Filed under Development
+
Return to the index.
+
HTML Purifier End-User Documentation
+ +

The classes in this library follow a few naming conventions, which may +help you find the correct functionality more quickly. Here they are:

+ +
+ +
All classes occupy the HTMLPurifier pseudo-namespace.
+
This means that all classes are prefixed with HTMLPurifier_. As such, all + names under HTMLPurifier_ are reserved. I recommend that you use the name + HTMLPurifierX_YourName_ClassName, especially if you want to take advantage + of HTMLPurifier_ConfigDef.
+ +
All classes correspond to their path if library/ was in the include path
+
HTMLPurifier_AttrDef is located at HTMLPurifier/AttrDef.php; replace + underscores with slashes and append .php and you'll have the location of + the class.
+ +
Harness and Test are reserved class names for unit tests
+
The suffix Test indicates that the class is a subclass of UnitTestCase + (of the Simpletest library) and is testable. "Harness" indicates a subclass + of UnitTestCase that is not meant to be run but to be extended into + concrete test cases and contains custom test methods (i.e. assert*())
+ +
Class names do not necessarily represent inheritance hierarchies
+
While we try to reflect inheritance in naming to some extent, it is not + guaranteed (for instance, none of the classes inherit from HTMLPurifier, + the base class). However, all class files have the require_once + declarations to whichever classes they are tightly coupled to.
+ +
Strategy has a meaning different from the Gang of Four pattern
+
In Design Patterns, the Gang of Four describes a Strategy object as + encapsulating an algorithm so that they can be switched at run-time. While + our strategies are indeed algorithms, they are not meant to be substituted: + all must be present in order for proper functioning.
+ +
Abbreviations are avoided
+
We try to avoid abbreviations as much as possible, but in some cases, + abbreviated version is more readable than the full version. Here, we + list common abbreviations: +
    +
  • Attr to Attributes (note that it is plural, i.e. $attr = array())
  • +
  • Def to Definition
  • +
  • $ret is the value to be returned in a function
  • +
+
+ +
Ambiguity concerning the definition of Def/Definition
+
While a definition normally defines the structure/acceptable values of + an entity, most of the definitions in this application also attempt + to validate and fix the value. I am unsure of a better name, as + "Validator" would exclude fixing the value, "Fixer" doesn't invoke + the proper image of "fixing" something, and "ValidatorFixer" is too long! + Some other suggestions were "Handler", "Reference", "Check", "Fix", + "Repair" and "Heal".
+ +
Transform not Transformer
+
Transform is both a noun and a verb, and thus we define a "Transform" as + something that "transforms," leaving "Transformer" (which sounds like an + electrical device/robot toy).
+ +
+ + + + diff --git a/3rdparty/htmlpurifier/docs/dev-optimization.html b/3rdparty/htmlpurifier/docs/dev-optimization.html new file mode 100644 index 000000000..681e034f5 --- /dev/null +++ b/3rdparty/htmlpurifier/docs/dev-optimization.html @@ -0,0 +1,33 @@ + + + + + + + +Optimization - HTML Purifier + + + +

Optimization

+ +
Filed under Development
+
Return to the index.
+
HTML Purifier End-User Documentation
+ +

Here are some possible optimization techniques we can apply to code sections if +they turn out to be slow. Be sure not to prematurely optimize: if you get +that itch, put it here!

+ + + + + + diff --git a/3rdparty/htmlpurifier/docs/dev-progress.html b/3rdparty/htmlpurifier/docs/dev-progress.html new file mode 100644 index 000000000..2243b8202 --- /dev/null +++ b/3rdparty/htmlpurifier/docs/dev-progress.html @@ -0,0 +1,309 @@ + + + + + + + +Implementation Progress - HTML Purifier + + + + + +

Implementation Progress

+ +
Filed under Development
+
Return to the index.
+
HTML Purifier End-User Documentation
+ +

+ Warning: This table is kept for historical purposes and + is not being actively updated. +

+ +

Key

+ + + + + + + + +
Implemented
Partially implemented
Not priority to implement
Dangerous attribute/property
Present in CSS1
Feature, requires extra work
+ +

CSS

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +</
NameNotes
Standard
background-colorCOMPOSITE(<color>, transparent)
backgroundSHORTHAND, currently alias for background-color
borderSHORTHAND, MULTIPLE
border-colorMULTIPLE
border-styleMULTIPLE
border-widthMULTIPLE
border-*SHORTHAND
border-*-colorCOMPOSITE(<color>, transparent)
border-*-styleENUM(none, hidden, dotted, dashed, + solid, double, groove, ridge, inset, outset)
border-*-widthCOMPOSITE(<length>, thin, medium, thick)
clearENUM(none, left, right, both)
color<color>
floatENUM(left, right, none), May require layout + precautions with clear
fontSHORTHAND
font-familyCSS validator may complain if fallback font + family not specified
font-sizeCOMPOSITE(<absolute-size>, + <relative-size>, <length>, <percentage>)
font-styleENUM(normal, italic, oblique)
font-variantENUM(normal, small-caps)
font-weightENUM(normal, bold, bolder, lighter, + 100, 200, 300, 400, 500, 600, 700, 800, 900), maybe special code for + in-between integers
letter-spacingCOMPOSITE(<length>, normal)
line-heightCOMPOSITE(<number>, + <length>, <percentage>, normal)
list-style-positionENUM(inside, outside), + Strange behavior in browsers
list-style-typeENUM(...), + Well-supported values are: disc, circle, square, + decimal, lower-roman, upper-roman, lower-alpha and upper-alpha. See also + CSS 3. Mostly IE lack of support.
list-styleSHORTHAND
marginMULTIPLE
margin-*COMPOSITE(<length>, + <percentage>, auto)
paddingMULTIPLE
padding-*COMPOSITE(<length>(positive), + <percentage>(positive))
text-alignENUM(left, right, + center, justify)
text-decorationNo blink (argh my eyes), not + enum, can be combined (composite sorta): underline, overline, + line-through
text-indentCOMPOSITE(<length>, + <percentage>)
text-transformENUM(capitalize, uppercase, + lowercase, none)
widthCOMPOSITE(<length>, + <percentage>, auto), Interesting
word-spacingCOMPOSITE(<length>, auto), + IE 5 no support
Table
border-collapseENUM(collapse, seperate)
border-spaceMULTIPLE
caption-sideENUM(top, bottom)
empty-cellsENUM(show, hide), No IE support makes this useless, + possible fix with &nbsp;? Unknown release milestone.
table-layoutENUM(auto, fixed)
vertical-alignCOMPOSITE(ENUM(baseline, sub, + super, top, text-top, middle, bottom, text-bottom), <percentage>, + <length>) Also applies to others with explicit height
Absolute positioning, unknown release milestone
bottomDangerous, must be non-negative to even be considered, + but it's still possible to arbitrarily position by running over.
left
right
top
clip-
positionENUM(static, relative, absolute, fixed) + relative not absolute?
z-indexDangerous
Unknown
background-imageDangerous
background-attachmentENUM(scroll, fixed), + Depends on background-image
background-positionDepends on background-image
cursorDangerous but fluffy
displayENUM(...), Dangerous but interesting; + will not implement list-item, run-in (Opera only) or table (no IE); + inline-block has incomplete IE6 support and requires -moz-inline-box + for Mozilla. Unknown target milestone.