| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341 | 
							
- Install
 
-     How to install HTML Purifier
 
- HTML Purifier is designed to run out of the box, so actually using the
 
- library is extremely easy.  (Although... if you were looking for a
 
- step-by-step installation GUI, you've downloaded the wrong software!)
 
- While the impatient can get going immediately with some of the sample
 
- code at the bottom of this library, it's well worth reading this entire
 
- document--most of the other documentation assumes that you are familiar
 
- with these contents.
 
- ---------------------------------------------------------------------------
 
- 1.  Compatibility
 
- HTML Purifier is PHP 5 and PHP 7, and is actively tested from PHP 5.3
 
- and up. It has no core dependencies with other libraries.
 
- These optional extensions can enhance the capabilities of HTML Purifier:
 
-     * iconv  : Converts text to and from non-UTF-8 encodings
 
-     * bcmath : Used for unit conversion and imagecrash protection
 
-     * tidy   : Used for pretty-printing HTML
 
- These optional libraries can enhance the capabilities of HTML Purifier:
 
-     * CSSTidy : Clean CSS stylesheets using %Core.ExtractStyleBlocks
 
-         Note: You should use the modernized fork of CSSTidy available
 
-         at https://github.com/Cerdic/CSSTidy
 
-     * Net_IDNA2 (PEAR) : IRI support using %Core.EnableIDNA
 
-         Note: This is not necessary for PHP 5.3 or later
 
- ---------------------------------------------------------------------------
 
- 2.  Reconnaissance
 
- A big plus of HTML Purifier is its inerrant support of standards, so
 
- your web-pages should be standards-compliant.  (They should also use
 
- semantic markup, but that's another issue altogether, one HTML Purifier
 
- cannot fix without reading your mind.)
 
- HTML Purifier can process these doctypes:
 
- * XHTML 1.0 Transitional (default)
 
- * XHTML 1.0 Strict
 
- * HTML 4.01 Transitional
 
- * HTML 4.01 Strict
 
- * XHTML 1.1
 
- ...and these character encodings:
 
- * UTF-8 (default)
 
- * Any encoding iconv supports (with crippled internationalization support)
 
- These defaults reflect what my choices would be if I were authoring an
 
- HTML document, however, what you choose depends on the nature of your
 
- codebase.  If you don't know what doctype you are using, you can determine
 
- the doctype from this identifier at the top of your source code:
 
-     <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
 
-         "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
 
- ...and the character encoding from this code:
 
-     <meta http-equiv="Content-type" content="text/html;charset=ENCODING">
 
- If the character encoding declaration is missing, STOP NOW, and
 
- read 'docs/enduser-utf8.html' (web accessible at
 
- http://htmlpurifier.org/docs/enduser-utf8.html).  In fact, even if it is
 
- present, read this document anyway, as many websites specify their
 
- document's character encoding incorrectly.
 
- ---------------------------------------------------------------------------
 
- 3.  Including the library
 
- The procedure is quite simple:
 
-     require_once '/path/to/library/HTMLPurifier.auto.php';
 
- This will setup an autoloader, so the library's files are only included
 
- when you use them.
 
- Only the contents in the library/ folder are necessary, so you can remove
 
- everything else when using HTML Purifier in a production environment.
 
- If you installed HTML Purifier via PEAR, all you need to do is:
 
-     require_once 'HTMLPurifier.auto.php';
 
- Please note that the usual PEAR practice of including just the classes you
 
- want will not work with HTML Purifier's autoloading scheme.
 
- Advanced users, read on; other users can skip to section 4.
 
- Autoload compatibility
 
- ----------------------
 
-     HTML Purifier attempts to be as smart as possible when registering an
 
-     autoloader, but there are some cases where you will need to change
 
-     your own code to accomodate HTML Purifier. These are those cases:
 
-     AN __autoload FUNCTION IS DECLARED AFTER OUR AUTOLOADER IS REGISTERED
 
-         spl_autoload_register() has the curious behavior of disabling
 
-         the existing __autoload() handler. Users need to explicitly
 
-         spl_autoload_register('__autoload'). Because we use SPL when it
 
-         is available, __autoload() will ALWAYS be disabled. If __autoload()
 
-         is declared before HTML Purifier is loaded, this is not a problem:
 
-         HTML Purifier will register the function for you. But if it is
 
-         declared afterwards, it will mysteriously not work. This
 
-         snippet of code (after your autoloader is defined) will fix it:
 
-             spl_autoload_register('__autoload')
 
- For better performance
 
- ----------------------
 
-     Opcode caches, which greatly speed up PHP initialization for scripts
 
-     with large amounts of code (HTML Purifier included), don't like
 
-     autoloaders. We offer an include file that includes all of HTML Purifier's
 
-     files in one go in an opcode cache friendly manner:
 
-         // If /path/to/library isn't already in your include path, uncomment
 
-         // the below line:
 
-         // require '/path/to/library/HTMLPurifier.path.php';
 
-         require 'HTMLPurifier.includes.php';
 
-     Optional components still need to be included--you'll know if you try to
 
-     use a feature and you get a class doesn't exists error! The autoloader
 
-     can be used in conjunction with this approach to catch classes that are
 
-     missing. Simply add this afterwards:
 
-         require 'HTMLPurifier.autoload.php';
 
- Standalone version
 
- ------------------
 
-     HTML Purifier has a standalone distribution; you can also generate
 
-     a standalone file from the full version by running the script
 
-     maintenance/generate-standalone.php . The standalone version has the
 
-     benefit of having most of its code in one file, so parsing is much
 
-     faster and the library is easier to manage.
 
-     If HTMLPurifier.standalone.php exists in the library directory, you
 
-     can use it like this:
 
-         require '/path/to/HTMLPurifier.standalone.php';
 
-     This is equivalent to including HTMLPurifier.includes.php, except that
 
-     the contents of standalone/ will be added to your path. To override this
 
-     behavior, specify a new HTMLPURIFIER_PREFIX where standalone files can
 
-     be found (usually, this will be one directory up, the "true" library
 
-     directory in full distributions). Don't forget to set your path too!
 
-     The autoloader can be added to the end to ensure the classes are
 
-     loaded when necessary; otherwise you can manually include them.
 
-     To use the autoloader, use this:
 
-         require 'HTMLPurifier.autoload.php';
 
- For advanced users
 
- ------------------
 
-     HTMLPurifier.auto.php performs a number of operations that can be done
 
-     individually. These are:
 
-         HTMLPurifier.path.php
 
-             Puts /path/to/library in the include path. For high performance,
 
-             this should be done in php.ini.
 
-         HTMLPurifier.autoload.php
 
-             Registers our autoload handler HTMLPurifier_Bootstrap::autoload($class).
 
-     You can do these operations by yourself, if you like.
 
- ---------------------------------------------------------------------------
 
- 4. Configuration
 
- HTML Purifier is designed to run out-of-the-box, but occasionally HTML
 
- Purifier needs to be told what to do.  If you answer no to any of these
 
- questions, read on; otherwise, you can skip to the next section (or, if you're
 
- into configuring things just for the heck of it, skip to 4.3).
 
- * Am I using UTF-8?
 
- * Am I using XHTML 1.0 Transitional?
 
- If you answered no to any of these questions, instantiate a configuration
 
- object and read on:
 
-     $config = HTMLPurifier_Config::createDefault();
 
- 4.1. Setting a different character encoding
 
- You really shouldn't use any other encoding except UTF-8, especially if you
 
- plan to support multilingual websites (read section three for more details).
 
- However, switching to UTF-8 is not always immediately feasible, so we can
 
- adapt.
 
- HTML Purifier uses iconv to support other character encodings, as such,
 
- any encoding that iconv supports <http://www.gnu.org/software/libiconv/>
 
- HTML Purifier supports with this code:
 
-     $config->set('Core.Encoding', /* put your encoding here */);
 
- An example usage for Latin-1 websites (the most common encoding for English
 
- websites):
 
-     $config->set('Core.Encoding', 'ISO-8859-1');
 
- Note that HTML Purifier's support for non-Unicode encodings is crippled by the
 
- fact that any character not supported by that encoding will be silently
 
- dropped, EVEN if it is ampersand escaped.  If you want to work around
 
- this, you are welcome to read docs/enduser-utf8.html for a fix,
 
- but please be cognizant of the issues the "solution" creates (for this
 
- reason, I do not include the solution in this document).
 
- 4.2. Setting a different doctype
 
- For those of you using HTML 4.01 Transitional, you can disable
 
- XHTML output like this:
 
-     $config->set('HTML.Doctype', 'HTML 4.01 Transitional');
 
- Other supported doctypes include:
 
-     * HTML 4.01 Strict
 
-     * HTML 4.01 Transitional
 
-     * XHTML 1.0 Strict
 
-     * XHTML 1.0 Transitional
 
-     * XHTML 1.1
 
- 4.3. Other settings
 
- There are more configuration directives which can be read about
 
- here: <http://htmlpurifier.org/live/configdoc/plain.html>  They're a bit boring,
 
- but they can help out for those of you who like to exert maximum control over
 
- your code.  Some of the more interesting ones are configurable at the
 
- demo <http://htmlpurifier.org/demo.php> and are well worth looking into
 
- for your own system.
 
- For example, you can fine tune allowed elements and attributes, convert
 
- relative URLs to absolute ones, and even autoparagraph input text! These
 
- are, respectively, %HTML.Allowed, %URI.MakeAbsolute and %URI.Base, and
 
- %AutoFormat.AutoParagraph. The %Namespace.Directive naming convention
 
- translates to:
 
-     $config->set('Namespace.Directive', $value);
 
- E.g.
 
-     $config->set('HTML.Allowed', 'p,b,a[href],i');
 
-     $config->set('URI.Base', 'http://www.example.com');
 
-     $config->set('URI.MakeAbsolute', true);
 
-     $config->set('AutoFormat.AutoParagraph', true);
 
- ---------------------------------------------------------------------------
 
- 5. Caching
 
- HTML Purifier generates some cache files (generally one or two) to speed up
 
- its execution. For maximum performance, make sure that
 
- library/HTMLPurifier/DefinitionCache/Serializer is writeable by the webserver.
 
- If you are in the library/ folder of HTML Purifier, you can set the
 
- appropriate permissions using:
 
-     chmod -R 0755 HTMLPurifier/DefinitionCache/Serializer
 
- If the above command doesn't work, you may need to assign write permissions
 
- to group:
 
-     chmod -R 0775 HTMLPurifier/DefinitionCache/Serializer
 
- You can also chmod files via your FTP client; this option
 
- is usually accessible by right clicking the corresponding directory and
 
- then selecting "chmod" or "file permissions".
 
- Starting with 2.0.1, HTML Purifier will generate friendly error messages
 
- that will tell you exactly what you have to chmod the directory to, if in doubt,
 
- follow its advice.
 
- If you are unable or unwilling to give write permissions to the cache
 
- directory, you can either disable the cache (and suffer a performance
 
- hit):
 
-     $config->set('Core.DefinitionCache', null);
 
- Or move the cache directory somewhere else (no trailing slash):
 
-     $config->set('Cache.SerializerPath', '/home/user/absolute/path');
 
- ---------------------------------------------------------------------------
 
- 6.   Using the code
 
- The interface is mind-numbingly simple:
 
-     $purifier = new HTMLPurifier($config);
 
-     $clean_html = $purifier->purify( $dirty_html );
 
- That's it!  For more examples, check out docs/examples/ (they aren't very
 
- different though).  Also, docs/enduser-slow.html gives advice on what to
 
- do if HTML Purifier is slowing down your application.
 
- ---------------------------------------------------------------------------
 
- 7.   Quick install
 
- First, make sure library/HTMLPurifier/DefinitionCache/Serializer is
 
- writable by the webserver (see Section 5: Caching above for details).
 
- If your website is in UTF-8 and XHTML Transitional, use this code:
 
- <?php
 
-     require_once '/path/to/htmlpurifier/library/HTMLPurifier.auto.php';
 
-     $config = HTMLPurifier_Config::createDefault();
 
-     $purifier = new HTMLPurifier($config);
 
-     $clean_html = $purifier->purify($dirty_html);
 
- ?>
 
- If your website is in a different encoding or doctype, use this code:
 
- <?php
 
-     require_once '/path/to/htmlpurifier/library/HTMLPurifier.auto.php';
 
-     $config = HTMLPurifier_Config::createDefault();
 
-     $config->set('Core.Encoding', 'ISO-8859-1'); // replace with your encoding
 
-     $config->set('HTML.Doctype', 'HTML 4.01 Transitional'); // replace with your doctype
 
-     $purifier = new HTMLPurifier($config);
 
-     $clean_html = $purifier->purify($dirty_html);
 
- ?>
 
-     vim: et sw=4 sts=4
 
 
  |