README - metacpan.org


            
              1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
              NAME
    HTML::StripScripts - strip scripting constructs out of HTML
SYNOPSIS
      use HTML::StripScripts;
      my $hss = HTML::StripScripts->new({ Context => 'Inline' });
      $hss->input_start_document;
      $hss->input_start('<i>');
      $hss->input_text('hello, world!');
      $hss->input_end('</i>');
      $hss->input_end_document;
      print $hss->filtered_document;
DESCRIPTION
    This module strips scripting constructs out of HTML, leaving as much
    non-scripting markup in place as possible. This allows web applications
    to display HTML originating from an untrusted source without introducing
    XSS (cross site scripting) vulnerabilities.
    You will probably use HTML::StripScripts::Parser rather than using this
    module directly.
    The process is based on whitelists of tags, attributes and attribute
    values. This approach is the most secure against disguised scripting
    constructs hidden in malicious HTML documents.
    As well as removing scripting constructs, this module ensures that there
    is a matching end for each start tag, and that the tags are properly
    nested.
    Previously, in order to customise the output, you needed to subclass
    "HTML::StripScripts" and override methods. Now, most customisation can
    be done through the "Rules" option provided to "new()".
    The HTML document must be parsed into start tags, end tags and text
    before it can be filtered by this module. Use either
    HTML::StripScripts::Parser or HTML::StripScripts::Regex instead if you
    want to input an unparsed HTML document.
CONSTRUCTORS
    new ( CONFIG )
        Creates a new "HTML::StripScripts" filter object, bound to a
        particular filtering policy. If present, the CONFIG parameter must
        be a hashref. The following keys are recognized (unrecognized keys
        will be silently ignored).
            $s = HTML::Stripscripts->new({
                Context         => 'Document|Flow|Inline|NoTags',
                BanList         => [qw( br img )] | {br => '1', img => '1'},
                BanAllBut       => [qw(p div span)],
                AllowSrc        => 0|1,
                AllowHref       => 0|1,
                AllowRelURL     => 0|1,
                AllowMailto     => 0|1,
                EscapeFiltered  => 0|1,
                Rules           => { See below for details },
            });
        "Context"
            A string specifying the context in which the filtered document
            will be used. This influences the set of tags that will be
            allowed.
            If present, the "Context" value must be one of:
            "Document"
                If "Context" is "Document" then the filter will allow a full
                HTML document, including the "HTML" tag and "HEAD" and
                "BODY" sections.
            "Flow"
                If "Context" is "Flow" then most of the cosmetic tags that
                one would expect to find in a document body are allowed,
                including lists and tables but not including forms.
            "Inline"
                If "Context" is "Inline" then only inline tags such as "B"
                and "FONT" are allowed.
            "NoTags"
                If "Context" is "NoTags" then no tags are allowed.
            The default "Context" value is "Flow".
        "BanList"
            If present, this option must be an arrayref or a hashref. Any
            tag that would normally be allowed (because it presents no XSS
            hazard) will be blocked if the lowercase name of the tag is in
            this list.
            For example, in a guestbook application where "HR" tags are used
            to separate posts, you may wish to prevent posts from including
            "HR" tags, even though "HR" is not an XSS risk.
        "BanAllBut"
            If present, this option must be reference to an array holding a
            list of lowercase tag names. This has the effect of adding all
            but the listed tags to the ban list, so that only those tags
            listed will be allowed.
        "AllowSrc"
            By default, the filter won't allow constructs that cause the
            browser to fetch things automatically, such as "SRC" attributes
            in "IMG" tags. If this option is present and true then those
            constructs will be allowed.
        "AllowHref"
            By default, the filter won't allow constructs that cause the
            browser to fetch things if the user clicks on something, such as
            the "HREF" attribute in "A" tags. Set this option to a true
            value to allow this type of construct.
        "AllowRelURL"
            By default, the filter won't allow relative URLs such as
            "../foo.html" in "SRC" and "HREF" attribute values. Set this
            option to a true value to allow them. "AllowHref" and / or
            "AllowSrc" also need to be set to true for this to have any
            effect.
        "AllowMailto"
            By default, "mailto:" links are not allowed. If "AllowMailto" is
            set to a true value, then this construct will be allowed.
        "EscapeFiltered"
            By default, any filtered tags are outputted as
            "<!--filtered-->". If "EscapeFiltered" is set to a true value,
            then the filtered tags are converted to HTML entities.
            For instance:
              <br>  -->  &lt;br&gt;
        "Rules"
            The "Rules" option provides a very flexible way of customising
            the filter.
            The focus is safety-first, so it is applied after all of the
            previous validation. This means that you cannot all malicious
            data should already have been cleared.
            Rules can be specified for tags and for attributes. Any tag or
            attribute not explicitly listed will be handled by the default
            "*" rules.
            The following is a synopsis of all of the options that you can
            use to configure rules. Below, an example is broken into
            sections and explained.
             Rules => {
                 tag => 0 | 1 | sub { tag_callback }
                        | {
                            attr      => 0 | 1 | 'regex' | qr/regex/ | sub { attr_callback},
                            '*'       => 0 | 1 | 'regex' | qr/regex/ | sub { attr_callback},
                            required  => [qw(attrname attrname)],
                            tag       => sub { tag_callback }
                          },
                '*' => 0 | 1 | sub { tag_callback }
                       | {
                           attr => 0 | 1 | 'regex' | qr/regex/ | sub { attr_callback},
                           '*'  => 0 | 1 | 'regex' | qr/regex/ | sub { attr_callback},
                           tag  => sub { tag_callback }
                         }
                }
            EXAMPLE:
                Rules => {
                    ##########################
                    ##### EXPLICIT RULES #####
                    ##########################
                    ## Allow <br> tags, reject <img> tags
                    br          => 1,
                    img         => 0,
                    ## Send all <div> tags to a sub
                    div         => sub { tag_callback },
                    ## Allow <blockquote> tags,and allow the 'cite' attribute
                    ## All other attributes are handled by the default C<*>
                    blockquote  => {
                        cite    => 1,
                    },
                    ## Allow <a> tags, and
                    a  => {
                        ## Allow the 'title' attribute
                        title     => 1,
                        ## Allow the 'href' attribute if it matches the regex
                        href    =>   '^http://yourdomain.com'
                   OR   href    => qr{^http://yourdomain.com},
                        ## 'style' attributes are handled by a sub
                        style     => sub { attr_callback },
                        ## All other attributes are rejected
                        '*'       => 0,
                        ## Additionally, the <a> tag should be handled by this sub
                        tag       => sub { tag_callback},
                        ## If the <a> tag doesn't have these attributes, filter the tag
                        required  => [qw(href title)],
                    },
                    ##########################
                    ##### DEFAULT RULES #####
                    ##########################
                    ## The default '*' rule - accepts all the same options as above.
                    ## If a tag or attribute is not mentioned above, then the default
                    ## rule is applied:
                    ## Reject all tags
                    '*'         => 0,
                    ## Allow all tags and all attributes
                    '*'         => 1,
                    ## Send all tags to the sub
                    '*'         => sub { tag_callback },
                    ## Allow all tags, reject all attributes
                    '*'         => { '*'  => 0 },
                    ## Allow all tags, and
                    '*' => {
                        ## Allow the 'title' attribute
                        title   => 1,
                        ## Allow the 'href' attribute if it matches the regex
                        href    =>   '^http://yourdomain.com'
                   OR   href    => qr{^http://yourdomain.com},
                        ## 'style' attributes are handled by a sub
                        style   => sub { attr_callback },
                        ## All other attributes are rejected
                        '*'     => 0,
                        ## Additionally, all tags should be handled by this sub
                        tag     => sub { tag_callback},
                    },
            Tag Callbacks
                    sub tag_callback {
                        my ($filter,$element) = (@_);
                        $element = {
                            tag      => 'tag',
                            content  => 'inner_html',
                            attr     => {
                                attr_name => 'attr_value',
                            }
                        };
                        return 0 | 1;
                    }
                A tag callback accepts two parameters, the $filter object
                and the C$element>. It should return 0 to completely ignore
                the tag and its content (which includes any nested HTML
                tags), or 1 to accept and output the tag.
                The $element is a hash ref containing the keys:
            "tag"
                This is the tagname in lowercase, eg "a", "br", "img". If
                you set the tag value to an empty string, then the tag will
                not be outputted, but the tag contents will.
            "content"
                This is the equivalent of DOM's innerHTML. It contains the
                text content and any HTML tags contained within this
                element. You can change the content or set it to an empty
                string so that it is not outputted.
            "attr"
                "attr" contains a hashref containing the attribute names and
                values
            If for instance, you wanted to replace "<b>" tags with "<span>"
            tags, you could do this:
                sub b_callback {
                    my ($filter,$element)  = @_;
                    $element->{tag}        = 'span';
                    $elemnt->{attr}{style} = 'font-weight:bold';
                    return 1;
                }
        Attribute Callbacks
                sub attr_callback {
                    my ( $filter, $tag, $attr_name, $attr_val ) = @_;
                    return undef | '' | 'value';
                }
            Attribute callbacks accept four parameters, the $filter object,
            the $tag name, the $attr_name and the $attr_value.
            It should return either "undef" to reject the attribute, or the
            value to be used. An empty string keeps the attribute, but
            without a value.
        "BanList" vs "BanAllBut" vs "Rules"
            It is not necessary to use "BanList" or "BanAllBut" - everything
            can be done via "Rules", however it may be simpler to write:
                BanAllBut => [qw(p div span)]
            The logic works as follows:
               * If BanAllBut exists, then ban everything but the tags in the list
               * Add to the ban list any elements in BanList
               * Any tags mentioned explicitly in Rules (eg a => 0, br => 1)
                 are added or removed from the BanList
               * A default rule of { '*' => 0 } would ban all tags except
                 those mentioned in Rules
               * A default rule of { '*' => 1 } would allow all tags except
                 those disallowed in the ban list, or by explicit rules
METHODS
    This class provides the following methods:
    input_start_document ()
        This method initializes the filter, and must be called once before
        starting on each HTML document to be filtered.
    input_start ( TEXT )
        Handles a start tag from the input document. TEXT must be the full
        text of the tag, including angle-brackets.
    input_end ( TEXT )
        Handles an end tag from the input document. TEXT must be the full
        text of the end tag, including angle-brackets.
    input_text ( TEXT )
        Handles some non-tag text from the input document.
    input_process ( TEXT )
        Handles a processing instruction from the input document.
    input_comment ( TEXT )
        Handles an HTML comment from the input document.
    input_declaration ( TEXT )
        Handles an declaration from the input document.
    input_end_document ()
        Call this method to signal the end of the input document.
    filtered_document ()
        Returns the filtered document as a string.
SUBCLASSING
    The only reason for subclassing this module now is to add to the list of
    accepted tags, attributes and styles (See "WHITELIST INITIALIZATION
    METHODS"). Everything else can be achieved with "Rules".
    The "HTML::StripScripts" class is subclassable. Filter objects are plain
    hashes and "HTML::StripScripts" reserves only hash keys that start with
    "_hss". The filter configuration can be set up by invoking the
    hss_init() method, which takes the same arguments as new().
OUTPUT METHODS
    The filter outputs a stream of start tags, end tags, text, comments,
    declarations and processing instructions, via the following "output_*"
    methods. Subclasses may override these to intercept the filter output.
    The default implementations of the "output_*" methods pass the text on
    to the output() method. The default implementation of the output()
    method appends the text to a string, which can be fetched with the
    filtered_document() method once processing is complete.
    If the output() method or the individual "output_*" methods are
    overridden in a subclass, then filtered_document() will not work in that
    subclass.
    output_start_document ()
        This method gets called once at the start of each HTML document
        passed through the filter. The default implementation does nothing.
    output_end_document ()
        This method gets called once at the end of each HTML document passed
        through the filter. The default implementation does nothing.
    output_start ( TEXT )
        This method is used to output a filtered start tag.
    output_end ( TEXT )
        This method is used to output a filtered end tag.
    output_text ( TEXT )
        This method is used to output some filtered non-tag text.
    output_declaration ( TEXT )
        This method is used to output a filtered declaration.
    output_comment ( TEXT )
        This method is used to output a filtered HTML comment.
    output_process ( TEXT )
        This method is used to output a filtered processing instruction.
    output ( TEXT )
        This method is invoked by all of the default "output_*" methods. The
        default implementation appends the text to the string that the
        filtered_document() method will return.
    output_stack_entry ( TEXT )
        This method is invoked when a tag plus all text and nested HTML
        content within the tag has been processed. It adds the tag plus its
        content to the content for its parent tag.
REJECT METHODS
    When the filter encounters something in the input document which it
    cannot transform into an acceptable construct, it invokes one of the
    following "reject_*" methods to put something in the output document to
    take the place of the unacceptable construct.
    The TEXT parameter is the full text of the unacceptable construct.
    The default implementations of these methods output an HTML comment
    containing the text "filtered". If "EscapeFiltered" is set to true, then
    the rejected text is HTML escaped instead.
    Subclasses may override these methods, but should exercise caution. The
    TEXT parameter is unfiltered input and may contain malicious constructs.
    reject_start ( TEXT )
    reject_end ( TEXT )
    reject_text ( TEXT )
    reject_declaration ( TEXT )
    reject_comment ( TEXT )
    reject_process ( TEXT )
WHITELIST INITIALIZATION METHODS
    The filter refers to various whitelists to determine which constructs
    are acceptable. To modify these whitelists, subclasses can override the
    following methods.
    Each method is called once at object initialization time, and must
    return a reference to a nested data structure. These references are
    installed into the object, and used whenever the filter needs to refer
    to a whitelist.
    The default implementations of these methods can be invoked as class
    methods.
    init_context_whitelist ()
        Returns a reference to the "Context" whitelist, which determines
        which tags may appear at each point in the document, and which other
        tags may be nested within them.
        It is a hash, and the keys are context names, such as "Flow" and
        "Inline".
        The values in the hash are hashrefs. The keys in these subhashes are
        lowercase tag names, and the values are context names, specifying
        the context that the tag provides to any other tags nested within
        it.
        The special context "EMPTY" as a value in a subhash indicates that
        nothing can be nested within that tag.
    init_attrib_whitelist ()
        Returns a reference to the "Attrib" whitelist, which determines
        which attributes each tag can have and the values that those
        attributes can take.
        It is a hash, and the keys are lowercase tag names.
        The values in the hash are hashrefs. The keys in these subhashes are
        lowercase attribute names, and the values are attribute value class
        names, which are short strings describing the type of values that
        the attribute can take, such as "color" or "number".
    init_attval_whitelist ()
        Returns a reference to the "AttVal" whitelist, which is a hash that
        maps attribute value class names from the "Attrib" whitelist to
        coderefs to subs to validate (and optionally transform) a particular
        attribute value.
        The filter calls the attribute value validation subs with the
        following parameters:
        "filter"
            A reference to the filter object.
        "tagname"
            The lowercase name of the tag in which the attribute appears.
        "attrname"
            The name of the attribute.
        "attrval"
            The attribute value found in the input document, in canonical
            form (see "CANONICAL FORM").
        The validation sub can return undef to indicate that the attribute
        should be removed from the tag, or it can return the new value for
        the attribute, in canonical form.
    init_style_whitelist ()
        Returns a reference to the "Style" whitelist, which determines which
        CSS style directives are permitted in "style" tag attributes. The
        keys are value names such as "color" and "background-color", and the
        values are class names to be used as keys into the "AttVal"
        whitelist.
    init_deinter_whitelist
        Returns a reference to the "DeInter" whitelist, which determines
        which inline tags the filter should attempt to automatically
        de-interleave if they are encountered interleaved. For example, the
        filter will transform:
          <b>hello <i>world</b> !</i>
        Into:
          <b>hello <i>world</i></b><i> !</i>
        because both "b" and "i" appear as keys in the "DeInter" whitelist.
CHARACTER DATA PROCESSING
    These methods transform attribute values and non-tag text from the input
    document into canonical form (see "CANONICAL FORM"), and transform text
    in canonical form into a suitable form for the output document.
    text_to_canonical_form ( TEXT )
        This method is used to reduce non-tag text from the input document
        to canonical form before passing it to the filter_text() method.
        The default implementation unescapes all entities that map to
        "US-ASCII" characters other than ampersand, and replaces any
        ampersands that don't form part of valid entities with "&amp;".
    quoted_to_canonical_form ( VALUE )
        This method is used to reduce attribute values quoted with
        doublequotes or singlequotes to canonical form before passing it to
        the handler subs in the "AttVal" whitelist.
        The default behavior is the same as that of
        "text_to_canonical_form()", plus it converts any CR, LF or TAB
        characters to spaces.
    unquoted_to_canonical_form ( VALUE )
        This method is used to reduce attribute values without quotes to
        canonical form before passing it to the handler subs in the "AttVal"
        whitelist.
        The default implementation simply replaces all ampersands with
        "&amp;", since that corresponds with the way most browsers treat
        entities in unquoted values.
    canonical_form_to_text ( TEXT )
        This method is used to convert the text in canonical form returned
        by the filter_text() method to a form suitable for inclusion in the
        output document.
        The default implementation runs anything that doesn't look like a
        valid entity through the escape_html_metachars() method.
    canonical_form_to_attval ( ATTVAL )
        This method is used to convert the text in canonical form returned
        by the "AttVal" handler subs to a form suitable for inclusion in
        doublequotes in the output tag.
        The default implementation converts CR, LF and TAB characters to a
        single space, and runs anything that doesn't look like a valid
        entity through the escape_html_metachars() method.
    validate_href_attribute ( TEXT )
        If the "AllowHref" filter configuration option is set, then this
        method is used to validate "href" type attribute values. TEXT is the
        attribute value in canonical form. Returns a possibly modified
        attribute value (in canonical form) or "undef" to reject the
        attribute.
        The default implementation allows only absolute "http" and "https"
        URLs, permits port numbers and query strings, and imposes reasonable
        length limits.
        It does not URI escape the query string, and it does not guarantee
        properly formatted URIs, it just tries to give safe URIs. You can
        always use an attribute callback (see "Attribute Callbacks") to
        provide stricter handling.
    validate_mailto ( TEXT )
        If the "AllowMailto" filter configuration option is set, then this
        method is used to validate "href" type attribute values which begin
        with "mailto:". TEXT is the attribute value in canonical form.
        Returns a possibly modified attribute value (in canonical form) or
        "undef" to reject the attribute.
        This uses a lightweight regex and does not guarantee that email
        addresses are properly formatted. You can always use an attribute
        callback (see "Attribute Callbacks") to provide stricter handling.
    validate_src_attribute ( TEXT )
        If the "AllowSrc" filter configuration option is set, then this
        method is used to validate "src" type attribute values. TEXT is the
        attribute value in canonical form. Returns a possibly modified
        attribute value (in canonical form) or "undef" to reject the
        attribute.
        The default implementation behaves as validate_href_attribute().
OTHER METHODS TO OVERRIDE
    As well as the output, reject, init and cdata methods listed above, it
    might make sense for subclasses to override the following methods:
    filter_text ( TEXT )
        This method will be invoked to filter blocks of non-tag text in the
        input document. Both input and output are in canonical form, see
        "CANONICAL FORM".
        The default implementation does no filtering.
    escape_html_metachars ( TEXT )
        This method is used to escape all HTML metacharacters in TEXT. The
        return value must be a copy of TEXT with metacharacters escaped.
        The default implementation escapes a minimal set of metacharacters
        for security against XSS vulnerabilities. The set of characters to
        escape is a compromise between the need for security and the need to
        ensure that the filter will work for documents in as many different
        character sets as possible.
        Subclasses which make strong assumptions about the document
        character set will be able to escape much more aggressively.
    strip_nonprintable ( TEXT )
        Returns a copy of TEXT with runs of nonprintable characters replaced
        with spaces or some other harmless string. Avoids replacing anything
        with the empty string, as that can lead to other security issues.
        The default implementation strips out only NULL characters, in order
        to avoid scrambling text for as many different character sets as
        possible.
        Subclasses which make some sort of assumption about the character
        set in use will be able to have a much wider definition of a
        nonprintable character, and hence a more secure strip_nonprintable()
        implementation.
ATTRIBUTE VALUE HANDLER SUBS
    References to the following subs appear in the "AttVal" whitelist
    returned by the init_attval_whitelist() method.
    _hss_attval_style( FILTER, TAGNAME, ATTRNAME, ATTRVAL )
        Attribute value hander for the "style" attribute.
    _hss_attval_size ( FILTER, TAGNAME, ATTRNAME, ATTRVAL )
        Attribute value handler for attributes who's values are some sort of
        size or length.
    _hss_attval_number ( FILTER, TAGNAME, ATTRNAME, ATTRVAL )
        Attribute value handler for attributes who's values are a simple
        integer.
    _hss_attval_color ( FILTER, TAGNAME, ATTRNAME, ATTRVAL )
        Attribute value handler for color attributes.
    _hss_attval_text ( FILTER, TAGNAME, ATTRNAME, ATTRVAL )
        Attribute value handler for text attributes.
    _hss_attval_word ( FILTER, TAGNAME, ATTRNAME, ATTRVAL )
        Attribute value handler for attributes who's values must consist of
        a single short word, with minus characters permitted.
    _hss_attval_wordlist ( FILTER, TAGNAME, ATTRNAME, ATTRVAL )
        Attribute value handler for attributes who's values must consist of
        one or more words, separated by spaces and/or commas.
    _hss_attval_wordlistq ( FILTER, TAGNAME, ATTRNAME, ATTRVAL )
        Attribute value handler for attributes who's values must consist of
        one or more words, separated by commas, with optional doublequotes
        around words and spaces allowed within the doublequotes.
    _hss_attval_href ( FILTER, TAGNAME, ATTRNAME, ATTRVAL )
        Attribute value handler for "href" type attributes. If the
        "AllowHref" configuration option is set, uses the
        validate_href_attribute() method to check the attribute value.
    _hss_attval_src ( FILTER, TAGNAME, ATTRNAME, ATTRVAL )
        Attribute value handler for "src" type attributes. If the "AllowSrc"
        configuration option is set, uses the validate_src_attribute()
        method to check the attribute value.
    _hss_attval_stylesrc ( FILTER, TAGNAME, ATTRNAME, ATTRVAL )
        Attribute value handler for "src" type style pseudo attributes.
    _hss_attval_novalue ( FILTER, TAGNAME, ATTRNAME, ATTRVAL )
        Attribute value handler for attributes that have no value or a value
        that is ignored. Just returns the attribute name as the value.
CANONICAL FORM
    Many of the methods described above deal with text from the input
    document, encoded in what I call "canonical form", defined as follows:
    All characters other than ampersands represent themselves. Literal
    ampersands are encoded as "&amp;". Non "US-ASCII" characters may appear
    as literals in whatever character set is in use, or they may appear as
    named or numeric HTML entities such as "&aelig;", "&#31337;" and
    "&#xFF;". Unknown named entities such as "&foo;" may appear.
    The idea is to be able to be able to reduce input text to a minimal
    form, without making too many assumptions about the character set in
    use.
PRIVATE METHODS
    The following methods are internal to this class, and should not be
    invoked from elsewhere. Subclasses should not use or override these
    methods.
    _hss_prepare_ban_list (CFG)
        Returns a hash ref representing all the banned tags, based on the
        values of BanList and BanAllBut
    _hss_prepare_rules (CFG)
        Returns a hash ref representing the tag and attribute rules (See
        "Rules").
        Returns undef if no filters are specified, in which case the
        attribute filter code has very little performance impact. If any
        rules are specified, then every tag and attribute is checked.
    _hss_get_attr_filter ( DEFAULT_FILTERS TAG_FILTERS ATTR_NAME)
        Returns the attribute filter rule to apply to this particular
        attribute.
        Checks for:
          - a named attribute rule in a named tag
          - a default * attribute rule in a named tag
          - a named attribute rule in the default * rules
          - a default * attribute rule in the default * rules
    _hss_join_attribs (FILTERED_ATTRIBS)
        Accepts a hash ref containing the attribute names as the keys, and
        the attribute values as the values. Escapes them and returns a
        string ready for output to HTML
    _hss_decode_numeric ( NUMERIC )
        Returns the string that should replace the numeric entity NUMERIC in
        the text_to_canonical_form() method.
    _hss_tag_is_banned ( TAGNAME )
        Returns true if the lower case tag name TAGNAME is on the list of
        harmless tags that the filter is configured to block, false
        otherwise.
    _hss_get_to_valid_context ( TAG )
        Tries to get the filter to a context in which the tag TAG is
        allowed, by introducing extra end tags or start tags if necessary.
        TAG can be either the lower case name of a tag or the string
        'CDATA'.
        Returns 1 if an allowed context is reached, or 0 if there's no
        reasonable way to get to an allowed context and the tag should just
        be rejected.
    _hss_close_innermost_tag ()
        Closes the innermost open tag.
    _hss_context ()
        Returns the current named context of the filter.
    _hss_valid_in_context ( TAG, CONTEXT )
        Returns true if the lowercase tag name TAG is valid in context
        CONTEXT, false otherwise.
    _hss_valid_in_current_context ( TAG )
        Returns true if the lowercase tag name TAG is valid in the filter's
        current context, false otherwise.
BUGS AND LIMITATIONS
    Performance
        This module does a lot of work to ensure that tags are correctly
        nested and are not left open, causing unnecessary overhead for
        applications where that doesn't matter.
        Such applications may benefit from using the more lightweight
        HTML::Scrubber::StripScripts module instead.
    Strictness
        URIs and email addresses are cleaned up to be safe, but not
        necessarily accurate. That would have required adding dependencies.
        Attribute callbacks can be used to add this functionality if
        required, or the validation methods can be overriden.
        By default, filtered HTML may not be valid strict XHTML, for
        instance empty required attributes may be outputted. However, with
        "Rules", it should be possible to force the HTML to validate.
SEE ALSO
    HTML::Parser, HTML::StripScripts::Parser, HTML::StripScripts::Regex
AUTHOR
    Original author Nick Cleaton <nick@cleaton.net>
    New code added and module maintained by Clinton Gormley
    <clint@traveljury.com>
COPYRIGHT
    Copyright (C) 2003 Nick Cleaton. All Rights Reserved.
    Copyright (C) 2007 Clinton Gormley. All Rights Reserved.
    This module is free software; you can redistribute it and/or modify it
    under the same terms as Perl itself.
	Global
`s`	Focus search bar
`?`	Bring up this help dialog
	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)
	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse
	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)