lib/PPI/Token/HereDoc.pm


            
              1
—
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
—
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
—
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
—
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
              package PPI::Token::HereDoc;
=pod
=head1 NAME
PPI::Token::HereDoc - Token class for the here-doc
=head1 INHERITANCE
  PPI::Token::HereDoc
  isa PPI::Token
      isa PPI::Element
=head1 DESCRIPTION
Here-docs are incredibly handy when writing Perl, but incredibly tricky
when parsing it, primarily because they don't follow the general flow of
input.
They jump ahead and nab lines directly off the input buffer. Whitespace
and newlines may not matter in most Perl code, but they matter in here-docs.
They are also tricky to store as ab object. They look sort of like an
operator and a string, but they don't act like it. And they have a second
section that should be something like a separate token, but isn't because a
strong can span from above the here-doc content to below it.
So when parsing, this is what we do.
Firstly, the PPI::Token::HereDoc object, does not represent the <<
operator, or the "END_FLAG", or the content, or even the terminator.
It represents all of them at once.
The token itself has a as it's "content" just the declaration part.
  # This is what the content of a HereDoc token is
  <<FOO
   
  # Or this
  <<"FOO"
   
  # Or even this
  <<      'FOO'
That is, the "operator", any whitespace seperator, and the quoted or bare
terminator. When you call the C<content> method on a HereDoc token, you
get '<< "FOO"'.
As for the content and terminator, when treated purely in "content" terms
they do not exist.
The content is made available with the C<heredoc> method, and the name of
the terminator with the C<terminator> method.
To make things work in the way you expect, PPI has to play some games
when doing line/column location calculation for tokens, and also during
the content parsing and generation processes.
Document cannot simply by recreated by stitching together the token
contents, and involve a somewhat more expensive procedure, but the extra
expense should be relatively negligable unless you are doing huge
quantaties of them.
Please note that due to the immature nature of PPI in general, we expect
here-docs to be a rich (bad) source of corner-case bugs for quite a while,
but for the most part they should more or less DWYM.
=head2 Comparison to other string types
Although technically it can be consider a quote, for the time being HereDocs
are being treated as a completely seperate Token subclass, and will not be
found in a search for PPI::Token::Quote or PPI::Token::QuoteLike objects.
This may change in the future, with it most likely to end up under
QuoteLike.
=head1 METHODS
Although it has the standard set of Token methods, HereDoc objects have
a relatively large number of unique methods all of their own.
=cut
use strict;
use base 'PPI::Token';
use vars qw{$VERSION};
BEGIN {
        $VERSION = '1.107';
}
#####################################################################
# PPI::Token::HereDoc Methods
=pod
=head2 heredoc
The C<heredoc> method is the authorative method for accessing the contents
of the here-doc.
It returns the contents of the here-doc as a list of newline-terminated
strings. If called in scalar context, it returns the number of lines in
the here-doc, B<excluding> the terminator line.
=cut
sub heredoc {
        wantarray
                ? @{shift->{_heredoc}}
                : scalar @{shift->{_heredoc}};
}
=pod
=head2 terminator
The C<terminator> method returns the name of the terminating string for the
here-doc.
Returns the terminating string as an unescaped string (in the rare case
the terminator has an escaped quote in it).
=cut
sub terminator {
        shift->{_terminator};
}
#####################################################################
# Tokenizer Methods
# Parse in the entire here-doc in one call
sub __TOKENIZER__on_char {
        my $t     = $_[1];
        my $token = $t->{token} or return undef;
        # We are currently located on the first char after the <<
        # Get the rest of the line
        $_ = substr( $t->{line}, $t->{line_cursor} );
        # Handle the most common form first for simplicity and speed reasons
        ### FIXME - This regex, and this method in general, do not yet allow
        ### for the null here-doc, which terminates at the first
        ### empty line.
        unless ( /^(\s*(?:"[^"]*"|'[^']*'|`[^`]*`|\w+))/ ) {
                # Degenerate to a left-shift operation
                $token->set_class('Operator') or return undef;
                return $t->_finalize_token->__TOKENIZER__on_char( $t );
        }
        # Add the rest of the token, work out what type it is,
        # and suck in the content until the end.
        $token->{content} .= $1;
        $t->{line_cursor} += length $1;
        # Find the terminator, clean it up and determine
        # the type of here-doc we are dealing with.
        my $content = $token->{content};
        if ( $content =~ /^\<\<(\w+)$/ ) {
                # Bareword
                $token->{_mode}       = 'interpolate';
                $token->{_terminator} = $1;
        } elsif ( $content =~ /^\<\<\s*\'(.*)\'$/ ) {
                # ''-quoted literal
                $token->{_mode}       = 'literal';
                $token->{_terminator} = $1;
                $token->{_terminator} =~ s/\\'/'/g;
        } elsif ( $content =~ /^\<\<\s*\"(.*)\"$/ ) {
                # ""-quoted literal
                $token->{_mode}       = 'interpolate';
                $token->{_terminator} = $1;
                $token->{_terminator} =~ s/\\"/"/g;
        } elsif ( $content =~ /^\<\<\s*\`(.*)\`$/ ) {
                # ``-quoted command
                $token->{_mode}       = 'command';
                $token->{_terminator} = $1;
                $token->{_terminator} =~ s/\\`/`/g;
        } else {
                # WTF?
                return undef;
        }
        # Define $line outside of the loop, so that if we encounter the
        # end of the file, we have access to the last line still.
        my $line;
        # Suck in the HEREDOC
        $token->{_heredoc} = [];
        my $terminator = $token->{_terminator} . "\n";
        while ( defined($line = $t->_get_line) ) {
                if ( $line eq $terminator ) {
                        # Keep the actual termination line for consistency
                        # when we are re-assembling the file
                        $token->{_terminator_line} = $line;
                        # The HereDoc is now fully parsed
                        return $t->_finalize_token->__TOKENIZER__on_char( $t );
                }
                # Add the line
                push @{$token->{_heredoc}}, $line;
        }
        # End of file.
        # Error: Didn't reach end of here-doc before end of file.
        # $line might be undef if we get NO lines.
        if ( defined $line and $line eq $token->{_terminator} ) {
                # If the last line matches the terminator
                # but is missing the newline, we want to allow
                # it anyway (like perl itself does). In this case
                # perl would normally throw a warning, but we will
                # also ignore that as well.
                pop @{$token->{_heredoc}};
                $token->{_terminator_line} = $line;
        } else {
                # The HereDoc was not properly terminated.
                $token->{_terminator_line} = undef;
                # Trim off the trailing whitespace
                if ( defined $token->{_heredoc}->[-1] and $t->{source_eof_chop} ) {
                        chop $token->{_heredoc}->[-1];
                        $t->{source_eof_chop} = '';
                }
        }
        # Set a hint for PPI::Document->serialize so it can
        # inexpensively repair it if needed when writing back out.
        $token->{_damaged} = 1;
        # The HereDoc is not fully parsed
        $t->_finalize_token->__TOKENIZER__on_char( $t );
}
1;
=pod
=head1 TO DO
- Implement PPI::Token::Quote interface compatibility
- Check CPAN for any use of the null here-doc or here-doc-in-s///e
- Add support for the null here-doc
- Add support for here-doc in s///e
=head1 SUPPORT
See the L<support section|PPI/SUPPORT> in the main module
=head1 AUTHOR
Adam Kennedy, L<http://ali.as/>, cpan@ali.as
=head1 COPYRIGHT
Copyright (c) 2001 - 2005 Adam Kennedy. All rights reserved.
This program is free software; you can redistribute
it and/or modify it under the same terms as Perl itself.
The full text of the license can be found in the
LICENSE file included with this module.
=cut
	Global
`s`	Focus search bar
`?`	Bring up this help dialog
	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)
	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse
	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)