=encoding utf8

=head1 Name

SPVM::Document::Language::Tokenization - Tokenization in the SPVM Language

=head1 Description

This document describes the tokenization in the SPVM language.

=head1 Tokenization

This section describes L<lexical analysis|https://en.wikipedia.org/wiki/Lexical_analysis> in the SPVM Language.

This is called tokenization.

See L<SPVM::Document::Language::SyntaxParsing> about syntax parsing.

=head2 Character Encoding

The character encoding of SPVM source codes is UTF-8.

If a character is an ASCII character, it must be an ASCII printable character or a L<space character|/"Space Characters">.

Compilation Errors:

The charactor encoding of SPVM source codes must be UTF-8. Otherwise a compilation error occurs.

If a character is an ASCII character, it must be an L<ASCII printable character|https://en.wikipedia.org/wiki/ASCII#Printable_characters> or a L<space character|/"Space Characters">. Otherwise a compilation error occurs.

=head2 Line Terminators

The line terminator is ASCII C<LF>.

When a line terminator appears, the current line number is incremented by 1.

=head2 Space Characters

The space characters are ASCII C<SP>, C<HT>, C<FF>, C<LF>.

=head2 Word Characters

The word characters are ASCII C<a-zA-Z>, C<0-9>, C<_>.

=head2 Names

This section describes names.

=head3 Symbol Name

A symbol name consists of L<word characters|/"Word Characters"> and C<::>.

It dose not contains C<__>.

It dose not begin with C<0-9>.

It dose not begin with C<::>.

It dose not end with C<::>.

It dose not contains C<::::>.

It dose not begin with C<0-9>.

Compliation Errors:

If a symbol name is invald, a compilation error occurs.

Examples:

  # Symbol names
  foo
  foo_bar2
  Foo::Bar
  
  # Invalid symbol names
  2foo
  foo__bar
  ::Foo
  Foo::
  Foo::::Bar

=head3 Class Name

A class name is a L<symbol name|/"Symbol Name">.

Each partial name of a class name must begin with an uppercase letter.

Partial names are individual names separated by C<::>. For example, the partial names of C<Foo::Bar::Baz> are C<Foo>, C<Bar>, and C<Baz>.

Compilation Errors:

If a class name is invalid, a compilation error occurs.

Examples:
  
  # Class names
  Foo
  Foo::Bar
  Foo::Bar::Baz3
  Foo::bar
  Foo_Bar::Baz_Baz
  
  # Invalid class names
  Foo
  Foo::::Bar
  Foo::Bar::
  Foo__Bar
  Foo::bar

=head3 Method Name

A method name is a L<symbol name|/"Symbol Name"> without C<::> or an empty string C<"">.

Method names with the same name as L<keywords|/"Keywords"> are allowed.

Compilation Errors:

If a method name is invalid, a compilation error occurs.

Examples:

  # Method names
  FOO
  FOO_BAR3
  foo
  foo_bar
  _foo
  _foo_bar_
  
  # Invalid method names
  foo__bar
  3foo

=head3 Field Name

A field name is a L<symbol name|/"Symbol Name"> without C<::>.

Field names with the same name as L<keywords|/"Keywords"> are allowed.

Compilation Errors:

If a field names is invalid, a compilation error occurs.

Examples:

  # Field names
  FOO
  FOO_BAR3
  foo
  foo_bar
  _foo
  _foo_bar_
  
  # Invalid field names
  foo__bar
  3foo
  Foo::Bar

=head3 Variable Name

A variable name begins with C<$> and is followed by a L<symbol name|/"Symbol Name">.

The symbol name in a variable name can be surrounded by C<{> and C<}>.

Compilation Errors:

If a field names is invalid, a compilation error occurs.

If an opening C<{> exists and the closing C<}> dose not exist, a compilation error occurs.

Examples:

  # Variable names
  $name
  $my_name
  ${name}
  $Foo::name
  $Foo::Bar::name
  ${Foo::name}
  
  # Invalid variable names
  $::name
  $name::
  $Foo::::name
  $my__name
  ${name

=head4 Class Variable Name

A class variable name is a L<variable name|/"Variable Name">.

Examples:

  # Class variable names
  $NAME
  $MY_NAME
  ${NAME}
  $FOO::NAME
  $FOO::BAR::NAME
  ${FOO::NAME_BRACE}
  $FOO::name
  
  # Invalid class variable names
  $::NAME
  $NAME::
  $FOO::::NAME
  $MY__NAME
  $3FOO
  ${NAME

=head4 Local Variable Name

A local variable name is a L<variable name|/"Variable Name"> without C<::>.

Examples:

  # Local variable names
  $name
  $my_name
  ${name_brace}
  $_name
  $NAME

  # Invalid local variable names
  $::name
  $name::
  $Foo::name
  $Foo::::name
  $my__name
  ${name
  $3foo

=head2 Keywords

The List of Keywords:

  alias
  allow
  as
  basic_type_id
  break
  byte
  can
  case
  cmp
  class
  compile_type_name
  copy
  default
  die
  div_uint
  div_ulong
  double
  dump
  elsif
  else
  enum
  eq
  eval
  eval_error_id
  extends
  for
  float
  false
  gt
  ge
  has
  if
  interface
  int
  interface_t
  isa
  isa_error
  isweak
  is_compile_type
  is_type
  is_error
  is_read_only
  args_width
  last
  length
  lt
  le
  long
  make_read_only
  my
  mulnum_t
  method
  mod_uint
  mod_ulong
  mutable
  native
  ne
  next
  new
  new_string_len
  of
  our
  object
  print
  private
  protected
  public
  precompile
  pointer
  return
  require
  required
  rw
  ro
  say
  static
  switch
  string
  short
  scalar
  true
  type_name
  undef
  unless
  unweaken
  use
  version
  void
  warn
  while
  weaken
  wo
  INIT
  __END__
  __PACKAGE__
  __FILE__
  __LINE__

=head2 Operator Tokens

The List of Operator Tokens:

  !
  !=
  $
  %
  &
  &&
  &=
  =
  ==
  ^
  ^=
  |
  ||
  |=
  -
  --
  -=
  ~
  @
  +
  ++
  +=
  *
  *=
  <
  <=
  >
  >=
  <=>
  %
  %=
  <<
  <<=
  >>=
  >>
  >>>
  >>>=
  .
  .=
  /
  /=
  \
  (
  )
  {
  }
  [
  ]
  ;
  :
  ,
  ->
  =>

=head2 Comment

Comments have no meaning.

  #COMMENT

A comment begins with C<#>.

It is followed by any string I<COMMENT>.

It ends with ASCII C<LF>.

L<Line directives|/"Line Directive"> take precedence over comments.

L<File directives|/"File Directive"> take precedence over comments.

Examples:

  # This is a comment line

=head2 line Directive

A line directive set the current line number.

  #line NUMBER

A line directive begins with C<#line> from the beggining of the line.

It is followed by one or more ASCII C<SP>.

It is followed by I<NUMBER>. I<NUMBER> is a positive 32bit integer.

It ends with ASCII C<LF>.

The current line number of the source code is set to I<NUMBER>.

Line directives take precedence over L<comments|/"Comment">.

Compilation Errors:

A line directive must begin from the beggining of the line. Otherwise an compilation error occurs.

A line directive must end with "\n". Otherwise an compilation error occurs.

A line directive must have a line number. Otherwise an compilation error occurs.

The line number given to a line directive must be a positive 32bit integer. Otherwise an compilation error occurs.

Examples:

  class MyClass {
    
    static method main : void () {
      
  #line 39
      
    }
  }

=head2 file Directive

A file directive set the current file path.

  #file "FILE_PATH"
  
A file directive begins from the beggining of the source code excluding a shebang line.

A shebang line before a file directive is allowed.

  #!command
  #file "FILE_PATH"

It is followed by one or more ASCII C<SP>.

It is followed by C<">.

It is followed by I<FILE_PATH>. I<FILE_PATH> is a string that represetns a file path.

It is closed with C<">.

It ends with ASCII C<LF>.

The current file path is set to I<FILE_PATH>.

File directives take precedence over L<comments|/"Comment">.

Compilation Errors:

A file directive must begin from the beggining of the source code. Otherwise an compilation error occurs.

A file directive must end with "\n". Otherwise an compilation error occurs.

A file directive must have a file path. Otherwise an compilation error occurs.

A file directive must end with ". Otherwise an compilation error occurs.

Examples:

  #file "/path/MyClass.spvm"
  class MyClass {
  
  }

=head2 lib Directive

A lib directive gives a hint for a class search directory to L<spvm> command and L<spvmcc> command.

  #lib "CLASS_SEARCH_DIRECTORY"
  
A lib directive begins from the beggining of a line.

It is followed by one or more ASCII C<SP>.

It is followed by C<">.

It is followed by I<CLASS_SEARCH_DIRECTORY>. I<CLASS_SEARCH_DIRECTORY> is a string that represetns a L<class search directory|SPVM::Document::Language::Class/"Class Search Directories">.

It is closed with C<">.

It ends with ASCII C<LF>.

The line directives take precedence over L<comments|/"Comment">.

I<CLASS_SEARCH_DIRECTORY> can contains C<$FindBin::Bin>. This is expaned to the directory where the SPVM script is placed.

  #lib "$FindBin::Bin/lib/SPVM"

Compilation Errors:

A lib directive must begin from the beggining of a line. Otherwise an compilation error occurs.

The directory specified by a lib directive end with "\n". Otherwise an compilation error occurs.

The directory specified by a lib directive must not be an empty string. Otherwise an compilation error occurs.

The directory specified by a lib directive must end with ". Otherwise an compilation error occurs.

Examples:

C<my_script.spvm>:

  #lib "$FindBin::Bin/lib/SPVM"
  
  class {
  
  }

=head2 __END__

If a line begins with C<__END__> and ends with ASCII C<LF>, the line with C<__END__> and the below lines are interpreted as L<comments|/"Comment">.

Examples:
  
  class MyClass {
    
  }
  
  __END__
  
  foo
  bar

=head2 POD

POD is a syntax to write multiline comment. POD has no meaning.

The Beginning of a POD:

  =NAME

The beginning of a POD begins with C<=> from the beggining of the line.

It is followed by I<NAME>. I<NAME> is any string that begins with ASCII C<a-zA-Z>.

It ends with ASCII C<LF>.

The End of a POD:

  =cut

The end of a POD begins with C<=> from the beggining of the line.

It is followed by C<cut>.

It ends with ASCII C<LF>.

Examples:

  
  =pod
  
  Comment1
  Comment2
  
  =cut
  
  =head1
  
  Comment1
  Comment2
  
  =cut

=head2 Fat Comma

A fat comma is

  =>

The fat comma is an alias for a comma C<,>.

  # Comma
  ["a", "b", "c", "d"]
  
  # Fat Comma
  ["a" => "b", "c" => "d"]

If the left operand of a fat comma is a L<symbol name|/"Symbol Name"> without C<::>, it is wrraped by C<"> and is treated as a L<string literal|/"String Literal">.

  # foo_bar2 is treated as "foo_bar2"
  [foo_bar2 => "Mark"]
  
  ["foo_bar2" => "Mark"]

=head1 Literals

A literal represents a constant value.

=head2 Numeric Literals

A numeric literal represents a constant L<number|SPVM::Document::Language::Types/"Number">.

=head2 Integer Literals

A interger literal represents a constant number of an L<integer type|SPVM::Document::Language::Types/"Integer Types">.

=head3 Integer Literal Decimal Notation

The interger literal decimal notation represents a number of int type or long type using decimal numbers C<0-9>.
  
It can begin with a minus C<->.

It is followed by one or more of C<0-9>.

C<_> can be placed at the any positions after the first C<0-9> as a separator. C<_> has no meaning.

It can end with the suffix C<L> or C<l>.

If the suffix C<L> or C<l> exists, the return type is long type. Otherwise the return type is int type.

Compilation Errors:

If the return type is int type and the value is greater than the max value of int type or less than the minimal value of int type, a compilation error occurs.

If the return type is long type and the value is greater than the max value of long type or less than the minimal value of long type, a compilation error occurs.

Examples:

  123
  -123
  123L
  123l
  123_456_789
  -123_456_789L

=head3 Integer Literal Hexadecimal Notation

The interger literal hexadecimal notation represents a number of int type or long type using hexadecimal numbers C<0-9a-zA-Z>.

It can begin with a minus C<->.

It is followed by C<0x> or C<0X>.

It is followed by one or more C<0-9a-zA-Z>. This is called hexadecimal numbers part.

C<_> can be placed at the any positions after C<0x> or C<0X> as a separator. C<_> has no meaning.

It can end with the suffix C<L> or C<l>.

If the suffix C<L> or C<l> exists, the return type is long type. Otherwise the return type is int type.

If the return type is int type, the hexadecimal numbers part is interpreted as an unsigned 32 bit integer, and is converted to a signed 32-bit integer without changing the bits. For example, C<0xFFFFFFFF> is  -1.

If the return type is long type, the hexadecimal numbers part is interpreted as unsigned 64 bit integer, and is converted to a signed 64-bit integer without changing the bits. For example, C<0xFFFFFFFFFFFFFFFFL> is C<-1L>.

Compilation Errors:

If the return type is int type and the hexadecimal numbers part is greater than hexadecimal C<FFFFFFFF>, a compilation error occurs.

If the return type is long type and the hexadecimal numbers part is greater than hexadecimal C<FFFFFFFFFFFFFFFF>, a compilation error occurs.

Examples:

  0x3b4f
  0X3b4f
  -0x3F1A
  0xDeL
  0xFFFFFFFF
  0xFF_FF_FF_FF
  0xFFFFFFFFFFFFFFFFL

=head3 Integer Literal Octal Notation

The interger literal octal notation represents a number of int type or long type using octal numbers C<0-7>.

It can begin with a minus C<->.

It is followed by C<0>.

It is followed by one or more C<0-7>. This is called octal numbers part.

C<_> can be placed at the any positions after C<0> as a separator. C<_> has no meaning.

It can end with the suffix C<L> or C<l>.

If the suffix C<L> or C<l> exists, the return type is long type. Otherwise the return type is int type.

If the return type is int type, the octal numbers part is interpreted as an unsigned 32 bit integer, and is converted to a signed 32-bit integer without changing the bits. For example, C<037777777777> is  -1.

If the return type is long type, the octal numbers part is interpreted as unsigned 64 bit integer, and is converted to a signed 64-bit integer without changing the bits. For example, C<01777777777777777777777L> is C<-1L>.

If the return type is long type, the value that is except for C<-> is interpreted as unsigned 64 bit integer C<uint64_t> type in the C language, and the following conversion is performed.

Compilation Errors:

If the return type is int type and the octal numbers part is greater than octal 37777777777, a compilation error occurs.

If the return type is long type and the octal numbers part is greater than octal 1777777777777777777777, a compilation error occurs.

Examples:

  0755
  -0644
  0666L
  0655_755

=head3 Integer Literal Binary Notation

The interger literal binary notation represents a number of int type or long type using binary numbers C<0> and C<1>.

It can begin with a minus C<->.

It is followed by C<0b> or C<0B>.

It is followed by one or more C<0> and C<1>. This is called binary numbers part.

C<_> can be placed at the any positions after C<0b> or C<0B> as a separator. C<_> has no meaning.

It can end with the suffix C<L> or C<l>.

If the suffix C<L> or C<l> exists, the return type is long type. Otherwise the return type is int type.

If the return type is int type, the binary numbers part is interpreted as an unsigned 32 bit integer, and is converted to a signed 32-bit integer without changing the bits. For example, C<0b11111111111111111111111111111111> is  -1.

If the return type is long type, the binary numbers part is interpreted as unsigned 64 bit integer, and is converted to a signed 64-bit integer without changing the bits. For example, C<0b1111111111111111111111111111111111111111111111111111111111111111L> is C<-1L>.

Compilation Errors:

If the return type is int type and the value that is except for C<-> is greater than binary C<11111111111111111111111111111111>, a compilation error occurs.

If the return type is long type and the value that is except for C<-> is greater than binary C<1111111111111111111111111111111111111111111111111111111111111111>, a compilation error occurs.

Examples:

  0b0101
  -0b1010
  0b110000L
  0b10101010_10101010

=head2 Floating Point Literals

The floating point litral represetns a floating point number.

=head3 Floating Point Literal Decimal Notation

The floating point litral decimal notation represents a number of float type and double type using decimal numbers C<0-9>.

It can begin with a minus C<->.

It is followed by one or more C<0-9>.

C<_> can be placed at the any positions after the first C<0-9>.

It can be followed by a floating point part, an exponent part, or a combination of a floating point part and an exponent part.

[Floating Point Part Begin]

A floating point part begins with C<.>.

It is followed by one or more C<0-9>.

[Floating Point Part End]

[Exponent Part Begin]

An exponent part begins with C<e> or C<E>.

It can be followed by C<+> or C<->

It is followed by one or more C<0-9>.

[Exponent Part End]

A floating point litral decimal notation can end with a suffix C<f>, C<F>, C<d>, or C<D>.

If a suffix does not exists, a floating point litral decimal notation must have a floating point part or an exponent part.

If the suffix C<f> or C<F> exists, the return type is float type. Otherwise the return type is double type.

Compilation Errors:

If the return type is float type, the floating point litral decimal notation without the suffix must be able to be parsed by the C<strtof> function in the C language. Otherwise, a compilation error occurs.

If the return type is double type, the floating point litral decimal notation without the suffix must be able to be parsed by the C<strtod> function in the C language. Otherwise, a compilation error occurs.

Examples:

  1.32
  -1.32
  1.32f
  1.32F
  1.32d
  1.32D
  1.32e3
  1.32e-3
  1.32E+3
  1.32E-3
  1.32e3f
  12e7

=head3 Floating Point Literal Hexadecimal Notation

The floating point litral hexadecimal notation represents a number of float type and double type using hexadecimal numbers C<0-9a-zA-Z>.

It can begin with a minus C<->.

It is followed by C<0x> or C<0X>.

It is followed by one or more C<0-9a-zA-Z>.

C<_> can be placed at the any positions after C<0x> or C<0X>.

It can be followed by a floating point part, an exponent part, or a combination of a floating point part and an exponent part.

[Floating Point Part Begin]

A floating point part begins with C<.>

It is followed by one or more C<0-9a-zA-Z>.

[Floating Point Part End]

[Exponent Part Begin]

An exponent part begins with C<p> or C<P>.

It can be followed by C<+> or C<->.

It is followed by one or more C<0-9>.

[Exponent Part End]

A floating point litral hexadecimal notation can end with a suffix C<f>, C<F>, C<d>, or C<D>.

If a suffix does not exists, a floating point litral hexadecimal notation must have a floating point part or an exponent part.

Compilation Errors:

If the return type is float type, the floating point litral hexadecimal notation without the suffix must be able to be parsed by the C<strtof> function in the C language. Otherwise, a compilation error occurs.

If the return type is double type, thefloating point litral hexadecimal notation without the suffix must be able to be parsed by the C<strtod> function in the C language. Otherwise, a compilation error occurs.

Examples:

  0x3d3d.edp0
  0x3d3d.edp3
  0x3d3d.edP3
  0x3d3d.edP+3
  0x3d3d.edP-3f
  0x3d3d.edP-3F
  0x3d3d.edP-3d
  0x3d3d.edP-3D
  0x3d3dP+3

=head2 Bool Literals

The bool literal represents a bool object.

=head3 true

C<true> is the alias for L<Bool#TRUE|SPVM::Bool/"TRUE">.

  true

Examples:

  # true
  my $bool_object_true = true;

=head3 false

C<false> is the alias for L<Bool#FALSE|SPVM::Bool/"FALSE">.

  false

Examples:

  # false
  my $bool_object_false = false;

=head2 Character Literal

A character literal represents a number of L<byte type|SPVM::Document::Language::Types/"byte Type"> that normally represents an ASCII character.

It begins with C<'>.

It is followed by a printable ASCII character C<0x20-0x7e> or an L<character literal escape character|/"Character Literal Escape Characters">.

It ends with C<'>.

The return type is byte type.

Compilation Errors:

If the format of the character literal is invalid, a compilation error occurs.

=head3 Character Literal Escape Characters

The List of Character Literal Escape Characters:

=begin html

<table>
  <tr>
    <th>
      Character Literal Escape Characters
    </th>
    <th>
      Values
    </th>
  </tr>
  <tr>
    <td>
      \a
    </td>
    <td>
      <code>0x07</code> BEL
    </td>
  </tr>
  <tr>
    <td>
      \t
    </td>
    <td>
      <code>0x09</code> HT
    </td>
  </tr>
  <tr>
    <td>
      \n
    </td>
    <td>
      <code>0x0A</code> LF
    </td>
  </tr>
  <tr>
    <td>
      \f
    </td>
    <td>
      <code>0x0C</code> FF
    </td>
  </tr>
  <tr>
    <td>
      \r
    </td>
    <td>
      <code>0x0D</code> CR
    </td>
  </tr>
  <tr>
    <td>
      \"
    </td>
    <td>
      <code>0x22</code> "
    </td>
  </tr>
  <tr>
    <td>
      \'
    </td>
    <td>
      <code>0x27</code> '
    </td>
  </tr>
  <tr>
    <td>
      \\
    </td>
    <td>
      <code>0x5C</code> \
    </td>
  </tr>
  <tr>
    <td>
      <a href="#Octal-Escape-Character">Octal Escape Character</a>
    </td>
    <td>
      A number represented by an octal escape character
    </td>
  </tr>
  <tr>
    <td>
      <a href="#Hexadecimal-Escape-Character">Hexadecimal Escape Character</a>
    </td>
    <td>
      A number represented by a hexadecimal escape character
    </td>
  </tr>
</table>

=end html

The type of every character literal escape character is byte type.

Examples:

  # Charater literals
  'a'
  'x'
  '\a'
  '\t'
  '\n'
  '\f'
  '\r'
  '\"'
  '\''
  '\\'
  ' '
  '\0'
  '\012'
  '\377'
  '\o{1}'
  '\xab'
  '\xAB'
  '\x0D'
  '\x0A'
  '\xD'
  '\xA'
  '\xFF'
  '\x{A}'

=head2 Octal Escape Character

The octal escape character represents an unsined 8-bit integer using octal numbers C<0-7>.

The octal escape character is a part of a L<string literal|/"String Literal"> and a L<character literal|/"Character Literal">.

It begins with C<\0>, C<\1>, C<\2>, C<\3>, C<\4>, C<\5>, C<\6>, C<\7>, or C<\o{>.

If it begins with C<\0>, C<\1>, C<\2>, C<\3>, C<\4>, C<\5>, C<\6>, or C<\7>, it is followed by one to two C<0-7>.

If it begins with C<\o{>, it is followed by one to three C<0-7>, and ends with C<}>.

The octal numbers after C<\> or C<\o{> is called octal numbers part.

Octal numbers part is interpreted as an unsined 8-bit integer, and is converted to a number of byte type without changing the bits.

Compilation Errors:

The octal numbers part must be less than or equal to C<377>. Otherwise a compilation error occurs.

If an octal escape character begins with C<\o{>, the close C<}> must exist. Otherwise a compilation error occurs.

Examples:
  
  # Octal escape characters
  \0
  \01
  \03
  \012
  \001
  \077
  \377
  \o{1}
  \o{12}

=head2 Hexadecimal Escape Character

The hexadecimal escape character represents an unsined 8-bit integer using hexadecimal numbers C<0-9a-fA-F>.

The hexadecimal escape character is a part of a L<string literal|/"String Literal"> and a L<character literal|/"Character Literal">.

The hexadecimal escape character begins with C<\x>.

It can be followed by C<{>.

It is followed by one or two C<0-9a-fA-F>. This is called hexadecimal numbers part.

If it contains C<{>, it must be followed by C<}>.

Hexadecimal numbers part is interpreted as an unsined 8-bit integer, and is converted to a number of byte type without changing the bits.

Compilation Errors:

If the format of the hexadecimal escape character is invalid, a compilation error occurs.

Examples:
  
  # Hexadecimal escape characters
  \xab
  \xAB
  \x0D
  \x0A
  \xD
  \xA
  \xFF
  \x{A}

=head2 String Literal

A string literal represents a constant L<string|SPVM::Document::Language::Types/"String">.

A string literal begins with C<">.

It is followed by zero or more UTF-8 characters, L<string literal escape characters|/"String Literal Escape Characters">, or L<variable expansions|/"Variable Expansion">.

It ends with C<">.

The return type is L<string type|SPVM::Document::Language::Types/"string Type">.

Compilation Errors:

If the format of the string literal is invalid, a compilation error occurs.

Examples:

  # String literals
  ""
  "abc";
  "ã‚ã„ã†"
  "hello\tworld\n"
  "hello\x0D\x0A"
  "hello\xA"
  "hello\x{0A}"
  "hello\0"
  "hello\012"
  "hello\377"
  "AAA $foo BBB"
  "AAA $FOO BBB"
  "AAA $$foo BBB"
  "AAA $foo->{x} BBB"
  "AAA $foo->[3] BBB"
  "AAA $foo->{x}[3] BBB"
  "AAA $@ BBB"
  "\N{U+3042}\N{U+3044}\N{U+3046}"

=head3 String Literal Escape Characters

The List of String Literal Escape Characters:

=begin html

<table>
  <tr>
    <th>
      String Literal Escape Characters
   </th>
    <th>
      Values
   </th>
  </tr>
  <tr>
    <td>
      \a
    </td>
    <td>
      <code>0x07</code> BEL
    </td>
  </tr>
  <tr>
    <td>
      \t
    </td>
    <td>
      <code>0x09</code> HT
    </td>
  </tr>
  <tr>
    <td>
      \n
    </td>
    <td>
      <code>0x0A</code> LF
    </td>
  </tr>
  <tr>
    <td>
      \f
    </td>
    <td>
      <code>0x0C</code> FF
    </td>
  </tr>
  <tr>
    <td>
      \r
    </td>
    <td>
      <code>0x0D</code> CR
    </td>
  </tr>
  <tr>
    <td>
      \"
    </td>
    <td>
      <code>0x22</code> "
    </td>
  </tr>
  <tr>
    <td>
      \$
    </td>
    <td>
      <code>0x24</code> $
    </td>
  </tr>
  <tr>
    <td>
      \'
    </td>
    <td>
      <code>0x27</code> '
    </td>
  </tr>
  <tr>
    <td>
      \\
    </td>
    <td>
      <code>0x5C</code> \
    </td>
  </tr>
  <tr>
    <td>
      <a href="#Octal-Escape-Character">Octal Escape Character</a>
    </td>
    <td>
      A number represented by an octal escape character
    </td>
  </tr>
  <tr>
    <td>
      <a href="#Hexadecimal-Escape-Character">Hexadecimal Escape Character</a>
    </td>
    <td>
      A number represented by a hexadecimal escape character
    </td>
  </tr>
  <tr>
    <td>
      <a href="#Unicode-Escape-Character">A Unicode escape character</a>
    </td>
    <td>
      Numbers represented by an Unicode escape character
    </td>
  </tr>
  <tr>
    <td>
      <a href="#Raw-Escape-Characters">A raw escape character</a>
    </td>
    <td>
      Numbers represented by a hexadecimal escape character
    </td>
  </tr>
</table>

=end html

The type of every string literal escape character ohter than the Unicode escape character and the raw escape character is byte type.

The type of each number contained in the Unicode escape character and the raw escape character is byte type.

=head3 Unicode Escape Character

The Unicode escape character represents an UTF-8 character.

An UTF-8 character is represented by an Unicode code point with hexadecimal numbers C<0-9a-fA-F>.

This is one to four numbers of byte type.

The Unicode escape character is a part of a L<string literal|/"String Literal">.

It begins with C<\N{U+>.

It is followed by one or more C<0-9a-fA-F>. This is called code point part.

It ends with C<}>.

Compilation Errors:

If a code point part is not a Unicode scalar value, a compilation error occurs.

Examples:
  
  # Unicode escape characters
  
  # ã‚
  \N{U+3042}
  
  # ã„
  \N{U+3044}
  
  # ã†
  \N{U+3046}"

=head3 Raw Escape Characters

A raw escape character is an escapa character that <\> is interpreted as ASCII C<\> and the following character is interpreted as itself.

For example, a raw escape character C<\s> is ASCII chracters C<\s>.

A raw escape character is a part of a L<string literal|/"String Literal">.

The List of Raw Escape Characters:

=begin html

<table>
  <tr><th>Raw Escape Characters</th></tr>
  <tr><td>\!</td></tr>
  <tr><td>\#</td></tr>
  <tr><td>\%</td></tr>
  <tr><td>\&</td></tr>
  <tr><td>\(</td></tr>
  <tr><td>\)</td></tr>
  <tr><td>\*</td></tr>
  <tr><td>\+</td></tr>
  <tr><td>\,</td></tr>
  <tr><td>\-</td></tr>
  <tr><td>\.</td></tr>
  <tr><td>\/</td></tr>
  <tr><td>\:</td></tr>
  <tr><td>\;</td></tr>
  <tr><td>\<</td></tr>
  <tr><td>\=</td></tr>
  <tr><td>\></td></tr>
  <tr><td>\?</td></tr>
  <tr><td>\@</td></tr>
  <tr><td>\A</td></tr>
  <tr><td>\B</td></tr>
  <tr><td>\D</td></tr>
  <tr><td>\G</td></tr>
  <tr><td>\H</td></tr>
  <tr><td>\K</td></tr>
  <tr><td>\N</td></tr>
  <tr><td>\P</td></tr>
  <tr><td>\R</td></tr>
  <tr><td>\S</td></tr>
  <tr><td>\V</td></tr>
  <tr><td>\W</td></tr>
  <tr><td>\X</td></tr>
  <tr><td>\Z</td></tr>
  <tr><td>\[</td></tr>
  <tr><td>\]</td></tr>
  <tr><td>\^</td></tr>
  <tr><td>\_</td></tr>
  <tr><td>\`</td></tr>
  <tr><td>\b</td></tr>
  <tr><td>\d</td></tr>
  <tr><td>\g</td></tr>
  <tr><td>\h</td></tr>
  <tr><td>\k</td></tr>
  <tr><td>\p</td></tr>
  <tr><td>\s</td></tr>
  <tr><td>\v</td></tr>
  <tr><td>\w</td></tr>
  <tr><td>\z</td></tr>
  <tr><td>\{</td></tr>
  <tr><td>\|</td></tr>
  <tr><td>\}</td></tr>
  <tr><td>\~</td></tr>
</table>

=end html

=head3 Variable Expansion

The variable expasion is a syntax to embed L<getting a local variable|SPVM::Document::Language::Operators/"Getting a Local Variable">, L<getting a class variables|SPVM::Document::Language::Operators/"Getting a Class Variable">, a L<dereference|SPVM::Document::Language::Operators/"Dereference Operator">, L<getting a field|SPVM::Document::Language::Operators/"Getting a Field">, L<getting an array element|SPVM::Document::Language::Operators/"Getting an Array Element">, L<getting the exception variable|SPVM::Document::Language::Operators/"Getting the Exception Variable"> into a L<string literal|"String Literal">.

  "AAA $foo BBB"
  "AAA $FOO BBB"
  "AAA $$foo BBB"
  "AAA $foo->{x} BBB"
  "AAA $foo->[3] BBB"
  "AAA $foo->{x}[3] BBB"
  "AAA $foo->{x}->[3] BBB"
  "AAA $@ BBB"
  "AAA ${foo}BBB"

The above codes are expanded to the following codes.

  "AAA " . $foo . " BBB"
  "AAA " . $FOO . " BBB"
  "AAA " . $$foo . " BBB"
  "AAA " . $foo->{x} . " BBB"
  "AAA " . $foo->[3] . " BBB"
  "AAA " . $foo->{x}[3] . " BBB"
  "AAA " . $foo->{x}->[3] . " BBB"
  "AAA " . $@ . "BBB"
  "AAA " . ${foo} . "BBB"

The operation of getting field does not contain L<space characters|/"Space Characters"> between C<{> and C<}>.

The index of getting array element must be a constant interger.

The getting array dose not contain L<space characters|/"Space Characters"> between C<[> and C<]>.

The end C<$> is interpreted by C<$>, not interpreted as a variable expansion.
  
  # AAA$
  "AAA$"

=head2 Single-Quoted String Literal

A single-quoted string literal represents a constant string without variable expansions with a few escape characters.

It begins with C<q'>.

It is followed by zero or more UTF-8 characters, or L<single-quoted string literal escape characters|/"Single-Quoted String Literal Escape Characters">.

It ends with C<'>.

The return type is L<string type|SPVM::Document::Language::Types/"string Type">.

Compilation Errors:

A single-quoted string literal must be end with C<'>. Otherwise a compilation error occurs.

If the escape character in a single-quoted string literal is invalid, a compilation error occurs.

Examples:

  # Single-quoted string literals
  q'abc';
  q'abc\'\\';

=head3 Single-Quoted String Literal Escape Characters

The List of Single-Quoted String Literal Escape Characters:

=begin html

<table>
  <tr>
    <th>
      Single-Quoted String Literal Escape Characters
   </th>
    <th>
      Values
   </th>
  </tr>
  <tr>
    <td>
      \'
    </td>
    <td>
      <code>0x27</code> '
    </td>
  </tr>
  <tr>
    <td>
      \\
    </td>
    <td>
      <code>0x5C</code> \
    </td>
  </tr>
</table>

=end html

The type of every single-quoted string literal escape character is byte type.

=head2 Here Document

A here document represents a constant string in multiple lines without escape characters and L<variable expansions|/"Variable Expansion">.

  <<'HERE_DOCUMENT_NAME';
  LINE1
  LINE2
  LINEn
  HERE_DOCUMENT_NAME

A here document begins with C<<<'HERE_DOCUMENT_NAME';> and ASCII C<LF>.

I<HERE_DOCUMENT_NAME> is a L<here document name|/"Here Document Name">.

It is followed by a string in multiple lines.

It ends with I<HERE_DOCUMENT_NAME> from the beginning of a line and ASCII C<LF>.

Compilation Errors:

C<<<'HERE_DOCUMENT_NAME';> must not contain L<space characters|/"Space Characters">. Otherwise a compilation error occurs.

Examples:
  
  # Here document
  my $string = <<'EOS';
  Hello
  World
  EOS

=head3 Here Document Name

A here document name consist of C<a-z>, C<A-Z>, C<_>, C<0-9>.

The length of a here document name is greater than or equal to 0.

A here document name cannot begin with C<0-9>.

A here document name cannot contain C<__>.

Compilaition Errors:

If the format of a here document name is invalid, a compilatio error occurs.

=head1 See Also

=over 2

=item * L<SPVM::Document::Language::SyntaxParsing>

=item * L<SPVM::Document::Language::Statements>

=item * L<SPVM::Document::Language::Operators>

=item * L<SPVM::Document::Language::Class>

=item * L<SPVM::Document::Language>

=item * L<SPVM::Document>

=back

=head1 Copyright & License

Copyright (c) 2023 Yuki Kimoto

MIT License