JSON::Tokenize - tokenize a string containing JSON


use JSON::Tokenize ':all';
my $input = '{"tuttie":["fruity", true, 100]}';
my $token = tokenize_json ($input);
print_tokens ($token, 0);

sub print_tokens
    my ($token, $depth) = @_;
    while ($token) {
        my $start = tokenize_start ($token);
        my $end = tokenize_end ($token);
        my $type = tokenize_type ($token);
        print "   " x $depth;
        my $value = substr ($input, $start, $end - $start);
        print ">>$value<< has type $type\n";
        my $child = tokenize_child ($token);
        if ($child) {
            print_tokens ($child, $depth+1);
        my $next = tokenize_next ($token);
        $token = $next;

This outputs

>>{"tuttie":["fruity", true, 100]}<< has type object
   >>"tuttie"<< has type string
   >>:<< has type colon
   >>["fruity", true, 100]<< has type array
      >>"fruity"<< has type string
      >>,<< has type comma
      >>true<< has type literal
      >>,<< has type comma
      >>100<< has type number


This documents version 0.46 of JSON::Tokenize corresponding to git commit 82e253446533e967f63181cf57cefa12b96d601e released on Wed Nov 16 13:25:28 2016 +0900.


This is a module for tokenizing a JSON string. It breaks the string into individual tokens without creating any Perl structures. Thus it can be used for tasks such as picking out or searching through parts of a large JSON structure without storing each part of the entire structure as individual Perl variables in memory.

This module is an experimental part of JSON::Parse and its interface is likely to change. The tokenizing functions are currently written in a very primitive way.



my $token = tokenize_json ($json);


my $next = tokenize_next ($token);

Walk the tree of tokens.


my $child = tokenize_child ($child);

Walk the tree of tokens.


my $start = tokenize_start ($token);

Get the start of the token as a byte offset from the start of the string. Note this is a byte offset not a character offset.


my $end = tokenize_end ($token);

Get the end of the token as a byte offset from the start of the string. Note this is a byte offset not a character offset.


my $type = tokenize_type ($token);

Get the type of the token as a string. The possible return values are

"initial state",
"unicode escape"


Ben Bullock, <>


If you'd like to see this module continued, let me know that you're using it. For example, send an email, write a bug report, star the project's github repository, add a patch, add a ++ on, or write a rating at CPAN ratings. It really does make a difference. Thanks.


This package and associated files are copyright (C) 2016 Ben Bullock.

You can use, copy, modify and redistribute this package and associated files under the Perl Artistic Licence or the GNU General Public Licence.