NAME

Neo4j::Driver::Net - Explains the design of the networking modules

VERSION

version 0.40

OVERVIEW

Each Neo4j::Driver::Session has exactly one network controller instance that is used by the session and all of its transactions to communicate with the Neo4j server. This document discusses the features and known limitations of the network controllers.

Unless you're planning to develop custom network adapters for the driver, you probably don't need to read this document.

The controllers don't communicate with the server directly. Instead, they use another module that has responsibility for the actual network transmissions. For HTTP connections, that other module can be customised via "http_adapter_factory" in Neo4j::Driver::Plugin. For Bolt, see "EXTENSIONS" below.

Network responses received from the server will be parsed for Neo4j statement results by the appropriate result handler for the response format used by the server. A custom networking module can also provide custom response parsers, for example implemented in XS code.

Please note that the division of labour between sessions or transactions on the one hand and networking controllers on the other hand is an internal implementation detail of the driver and as such is subject to unannounced change. While some of those details are explained in this document, this is done only to help contributors and users of public APIs better understand the driver's design. See "USE OF INTERNAL APIS" in Neo4j::Driver::Plugin for more information on this topic.

SYNPOSIS

$helper = Neo4j::Driver::Net::HTTP->new({
  net_module => 'Local::Neo4jUserAgentHTTP',
});

$helper->_set_database($db_name);
$helper->{active_tx} = ...;
@results = $helper->_run($tx, @statements);

# Parsing a JSON result
die unless $helper->{http_agent}->http_header->{success};
$json_coder = $helper->{http_agent}->json_coder;
$json_coder->decode( $helper->{http_agent}->fetch_all );

WARNING: Some of these calls are private APIs. See "USE OF INTERNAL APIS" in Neo4j::Driver::Plugin.

FEATURES

The networking controllers primarily deal with the following tasks:

  • Establish a database connection.

  • Provide Neo4j::Driver::ServerInfo.

  • Handle certain generic protocol requirements (such as HTTP content negotiation).

  • Sync state between server transactions and driver transaction objects.

  • Control the translation of Cypher statements to network transmissions, and of network transmissions to statement results.

HTTP connections use proactive content negotiation (RFC 7231) to obtain a suitable response from the Neo4j server. The driver supports both Jolt and JSON as result formats. There is also a fallback result handler, which is used to parse error messages out of text/* responses. The HTTP result handlers are individually queried for the media types they support. This information is cached by the networking controller.

All result handlers inherit a common interface from Neo4j::Driver::Result. They provide methods to initialise and bless result data records as Neo4j::Driver::Record objects. Result handlers are also responsible that all values returned from Neo4j are provided to users in the format that is documented for "get" in Neo4j::Driver::Record. For backwards compatibility, a lot of the internal data structures currently match the format of Neo4j JSON responses.

The first HTTP connection to a Neo4j server is always made to the Discovery API, which is used to obtain Neo4j::Driver::ServerInfo and the transaction endpoint URI template. These are the only GET requests made by the driver. Because of a known issue with Neo4j, the Accept request header field needs to be varied by HTTP request method (#12644).

With HTTP being a stateless protocol, Neo4j supports multiple concurrent transactions by using a different URL for each one in a REST-like fashion. For requests made to such explicit transaction endpoints, the Neo4j Transactional HTTP API always provides transaction status information in the response. Transactions that remain open include an expiration time. The networking controller parses and stores this timestamp and uses it to track which transactions are still open and which have timed out. The origination Date field is used to synchronise the clocks of the driver and the Neo4j server (RFC 7231).

Bolt, on the other hand, currently only supports a single open transaction per connection. While a Bolt connection can be viewed as a simple state machine in the backend Bolt library (see Bolt Protocol Server State Spec), Neo4j::Bolt currently doesn't allow users to directly observe state changes, so it is currently somewhat difficult to determine the Bolt connection state. The driver attempts to infer it based on the behaviour of Neo4j::Bolt, and mostly gets it right, but there may be some as-yet-unknown issues. Bug reports are welcome.

One key difference between HTTP and Bolt is the handling of transaction state in case of Neo4j errors. According to Neo4j Status Codes, the effect of errors is always a transaction rollback. On HTTP, these rollbacks take place immediately. On Bolt, however, the transaction is typically only marked as uncommittable on the Neo4j server, but the Bolt connection is not actually put into the FAILED state immediately. To try and work around this difference between HTTP and Bolt, this driver's Bolt network controller always attempts an explicit transaction rollback if faced with any error condition. Again, this approach mostly gets it right, but there may be some remaining issues, particularly when network errors and server errors happen simultaneously.

COMPATIBILITY

Neo4j::Driver version 0.21 is compatible with Neo4j::Bolt 0.01 or later.

When using HTTP, Neo4j::Driver 0.21 supports determining the version of any Neo4j server via "server" in Neo4j::Driver::Session. This even works on Neo4j 1.x, but running statements on Neo4j 1.x will fail, because it lacks the transactional API.

Neo4j::Driver 0.21 is compatible with Neo4j versions 2.x, 3.x, and 4.x. It supports HTTP responses in the formats JSON and Jolt (both strict mode and sparse mode).

For Bolt as well as HTTP, future versions of the driver will tend to implement new requirements in order to stay compatible with newer versions of Neo4j. Support for old Neo4j or library versions is likely to only be dropped with major updates to the driver (such as 0.x to 1.x).

BUGS AND LIMITATIONS

As described above, there may be cases in which the state of a Bolt connection is not determined correctly, leading to unexpected failures. If such bugs do exist, they are expected to mostly happen in rare edge cases, but please be sure to report any problems.

Clock synchronisation using the HTTP Date header does not take into account network delays. In case of high network latency, the driver may treat transactions as open even though they have already expired on the server. To address this, you could either increase the transaction idle timeout in neo4j.conf or manipulate the return value of date_header() in a custom network adapter.

The metadata in HTTP JSON responses is often insufficient to fully describe the response data. In particular:

  • Path metadata doesn't include node labels or relationship type (#12613).

  • Records with fields that are maps or lists have unparsable metadata (#12306).

  • Byte arrays are coded as lists of integers in JSON results.

As of Neo4j 4.2, the Jolt documentation for byte arrays doesn't match the implementation (#12660). Future Neo4j versions might fix the implementation to match the docs.

Neo4j spatial and temporal types are not currently implemented in all response format parsers.

IPv6 / dual-stack support

The driver supports IPv4 and IPv6 addresses as well as regular hostnames of all varieties.

However, there appears to be an issue contacting Neo4j 4.0 and newer in some dual-stack environments. The issue only occurs when using HTTP networking with dual-stack hostnames (such as localhost). The issue causes a delay upon sending a query to the server, usually by the duration of the timeout (as configured in the driver, or the socket's default timeout); however, an actual timeout exception is never reported to the driver. The issue only occurs sometimes and at this point it's not known how to reproduce it reliably.

I think this is caused in part by some slightly odd behaviour in setup() in IO::Socket::IP. For the localhost address, the while loop is usually executed twice, with the first call to $self->connect( $addr ) failing (with a $! of "Connection refused"), but the second call succeeding. It seems like the first call uses IPv6 to ::1, but the second uses IPv4 to 127.0.0.1. For some reason, the first call sometimes blocks until the timeout. The second call then still succeeds, which is why there is no timeout error reported by LWP. This behaviour is documented under "Timeout => NUM" in IO::Socket::IP.

Specifying 127.0.0.1 instead of localhost seems to avoid this problem entirely. The same seems to be true for ::1 (not sure why).

This issue never seemed to occur with Neo4j 3.5, which raises the question which of the changes that came with version 4.0 may have caused this. While I don't have the answer, it seems that explicitly enabling or disabling IPv6 in the Neo4j 4.0 server configuration in neo4j.conf may help to avoid triggering this issue:

# Disable IPv6 in neo4j.conf
dbms.jvm.additional=-Djava.net.preferIPv4Stack=true

# Enable IPv6 in neo4j.conf
dbms.jvm.additional=-Djava.net.preferIPv4Stack=false
dbms.jvm.additional=-Djava.net.preferIPv6Addresses=true

EXTENSIONS

The net_module config option available in driver versions 0.21 through 0.30 has been replaced with an experimental plug-in API; see "Custom networking modules" in Neo4j::Driver::Deprecations.

Custom Bolt networking modules

By default, Bolt networking uses the amazing XS module Neo4j::Bolt by Mark A. Jensen (MAJENSEN), which in turn uses the C library libneo4j-client to actually connect to the Neo4j server. Updates and improvements are quite possibly best made directly in those libraries, so that not only Neo4j::Driver, but also other users benefit from them.

There is currently no public API for custom Bolt network adapters; see "Network adapter API for Bolt" in Neo4j::Driver::Plugin. However, the driver's test suite still uses the former net_module option internally, so it will remain available for the time being.

use Local::MyBolt;
$driver->config(uri => 'bolt://...');
$driver->{config}{net_module} = 'Local::MyBolt';  # private API

The module name provided will be used in place of Neo4j::Bolt and will have to match its API exactly. Note that there is no support for this approach. For details, see "USE OF INTERNAL APIS" in Neo4j::Driver::Plugin.

Custom HTTP networking modules

Neo4j::Driver includes a default HTTP network adapter. The included adapter may change in future. As of version 0.21, the default adapter uses LWP directly. Earlier versions used REST::Client.

The driver's test suite still uses the former net_module option internally. Modules to be used as net_module must implement the API for an HTTP adapter; see "Network adapter API for HTTP" in Neo4j::Driver::Plugin. Additionally, they must implement the following method:

new
sub new {
  my ($class, $driver) = @_;
  ...
}

Initialises the object. May or may not establish a network connection. May access $driver config options using the method "config" in Neo4j::Driver only.

Custom result handlers

This section has been replaced by "Result handler API" in Neo4j::Driver::Plugin.

USE OF INTERNAL APIS

This section has been replaced by "USE OF INTERNAL APIS" in Neo4j::Driver::Plugin.

AUTHOR

Arne Johannessen <ajnn@cpan.org>

If you contact me by email, please make sure you include the word "Perl" in your subject header to help beat the spam filters.

COPYRIGHT AND LICENSE

This software is Copyright (c) 2016-2023 by Arne Johannessen.

This is free software; you can redistribute it and/or modify it under the terms of the Artistic License 2.0 or (at your option) the same terms as the Perl 5 programming language system itself.