Stream API (deprecated)¶
Warning
This document describes the old experimental streaming API included in version 0.1.0. That API has been replaced in version 0.2.0 with a new registry based implementation. This document is retained for historical purposes. A section at the end describes some of the drawbacks of this approach and why it was replaced.
The “stream” API is meant to provide a mechanism for creating high level bindings for parsing a JSON character buffer directly into static objects. You can use it to make your C/C++ structures JSON-serializable.
The JSON event stream is provided by the LexerParser
which maintains
a state machine for the tokenization and semantic parsing of a JSON text
document. It provides the GetNextEvent()
which sequentially returns
each semantic event in the stream.
Parsing¶
Parsing JSON entities into C++ entities is generally done through overloads of
the ParseValue()
function, which has the following signature:
void ParseValue(const Event& event, LexerParser* stream, ValueType* value);
for each parseable ValueType
. ParseValue()
is essentially a
dispatcher which dispatches either a token parser (for scalar types), an object
parser (for object types) or an array parser (for array types).
For scalar types, the implementation generally looks something like this, where it dispatches a token parser:
void ParseValue(const Event& event, LexerParser* stream, Foo* value) {
if (event.typeno != json::Event::VALUE_LITERAL) {
LOG(WARNING) << fmt::format("Cannot parse {} as value at {}:{}",
json::Event::ToString(event.typeno),
event.token.location.lineno,
event.token.location.colno);
}
ParseToken(event.token, value);
}
For C/C++ structures and classes, the implementation generally looks like this:
void ParseValue(const Event& event, LexerParser* stream, Foo* out) {
ParseObject(event, stream, out);
}
ParseObject()
is a function template which simply does the following:
- Ensures that the parse stream matches the expected sequence:
- Starting with
OBJECT_BEGIN
- Followed by a sequence of (OBJECT_KEY, value) pairs
- Ending with
OBJECT_END
- Starting with
- Iterates through each of the (OBJECT_KEY, value) pairs and
calls
ParseField()
on each pair.
ParseField()
is, in turn, overloaded for every serializable type. The
purpose of ParseField()
is just to select which member to call
ParseValue()
on, given the current JSON object key. The
implementation generally looks like this:
int ParseField(const re2::StringPiece& key, const Event& event,
LexerParser* stream, Foo* out) {
uint64_t keyid = RuntimeHash(key);
switch (keyid) {
case Hash("field_a"):
ParseValue(event, stream, &out->field_a);
break;
case Hash("field_b"):
ParseValue(event, stream, &out->field_b);
break;
case Hash("field_c"):
ParseValue(event, stream, &out->field_c);
break;
default:
SinkValue(event, stream);
return 1;
}
return 0;
}
In summary, the logic follows this call-map:
ParseValue
├─ ParseToken
└─ ParseObject (template)
└─ ParseField (overload)
└─ ParseValue (overload)
All JSON-parsable types must implement ParseValue()
. JSON-parsable
scalars my utilize ParseToken()
if an overload exists, or they may
implement the token parser directly in ParseValue()
. JSON-parsable objects
must implement ParseValue()
as a single-line function call to
ParseObject()
and must also implement ParseField()
.
Note that ParseValue()
overloads are necessary mostly due to the static
nature of C++. You could imagine an implementation that looks like the
following:
void ParseValue(const Event& event, LexerParser* stream, Foo* value) {
if (event.typeno == json::Event::BEGIN_OBJECT) {
ParseObject(event, stream, value);
} elif (event.typeno == json::Event::BEGIN_ARRAY) {
ParseArray(event, stream, value);
} else {
ParseScalar(event.token, value);
}
}
But for a given type Foo
only one of these functions will have
applicable overloads. The other two wont exist and we’ll get compiler errors.
This is why ParseValue()
needs to be overloaded for every type.
Problems with two-stage lookup¶
The above strategy is problematic due to the idosyncracies of two-stage lookup
in C++. In order to compile successfully, the template definition for
ParseObject()
must come after the declaration for all ParseValue()
overloads in the translation unit where it is instanciated for a particular
type. This leads to the ugly pattern of:
- Include the basic stream headers
- Declare all the overloads (or include the headers that declare them)
- Include the “stream_tpl.h” header, which has the implementation of the helper templates.
- Define the overloads.
This means that correct compilation of any translation unit depends on a strict and esoteric order of the includes. It also requires including a header after some C++. This could be mitigated by hiding the template definition within a macro, but then it would require a magic macro to drop a template definition somewhere in the source file. And, again, the placement of that macro call will be specific and esoteric.
All of this leads to a poor library user experience and does not provide a good mechanism for extensibility. Starting with version 0.2.0 a new pluggable scheme is implemented which avoids these pitfalls.