Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Extism host SDKs (Java / PHP / Ruby / … the polyglot tail)

The aozora-extism crate compiles to one portable wasm32-unknown-unknown artifact — aozora.wasm — that any language with an Extism host SDK can load. The bytes are identical on every platform, so there is no per-(OS × arch) native build to produce, sign, and publish: a Java, PHP, or Ruby host loads the same wasm a Go host does.

This is the breadth strategy for new languages (ADR-0006). The native bindings stay where they already pay their way — Python (PyO3) and the browser WASM (wasm-bindgen) are in-process and faster — and the C ABI remains for max-performance embedders willing to ship a native library per platform. Extism covers everyone else with a single artifact and mechanically generated types.

The contract is the same “text in → bytes out” waist as the C ABI: each export takes the Aozora source as input bytes and returns either HTML, a round-tripped source string, or a versioned JSON envelope. Every JSON path delegates to aozora::wire — the single cross-driver authority — so the output is byte-identical to the C ABI, the browser WASM, and the PyO3 drivers.

The plugin contract

aozora.wasm exports seven #[plugin_fn] entry points. Each takes the source text as input and returns a string:

ExportInputReturnsShape
to_htmlsourcestringSemantic HTML5 with aozora-* class hooks.
serializesourcestringCanonical 青空文庫 source (round-trip).
diagnostics_jsonsourcestringWire envelope of diagnostics.
nodes_jsonsourcestringWire envelope of source-keyed nodes.
pairs_jsonsourcestringWire envelope of matched open/close pairs.
container_pairs_jsonsourcestringWire envelope of container open/close pairs.
schema_version(ignored)stringThe wire schema version as a decimal string.

The four *_json exports each emit the standard wire envelope

{ "schema_version": 1, "data": [ /* … entries … */ ] }

The per-endpoint data entry shapes — and the committed JSON Schema for each — are documented in the Wire format chapter. to_html and serialize return a bare string (no envelope), and schema_version returns just the integer rendered as text (e.g. "1"); it ignores its input, so a host calls it with an empty buffer.

A source larger than the parser’s 4 GiB (u32::MAX) span limit is rejected on the Extism error channel rather than aborting the instance — the same guard the C ABI and browser WASM apply.

The schema_version wire contract

Every *_json export wraps its payload in { "schema_version": N, "data": [...] }, where N is aozora::wire::SCHEMA_VERSION baked into the wasm at build time.

A host MUST call schema_version at load time and assert that the returned integer equals the version its types were generated for:

  1. The wasm and the host’s generated types are version-locked. Mismatch means the data array may not decode into the types you compiled against.
  2. schema_version is a cheap, input-free probe — the canonical place to fail fast, before the first real parse.

A SCHEMA_VERSION bump is a breaking change to the wire shape (a new kind value, a field rename, an envelope restructuring). Per ADR-0006’s consequences, a bump forces:

  • regeneration of every language’s types (just types-langs, drift-gated), and
  • a coordinated SDK release — the wasm release asset and the host SDKs are released together, version-locked.

So a host that asserts schema_version == <generated-for> at load can treat any other value as “this wasm is from a different release than my types” and refuse to proceed, rather than silently decoding against the wrong shape.

Worked example: the Go SDK

The reference host SDK is aozora-go — a pure-Go host built on the wazero runtime (no cgo, no native build). It is the concrete instance of the language-agnostic pattern below: load aozora.wasm, assert schema_version, call the exports, and decode the envelopes with types generated from the committed JSON Schema. Every other Extism host SDK follows the identical shape — only the host-SDK API calls and the generated type syntax differ.

See aozora-go for the worked, idiomatic version; the section below is the template every language instantiates.

Language-agnostic “call a plugin export” template

The steps are the same in every Extism host SDK; only the method names and type syntax change.

  1. Obtain aozora.wasm. Download it from a GitHub release asset, or build it yourself with just extism-build (see Building the plugin).
  2. Create an Extism plugin from the bytes. Hand the wasm bytes to your host SDK’s plugin constructor. WASI is not required — the plugin needs no filesystem or environment access.
  3. Call schema_version and assert. Invoke schema_version with an empty input, parse the returned decimal string to an integer, and assert it equals the version your types were generated for. Abort on mismatch.
  4. Call to_html(source). Pass the source bytes; receive the HTML5 string.
  5. Call nodes_json(source) (or any *_json export). Receive the JSON envelope string and parse it.
  6. Decode data with generated types. Deserialize the envelope’s data array into the types generated from the committed JSON Schema for that endpoint.
plugin  = ExtismPlugin(read("aozora.wasm"))      // step 2

ver     = int(plugin.call("schema_version", ""))  // step 3
assert ver == EXPECTED_SCHEMA_VERSION

html    = plugin.call("to_html", source)          // step 4

env     = json_parse(plugin.call("nodes_json", source))   // step 5
assert env.schema_version == EXPECTED_SCHEMA_VERSION
nodes   = decode<NodeWire[]>(env.data)            // step 6

One plugin instance is not concurrency-safe. A single Extism plugin wraps a single wasm instance with its own linear memory; do not call into one instance from multiple threads at once. Use one instance per thread, or pool them.

Per-language pointers

Extism publishes host SDKs for roughly 15 languages — including Java, PHP, Ruby, .NET, Elixir, Haskell, OCaml, C/C++, and more — plus the pure-Go aozora-go reference. Browse the current set at the Extism host-SDK docs.

  • Types for every supported language are generated from the committed wire JSON Schema by just types-langs (the quicktype driver), wired into the same drift-gate that guards the TypeScript .d.ts. Generate once per SCHEMA_VERSION; commit the output.
  • The wasm ships as a GitHub release asset (one artifact, all platforms) and is reproducible locally via just extism-build.

Building the plugin

just extism-build

Builds aozora-extism for wasm32-unknown-unknown and runs binaryen’s wasm-opt (the pinned, bulk-memory-capable build baked into the dev image), producing:

  • crates/aozora-extism/dist/aozora.wasm — the portable plugin artifact.

To exercise it end-to-end:

just smoke-extism

Both run inside the dev image — never invoke cargo / wasm-opt on the host.

See also

  • Choosing a binding — native vs. C ABI vs. Extism, and when to reach for each.
  • Go SDK — the reference Extism host SDK (pure-Go wazero).
  • Wire format — the envelope shape, the four endpoint payloads, and their JSON Schemas.
  • C ABI — the in-process alternative for embedders that ship a native library.
  • ADR-0006 — why Extism + schema-driven type generation is the breadth strategy.