Skip to content

Export and Import

Moving documents in and out of Planck stores is a manifest-driven operation. You write a single YAML file that describes what to move, in which format, and how the nested structures should map onto flat files. The same manifest works for both the directions. You can run it once, or you can let the workbench scheduler run it on a cron.

This page is the operator guide for that flow. It covers the manifest format, the path data takes through the engine during each operation, and the difference between catalog-level export/import (which is what this page is about) and the file-level backup/restore that planctl gives you for disaster recovery.

One note on scope: backup, gc, WAL truncate, stats, export, import, and restore all run as scheduler tasks inside the workbench. Replication is a separate, continuous path between primary and replica, and it is not driven by the scheduler.

Where it runs

Export and import live in three places.

  • Workbench UI. The control plane on port 2369. There is a Run Now dialog and a Schedule dialog, both reachable from Server Overview or from the Schedules panel. This is the usual entry point.
  • Workbench HTTP API. POST /api/export and POST /api/import. Both take the same YAML manifest that the UI uses. Handy when you are scripting from CI or from your own ops tooling.
  • Scheduler. A persisted manifest plus a cron expression. The scheduler resolves any template variables at execution time and runs the same engine path that the Run Now dialog uses.

The engine itself (the planck binary) is the thing that actually reads or writes the documents. The workbench parses the manifest, translates the optional YQL filter into the engine's query form, and then forwards the request over the wire protocol.

Do note that planctl does not have its own export or import subcommand. For one-off catalog moves, use the workbench UI or the HTTP API. For full-host disaster recovery, use planctl backup and planctl restore. The difference between the two flows is covered below.

The manifest

Every operation, scheduled or one-shot, takes one YAML manifest. It is the single input that covers every format and every layout.

Top-level fields

FieldTypeRequiredDescription
storestringYesStore namespace, for example orders or stores.orders.
formatstringYesbson, json, or csv.
output_dirstringYesOutput directory for export, source directory or file path for import.
querystringNoYQL filter to export a subset of documents. Export only.
fieldsarrayNoField type hints for JSON import. See JSON type hints.
entitiesarrayNoRequired for CSV. Optional for JSON and BSON. Defines the file mapping.

Entity definitions

The entities list is how you describe a multi-file layout. Each entry is one CSV file or one logical JSON shard.

FieldTypeRequiredDescription
namestringYesEntity name. Used for file naming and parent references.
rolestringYesparent or child.
filestringYesFilename for this entity's data.
parentstringNoParent entity name. Use this to nest a child under another child. Defaults to the root parent.
parent_fieldstringChild onlyArray field on the parent document where this entity nests.
join_keystringChild onlyColumn that links child rows to the parent. It is read directly from the CSV header, not from fields.
fieldsarrayYesList of field descriptors for document columns. The join_key column does not need to be declared here.

Field descriptors

PropertyTypeRequiredDescription
namestringYesField name in the document.
typestringYesData type used for coercion.

Supported types:

TypeMaps to
stringText values.
intInteger (i64).
doubleFloating-point (f64).
boolBoolean.
datetimeTimestamp in epoch milliseconds.
objectidBSON ObjectId (12-byte hex).

Hierarchies

Children can nest under children, to any depth. The parent field on each child names another entity, and in this way it builds up a tree.

orders (parent)
  +-- items (child of orders)
  |     +-- attributes (child of items)
  |     |     +-- tags (child of attributes)
  |     +-- reviews (child of items)
  +-- payments (child of orders)

On export, the engine walks the tree top down from the root, writes the parent rows, and then writes each child file with the join key injected from the parent it belongs to.

On import, entities are sorted deepest-first, loaded, and grouped by join key. The importer then walks the tree bottom up, embedding child groups as arrays under the configured parent_field until the root document is fully assembled. The root documents are inserted into the store as BSON, and the engine flushes once the batch is done.

Filtering exports

The optional query field accepts a YQL expression. Without it, the export covers every document in the store.

yaml
query: 'orders.filter(status = "completed" and total > 100)'

For scheduled exports you usually want a relative date range. Hardcoded timestamps go stale the moment they ship. The scheduler resolves a small set of template variables to epoch milliseconds at execution time:

VariableResolves to
${today}Start of the current day.
${yesterday}Start of the previous day.
${tomorrow}Start of the next day.
${now}Current timestamp.
${week_ago}Seven days before ${now}.
${month_ago}Thirty days before ${now}.

A typical nightly export of the previous day's sales:

yaml
query: "sales.filter(sale_date >= ${yesterday} and sale_date < ${today})"

The variables are resolved at run time, not at the moment you save the schedule.

Running an operation

From the workbench UI

Two entry points, and the same dialog:

  1. Server Overview, Schema tab. Click Export or Import next to any store. The dialog opens with the store namespace pre-filled.
  2. Schedules panel. Create a task with export or import as its type, then paste or upload a manifest.

The dialog supports either mode:

ModeBehavior
Run NowExecutes immediately. The dialog shows progress and a final result panel.
SchedulePersists the manifest as a scheduler task and runs it on the given cron.

Manifests can be typed straight into the editor or uploaded from disk. The dialog accepts .yaml, .yml, and .txt files, loads the content into the editor, and lets you review the same before running.

Scheduling fields

When you save as a scheduled task:

FieldRequiredDescription
NameYesSchedule name, for example nightly-orders-export.
CronYesStandard 5-field cron expression.
DescriptionNoFree-form note for operators.

Common cron presets:

PresetExpression
Daily, 2 am0 2 * * *
Daily, 4 am0 4 * * *
Weekly, Sun 3 am0 3 * * 0
Hourly0 * * * *

Scheduled export and import tasks show up in the Schedules panel alongside backup, gc, wal truncate, stats, and restore tasks. They can be paused, resumed, edited, or run on demand from the same panel itself.

From the HTTP API

The same manifest, posted as the request body:

sh
curl -X POST https://workbench.example.com:2369/api/export \
  -H "Content-Type: application/yaml" \
  -H "Authorization: Bearer $WB_KEY" \
  --data-binary @orders-export.yaml

/api/import takes the manifest in the same way. Both are meant for scripted use from CI or from ops automation.

Manifest examples

JSON, whole store

yaml
store: orders
format: json
output_dir: /data/exports/orders

JSON is self-describing, so entities is not needed here. The export covers every document in the store.

JSON, filtered export

yaml
store: orders
format: json
output_dir: /data/exports/shipped-orders
query: "orders.filter(TotalDue > 10000)"

JSON import, auto-inferred types

yaml
store: orders
format: json
output_dir: /data/imports/orders.json

For a JSON import, output_dir is the path to the file. The file must be a single JSON array of objects. With no fields section, the types come straight from the JSON syntax:

  • JSON string to BSON string.
  • JSON number with no decimal to BSON int64.
  • JSON number with a decimal to BSON double.
  • JSON boolean to BSON boolean.
  • JSON null to BSON null.
  • JSON object to BSON embedded document.
  • JSON array to BSON array.

JSON type hints

Auto-inference is convenient, but it does not always produce the BSON type you actually want. The common cases are numeric strings that should become integers, ISO date strings that should become timestamps, and integer literals that should become doubles.

Add a fields block to coerce specific fields. Any field that you do not list keeps its auto-inferred type.

yaml
store: orders
format: json
output_dir: /data/imports/orders.json
fields:
  - name: EmployeeID
    type: int
  - name: CustomerID
    type: int
  - name: TotalDue
    type: double
  - name: OrderDate
    type: datetime
  - name: IsOnline
    type: bool

Coercion rules:

Declared typeJSON valueBSON result
int"289" (string)int64 289
int289 (number)int64 289
double"9.99" (string)double 9.99
double100 (number)double 100.0
bool"true", "1", "yes" (string)boolean true
bool1 (number)boolean true
bool0 (number)boolean false
datetime"2024-01-15" (string)int64 1705276800000 (epoch ms)
datetime"2024-01-15T10:30:00Z" (string)int64 1705314600000 (epoch ms)
stringanythingstored as written (default).

The supported datetime formats are YYYY-MM-DD (midnight UTC) and YYYY-MM-DDTHH:MM:SSZ (ISO 8601 UTC). The T may also be a space.

BSON

yaml
store: products
format: bson
output_dir: /data/exports/products

Documents are written out as length-prefixed BSON blobs. The same manifest, with output_dir pointing at a .bson file, performs an import.

CSV, flat

yaml
store: products
format: csv
output_dir: /data/exports/products

entities:
  - name: products
    role: parent
    file: products.csv
    fields:
      - name: ProductID
        type: int
      - name: ProductNumber
        type: string
      - name: ProductName
        type: string
      - name: StandardCost
        type: double
      - name: ListPrice
        type: double

One CSV, the listed columns, the declared types. The same manifest runs as an import as well: the engine reads each row, coerces the fields to their declared BSON types, and inserts.

CSV, parent and child

yaml
store: orders
format: csv
output_dir: /data/exports/orders

entities:
  - name: orders
    role: parent
    file: orders.csv
    fields:
      - name: OrderDate
        type: string
      - name: CustomerID
        type: int
      - name: SubTotal
        type: double
      - name: TotalDue
        type: double

  - name: details
    role: child
    parent_field: SalesOrderDetails
    join_key: CustomerID
    file: order_details.csv
    fields:
      - name: SalesOrderDetailID
        type: int
      - name: ProductID
        type: int
      - name: OrderQty
        type: int
      - name: UnitPrice
        type: double
      - name: LineTotal
        type: double

Export flattens the orders into orders.csv and their SalesOrderDetails arrays into order_details.csv, with CustomerID injected from the parent row. Import goes the other way around: rows in order_details.csv are grouped by CustomerID, then embedded as the SalesOrderDetails array on the matching order before it is inserted.

CSV, three levels deep

yaml
store: orders
format: csv
output_dir: /data/exports/orders-full

entities:
  - name: orders
    role: parent
    file: orders.csv
    fields:
      - name: OrderDate
        type: string
      - name: CustomerID
        type: int
      - name: TotalDue
        type: double

  - name: details
    role: child
    parent_field: SalesOrderDetails
    join_key: CustomerID
    file: order_details.csv
    fields:
      - name: SalesOrderDetailID
        type: int
      - name: ProductID
        type: int
      - name: OrderQty
        type: int
      - name: UnitPrice
        type: double

  - name: attributes
    role: child
    parent: details
    parent_field: Attributes
    join_key: SalesOrderDetailID
    file: detail_attributes.csv
    fields:
      - name: AttrName
        type: string
      - name: AttrValue
        type: string

Here attributes nests under details because of the parent: details line. Without that line it would nest under the root instead.

CSV, flattening a sub-document

yaml
store: customers
format: csv
output_dir: /data/exports/customers

entities:
  - name: customers
    role: parent
    file: customers.csv
    fields:
      - name: CustomerID
        type: int
      - name: FirstName
        type: string
      - name: LastName
        type: string
      - name: FullName
        type: string

  - name: addresses
    role: child
    parent_field: Address
    join_key: CustomerID
    file: customer_addresses.csv
    fields:
      - name: Street
        type: string
      - name: City
        type: string
      - name: State
        type: string
      - name: ZipCode
        type: string
      - name: Country
        type: string

A nested Address sub-document is flattened into its own CSV, with CustomerID carried across.

Scheduled export, daily sales

yaml
store: sales
format: csv
output_dir: /data/exports/daily-sales
query: "sales.filter(sale_date >= ${yesterday} and sale_date < ${today})"

entities:
  - name: sales
    role: parent
    file: sales.csv
    fields:
      - { name: sale_id, type: int }
      - { name: sale_date, type: datetime }
      - { name: register_id, type: string }
      - { name: total, type: double }

  - name: line_items
    role: child
    parent_field: items
    join_key: sale_id
    file: sale_items.csv
    fields:
      - { name: sku, type: string }
      - { name: quantity, type: int }
      - { name: price, type: double }

Save this with cron 0 1 * * * and the scheduler will write yesterday's sales to two CSVs at 1 am every night.

What happens under the hood

Export

  1. The workbench receives the manifest YAML.
  2. If query is set, the workbench parses the YQL into the engine's query form.
  3. The parsed manifest, along with any parsed query, is sent to the engine over the wire protocol.
  4. The engine flushes the memtable so that the export sees the most recent writes.
  5. The engine scans the store, applies the predicate, and walks the entity tree to write each file. For each child, the join key from the parent row is injected into the child file.

Import

  1. The workbench receives the manifest YAML and forwards it to the engine.
  2. The engine sorts the entities deepest-first.
  3. At each depth, the child file is loaded and the rows are grouped by their join_key column (read from the CSV header, not from the fields list).
  4. The importer walks back up the tree, embedding the already-grouped children as arrays under the configured parent_field.
  5. The root parent file is read row by row. The matching child groups are embedded, the document is assembled, and the row is inserted as BSON.
  6. After the batch finishes, the engine flushes to persist the new documents.

Scheduled execution

  1. The scheduler reads the stored manifest from the schedule document.
  2. The template variables are resolved to epoch milliseconds at that moment.
  3. The resolved manifest then takes the same path as a Run Now operation. The result is logged back to the schedule task, so you can see it in the Schedules panel.

Export and import vs backup and restore

It is worth being explicit about the split here. Both live in the workbench scheduler, but they are not the same operation.

ConcernExport / importBackup / restore
Driven byYAML manifestApp or system selector
ScopeA single store, optionally filteredA whole app's data dir, or the workbench system DB
FormatJSON, BSON, CSVFile-level snapshot (vlog segments, B+ tree, WAL)
PurposeSeed data, cross-environment migrationDisaster recovery
CLI surfaceNone. UI and /api/export /api/importplanctl backup and planctl restore
Scheduler task typeexport, importbackup, restore

planctl backup takes a one-shot file-level snapshot of the running app and downloads it to your machine:

sh
planctl backup --app myapp --profile prod --output ./backups

planctl restore writes a snapshot back into the target host. The mode is implicit in which flags are present:

sh
# restore an app
planctl restore --app myapp --backup ./backups/myapp-2025-06-01.tar --profile prod

# restore one service on an app
planctl restore --app myapp --service orders --backup ./backups/orders.tar --profile prod

# restore the workbench's own system DB
planctl restore --system --backup ./backups/system-2025-06-01.tar

Use planctl backup and planctl restore for the bit-for-bit recovery scenario. For everything else, use the export/import manifests.

Practical notes

  • Use absolute paths for output_dir. Relative paths resolve against the engine process's working directory, which is rarely where you want them to land.
  • Declare types for every CSV column. CSV carries no type information of its own, so the manifest is the only source of truth. For JSON imports, add a fields block for any field where auto-inference produces the wrong BSON type.
  • Schedule heavy operations off-hours. Export scans the whole store. Import batches the inserts. Both consume CPU, disk, and memory in proportion to the data set.
  • Use template variables for date-bounded scheduled exports. A hardcoded date in a recurring task is a bug waiting to ship.
  • Test with Run Now before scheduling. The same manifest, the same engine path. In case it works once, it will work on a cron too.
  • One manifest, both directions. The same file can drive an export on the source host and an import on the target host. That is the intended way to move data between environments.