Skip to content

Export and Import data in Planck using YAML Manifests

Overview

Planck exports and imports data through YAML manifests, a single document that describes what data to move, in what format, the document structure, field types, parent-child relationships, and file mappings. The same manifest format works for both export and import.

Operations can be run immediately or saved as recurring scheduled tasks with cron expressions.

Manifest Format

Every export and import is manifest-driven. The manifest is the single input that covers all scenarios:

  • BSON: store, format, output directory (documents are self-describing)
  • JSON: store, format, output directory, optional fields for type coercion during import
  • CSV flat: store, format, output directory, single entity with fields and types
  • CSV nested: store, format, output directory, multiple entities with parent/child relationships, join keys, and field definitions

Top-Level Fields

FieldTypeRequiredDescription
storestringYesStore namespace (e.g., orders or stores.orders)
formatstringYesOutput format: bson, json, or csv
output_dirstringYesDirectory for output files (export) or source files (import)
querystringNoYQL filter expression to export a subset of documents (export only)
fieldsarrayNoField type descriptors for JSON import (see JSON Import with Type Hints)

Entity Definition

The entities list describes how documents map to files. Required for CSV; optional for JSON and BSON.

FieldTypeRequiredDescription
namestringYesEntity name (used for file naming and parent references)
rolestringYesparent or child
filestringYesFilename for this entity's data
parentstringNoParent entity name (for children nested under another child, not the root). Defaults to the root parent
parent_fieldstringChild onlyArray field name in the parent document where this entity nests
join_keystringChild onlyColumn used to link child rows to the parent. Does not need to appear in the child's fields list, it is read directly from the CSV header
fieldsarrayYesList of field descriptors (document fields only; the join_key column is handled automatically)

Field Descriptor

Each field in the fields list defines a column:

PropertyTypeRequiredDescription
namestringYesField name in the document
typestringYesData type for conversion

Available Field Types

TypeDescription
stringText values
intInteger (i64)
doubleFloating-point (f64)
boolBoolean (true/false)
datetimeTimestamp (epoch milliseconds)
objectidBSON ObjectId (12-byte hex)

Hierarchy, Unlimited Depth

The parent field on each child entity references another entity by name, building a tree of any depth:

orders (parent)
  +-- items (child of orders)
  |     +-- attributes (child of items)
  |     |     +-- tags (child of attributes)
  |     +-- reviews (child of items)
  +-- payments (child of orders)

Export: top-down recursion from root, writing child rows at each level with the join key injected from the parent.

Import: entities are sorted by depth (deepest first), loaded and grouped bottom-up by join key, then assembled top-down into nested documents.

Query Filter (Export Only)

The optional query field accepts a YQL expression to export only matching documents. Without it, the entire store is exported.

Static Query

yaml
query: "orders.filter(status = \"completed\" and total > 100)"

Template Variables (Scheduled Exports)

For scheduled exports, dates cannot be hardcoded. Template variables are resolved by the scheduler at execution time:

VariableResolves To
${today}Start of current day (epoch ms)
${yesterday}Start of previous day (epoch ms)
${tomorrow}Start of next day (epoch ms)
${now}Current timestamp (epoch ms)
${week_ago}7 days ago (epoch ms)
${month_ago}30 days ago (epoch ms)

Example, daily export of yesterday's sales:

yaml
query: "sales.filter(sale_date >= ${yesterday} and sale_date < ${today})"

Variables are resolved at execution time, not at schedule creation time.

Running Exports and Imports

From the UI

Exports and imports can be triggered from two places:

  1. Server Overview > Schema tab, Click the Export or Import button next to any store. The dialog opens pre-filled with the store namespace.
  2. Schedules panel, Create an export or import task with a manifest.

Execution Modes

Both dialogs offer two modes:

ModeDescription
Run NowExecute immediately. The dialog shows progress and results.
ScheduleSave as a recurring task. Runs at the specified cron time.

Scheduling

When scheduling, provide:

FieldRequiredDescription
NameYesSchedule name (e.g., nightly-orders-export)
CronYesCron expression
DescriptionNoOptional description

Cron presets:

PresetExpression
Daily 2am0 2 * * *
Daily 4am0 4 * * *
Weekly Sun 3am0 3 * * 0
Hourly0 * * * *

Scheduled tasks appear in the Schedules panel alongside backup and GC tasks, and can be enabled/disabled, edited, or deleted.

Manifest File Upload

Both dialogs support uploading manifest files from your local machine. Click Upload file and select a .yaml, .yml, or .txt file. The content loads into the editor for review before executing.

Sample Manifests

JSON, Simple Export (entire store)

yaml
store: orders
format: json
output_dir: /data/exports/orders

No entities needed, JSON is self-describing. Exports all documents.

JSON, Export with Query Filter

yaml
store: orders
format: json
output_dir: /data/exports/shipped-orders
query: "orders.filter(TotalDue > 10000)"

Only exports orders matching the filter.

JSON, Import (auto-infer types)

yaml
store: orders
format: json
output_dir: /data/imports/orders.json

For JSON import without fields, output_dir is the path to the JSON file containing an array of objects. Types are auto-inferred from JSON syntax:

  • JSON strings → BSON string
  • JSON numbers (no decimal) → BSON int64
  • JSON numbers (with decimal) → BSON double
  • JSON booleans → BSON boolean
  • JSON null → BSON null
  • JSON objects → BSON embedded document
  • JSON arrays → BSON array

JSON, Import with Type Hints

Auto-inference doesn't always produce the correct BSON types. Common problems:

  • Numeric strings like "EmployeeID": "289" stay as strings instead of int64
  • Date strings like "OrderDate": "2024-01-15" stay as strings instead of datetime
  • Numbers like 100 that should be doubles get stored as int64

Add a fields section to control type coercion for specific fields:

yaml
store: orders
format: json
output_dir: /data/imports/orders.json
fields:
  - name: EmployeeID
    type: int
  - name: CustomerID
    type: int
  - name: TotalDue
    type: double
  - name: OrderDate
    type: datetime
  - name: IsOnline
    type: bool

Fields listed in the manifest are coerced to the specified type. Fields not listed use auto-inference (current behavior). This gives you explicit control where you need it without having to declare every field.

Type coercion rules:

Declared TypeJSON ValueBSON Result
int"289" (string)int64 289
int289 (number)int64 289
double"9.99" (string)double 9.99
double100 (number)double 100.0
bool"true", "1", "yes" (string)boolean true
bool1 (number)boolean true
bool0 (number)boolean false
datetime"2024-01-15" (string)int64 1705276800000 (epoch ms)
datetime"2024-01-15T10:30:00Z" (string)int64 1705314600000 (epoch ms)
stringanystored as-is (default)

Date format support: YYYY-MM-DD (date only, midnight UTC) and YYYY-MM-DDTHH:MM:SSZ (full ISO 8601 UTC). The T separator can also be a space: YYYY-MM-DD HH:MM:SS.

BSON, Export

yaml
store: products
format: bson
output_dir: /data/exports/products

Exports all documents as length-prefixed BSON blobs.

BSON, Import

yaml
store: products
format: bson
output_dir: /data/imports/products.bson

CSV, Flat Export (single entity)

yaml
store: products
format: csv
output_dir: /data/exports/products

entities:
  - name: products
    role: parent
    file: products.csv
    fields:
      - name: ProductID
        type: int
      - name: ProductNumber
        type: string
      - name: ProductName
        type: string
      - name: StandardCost
        type: double
      - name: ListPrice
        type: double

Writes a single CSV file with the specified columns.

CSV, Flat Import

yaml
store: products
format: csv
output_dir: /data/imports

entities:
  - name: products
    role: parent
    file: products.csv
    fields:
      - name: ProductID
        type: int
      - name: ProductNumber
        type: string
      - name: ProductName
        type: string
      - name: StandardCost
        type: double
      - name: ListPrice
        type: double

Reads the CSV, converts each row to a BSON document with correct field types, and inserts into the store.

CSV, Nested Export (parent + child)

yaml
store: orders
format: csv
output_dir: /data/exports/orders

entities:
  - name: orders
    role: parent
    file: orders.csv
    fields:
      - name: OrderDate
        type: string
      - name: CustomerID
        type: int
      - name: SubTotal
        type: double
      - name: TotalDue
        type: double

  - name: details
    role: child
    parent_field: SalesOrderDetails
    join_key: CustomerID
    file: order_details.csv
    fields:
      - name: SalesOrderDetailID
        type: int
      - name: ProductID
        type: int
      - name: OrderQty
        type: int
      - name: UnitPrice
        type: double
      - name: LineTotal
        type: double

Flattens orders into orders.csv and their SalesOrderDetails array into order_details.csv with CustomerID injected from the parent row.

CSV, Nested Import (parent + child)

The same manifest works for import. The importer reads order_details.csv, groups rows by CustomerID, reads orders.csv, embeds matching detail rows as the SalesOrderDetails array in each order document, and inserts the assembled documents.

CSV, Multi-Level Hierarchy (3 levels)

yaml
store: orders
format: csv
output_dir: /data/exports/orders-full

entities:
  - name: orders
    role: parent
    file: orders.csv
    fields:
      - name: OrderDate
        type: string
      - name: CustomerID
        type: int
      - name: TotalDue
        type: double

  - name: details
    role: child
    parent_field: SalesOrderDetails
    join_key: CustomerID
    file: order_details.csv
    fields:
      - name: SalesOrderDetailID
        type: int
      - name: ProductID
        type: int
      - name: OrderQty
        type: int
      - name: UnitPrice
        type: double

  - name: attributes
    role: child
    parent: details
    parent_field: Attributes
    join_key: SalesOrderDetailID
    file: detail_attributes.csv
    fields:
      - name: AttrName
        type: string
      - name: AttrValue
        type: string

Three levels: orders -> details (child of orders) -> attributes (child of details). The parent: details field on attributes makes it nest under details instead of orders.

CSV, Sub-document Flattening

yaml
store: customers
format: csv
output_dir: /data/exports/customers

entities:
  - name: customers
    role: parent
    file: customers.csv
    fields:
      - name: CustomerID
        type: int
      - name: FirstName
        type: string
      - name: LastName
        type: string
      - name: FullName
        type: string

  - name: addresses
    role: child
    parent_field: Address
    join_key: CustomerID
    file: customer_addresses.csv
    fields:
      - name: Street
        type: string
      - name: City
        type: string
      - name: State
        type: string
      - name: ZipCode
        type: string
      - name: Country
        type: string

Flattens the nested Address sub-document into a separate CSV with CustomerID for joining back.

CSV, Export with Query Filter and Template Variables

yaml
store: orders
format: csv
output_dir: /data/exports/big-orders
query: "orders.filter(TotalDue > 50000 and ShipDate > ${yesterday})"

entities:
  - name: orders
    role: parent
    file: big_orders.csv
    fields:
      - name: OrderDate
        type: string
      - name: CustomerID
        type: int
      - name: TotalDue
        type: double
      - name: ShipDate
        type: string

When scheduled with a cron expression, ${yesterday} resolves to the start of the previous day at execution time.

Scheduled Export, Daily Sales

yaml
store: sales
format: csv
output_dir: /data/exports/daily-sales
query: "sales.filter(sale_date >= ${yesterday} and sale_date < ${today})"

entities:
  - name: sales
    role: parent
    file: sales.csv
    fields:
      - { name: sale_id, type: int }
      - { name: sale_date, type: datetime }
      - { name: register_id, type: string }
      - { name: total, type: double }

  - name: line_items
    role: child
    parent_field: items
    join_key: sale_id
    file: sale_items.csv
    fields:
      - { name: sku, type: string }
      - { name: quantity, type: int }
      - { name: price, type: double }

Schedule with cron 0 1 * * * (nightly at 1am) to export yesterday's sales with line items to two CSVs.

How It Works

Export Flow

  1. The workbench receives the manifest YAML
  2. If the manifest has a query field, the workbench parses the YQL into a JSON query using planck.pql.parse()
  3. The parsed manifest and optional JSON query are sent to the engine
  4. The engine flushes the memtable to ensure all recently written data is on disk
  5. The engine scans the store, applies query predicates to filter documents, and writes output files per the manifest's entity definitions
  6. Parent fields are written to the parent CSV; child arrays are flattened into child CSVs with join keys injected

Import Flow

  1. The workbench receives the manifest YAML and sends it to the engine
  2. The engine sorts entities by depth (deepest children first)
  3. Starting from the deepest level, each entity's CSV is loaded and rows are grouped by their join_key column (read from the CSV header, not the fields list)
  4. At each level, already-assembled child groups from deeper levels are embedded as arrays, building enriched documents bottom-up
  5. The root parent CSV is read row by row; matching child groups (already containing their own nested children) are embedded under the specified parent_field
  6. Assembled documents are inserted into the store as BSON
  7. The engine flushes to persist the imported data to disk

Scheduled Execution

  1. The scheduler reads the stored manifest from the schedule document
  2. Template variables (${today}, ${yesterday}, etc.) are resolved to epoch milliseconds
  3. The resolved manifest follows the same export/import flow as a "Run Now" operation

Best Practices

  • Use field selection in export manifests to reduce output size and exclude internal fields
  • Specify types for all fields in CSV manifests to ensure correct BSON storage types. For JSON imports, add a fields section for any field where auto-inference produces the wrong type (especially numeric strings and dates)
  • Use absolute paths for output_dir to avoid ambiguity
  • Schedule large operations during off-hours, exports perform full store scans and imports do batch inserts, both of which consume CPU, disk I/O, and memory
  • Use template variables for scheduled exports that need date-based filtering instead of hardcoding dates
  • Test manifests by running them with "Run Now" before creating scheduled tasks
  • Same manifest for round-trips, the same manifest works for both export and import, making it easy to move data between environments