Export and Import data in Planck using YAML Manifests
Overview
Planck exports and imports data through YAML manifests, a single document that describes what data to move, in what format, the document structure, field types, parent-child relationships, and file mappings. The same manifest format works for both export and import.
Operations can be run immediately or saved as recurring scheduled tasks with cron expressions.
Manifest Format
Every export and import is manifest-driven. The manifest is the single input that covers all scenarios:
- BSON: store, format, output directory (documents are self-describing)
- JSON: store, format, output directory, optional
fieldsfor type coercion during import - CSV flat: store, format, output directory, single entity with fields and types
- CSV nested: store, format, output directory, multiple entities with parent/child relationships, join keys, and field definitions
Top-Level Fields
| Field | Type | Required | Description |
|---|---|---|---|
store | string | Yes | Store namespace (e.g., orders or stores.orders) |
format | string | Yes | Output format: bson, json, or csv |
output_dir | string | Yes | Directory for output files (export) or source files (import) |
query | string | No | YQL filter expression to export a subset of documents (export only) |
fields | array | No | Field type descriptors for JSON import (see JSON Import with Type Hints) |
Entity Definition
The entities list describes how documents map to files. Required for CSV; optional for JSON and BSON.
| Field | Type | Required | Description |
|---|---|---|---|
name | string | Yes | Entity name (used for file naming and parent references) |
role | string | Yes | parent or child |
file | string | Yes | Filename for this entity's data |
parent | string | No | Parent entity name (for children nested under another child, not the root). Defaults to the root parent |
parent_field | string | Child only | Array field name in the parent document where this entity nests |
join_key | string | Child only | Column used to link child rows to the parent. Does not need to appear in the child's fields list, it is read directly from the CSV header |
fields | array | Yes | List of field descriptors (document fields only; the join_key column is handled automatically) |
Field Descriptor
Each field in the fields list defines a column:
| Property | Type | Required | Description |
|---|---|---|---|
name | string | Yes | Field name in the document |
type | string | Yes | Data type for conversion |
Available Field Types
| Type | Description |
|---|---|
string | Text values |
int | Integer (i64) |
double | Floating-point (f64) |
bool | Boolean (true/false) |
datetime | Timestamp (epoch milliseconds) |
objectid | BSON ObjectId (12-byte hex) |
Hierarchy, Unlimited Depth
The parent field on each child entity references another entity by name, building a tree of any depth:
orders (parent)
+-- items (child of orders)
| +-- attributes (child of items)
| | +-- tags (child of attributes)
| +-- reviews (child of items)
+-- payments (child of orders)Export: top-down recursion from root, writing child rows at each level with the join key injected from the parent.
Import: entities are sorted by depth (deepest first), loaded and grouped bottom-up by join key, then assembled top-down into nested documents.
Query Filter (Export Only)
The optional query field accepts a YQL expression to export only matching documents. Without it, the entire store is exported.
Static Query
query: "orders.filter(status = \"completed\" and total > 100)"Template Variables (Scheduled Exports)
For scheduled exports, dates cannot be hardcoded. Template variables are resolved by the scheduler at execution time:
| Variable | Resolves To |
|---|---|
${today} | Start of current day (epoch ms) |
${yesterday} | Start of previous day (epoch ms) |
${tomorrow} | Start of next day (epoch ms) |
${now} | Current timestamp (epoch ms) |
${week_ago} | 7 days ago (epoch ms) |
${month_ago} | 30 days ago (epoch ms) |
Example, daily export of yesterday's sales:
query: "sales.filter(sale_date >= ${yesterday} and sale_date < ${today})"Variables are resolved at execution time, not at schedule creation time.
Running Exports and Imports
From the UI
Exports and imports can be triggered from two places:
- Server Overview > Schema tab, Click the Export or Import button next to any store. The dialog opens pre-filled with the store namespace.
- Schedules panel, Create an
exportorimporttask with a manifest.
Execution Modes
Both dialogs offer two modes:
| Mode | Description |
|---|---|
| Run Now | Execute immediately. The dialog shows progress and results. |
| Schedule | Save as a recurring task. Runs at the specified cron time. |
Scheduling
When scheduling, provide:
| Field | Required | Description |
|---|---|---|
| Name | Yes | Schedule name (e.g., nightly-orders-export) |
| Cron | Yes | Cron expression |
| Description | No | Optional description |
Cron presets:
| Preset | Expression |
|---|---|
| Daily 2am | 0 2 * * * |
| Daily 4am | 0 4 * * * |
| Weekly Sun 3am | 0 3 * * 0 |
| Hourly | 0 * * * * |
Scheduled tasks appear in the Schedules panel alongside backup and GC tasks, and can be enabled/disabled, edited, or deleted.
Manifest File Upload
Both dialogs support uploading manifest files from your local machine. Click Upload file and select a .yaml, .yml, or .txt file. The content loads into the editor for review before executing.
Sample Manifests
JSON, Simple Export (entire store)
store: orders
format: json
output_dir: /data/exports/ordersNo entities needed, JSON is self-describing. Exports all documents.
JSON, Export with Query Filter
store: orders
format: json
output_dir: /data/exports/shipped-orders
query: "orders.filter(TotalDue > 10000)"Only exports orders matching the filter.
JSON, Import (auto-infer types)
store: orders
format: json
output_dir: /data/imports/orders.jsonFor JSON import without fields, output_dir is the path to the JSON file containing an array of objects. Types are auto-inferred from JSON syntax:
- JSON strings → BSON string
- JSON numbers (no decimal) → BSON int64
- JSON numbers (with decimal) → BSON double
- JSON booleans → BSON boolean
- JSON null → BSON null
- JSON objects → BSON embedded document
- JSON arrays → BSON array
JSON, Import with Type Hints
Auto-inference doesn't always produce the correct BSON types. Common problems:
- Numeric strings like
"EmployeeID": "289"stay as strings instead of int64 - Date strings like
"OrderDate": "2024-01-15"stay as strings instead of datetime - Numbers like
100that should be doubles get stored as int64
Add a fields section to control type coercion for specific fields:
store: orders
format: json
output_dir: /data/imports/orders.json
fields:
- name: EmployeeID
type: int
- name: CustomerID
type: int
- name: TotalDue
type: double
- name: OrderDate
type: datetime
- name: IsOnline
type: boolFields listed in the manifest are coerced to the specified type. Fields not listed use auto-inference (current behavior). This gives you explicit control where you need it without having to declare every field.
Type coercion rules:
| Declared Type | JSON Value | BSON Result |
|---|---|---|
int | "289" (string) | int64 289 |
int | 289 (number) | int64 289 |
double | "9.99" (string) | double 9.99 |
double | 100 (number) | double 100.0 |
bool | "true", "1", "yes" (string) | boolean true |
bool | 1 (number) | boolean true |
bool | 0 (number) | boolean false |
datetime | "2024-01-15" (string) | int64 1705276800000 (epoch ms) |
datetime | "2024-01-15T10:30:00Z" (string) | int64 1705314600000 (epoch ms) |
string | any | stored as-is (default) |
Date format support: YYYY-MM-DD (date only, midnight UTC) and YYYY-MM-DDTHH:MM:SSZ (full ISO 8601 UTC). The T separator can also be a space: YYYY-MM-DD HH:MM:SS.
BSON, Export
store: products
format: bson
output_dir: /data/exports/productsExports all documents as length-prefixed BSON blobs.
BSON, Import
store: products
format: bson
output_dir: /data/imports/products.bsonCSV, Flat Export (single entity)
store: products
format: csv
output_dir: /data/exports/products
entities:
- name: products
role: parent
file: products.csv
fields:
- name: ProductID
type: int
- name: ProductNumber
type: string
- name: ProductName
type: string
- name: StandardCost
type: double
- name: ListPrice
type: doubleWrites a single CSV file with the specified columns.
CSV, Flat Import
store: products
format: csv
output_dir: /data/imports
entities:
- name: products
role: parent
file: products.csv
fields:
- name: ProductID
type: int
- name: ProductNumber
type: string
- name: ProductName
type: string
- name: StandardCost
type: double
- name: ListPrice
type: doubleReads the CSV, converts each row to a BSON document with correct field types, and inserts into the store.
CSV, Nested Export (parent + child)
store: orders
format: csv
output_dir: /data/exports/orders
entities:
- name: orders
role: parent
file: orders.csv
fields:
- name: OrderDate
type: string
- name: CustomerID
type: int
- name: SubTotal
type: double
- name: TotalDue
type: double
- name: details
role: child
parent_field: SalesOrderDetails
join_key: CustomerID
file: order_details.csv
fields:
- name: SalesOrderDetailID
type: int
- name: ProductID
type: int
- name: OrderQty
type: int
- name: UnitPrice
type: double
- name: LineTotal
type: doubleFlattens orders into orders.csv and their SalesOrderDetails array into order_details.csv with CustomerID injected from the parent row.
CSV, Nested Import (parent + child)
The same manifest works for import. The importer reads order_details.csv, groups rows by CustomerID, reads orders.csv, embeds matching detail rows as the SalesOrderDetails array in each order document, and inserts the assembled documents.
CSV, Multi-Level Hierarchy (3 levels)
store: orders
format: csv
output_dir: /data/exports/orders-full
entities:
- name: orders
role: parent
file: orders.csv
fields:
- name: OrderDate
type: string
- name: CustomerID
type: int
- name: TotalDue
type: double
- name: details
role: child
parent_field: SalesOrderDetails
join_key: CustomerID
file: order_details.csv
fields:
- name: SalesOrderDetailID
type: int
- name: ProductID
type: int
- name: OrderQty
type: int
- name: UnitPrice
type: double
- name: attributes
role: child
parent: details
parent_field: Attributes
join_key: SalesOrderDetailID
file: detail_attributes.csv
fields:
- name: AttrName
type: string
- name: AttrValue
type: stringThree levels: orders -> details (child of orders) -> attributes (child of details). The parent: details field on attributes makes it nest under details instead of orders.
CSV, Sub-document Flattening
store: customers
format: csv
output_dir: /data/exports/customers
entities:
- name: customers
role: parent
file: customers.csv
fields:
- name: CustomerID
type: int
- name: FirstName
type: string
- name: LastName
type: string
- name: FullName
type: string
- name: addresses
role: child
parent_field: Address
join_key: CustomerID
file: customer_addresses.csv
fields:
- name: Street
type: string
- name: City
type: string
- name: State
type: string
- name: ZipCode
type: string
- name: Country
type: stringFlattens the nested Address sub-document into a separate CSV with CustomerID for joining back.
CSV, Export with Query Filter and Template Variables
store: orders
format: csv
output_dir: /data/exports/big-orders
query: "orders.filter(TotalDue > 50000 and ShipDate > ${yesterday})"
entities:
- name: orders
role: parent
file: big_orders.csv
fields:
- name: OrderDate
type: string
- name: CustomerID
type: int
- name: TotalDue
type: double
- name: ShipDate
type: stringWhen scheduled with a cron expression, ${yesterday} resolves to the start of the previous day at execution time.
Scheduled Export, Daily Sales
store: sales
format: csv
output_dir: /data/exports/daily-sales
query: "sales.filter(sale_date >= ${yesterday} and sale_date < ${today})"
entities:
- name: sales
role: parent
file: sales.csv
fields:
- { name: sale_id, type: int }
- { name: sale_date, type: datetime }
- { name: register_id, type: string }
- { name: total, type: double }
- name: line_items
role: child
parent_field: items
join_key: sale_id
file: sale_items.csv
fields:
- { name: sku, type: string }
- { name: quantity, type: int }
- { name: price, type: double }Schedule with cron 0 1 * * * (nightly at 1am) to export yesterday's sales with line items to two CSVs.
How It Works
Export Flow
- The workbench receives the manifest YAML
- If the manifest has a
queryfield, the workbench parses the YQL into a JSON query usingplanck.pql.parse() - The parsed manifest and optional JSON query are sent to the engine
- The engine flushes the memtable to ensure all recently written data is on disk
- The engine scans the store, applies query predicates to filter documents, and writes output files per the manifest's entity definitions
- Parent fields are written to the parent CSV; child arrays are flattened into child CSVs with join keys injected
Import Flow
- The workbench receives the manifest YAML and sends it to the engine
- The engine sorts entities by depth (deepest children first)
- Starting from the deepest level, each entity's CSV is loaded and rows are grouped by their
join_keycolumn (read from the CSV header, not thefieldslist) - At each level, already-assembled child groups from deeper levels are embedded as arrays, building enriched documents bottom-up
- The root parent CSV is read row by row; matching child groups (already containing their own nested children) are embedded under the specified
parent_field - Assembled documents are inserted into the store as BSON
- The engine flushes to persist the imported data to disk
Scheduled Execution
- The scheduler reads the stored manifest from the schedule document
- Template variables (
${today},${yesterday}, etc.) are resolved to epoch milliseconds - The resolved manifest follows the same export/import flow as a "Run Now" operation
Best Practices
- Use field selection in export manifests to reduce output size and exclude internal fields
- Specify types for all fields in CSV manifests to ensure correct BSON storage types. For JSON imports, add a
fieldssection for any field where auto-inference produces the wrong type (especially numeric strings and dates) - Use absolute paths for
output_dirto avoid ambiguity - Schedule large operations during off-hours, exports perform full store scans and imports do batch inserts, both of which consume CPU, disk I/O, and memory
- Use template variables for scheduled exports that need date-based filtering instead of hardcoding dates
- Test manifests by running them with "Run Now" before creating scheduled tasks
- Same manifest for round-trips, the same manifest works for both export and import, making it easy to move data between environments