botserver/WORKFLOW_PLAN.md
2026-03-16 23:22:48 -03:00

6.8 KiB

BASIC Workflow Engine Plan

Current State

  • workflow_executions and workflow_events tables exist in DB
  • WorkflowExecution / WorkflowEvent models exist in core/shared/models/workflow_models.rs
  • ORCHESTRATE WORKFLOW keyword exists in basic/keywords/orchestration.rs (stub)
  • STEP keyword registered but not durable
  • Compiler (basic/mod.rs) produces Rhai AST and runs it in one shot via engine.eval_ast_with_scope
  • HEAR currently blocks a thread (new) — works but not crash-safe

Goal

BASIC scripts run as durable step sequences. Each keyword is a step. On crash/restart, execution resumes from the last completed step. No re-run. No Rhai for control flow.

' ticket.bas
TALK "Describe the issue"        Step 1
HEAR description                 Step 2 (suspends, waits)
SET ticket = CREATE(description)  Step 3
TALK "Ticket #{ticket} created"  Step 4

Two Execution Modes

The compiler serves both modes via a pragma at the top of the .bas file:

' Default: Rhai mode (current behavior, fast, no durability)
TALK "Hello"

' Workflow mode (durable, crash-safe)
#workflow
TALK "Hello"
HEAR name

ScriptService::compile() detects #workflow and returns either:

  • ExecutionPlan::Rhai(AST) — current path, unchanged
  • ExecutionPlan::Workflow(Vec<Step>) — new path

ScriptService::run() dispatches accordingly.


Architecture

1. Compiler changes (basic/mod.rs, basic/compiler/)

Add compile_to_steps(script: &str) -> Result<Vec<Step>>:

pub enum Step {
    Talk { template: String },
    Hear { variable: String, input_type: String },
    Set  { variable: String, expression: String },
    If   { condition: String, then_steps: Vec<Step>, else_steps: Vec<Step> },
    Call { function: String, args: Vec<String>, result_var: Option<String> },
    // ... one variant per keyword
}

Expressions inside steps (condition, expression, template) are still evaluated by Rhai — but only as pure expression evaluator, no custom syntax, no side effects. This keeps Rhai as a math/string engine only.

2. WorkflowEngine (basic/workflow/engine.rs)

pub struct WorkflowEngine {
    state: Arc<AppState>,
    session: UserSession,
}

impl WorkflowEngine {
    /// Start a new workflow or resume existing one for this session+script
    pub async fn run(&self, script_path: &str, steps: Vec<Step>) -> Result<()>

    /// Execute one step, persist result, return next action
    async fn execute_step(&self, exec_id: Uuid, step: &Step, vars: &mut Variables) -> StepResult

    /// Load execution state from DB
    async fn load_state(&self, exec_id: Uuid) -> (usize, Variables)

    /// Persist step completion
    async fn save_state(&self, exec_id: Uuid, step_index: usize, vars: &Variables, status: &str)
}

pub enum StepResult {
    Continue,           // go to next step
    Suspend,            // HEAR — save state, return, wait for next message
    Done,               // script finished
}

3. HEAR in workflow mode

No thread blocking. Instead:

  1. execute_step(Hear) saves state to workflow_executions with status = "waiting", current_step = N
  2. Returns StepResult::Suspend → engine returns to caller
  3. Next user message → stream_response checks workflow_executions for session_id with status = "waiting"
  4. Loads variables, sets variables["description"] = user_input, advances current_step, resumes

4. stream_response dispatch (core/bot/mod.rs)

// At top of stream_response, before LLM:
if let Some(exec) = WorkflowEngine::find_waiting(state, session_id).await {
    WorkflowEngine::resume(state, exec, message_content).await?;
    return Ok(());
}

5. DB schema (already exists, minor additions)

-- Already exists:
workflow_executions (id, bot_id, workflow_name, current_step, state_json, status, ...)

-- Add:
ALTER TABLE workflow_executions ADD COLUMN session_id UUID;
ALTER TABLE workflow_executions ADD COLUMN script_path TEXT;
-- state_json stores: { "variables": {...}, "step_index": N }

Migration Path

Phase 1 — Parallel mode (no breaking changes)

  • Add compile_to_steps() alongside existing compile()
  • Add WorkflowEngine as new struct
  • #workflow pragma routes to new path
  • All existing .bas files unchanged, run via Rhai as before

Phase 2 — Keyword parity

Implement step variants for all keywords used in practice: TALK, HEAR, SET, IF/ELSE/END IF, CALL (HTTP, LLM, tool), SEND MAIL, SCHEDULE

Phase 3 — Default for new scripts

New .bas files default to workflow mode. Rhai mode kept for backwards compat and tool scripts (short-lived, no HEAR).

Phase 4 — Rhai scope reduction

Remove Rhai custom syntax registrations. Keep Rhai only as expression evaluator:

engine.eval_expression::<Dynamic>(&expr, &scope)

File Map

basic/
  mod.rs                    ← add compile_to_steps(), ExecutionPlan enum
  compiler/
    mod.rs                  ← existing Rhai compiler, unchanged
    step_compiler.rs        ← NEW: BASIC → Vec<Step>
  workflow/
    mod.rs                  ← NEW: WorkflowEngine
    engine.rs               ← NEW: execute_step, load/save state
    variables.rs            ← NEW: Variables (HashMap<String, Dynamic>)
    steps.rs                ← NEW: Step enum
  keywords/                 ← existing, unchanged in Phase 1

Open Source Alternatives Considered

Engine Lang Latency RAM Rust SDK Verdict
Custom (this plan) Rust ~1ms 0 extra Native Best fit
Restate Rust server ~5ms ~50MB official Fallback option
Rhythm Rust ~2ms ~10MB Native Experimental
Temporal Go+Java ~20ms ~500MB Too heavy
Windmill Rust+TS ~10ms ~200MB Wrong abstraction

Why custom over Restate: Restate requires its own server as a proxy between HTTP requests and handlers — adds a network hop and an extra process. The custom plan uses PostgreSQL already running in the stack, zero extra infrastructure.

Escape hatch: The Step enum in this plan maps 1:1 to Restate workflow steps. If the custom engine proves too complex to maintain, migration to Restate is mechanical — swap WorkflowEngine::execute_step internals, keep the compiler and Step enum unchanged.


  • No re-run ever. Steps before current_step are skipped on resume.
  • Rhai never removed entirely — used for expression eval only.
  • Backwards compatible — no #workflow = Rhai mode, existing bots unaffected.
  • HEAR in workflow mode = zero threads held. State in DB, not RAM.
  • Tool scripts (called by LLM) stay in Rhai mode — they're short-lived, no HEAR needed.