"OpenClaw Browser Posting Pipeline Incident Postmortem: System-Level Fix from 'fields are required' to 'stale ref'"

OpenClaw Browser Posting Pipeline Incident Postmortem: From fields are required to a System-Level Fix for Stale Refs

This article is a complete postmortem of a real production troubleshooting incident. The goal is to clearly explain “why it broke, why it wasn’t a problem before, and what exactly changed this time,” and to provide repeatable repair steps.

Intended audience:

  • People using the browser tool in OpenClaw to automate posting/form filling
  • People who have encountered fields are required, Element \"eXX\" not found or not visible, No tool call found for function call output ...
  • People who want to upgrade a “temporary firefight” into a “stable, reusable fix”

1. Symptoms and Business Impact

1) Primary errors

In this incident, three typical errors appeared:

  1. fields are required
  2. Element "e92" not found or not visible. Run a new snapshot to see current page elements.
  3. 400 No tool call found for function call output with call_id ...

2) Business behavior

  • The page could be opened
  • The DOM could be snapshotted
  • But once reaching act/fill or node input actions, the browser control service errored out and stopped
  • This caused cross-site posting flows (Linux.do / V2EX / 2libra / iSharkFly) to be unable to run continuously

This kind of failure is easy to misdiagnose as an “account permission issue” or “context too long,” but this incident proved that was not the case.


2. Timeline (Key Nodes)

  • Around 2026-02-23 12:44 (CST): local OpenClaw installed/updated to 2026.2.22-2
  • After that, many browser errors appeared: fields are required
  • After further progress, stale refs appeared: Element "e92" not found or not visible
  • In historical sessions there was: No tool call found for function call output ...
  • Finally, posting was restored successfully after protocol compatibility + stale-ref fallback fixes

Notes:

  • No tool call found ... and fields are required are not the same problem.
  • Reducing the context window (e.g., 1048576 → 200000) cannot directly fix these two underlying errors.

3. Root Cause Breakdown

Root cause A: fill protocol incompatibility (the first trigger)

The upstream action request commonly looked like:

{
  "kind": "fill",
  "ref": "e57",
  "text": "..."
}

But the current routing implementation leans toward requiring:

{
  "kind": "fill",
  "fields": [
    { "ref": "e57", "type": "text", "value": "..." }
  ]
}

When fields is not provided and there is no compatibility mapping, it directly triggers fields are required.

Root cause B: Dynamic pages cause ref drift (stale refs)

For example:

  • At snapshot time you get e92
  • After a component re-render / modal switch, e92 has become invalid
  • A later type/fill still uses the old ref, triggering not found or not visible

This is very common in dynamic UIs such as the V2EX node-selection input box.

Root cause C: No tool call found ... is a misalignment of session call state

This error is due to inconsistency in the session/tool-call pipeline (call_id mismatch), not a browser DOM operation failure itself.


4. Fix Goals and Strategy

The goal is not a “single-point patch,” but to make the posting pipeline “compatible + self-healing + observable”:

  1. Be compatible with old request shapes to avoid fields are required
  2. Automatically recover from stale refs without requiring manual re-snapshot every time
  3. Provide a fallback even on a second failure, so the flow doesn’t crash at critical write points

5. Actual Changes (Runtime dist)

Note: After OpenClaw is packaged there are multiple hashed artifacts; you must patch all critical branches in sync to avoid the runtime hitting an unpatched bundle.

1) fill compatibility layer

Add a compatibility mapping in /act’s case "fill":

  • When fields is missing, automatically convert {ref,text/value} into fields:[{ref,type:"text",value}]
  • Provide a default value text for the field type

2) Stale-ref automatic retry

In case "type" and case "fill":

  • Catch Unknown ref / not found or not visible
  • Automatically run snapshotRoleViaPlaywright({ refsMode: "aria" }) once
  • Then retry the original action

3) Second-layer fallback (key enhancement in this incident)

If “re-snapshot + retry” still fails due to stale refs:

  • For type: try writing directly into document.activeElement and dispatch input/change
  • For fill: in single-field text scenarios, use the same activeElement-write fallback

This covers a high-frequency scenario: “focus is still on the target input, but the old ref has become invalid.”

4) Files involved

  • /opt/homebrew/lib/node_modules/openclaw/dist/routes-CmNAokG-.js
  • /opt/homebrew/lib/node_modules/openclaw/dist/routes-FGJF5gtZ.js
  • /opt/homebrew/lib/node_modules/openclaw/dist/pi-embedded-helpers-CNhhELVT.js
  • /opt/homebrew/lib/node_modules/openclaw/dist/pi-embedded-helpers-DxTyisc4.js

6. Verification Steps (Reproducible)

  1. Restart the gateway and confirm the new process is in effect (PID changes)
  2. Trigger the same-path automated posting flow
  3. Check whether logs still show:
    • fields are required
    • Element "eXX" not found or not visible
  4. Verify the business result: whether the post was actually created successfully

Result this time:

  • The broken path was restored and posting succeeded.

7. Why “It was fine two days ago, but now we have to change so much”?

The conclusion is clear:

  1. Version changes amplified protocol differences

    • Current local version: [email protected]
    • In this version, the fill entry is stricter about the fields[] shape
  2. The same issue must be patched in multiple dist entry points

    • It looks like “many files changed,” but essentially “the same logic exists repeatedly across multiple bundled artifacts”
  3. Dynamic-page ref drift is an inherent risk

    • Not hitting it before doesn’t mean it wasn’t there
    • Once the page re-renders more frequently, it will erupt in clusters

So this was not a “pure configuration issue,” but simultaneous gaps in “protocol compatibility + dynamic UI stability.”


8. Follow-up Recommendations (Regression Prevention)

1) Protocol-layer recommendations

  • Clearly define the canonical schema for fill in the browser action protocol
  • Maintain a backward-compatibility window for the old schema, and log migration hints

2) Execution-layer recommendations

  • Add a configurable “auto snapshot before action” switch for type/fill/click (high-stability mode)
  • Add structured logs for the stale-ref retry branch (to confirm whether fallbacks were triggered)

3) Ops-layer recommendations

  • After each upgrade, run a minimal E2E: open → snapshot → fill → submit
  • Hook key errors (fields are required, Unknown ref) into alerting

4) Session-layer recommendations

  • Investigate No tool call found ... separately (session integrity/replay consistency); don’t mix it with browser DOM issues

9. Troubleshooting Checklist (For Future Me)

When you hit a similar issue, follow this order:

  1. First separate error types (protocol/DOM/session)
  2. Check the latest log timestamp to avoid misjudging based on old errors
  3. Verify whether the request payload shape matches the current version
  4. Add “snapshot retry + activeElement fallback” for stale refs
  5. Restart and confirm the new process loaded the new code
  6. Re-test with the same path and confirm business success, not just the disappearance of errors

10. Closing

This fix wasn’t about “hiding the error,” but about making the posting pipeline resilient in three layers:

  • Layer 1: Protocol compatibility (prevent structure mismatch)
  • Layer 2: Auto recovery (retry after snapshot)
  • Layer 3: Input fallback (write via activeElement)

The ultimate goal is just one thing: make automation sustainable, reproducible, and maintainable on real dynamic web pages.

If you’re also doing OpenClaw browser automation, it’s recommended to bake these three layers of strategy into your default template.