Skip to main content
POST
/
api
/
onboarding
/
start-scraping
Start the onboarding scrape
curl --request POST \
  --url https://app.puffle.ai/api/onboarding/start-scraping \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "website": "<string>",
  "email": "[email protected]",
  "firstName": "<string>",
  "lastName": "<string>"
}
'
{ "sessionId": "7c2b2d4e-e29b-41d4-a716-446655440000", "runId": "run_a1b2c3d4e5f6" }

Overview

Creates a new onboarding_sessions row and kicks off the onboarding-scrape Trigger.dev task. The task:
  1. Resolves the supplied website and crawls up to 10 pages with Firecrawl
  2. Runs three parallel Sonnet generations to produce company_context, market_context, and an Ideal Customer Profile
  3. Writes the results onto the caller’s user_profiles row and flips the session status to completed
The call returns immediately with a sessionId and runId. Actual context population is asynchronous — poll GET /api/onboarding/status until status === "completed".

AI agent notes

When would an agent call this?Almost never on its own. Onboarding is normally driven by the human the first time they sign up — by the time an autonomous agent is operating the workspace, GET /api/onboarding/status should already return completed.The one legitimate agent use case is re-bootstrapping a workspace that has session: null — and even then, only after the human has explicitly asked the agent to run onboarding on their behalf. If you see session: null, the safer default is to stop and escalate. Do not guess a website URL.If the agent wants to refresh context on an already-onboarded workspace (e.g. company rebranded, URL changed), use regenerate-context instead — it is rate-limited and safer.Polling cadence after dispatch: every 3 to 5 seconds against GET /api/onboarding/status?sessionId=.... Typical completion: 30 to 90 seconds. On status === "failed", report errorMessage and errorStep to the human; do not retry.
See also: Check onboarding status · Regenerate context · Agent Playbook.

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

application/json

Minimum required input is a website URL. Everything else is optional and falls back to auth session values.

website
string
required

Company website URL. Required. The scrape task Firecrawl-crawls up to 10 pages of this domain to build company_context, market_context, and the ICP.

Minimum string length: 1
email
string<email>

Email of the person signing up. Defaults to the authenticated user's email when omitted. Used to seed first_name when firstName isn't provided.

Pattern: ^(?!\.)(?!.*\.\.)([A-Za-z0-9_'+\-\.]*)[A-Za-z0-9_+-]@([A-Za-z0-9][A-Za-z0-9\-]*\.)+[A-Za-z]{2,}$
firstName
string

First name for the user_profile row. If omitted, derived from the email local-part (e.g. [email protected]sid).

lastName
string

Last name for the user_profile row. Optional.

Response

Session created and background task dispatched. Poll the status endpoint.

Dispatched. Session is pending; a Trigger.dev task will transition it through looking_up → gathering_info → generating_context → completed (or failed).

sessionId
string<uuid>
required

ID of the newly-created onboarding session. Pass this to GET /api/onboarding/status?sessionId=... to poll the scrape progress.

Pattern: ^([0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[1-8][0-9a-fA-F]{3}-[89abAB][0-9a-fA-F]{3}-[0-9a-fA-F]{12}|00000000-0000-0000-0000-000000000000|ffffffff-ffff-ffff-ffff-ffffffffffff)$
runId
string
required

Trigger.dev run identifier for the background onboarding-scrape task. Keep for support — uniquely identifies this scrape across Vercel logs and the Trigger dashboard.