Documentation Index
Fetch the complete documentation index at: https://docs.arkor.ai/llms.txt
Use this file to discover all available pages before exploring further.
DatasetSource
createTrainer accepts one dataset, expressed as a discriminated union on type:
HuggingFace
| Field | Type | Notes |
|---|---|---|
type | "huggingface" | Discriminant. |
name | string | Repository name (e.g. arkorlab/triage-demo). Public repos work without further auth. |
split | string? | Override the default split. Optional. |
subset | string? | For datasets that publish multiple subsets. Optional. |
triage / translate / redaction) use. Most projects start here.
Blob URL
| Field | Type | Notes |
|---|---|---|
type | "blob" | Discriminant. |
url | string | HTTPS URL the backend can fetch. |
token | string? | Forwarded to the cloud-api in the job config; the backend uses it when fetching the blob. The exact HTTP wire format (header, scheme, etc.) is backend-defined and not part of the SDK contract. |
Picking a form
- Reach for
huggingfacewhen the dataset is already on the Hub. It is the most-tested path. - Reach for
blobwhen you need a dataset that cannot live on the Hub (proprietary content, signed URL, internal-only).
{ type: "file", path: "./data.jsonl" }) are not in DatasetSource today. To use one, host it as a blob URL or upload it to a private HuggingFace repo first.