Llama 3.3 70B Instruct fp8 Fast

Model details

Model: @cf/meta/llama-3.3-70b-instruct-fp8-fast
Provider: cloudflare-workers-ai
API: openai-completions
Base URL: https://api.cloudflare.com/client/v4/accounts/{CLOUDFLARE_ACCOUNT_ID}/ai/v1
Input: text
Reasoning: No
Context window: 24,000
Max tokens: 24,000

Show configuration

{
  "providers": {
    "cloudflare-workers-ai": {
      "apiKey": "YOUR_API_KEY",
      "models": [
        {
          "id": "@cf/meta/llama-3.3-70b-instruct-fp8-fast",
          "name": "Llama 3.3 70B Instruct fp8 Fast",
          "reasoning": false,
          "input": [
            "text"
          ],
          "contextWindow": 24000,
          "maxTokens": 24000,
          "cost": {
            "input": 0.293,
            "output": 2.253,
            "cacheRead": 0,
            "cacheWrite": 0
          },
          "compat": {
            "supportsStore": false,
            "supportsDeveloperRole": false,
            "supportsLongCacheRetention": false,
            "sendSessionAffinityHeaders": true
          }
        }
      ],
      "api": "openai-completions",
      "baseUrl": "https://api.cloudflare.com/client/v4/accounts/{CLOUDFLARE_ACCOUNT_ID}/ai/v1"
    }
  }
}

Pricing

USD per million tokens. A tier is selected from the total input tokens in each request and applies to that entire request.

Pricing rates for Llama 3.3 70B Instruct fp8 Fast
Request input	Input	Output	Cache read	Cache write
All requests	$0.293	$2.253	$0	$0

Session cost calculator

Estimate the requests made during an agent session. A user turn can make several model calls while using tools, so costs are calculated per model request.

Session shape

Rounded median shapes from recent Pi agent sessions.

Model requestsStarting contexttokensContext added / requesttokensOutput / requesttokens

Warm-request cache reuse

96%Effective token reuse combines partial hits and occasional complete misses.

Starting context is already cached

Estimated session$0.00

Uncached input: —
Cache reads: —
Uncached prefixes: —
Cache writes: —
Output: —
Without caching: —

The first request starts cold unless marked otherwise. Output is billed separately from context growth because reasoning tokens are not always retained. This remains a directional estimate: providers differ in eligibility, rounding, and retention.

Compatibility flags

Effective values after applying Pi's API defaults and model overrides.

Effective compatibility flags for Llama 3.3 70B Instruct fp8 Fast
Feature	Value
`supportsStore`	No
`supportsDeveloperRole`	No
`supportsReasoningEffort`	Yes
`supportsUsageInStreaming`	Yes
`maxTokensField`	`max_completion_tokens`
`requiresToolResultName`	No
`requiresAssistantAfterToolResult`	No
`requiresThinkingAsText`	No
`requiresReasoningContentOnAssistantMessages`	No
`thinkingFormat`	`openai`
`openRouterRouting`	`Empty`
`vercelGatewayRouting`	`Empty`
`chatTemplateKwargs`	`Empty`
`zaiToolStream`	No
`supportsStrictMode`	Yes
`cacheControlFormat`	None
`sendSessionAffinityHeaders`	Yes
`supportsLongCacheRetention`	No

Also available from other providers