ago

package module

v0.0.0-...-7b4c35c Latest Latest Go to latest Published: Feb 27, 2026 License: Apache-2.0 Imports: 7 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/Harshal1000/ago

Links

Open Source Insights

README ¶

ago

Production-grade AI agent framework for Go.

Build agents that think, act, and compose — with the performance and reliability Go is known for.

go get github.com/Harshal1000/ago

Why ago?

Most AI frameworks are Python-first, abstraction-heavy, and break in production. ago is different.

Native Go — struct literals, interfaces, context propagation. No magic.
Real agentic loop — tool calling, parallel execution, multi-turn reasoning. Not a wrapper around an API call.
Agents as tools — compose agents into hierarchies. An agent can delegate to sub-agents seamlessly.
Streaming built-in — SSE-ready streaming with Go 1.23 iterators. for range over your agent's output.
Backend agnostic — swap LLM providers without changing agent code. One import switches everything.
Production errors — two error lanes: recoverable tool errors (sent back to the model) and infrastructure failures (stop the loop). Your agent handles both correctly.

Quick Start

package main

import (
    "context"
    "fmt"
    "log"

    "github.com/Harshal1000/ago"
    "github.com/Harshal1000/ago/agent"
    "github.com/Harshal1000/ago/tools"
    _ "github.com/Harshal1000/ago/llm" // register backends
)

func main() {
    // Define a tool
    search := &tools.FunctionTool{
        ToolName:    "search",
        Description: "Search for information",
        Parameters: &ago.Schema{
            Type: ago.TypeObject,
            Properties: map[string]*ago.Schema{
                "query": {Type: ago.TypeString, Description: "search query"},
            },
            Required: []string{"query"},
        },
        Fn: func(ctx context.Context, args map[string]any) (map[string]any, error) {
            query := args["query"].(string)
            return map[string]any{"results": doSearch(query)}, nil
        },
    }

    // Create an agent
    a := &agent.Agent{
        Name:         "assistant",
        Backend:      agent.BackendGenAI,
        Model:        "gemini-2.0-flash",
        SystemPrompt: "You are a helpful assistant.",
        Tools:        []ago.Tool{search},
    }
    a.InitLLM()

    // Run it
    result, err := ago.Run(context.Background(), a, []*ago.Content{
        ago.NewTextContent(ago.RoleUser, "Find the latest Go release notes"),
    })
    if err != nil {
        log.Fatal(err)
    }

    fmt.Println(result.Response.Candidates[0].Content.Parts[0].Text)
}

That's it. The agent calls the LLM, decides to use the search tool, executes it, feeds the result back, and returns a synthesized response. All automatic.

Core Concepts

Agents

An agent is a configuration: a model, a system prompt, and a set of tools. No inheritance, no base classes — just a struct.

a := &agent.Agent{
    Name:          "researcher",
    Backend:       agent.BackendGenAI,
    Model:         "gemini-2.0-flash",
    SystemPrompt:  "You research topics thoroughly.",
    Tools:         []ago.Tool{searchTool, readTool},
    MaxIterations: 10,
}

Tools

Any Go function becomes a tool. Implement the Tool interface or use FunctionTool for zero boilerplate.

calculator := &tools.FunctionTool{
    ToolName:    "calculate",
    Description: "Evaluate a math expression",
    Parameters:  &ago.Schema{
        Type: ago.TypeObject,
        Properties: map[string]*ago.Schema{
            "expression": {Type: ago.TypeString},
        },
        Required: []string{"expression"},
    },
    Fn: func(ctx context.Context, args map[string]any) (map[string]any, error) {
        expr := args["expression"].(string)
        result := evaluate(expr)
        return map[string]any{"result": result}, nil
    },
}

Agent Composition

Wrap any agent as a tool. The sub-agent runs its own full loop with its own tools.

// Register a specialist agent
agent.Register(&agent.Agent{
    Name:         "code-reviewer",
    Model:        "gemini-2.0-flash",
    SystemPrompt: "You review code for bugs and security issues.",
    Tools:        []ago.Tool{readFileTool, lintTool},
})

// Use it as a tool in another agent
reviewTool := &tools.AgentTool{
    ToolName:    "code_review",
    Description: "Run a thorough code review",
    AgentName:   "code-reviewer",
}

Streaming

Stream responses with Go iterators. Works through tool calls — the stream pauses during tool execution and resumes when the LLM continues.

for chunk, err := range ago.RunSSE(ctx, agent, contents) {
    if err != nil {
        log.Fatal(err)
    }
    for _, p := range chunk.Candidates[0].Content.Parts {
        fmt.Print(p.Text)
    }
}

SkipSynthesis

When a tool's output IS the answer, skip the extra LLM call. Saves tokens and latency.

apiLookup := &tools.FunctionTool{
    ToolName: "get_weather",
    Fn:       fetchWeather,
    ToolOptions: ago.ToolOptions{SkipSynthesis: true},
    // ...
}

Error Handling

ago distinguishes between two kinds of errors:

Error Type	What Happens	Example
Tool error	Sent to the model as context. Loop continues. The LLM can retry or work around it.	API rate limit, invalid input, no results found
Infrastructure error	Loop stops immediately. Returned to caller.	Network down, context cancelled, invalid credentials

// Tool error — return it in the result, loop continues
return &ago.ToolResult{Error: fmt.Errorf("no results found")}, nil

// Infrastructure error — return as Go error, loop stops
return nil, fmt.Errorf("database connection lost")

Context & Cancellation

context.Context flows through the entire execution chain. Set deadlines, propagate cancellation, pass values — it all works exactly as Go developers expect.

ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()

result, err := ago.Run(ctx, agent, contents)
// Timeout cancels the LLM call, any running tools, and sub-agents

Requirements

Go 1.23+
A supported LLM backend API key (e.g. GOOGLE_API_KEY for Gemini)

License

MIT

Documentation ¶

Overview ¶

Package ago is a Go framework for building AI agents with tool-calling capabilities.

ago provides a backend-agnostic abstraction over LLM providers (OpenAI, Google GenAI, etc.) and an agentic execution loop that handles multi-turn tool calling automatically.

Architecture Overview ¶

The framework is organized into a few key layers:

Core types (this package): Content, Part, Tool, Schema, GenerateConfig, and the LLM interface. These are the building blocks shared by all backends and the executor.
Agent (ago/agent): A configuration struct that bundles a backend, model, system prompt, tools, and generation config into a single reusable unit. Register agents globally or create them inline.
LLM backends (ago/llm): Implementations of the LLM interface for specific providers. Import ago/llm with a blank import to auto-register all backends via init().
Executor (ago.Run / ago.RunSSE): The agentic loop that sends messages to the LLM, detects tool calls, executes tools in parallel, feeds results back, and repeats until the model produces a final text response or the iteration limit is reached.

Quick Start ¶

1. Define tools implementing the ago.Tool interface (or use tools.FunctionTool). 2. Create an agent.Agent with a backend, model, system prompt, tools, and optional config. 3. Call ago.Run(ctx, agent, messages) for synchronous execution, or ago.RunSSE for streaming.

All core types live in this package — there is no separate "types" sub-package.

Index ¶

Constants
func RunSSE(ctx context.Context, agent AgentConfig, contents []*Content) iter.Seq2[*StreamChunk, error]
type AgentConfig
type Blob
type Candidate
type CodeExecutionResult
type CodeLanguage
type CodeOutcome
type Content
- func NewFunctionCallContent(calls ...*FunctionCall) *Content
- func NewFunctionResponseContent(responses ...*FunctionResponse) *Content
- func NewModelContent(parts ...*Part) *Content
- func NewTextContent(role Role, text string) *Content
- func NewUserContent(parts ...*Part) *Content
type ExecutableCode
type ExecutorResult
- func Run(ctx context.Context, agent AgentConfig, contents []*Content) (*ExecutorResult, error)
type FileData
type FinishReason
type FunctionCall
type FunctionDeclaration
type FunctionResponse
type GenerateConfig
type GenerateParams
type LLM
type ModalityTokenCount
type Part
- func BlobPart(mimeType string, data []byte) *Part
- func FileDataPart(mimeType, fileURI string) *Part
- func FunctionCallPart(id, name string, args map[string]any) *Part
- func FunctionResponsePart(id, name string, resp map[string]any) *Part
- func TextPart(text string) *Part
type Response
type Role
type Schema
type SchemaType
type StreamChunk
type ThinkingConfig
type TokenUsage
type Tool
type ToolOptions
type ToolResult
type VideoMetadata

Constants ¶

View Source

const DefaultMaxIterations = 10

DefaultMaxIterations is the default maximum number of agentic loop iterations.

Variables ¶

This section is empty.

Functions ¶

func RunSSE ¶

func RunSSE(ctx context.Context, agent AgentConfig, contents []*Content) iter.Seq2[*StreamChunk, error]

RunSSE executes the agentic loop with streaming, yielding chunks to the caller.

Types ¶

type AgentConfig ¶

type AgentConfig interface {
	GetName() string
	GetModel() string
	GetLLM() LLM
	GetTools() []Tool
	GetMaxIterations() int
	GetGenerateConfig() *GenerateConfig
	GetSystemInstruction() *Content
}

AgentConfig is the interface the executor needs from an agent. This avoids importing the agent package from the root package.

type Blob ¶

type Blob struct {
	// MIMEType identifies the data format (e.g. "image/png", "audio/mp3").
	MIMEType string `json:"mimeType,omitempty"`

	// Data is the raw binary content.
	Data []byte `json:"data,omitempty"`
}

Blob holds inline binary data (images, audio, documents) with a MIME type.

Use this for embedding small binary payloads directly in a message. For large files, prefer FileData with a URI reference instead.

type Candidate ¶

type Candidate struct {
	// Content is the generated message (text, tool calls, or both).
	Content *Content `json:"content,omitempty"`

	// FinishReason explains why the model stopped generating this candidate.
	FinishReason FinishReason `json:"finishReason,omitempty"`

	// Index is the candidate's position when multiple candidates are returned (usually 0).
	Index int `json:"index,omitempty"`

	// TokenCount is the number of tokens in this specific candidate (backend-dependent).
	TokenCount int32 `json:"tokenCount,omitempty"`

	// FinishMessage is an optional human-readable message about why generation stopped.
	FinishMessage string `json:"finishMessage,omitempty"`
}

Candidate is one possible completion from the model.

Most requests produce a single candidate. Each candidate has the generated Content, a FinishReason explaining why generation stopped, and optional token count metadata.

type CodeExecutionResult ¶

type CodeExecutionResult struct {
	// Outcome indicates whether execution succeeded, failed, or timed out.
	Outcome CodeOutcome `json:"outcome,omitempty"`

	// Output is the stdout/stderr captured during execution.
	Output string `json:"output,omitempty"`
}

CodeExecutionResult holds the outcome and output of running model-generated code.

type CodeLanguage ¶

type CodeLanguage string

CodeLanguage identifies the programming language of executable code.

const (
	// CodeLanguageUnspecified means the language was not specified.
	CodeLanguageUnspecified CodeLanguage = "LANGUAGE_UNSPECIFIED"

	// CodeLanguagePython indicates Python code.
	CodeLanguagePython CodeLanguage = "PYTHON"
)

type CodeOutcome ¶

type CodeOutcome string

CodeOutcome indicates how code execution ended.

const (
	// CodeOutcomeUnspecified means the outcome was not reported.
	CodeOutcomeUnspecified CodeOutcome = "OUTCOME_UNSPECIFIED"

	// CodeOutcomeOK means execution completed successfully.
	CodeOutcomeOK CodeOutcome = "OUTCOME_OK"

	// CodeOutcomeFailed means execution encountered an error.
	CodeOutcomeFailed CodeOutcome = "OUTCOME_FAILED"

	// CodeOutcomeDeadlineExceeded means execution timed out.
	CodeOutcomeDeadlineExceeded CodeOutcome = "OUTCOME_DEADLINE_EXCEEDED"
)

type Content ¶

type Content struct {
	// Role identifies who produced this content (user, model, system, or tool).
	Role Role `json:"role,omitempty"`

	// Parts holds the payload segments. A single Content can have multiple parts,
	// for example a model response with both text and a tool call, or a user message
	// with text and an image.
	Parts []*Part `json:"parts,omitempty"`
}

Content is a single turn in a conversation: a role plus one or more parts.

A conversation history is a slice of Content values, alternating between user messages, model responses, tool-call requests, and tool results. Each Content carries a Role identifying the sender and a slice of Part values holding the actual payload.

Examples of what a Content can represent:

A user's text question: Role=RoleUser, Parts=[TextPart("What is 2+2?")]
A model's text answer: Role=RoleModel, Parts=[TextPart("4")]
A model requesting a tool call: Role=RoleModel, Parts=[FunctionCallPart(...)]
Tool results sent back to model: Role=RoleTool, Parts=[FunctionResponsePart(...)]
A multimodal user message: Role=RoleUser, Parts=[TextPart("describe this"), BlobPart("image/png", data)]

func NewFunctionCallContent ¶

func NewFunctionCallContent(calls ...*FunctionCall) *Content

NewFunctionCallContent returns model Content containing one or more function calls (for tool-call turns).

func NewFunctionResponseContent ¶

func NewFunctionResponseContent(responses ...*FunctionResponse) *Content

NewFunctionResponseContent returns tool Content containing one or more function results (for tool-response turns).

func NewModelContent ¶

func NewModelContent(parts ...*Part) *Content

NewModelContent returns model Content built from the given parts.

func NewTextContent ¶

func NewTextContent(role Role, text string) *Content

NewTextContent creates a Content with the given role and a single text Part.

This is the most common way to build conversation messages:

userMsg := ago.NewTextContent(ago.RoleUser, "What is 2+2?")
sysMsg  := ago.NewTextContent(ago.RoleSystem, "You are a helpful assistant.")

func NewUserContent ¶

func NewUserContent(parts ...*Part) *Content

NewUserContent returns user Content built from the given parts (text, blobs, file refs, etc.).

type ExecutableCode ¶

type ExecutableCode struct {
	// Code is the source code the model generated.
	Code string `json:"code,omitempty"`

	// Language identifies the programming language (e.g. CodeLanguagePython).
	Language CodeLanguage `json:"language,omitempty"`
}

ExecutableCode is code produced by the model for execution.

Some backends support code-execution flows where the model can write code, have it executed, and use the output. This struct holds the generated code and its language.

type ExecutorResult ¶

type ExecutorResult struct {
	Response *Response  // final LLM response (or synthetic response for SkipSynthesis)
	History  []*Content // full conversation including tool calls/responses
}

ExecutorResult holds the outcome of an executor run.

func Run ¶

func Run(ctx context.Context, agent AgentConfig, contents []*Content) (*ExecutorResult, error)

Run executes the agentic loop for the given agent config synchronously.

type FileData ¶

type FileData struct {
	// MIMEType identifies the file format (e.g. "application/pdf", "image/jpeg").
	MIMEType string `json:"mimeType,omitempty"`

	// FileURI is the URI where the file can be accessed (e.g. "gs://bucket/file.pdf").
	FileURI string `json:"fileUri,omitempty"`

	// DisplayName is a human-readable label for the file (optional).
	DisplayName string `json:"displayName,omitempty"`
}

FileData references external data by URI, with optional metadata.

Use this to point the model at files stored externally (e.g. Google Cloud Storage, uploaded files) without embedding the raw bytes in the message.

type FinishReason ¶

type FinishReason string

FinishReason explains why the model stopped generating.

The executor uses FinishReason to decide what to do next: if the model stopped to make a tool call (FinishReasonToolCall), the executor runs the tool and loops back; if the model stopped normally (FinishReasonStop), the executor returns the final response.

const (
	// FinishReasonStop means the model finished generating naturally (produced a complete response).
	FinishReasonStop FinishReason = "stop"

	// FinishReasonMaxTokens means the model hit the MaxOutputTokens limit before completing.
	FinishReasonMaxTokens FinishReason = "max_tokens"

	// FinishReasonToolCall means the model stopped to request one or more tool invocations.
	// The executor will execute the tools and feed results back for the next iteration.
	FinishReasonToolCall FinishReason = "tool_call"

	// FinishReasonSafety means the response was blocked by safety/content filters.
	FinishReasonSafety FinishReason = "safety"

	// FinishReasonError means an error occurred during generation.
	FinishReasonError FinishReason = "error"
)

type FunctionCall ¶

type FunctionCall struct {
	// ID is a backend-assigned identifier for this tool call (used by OpenAI to match
	// tool calls with their responses; may be empty for other backends).
	ID string `json:"id,omitempty"`

	// Name is the tool/function name the model wants to invoke. Must match a registered
	// Tool's Name() return value.
	Name string `json:"name,omitempty"`

	// Args is the map of argument name to value that the model wants to pass to the tool.
	// Values are JSON-decoded: strings, float64 for numbers, bools, nested maps/slices.
	Args map[string]any `json:"args,omitempty"`
}

FunctionCall represents the model requesting a tool/function invocation.

When the model decides it needs to use a tool, it returns a Part with a FunctionCall containing the tool name and a map of arguments. The executor matches the Name against registered Tool implementations and calls Tool.Execute with the Args.

The ID field is backend-specific: OpenAI uses it to correlate tool calls with responses; GenAI may leave it empty. The executor handles ID propagation automatically.

type FunctionDeclaration ¶

type FunctionDeclaration struct {
	// Name is the unique identifier for this tool. The model uses this name when requesting
	// a tool call, and the executor uses it to look up the Tool implementation.
	Name string

	// Description tells the model what this tool does and when to use it.
	// Be specific: "Calculate mathematical expressions" is better than "Math tool".
	Description string

	// Parameters defines the arguments this tool accepts as a JSON Schema.
	// Use TypeObject with Properties and Required to define named parameters.
	// Set to nil or an empty object schema if the tool takes no arguments.
	Parameters *Schema
}

FunctionDeclaration describes a callable tool that can be offered to the model.

Each FunctionDeclaration has a name (used by the model to invoke it), a description (tells the model when and how to use it), and a parameter schema (tells the model what arguments to pass).

You typically don't create these directly — implement the Tool interface instead and let Tool.Declaration() return the FunctionDeclaration. The executor collects declarations from all registered tools and passes them to the LLM.

type FunctionResponse ¶

type FunctionResponse struct {
	// ID matches the FunctionCall.ID this response is for (required by OpenAI, optional elsewhere).
	ID string `json:"id,omitempty"`

	// Name is the tool name this response is for (matches the original FunctionCall.Name).
	Name string `json:"name,omitempty"`

	// Response is the tool's output as a key-value map. On success, this is whatever the
	// tool returned. On tool-level errors, this is {"error": "message"} and the loop continues.
	Response map[string]any `json:"response,omitempty"`
}

FunctionResponse is the result of a tool invocation, sent back to the model.

After the executor runs a tool, it wraps the result in a FunctionResponse and appends it to the conversation history. The model reads these to incorporate tool output into its next response.

type GenerateConfig ¶

type GenerateConfig struct {
	// MaxOutputTokens caps the number of tokens the model can generate in one response.
	// 0 means use the backend's default limit.
	MaxOutputTokens int

	// Temperature controls randomness in generation. Higher values (e.g. 0.8) make output
	// more random; lower values (e.g. 0.2) make it more deterministic. nil = backend default.
	Temperature *float64

	// TopP controls nucleus sampling: only tokens with cumulative probability <= TopP are
	// considered. nil = backend default (usually 1.0).
	TopP *float64

	// TopK limits sampling to the top K most likely tokens. nil = backend default.
	// Note: not all backends support TopK (OpenAI does not).
	TopK *float64

	// StopSequences is a list of strings that, when generated, cause the model to stop.
	// Empty means no custom stop sequences.
	StopSequences []string

	// ResponseMIMEType constrains the response format (e.g. "application/json" for JSON mode).
	// Empty means no constraint (default text).
	ResponseMIMEType string

	// ResponseSchema defines the expected JSON structure when ResponseMIMEType is "application/json".
	// The model will attempt to produce output matching this schema.
	ResponseSchema *Schema

	// ThinkingConfig enables and configures extended reasoning/thinking tokens.
	// Set Enabled=true and Budget to the max thinking tokens. nil = thinking disabled.
	ThinkingConfig *ThinkingConfig

	// Seed sets a deterministic seed for reproducible generation. nil = non-deterministic.
	// Note: reproducibility is best-effort and not guaranteed by all backends.
	Seed *int

	// PresencePenalty penalizes tokens that have already appeared in the output, encouraging
	// the model to talk about new topics. Range: typically -2.0 to 2.0. nil = no penalty.
	PresencePenalty *float64

	// FrequencyPenalty penalizes tokens proportional to how often they've appeared, reducing
	// repetition. Range: typically -2.0 to 2.0. nil = no penalty.
	FrequencyPenalty *float64
}

GenerateConfig holds optional sampling and output parameters for an LLM request.

This struct controls HOW the model generates (temperature, token limits, penalties, etc.) but NOT WHAT it generates with. System instructions and tools are configured on the Agent and injected automatically by the executor — do NOT set them here.

Set this on Agent.Config to apply defaults to every request for that agent:

a := &agent.Agent{
    Name:         "my-agent",
    Backend:      agent.BackendOpenAI,
    Model:        "gpt-4o",
    SystemPrompt: "You are a helpful assistant.",
    Tools:        []ago.Tool{myTool},
    Config: &ago.GenerateConfig{
        MaxOutputTokens: 1000,
        Temperature:     &[]float64{0.7}[0],
    },
}

All fields are optional. Zero values mean "use the backend's default".

type GenerateParams ¶

type GenerateParams struct {
	// Contents is the conversation history: a sequence of user messages, model responses,
	// tool calls, and tool results that form the context for this generation.
	Contents []*Content

	// Config holds sampling parameters (temperature, max tokens, penalties, etc.).
	// May be nil to use backend defaults.
	Config *GenerateConfig

	// SystemInstruction is the system prompt as a Content value.
	// Set by the executor from Agent.SystemPrompt. Backends inject this into their
	// native system-message mechanism (e.g. OpenAI system message, GenAI config field).
	SystemInstruction *Content

	// Tools is the list of tool declarations available to the model for this request.
	// Set by the executor from Agent.Tools. The model can choose to call any of these.
	Tools []*FunctionDeclaration
}

GenerateParams bundles everything the LLM needs for a single generation request.

The executor builds this from the Agent's configuration and the conversation history, then passes it to LLM.Generate or LLM.GenerateStream. This keeps GenerateConfig focused on sampling parameters while GenerateParams carries the full request context.

You typically don't create this directly — the executor (ago.Run / ago.RunSSE) builds it from the AgentConfig. If you're calling LLM.Generate directly (advanced usage), you'll need to construct this yourself.

type LLM ¶

type LLM interface {
	// Name returns a stable identifier for this backend (e.g. "openai", "genai").
	Name() string

	// Generate performs a single non-streaming completion request.
	// The model name, conversation history, system instruction, tools, and sampling config
	// are all provided via the GenerateParams struct.
	// Returns the full response with candidates and usage, or an error.
	Generate(ctx context.Context, model string, params *GenerateParams) (*Response, error)

	// GenerateStream performs a streaming completion request, yielding chunks as they arrive.
	// Each yielded StreamChunk may contain partial text or tool calls. The final chunk has
	// Complete=true with definitive candidates and usage data.
	// Iterate the returned sequence with a range loop; break early to cancel.
	GenerateStream(ctx context.Context, model string, params *GenerateParams) iter.Seq2[*StreamChunk, error]

	// Close releases any resources (connections, clients) held by the backend.
	// Call this when the LLM is no longer needed. Safe to call multiple times.
	Close() error
}

LLM is the interface that all LLM backend implementations must satisfy.

Each backend (OpenAI, Google GenAI, etc.) implements this interface to provide a uniform way to generate completions. The executor calls Generate or GenerateStream; users can also call them directly for advanced use cases that bypass the agentic loop.

Implementors must handle:

Converting GenerateParams.Contents to the backend's native message format.
Injecting GenerateParams.SystemInstruction via the backend's system-message mechanism.
Converting GenerateParams.Tools to the backend's native tool/function-call format.
Applying GenerateParams.Config sampling parameters.
Mapping the backend's response back to ago types (Response, StreamChunk, etc.).

The Name method returns a stable identifier (e.g. "openai", "genai") used for logging and debugging. Close releases any resources held by the backend.

type ModalityTokenCount ¶

type ModalityTokenCount struct {
	// Modality is the content type (e.g. "TEXT", "IMAGE").
	Modality string `json:"modality,omitempty"`

	// TokenCount is the number of tokens for this modality.
	TokenCount int32 `json:"tokenCount,omitempty"`
}

ModalityTokenCount reports the token count for a specific modality (e.g. TEXT, IMAGE, AUDIO).

Some backends break down token usage by modality so you can see how many tokens were consumed by text vs. images vs. other content types.

type Part ¶

type Part struct {
	// Text holds plain text content. Non-empty for text messages, empty otherwise.
	Text string `json:"text,omitempty"`

	// FunctionCall is set when the model requests a tool invocation.
	// Contains the tool name and arguments. The executor uses this to dispatch to the
	// matching Tool implementation.
	FunctionCall *FunctionCall `json:"functionCall,omitempty"`

	// FunctionResponse is set when returning a tool's result to the model.
	// Contains the tool name and the response data (key-value map). The executor
	// builds these automatically after executing tools.
	FunctionResponse *FunctionResponse `json:"functionResponse,omitempty"`

	// InlineData holds binary data (e.g. images, audio clips) embedded directly in the
	// message, along with its MIME type. Use BlobPart() to create these.
	InlineData *Blob `json:"inlineData,omitempty"`

	// FileData references external data by URI (e.g. a GCS path or uploaded file URL).
	// Use FileDataPart() to create these.
	FileData *FileData `json:"fileData,omitempty"`

	// ExecutableCode holds code the model produced for execution (used in code-execution
	// flows where the model can write and run code).
	ExecutableCode *ExecutableCode `json:"executableCode,omitempty"`

	// CodeExecutionResult holds the outcome and output of running model-generated code.
	CodeExecutionResult *CodeExecutionResult `json:"codeExecutionResult,omitempty"`

	// VideoMetadata provides timing and FPS info for video content in InlineData or FileData.
	VideoMetadata *VideoMetadata `json:"videoMetadata,omitempty"`

	// Thought is true when this part contains model reasoning/thinking tokens.
	// These are typically passed through opaquely and not displayed to the end user.
	Thought bool `json:"thought,omitempty"`

	// ThoughtSignature is an opaque signature for thought tokens, used by some backends
	// to validate or reference thinking content across turns.
	ThoughtSignature []byte `json:"thoughtSignature,omitempty"`
}

Part is one segment of a Content message.

A Part carries exactly one kind of payload. Set one (and only one) of the fields below. The executor and LLM backends inspect which field is non-zero to determine the part type.

Common part types:

Text: Plain text (user question, model answer, system prompt).
FunctionCall: The model is requesting a tool invocation (name + arguments).
FunctionResponse: The result of a tool invocation, sent back to the model.
InlineData: Binary data (image, audio) embedded directly in the message.
FileData: A reference to external data by URI (e.g. GCS, uploaded file).
ExecutableCode: Code the model generated for execution (code-execution flows).
CodeExecutionResult: The output of running model-generated code.

The Thought and ThoughtSignature fields are used when the model returns reasoning/thinking tokens (e.g. with ThinkingConfig enabled). These are typically opaque to the caller.

Use the constructor functions (TextPart, FunctionCallPart, BlobPart, etc.) for convenience.

func BlobPart ¶

func BlobPart(mimeType string, data []byte) *Part

BlobPart returns a Part with inline binary data (e.g. image) and the given MIME type.

func FileDataPart ¶

func FileDataPart(mimeType, fileURI string) *Part

FileDataPart returns a Part that references data by URI (e.g. GCS or uploaded file).

func FunctionCallPart ¶

func FunctionCallPart(id, name string, args map[string]any) *Part

FunctionCallPart returns a Part representing a single function call (id, name, args).

func FunctionResponsePart ¶

func FunctionResponsePart(id, name string, resp map[string]any) *Part

FunctionResponsePart returns a Part representing a single function result (id, name, response).

func TextPart ¶

func TextPart(text string) *Part

TextPart returns a Part containing only plain text.

type Response ¶

type Response struct {
	// Candidates holds the generated completions. Typically contains one candidate.
	Candidates []*Candidate

	// Usage reports token consumption for this request.
	Usage TokenUsage

	// ModelVersion is the specific model version string returned by the backend
	// (e.g. "gpt-4o-2024-05-13" or "gemini-1.5-pro-001").
	ModelVersion string
}

Response is the result of a non-streaming LLM.Generate call.

It contains one or more Candidate completions (most commonly just one), token usage statistics, and the model version string reported by the backend.

In the executor loop, the first candidate's Content is what gets appended to the conversation history and checked for tool calls.

type Role ¶

type Role string

Role identifies the sender of a message in a conversation.

Every Content in a conversation history has a Role that tells the LLM who produced it. The executor and LLM backends use Role to correctly format messages for each provider's API.

Standard roles:

RoleUser: Messages from the end user (questions, instructions, follow-ups).
RoleModel: Messages from the LLM (text responses, tool-call requests).
RoleSystem: System instructions that guide the model's behavior. Typically set once via Agent.SystemPrompt; the framework injects it automatically.
RoleTool: Messages carrying tool/function execution results, sent back to the model so it can incorporate the tool output into its next response.

const (
	// RoleUser represents a message from the end user.
	RoleUser Role = "user"

	// RoleModel represents a message generated by the LLM.
	RoleModel Role = "model"

	// RoleSystem represents a system instruction that shapes the model's behavior.
	// You typically don't create system Content manually — set Agent.SystemPrompt instead
	// and the framework handles injection into the LLM request.
	RoleSystem Role = "system"

	// RoleTool represents a message carrying tool/function call results back to the model.
	// The executor creates these automatically after executing tools; you rarely build them by hand.
	RoleTool Role = "tool"
)

type Schema ¶

type Schema struct {
	// Type is the JSON Schema type (string, number, integer, boolean, array, object).
	Type SchemaType

	// Description explains what this value represents. Shown to the model to guide
	// argument generation — be specific and helpful.
	Description string

	// Enum restricts the value to one of the listed strings (only for TypeString).
	Enum []string

	// Properties defines the fields of an object schema (only for TypeObject).
	// Each key is a field name; the value is the field's schema.
	Properties map[string]*Schema

	// Required lists the field names that must be present (only for TypeObject).
	Required []string

	// Items defines the element schema for array types (only for TypeArray).
	Items *Schema

	// Format is an optional format hint (e.g. "date-time", "email", "uri").
	Format string

	// Nullable indicates the value may be null.
	Nullable bool
}

Schema defines the shape of tool parameters or structured response JSON.

This is a simplified JSON Schema representation used to tell the model what arguments a tool expects and what shape its output should have. LLM backends convert this to their native schema format (e.g. OpenAI JSON Schema, GenAI Schema).

Example — a tool that takes a required "query" string and optional "limit" integer:

&ago.Schema{
    Type: ago.TypeObject,
    Properties: map[string]*ago.Schema{
        "query": {Type: ago.TypeString, Description: "The search query to run"},
        "limit": {Type: ago.TypeInteger, Description: "Maximum number of results (default 10)"},
    },
    Required: []string{"query"},
}

type SchemaType ¶

type SchemaType string

SchemaType is a JSON Schema primitive type used to define tool parameter shapes and structured output formats.

Use these constants when building Schema definitions for tool parameters:

&ago.Schema{
    Type: ago.TypeObject,
    Properties: map[string]*ago.Schema{
        "query": {Type: ago.TypeString, Description: "Search query"},
        "limit": {Type: ago.TypeInteger, Description: "Max results"},
    },
    Required: []string{"query"},
}

const (
	// TypeString represents a JSON string value.
	TypeString SchemaType = "string"

	// TypeNumber represents a JSON number (float64).
	TypeNumber SchemaType = "number"

	// TypeInteger represents a JSON integer.
	TypeInteger SchemaType = "integer"

	// TypeBoolean represents a JSON boolean.
	TypeBoolean SchemaType = "boolean"

	// TypeArray represents a JSON array. Set Items to define the element schema.
	TypeArray SchemaType = "array"

	// TypeObject represents a JSON object. Set Properties and Required to define the shape.
	TypeObject SchemaType = "object"
)

type StreamChunk ¶

type StreamChunk struct {
	// Candidates holds partial or complete candidate data for this chunk.
	Candidates []*Candidate `json:"candidates,omitempty"`

	// Usage is populated on the final chunk with complete token usage statistics.
	Usage *TokenUsage `json:"usageMetadata,omitempty"`

	// Complete is true only on the final chunk, indicating the response is fully received.
	// The final chunk's Candidates and Usage are the source of truth.
	Complete bool `json:"complete,omitempty"`

	// ErrorMessage is set if the stream encountered an error.
	ErrorMessage string `json:"error,omitempty"`
}

StreamChunk is a single piece of a streaming LLM response.

When using LLM.GenerateStream or ago.RunSSE, the response arrives as a sequence of StreamChunk values. Each chunk may contain partial text, partial tool calls, or usage metadata. The final chunk has Complete=true and contains the definitive candidates and usage data (source of truth — earlier chunks may have partial/incomplete data).

Typical streaming flow:

Several chunks with partial text in Candidates[0].Content.Parts[0].Text
A final chunk with Complete=true, full candidates, and usage metadata

type ThinkingConfig ¶

type ThinkingConfig struct {
	// Enabled turns on thinking/reasoning mode.
	Enabled bool

	// Budget is the maximum number of thinking tokens the model may use.
	Budget int
}

ThinkingConfig enables and configures extended reasoning tokens for models that support it (e.g. Gemini with thinking mode).

When enabled, the model may produce "thinking" parts in its response that show its reasoning process. These are captured as Part values with Thought=true.

type TokenUsage ¶

type TokenUsage struct {
	// PromptTokenCount is the total number of tokens in the input (prompt + history).
	PromptTokenCount int32 `json:"promptTokenCount,omitempty"`

	// CandidatesTokenCount is the total tokens in the model's output (completion).
	CandidatesTokenCount int32 `json:"candidatesTokenCount,omitempty"`

	// TotalTokenCount is PromptTokenCount + CandidatesTokenCount.
	TotalTokenCount int32 `json:"totalTokenCount,omitempty"`

	// CachedContentTokenCount is the number of prompt tokens served from cache.
	CachedContentTokenCount int32 `json:"cachedContentTokenCount,omitempty"`

	// ThoughtsTokenCount is the number of tokens consumed by model reasoning/thinking.
	ThoughtsTokenCount int32 `json:"thoughtsTokenCount,omitempty"`

	// ToolUsePromptTokenCount is the number of prompt tokens attributed to tool definitions.
	ToolUsePromptTokenCount int32 `json:"toolUsePromptTokenCount,omitempty"`

	// TrafficType is a backend-specific traffic classification string.
	TrafficType string `json:"trafficType,omitempty"`

	// CacheTokensDetails breaks down cached tokens by modality.
	CacheTokensDetails []*ModalityTokenCount `json:"cacheTokensDetails,omitempty"`

	// CandidatesTokensDetails breaks down completion tokens by modality.
	CandidatesTokensDetails []*ModalityTokenCount `json:"candidatesTokensDetails,omitempty"`

	// PromptTokensDetails breaks down prompt tokens by modality.
	PromptTokensDetails []*ModalityTokenCount `json:"promptTokensDetails,omitempty"`

	// ToolUsePromptTokensDetails breaks down tool-definition tokens by modality.
	ToolUsePromptTokensDetails []*ModalityTokenCount `json:"toolUsePromptTokensDetails,omitempty"`
}

TokenUsage reports detailed token consumption for an LLM request.

This includes prompt tokens, completion (candidate) tokens, cached tokens, thinking tokens, tool-use tokens, and per-modality breakdowns. Not all fields are populated by every backend — check your provider's documentation for which fields are available.

Use this to monitor costs, debug context-length issues, or implement token budgeting.

type Tool ¶

type Tool interface {
	Name() string
	Declaration() *FunctionDeclaration
	Execute(ctx context.Context, args map[string]any) (*ToolResult, error)
	Options() ToolOptions
}

Tool is the interface for all tools usable by the executor.

type ToolOptions ¶

type ToolOptions struct {
	// SkipSynthesis, when true, causes the executor to return the tool result
	// directly without an additional LLM synthesis turn.
	SkipSynthesis bool
}

ToolOptions configures tool behavior within the executor loop.

type ToolResult ¶

type ToolResult struct {
	Response map[string]any
	Error    error
}

ToolResult holds the outcome of a tool execution. Response is sent back to the model as FunctionResponse data. Error is a tool-level error (sent to the model as {"error": "..."}, loop continues).

type VideoMetadata ¶

type VideoMetadata struct {
	// StartOffset is the start time of the video segment.
	StartOffset time.Duration `json:"startOffset,omitempty"`

	// EndOffset is the end time of the video segment.
	EndOffset time.Duration `json:"endOffset,omitempty"`

	// FPS is the frames-per-second of the video (nil if unspecified).
	FPS *float64 `json:"fps,omitempty"`
}

VideoMetadata describes a video segment's timing and frame rate.

Attach this to a Part that contains video data (via InlineData or FileData) to specify which portion of the video the model should consider.

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
agent
cmd
example command
llm
tools
utils

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL