Querying - Vrin

query()

Query the knowledge base with natural language. Returns either a complete result dict or a streaming response.

result = client.query("What is ACME's revenue?")
print(result["summary"])

Parameters

query

string

required

Natural-language question to answer.

stream

bool

default:"False"

If True, return a StreamingResponse that yields tokens as they arrive.

response_mode

string

default:"chat"

Controls answer depth and reasoning style.

Mode	Description
`"chat"`	Concise, direct answers. Fastest.
`"thinking"`	Includes reasoning chains and cross-document analysis.
`"research"`	Exhaustive multi-hop research with parallel strategies. Slowest.

query_depth

string

Override the retrieval depth independently of response mode. Values: "basic", "thinking", "research".

model

string

LLM model override (e.g. "gpt-4o"). Uses the default cost-efficient model if not specified.

session_id

string

Explicit conversation session ID to continue. Use this or maintain_context, not both.

maintain_context

bool

default:"False"

If True, maintain conversation state across queries. The client tracks the session ID automatically.

include_summary

bool

default:"True"

If True, include the AI-generated summary. Set to False for raw fact retrieval.

web_search_enabled

bool

default:"False"

Enable web search augmentation for questions that may need external information.

conversation_upload_ids

List[str]

List of upload IDs to include as additional context for this query.

Non-streaming response

result = client.query("What is ACME's revenue?")

Returns a dict with:

{
  "success": true,
  "summary": "ACME Corp reported $50M revenue in Q4 2025...",
  "session_id": "sess_abc123",
  "total_facts": 12,
  "total_chunks": 5,
  "metadata": {
    "entities": ["ACME Corp"],
    "model": "gpt-4o-mini",
    "search_time": "1.2s"
  }
}

Streaming response

resp = client.query("Summarize Q4 results", stream=True)
for token in resp:
    print(token, end="", flush=True)

# Metadata available after iteration
print(resp.total_facts)
print(resp.sources)

See StreamingResponse for all available properties.

Response modes

# Fast, concise answer
result = client.query("Who is ACME's CEO?", response_mode="chat")

# Reasoning chains included
result = client.query(
    "Why did revenue decline in Q3?",
    response_mode="thinking"
)

# Exhaustive multi-hop research
result = client.query(
    "Compare ACME and Widget Corp's growth trajectories",
    response_mode="research"
)

query_facts()

Fast fact retrieval without AI summary generation. Returns the same dict structure but with raw graph facts and vector chunks only.

facts = client.query_facts("ACME revenue")

Parameters

query

string

required

Natural-language question.

max_results

int

default:"10"

Maximum number of results to return.

Returns

Same dict structure as query() but summary will be empty or minimal since no LLM generation occurs.

Insufficient coverage

When the knowledge base has zero relevant facts for a query, Vrin returns early without calling the LLM:

result = client.query("What is the weather on Mars?")
if result.get("insufficient_coverage"):
    print("No relevant knowledge found for this query")

In streaming mode, check resp.insufficient_coverage after iteration completes.

​query()

​Parameters

​Non-streaming response

​Streaming response

​Response modes

​query_facts()

​Parameters

​Returns

​Insufficient coverage

query()

Parameters

Non-streaming response

Streaming response

Response modes

query_facts()

Parameters

Returns

Insufficient coverage