Skip to main content

query()

Query the knowledge base with natural language. Returns either a complete result dict or a streaming response.
result = client.query("What is ACME's revenue?")
print(result["summary"])

Parameters

query
string
required
Natural-language question to answer.
stream
bool
default:"False"
If True, return a StreamingResponse that yields tokens as they arrive.
response_mode
string
default:"chat"
Controls answer depth and reasoning style.
ModeDescription
"chat"Concise, direct answers. Fastest.
"thinking"Includes reasoning chains and cross-document analysis.
"research"Exhaustive multi-hop research with parallel strategies. Slowest.
query_depth
string
Override the retrieval depth independently of response mode. Values: "basic", "thinking", "research".
model
string
LLM model override (e.g. "gpt-4o"). Uses the default cost-efficient model if not specified.
session_id
string
Explicit conversation session ID to continue. Use this or maintain_context, not both.
maintain_context
bool
default:"False"
If True, maintain conversation state across queries. The client tracks the session ID automatically.
include_summary
bool
default:"True"
If True, include the AI-generated summary. Set to False for raw fact retrieval.
web_search_enabled
bool
default:"False"
Enable web search augmentation for questions that may need external information.
conversation_upload_ids
List[str]
List of upload IDs to include as additional context for this query.

Non-streaming response

result = client.query("What is ACME's revenue?")
Returns a dict with:
{
  "success": true,
  "summary": "ACME Corp reported $50M revenue in Q4 2025...",
  "session_id": "sess_abc123",
  "total_facts": 12,
  "total_chunks": 5,
  "metadata": {
    "entities": ["ACME Corp"],
    "model": "gpt-4o-mini",
    "search_time": "1.2s"
  }
}

Streaming response

resp = client.query("Summarize Q4 results", stream=True)
for token in resp:
    print(token, end="", flush=True)

# Metadata available after iteration
print(resp.total_facts)
print(resp.sources)
See StreamingResponse for all available properties.

Response modes

# Fast, concise answer
result = client.query("Who is ACME's CEO?", response_mode="chat")

# Reasoning chains included
result = client.query(
    "Why did revenue decline in Q3?",
    response_mode="thinking"
)

# Exhaustive multi-hop research
result = client.query(
    "Compare ACME and Widget Corp's growth trajectories",
    response_mode="research"
)

query_facts()

Fast fact retrieval without AI summary generation. Returns the same dict structure but with raw graph facts and vector chunks only.
facts = client.query_facts("ACME revenue")

Parameters

query
string
required
Natural-language question.
max_results
int
default:"10"
Maximum number of results to return.

Returns

Same dict structure as query() but summary will be empty or minimal since no LLM generation occurs.

Insufficient coverage

When the knowledge base has zero relevant facts for a query, Vrin returns early without calling the LLM:
result = client.query("What is the weather on Mars?")
if result.get("insufficient_coverage"):
    print("No relevant knowledge found for this query")
In streaming mode, check resp.insufficient_coverage after iteration completes.