← All docs
🎛️

Runtime hints

Which tool to use per runtime: short-context agents, reasoners, IDEs, REST.

RAGfly — Runtime Hints

The MCP protocol and REST API are agent-agnostic. This document complements the technical reference with behavioral guidance for each runtime type: when to use which tool, how to read responses, and what to avoid.


Short-context agents (Codex, GPT-4o-mini, Haiku, small models)

Recommended pattern: direct sequence without iteration.

estado_sesion()                         # confirm connection
→ buscar_chunks(q="your query", limite=5) # direct semantic search
→ use chunks[].texto in the prompt        # no re-processing needed

Or if the corpus is large and a generated response is needed:

preguntar(texto="your question")        # full RAG in one call
→ use the response directly

Why: these models benefit from short, complete flows. Calling listar_documentosver_documentobuscar_chunks in sequence wastes context unnecessarily when preguntar already does the full RAG.

Useful field: chunks[].texto comes pre-processed — no extraction or cleaning needed. The endpoint returns multiple scores per document (rrf_score, similitud_max, score_rerank) and per chunk (similitud). Use the min_similitud parameter to filter at the source rather than post-processing manually.


Autonomous reasoning agents (Claude, o1/o2, Gemini 2.5 Pro)

When to use each tool:

Situation Recommended tool
Free natural language query preguntar — full RAG with citations
You already have a codigo_documento ver_documento — detail without search
You need raw chunks (for own reranking, score calculation, manual synthesis) buscar_chunks
You need cross-document synthesis over a workspace ejecutar_habilidad with a RESUMIR or ANALIZAR skill
You want to know what documents are available before asking listar_documentos(estado="VECTORIZADO")

Scores: RAGfly returns multiple scores per result: rrf_score (hybrid fusion), similitud_max (best vector similarity for the doc), score_rerank (Cohere reranker), and similitud per chunk. A low score on a specific question may indicate the corpus does not contain the answer — better to respond "no evidence found" than to force a response with irrelevant chunks. Use the min_similitud parameter to filter at source.

LLM Skills: RAGfly has skills configurable by the group administrator (see listar_habilidades). Before implementing your own synthesis, check if a skill already does what you need.


IDE-embedded agents (Cursor, Cline, Continue.dev, Copilot with MCP)

Citations — not plain text: RAGfly returns chunks with structured metadata. Render as references, not inline:

{
  "texto": "The contract establishes a 5% penalty...",
  "nombre_documento": "Supplier_Contract_2024.pdf",
  "tipo_documento": "PDF",
  "similitud": 0.87,
  "score_rerank": 0.94
}

Suggested presentation in IDE:

> Source: Supplier_Contract_2024.pdf (similitud: 0.87)
> "The contract establishes a 5% penalty..."

tipo_documento indicates the origin format: PDF, DOCX, XLSX, TXT, MD, etc. Useful for deciding whether to show an icon or open with a specific viewer.

Document states: a document with state other than VECTORIZADO will not appear in semantic searches. If the user asks about a doc and it doesn't appear, check with listar_documentos — it may be in the pipeline (CHUNKEADO, ESCANEADO) or with an error (REVISAR, NO_ESCANEABLE).


REST / n8n / Make / Zapier integrations (no LLM agent)

Minimum flow:

POST /auth/login  →  JWT
GET  /auth/me     →  verify grupo_activo
POST /documentos/buscar-semantico  →  chunks with scores

Pagination: GET /documentos/paginado returns { items, total, page, limit }. Iterate with page=1,2,... until items is empty or page * limit >= total.

SSE Streaming: POST /chat/conversaciones/{id}/mensajes/stream returns Server-Sent Events. In n8n use the HTTP node with streaming mode; in Make use the HTTP module → SSE parse. Text fragments arrive as {"text": "..."}. The final event is {"done": true, "id_mensaje_user": N, "id_mensaje_assistant": N}. See full protocol in REST.md § SSE protocol.

Stateless conversations: if your automation doesn't need history, create a new conversation per request and discard it. Simpler than maintaining id_conversacion across executions.


Python SDK (pip install ragfly)

The SDK wraps the REST API. For most cases use it directly:

from ragfly import RAGfly

client = RAGfly(api_key="slm_live_...")
resp = client.ask("What are the penalty clauses?")
print(resp.answer)

# Direct semantic search (without LLM)
results = client.search("contracts 2024", limit=5)
for doc in results.documents:
    print(f"[rrf={doc.rrf_score:.3f}] {doc.nombre}")
    for chunk in doc.chunks[:2]:
        print(f"  similitud={chunk.similitud:.3f}: {chunk.texto[:120]}")

See SDK.md for the full client reference.


Summary — choosing a tool or endpoint

Need MCP tool REST endpoint
Verify connection estado_sesion GET /auth/me
Natural language question (RAG) preguntar POST /chat/conversaciones/{id}/mensajes/stream
Semantic search (raw chunks) buscar_chunks POST /documentos/buscar-semantico
See what documents exist listar_documentos GET /documentos/paginado
Document detail ver_documento GET /documentos/{codigo}
AI synthesis / analysis ejecutar_habilidad POST /habilidades/{codigo}/ejecutar
Pipeline state ver_cola GET /cola-estados-docs/paginado