Developer API

Before using the Notate API, you'll need to generate an API key from the Settings page under Developer Integration. Keys are stored in your local SQLite database.

API Key Management

Key Features

Generate new API keys
Add custom names for organization
Set expiration periods
View and manage active keys

Key Management

Copy keys to clipboard
Delete unused keys
View key expiration status

API Endpoints

The Notate API provides three main endpoints for different types of interactions with your collections and AI models.

Vector Query

Perform semantic searches across your collections using vector embeddings. Returns the most semantically similar documents based on the input query.

Endpoint:

POST http://127.0.0.1:47372/api/vector

Request Body:

{
  "input": "string",  // Required - The search query to find similar documents
  "collection_name": "string",  // Required - Collection to search within
  "top_k": number,  // Optional - Number of results to return (default: 5)
}

Example Response:

{
"status": "success",
"results": [
  {
    "content": "The document content...",
    "metadata": {
      "source": "document.txt",
      "title": "Document Title",
      "author": "",
      "description": "Document description...",
      "keywords": "",
      "ogImage": ""
    }
  }
]
}

RAG Query

Retrieves AI-generated responses with source citations using Retrieval-Augmented Generation. This endpoint combines semantic search with LLM capabilities to provide contextually relevant answers.

Important Note:

For external AI providers (OpenAI, Anthropic, etc.), you must specify both the provider and model parameters, and have valid API keys configured in your settings. If you only specify a model without a provider, the system will default to using Ollama with local models.

Endpoint:

POST http://127.0.0.1:47372/api/rag

Request Body:

{
  "input": "string",  // Required - The query to answer
  "collection_name": "string",  // Required - Collection for context retrieval
  "model": "string",  // Required - Model to use (e.g., "gpt-4", "llama2")
  "provider": "string",  // Required for external APIs (e.g., "openai", "anthropic") - Omit for local models
  "prompt": "string",  // Optional - Custom system prompt
  "top_k": number,  // Optional - Number of context chunks (default: 5)
  "temperature": number,  // Optional - Response creativity (default: 0.5)
  "max_completion_tokens": number,  // Optional - Max response length (default: 2048)
  "top_p": number,  // Optional - Nucleus sampling (default: 1)
  "frequency_penalty": number,  // Optional - Repetition control (default: 0)
  "presence_penalty": number,  // Optional - Topic diversity (default: 0)
  "is_ooba": boolean  // Optional - Use Oobabooga LLM processing (default: false)
  "is_ollama": boolean  // Optional - Use Ollama LLM processing (default: false)
  "character": string  // Optional - Oobabooga character 
}

Example Response:

{
  "id": "local-llama3.2-1735945911",
  "choices": [{
    "finish_reason": "stop",
    "index": 0,
    "message": {
      "content": "Example response from RAG query",
      "role": "assistant"
    }
  }],
  "created": 1735945911,
  "model": "llama3.2",
  "object": "chat.completion",
  "usage": {
    "completion_tokens": -1,
    "prompt_tokens": -1,
    "total_tokens": -1
  }
}

LLM Query

Direct interaction with the configured Language Model without collection context. Useful for general AI interactions or when collection context isn't needed.

Important Note:

Endpoint:

POST http://127.0.0.1:47372/api/llm

Request Body:

{
  "input": "string",  // Required - The prompt or question for the LLM
  "model": "string",  // Required - Model to use (e.g., "gpt-4", "claude-2", "llama2")
  "provider": "string",  // Required for external APIs (e.g., "openai", "anthropic") - Omit for local models
  "prompt": "string",  // Optional - Custom system prompt
  "temperature": number,  // Optional - Response creativity (default: 0.5)
  "max_completion_tokens": number,  // Optional - Max response length (default: 2048)
  "top_p": number,  // Optional - Nucleus sampling (default: 1)
  "frequency_penalty": number,  // Optional - Repetition control (default: 0)
  "presence_penalty": number,  // Optional - Topic diversity (default: 0)
  "is_ollama": boolean  // Optional - Use Ollama LLM processing (default: false)
  "is_ooba": boolean  // Optional - Use Oobabooga LLM processing (default: false)
  "character": string  // Optional - Oobabooga character 
}

Example Response:

{
  "id": "local-llama3.2-1735945911",
  "choices": [{
    "finish_reason": "stop",
    "index": 0,
    "message": {
      "content": "Example response from RAG query",
      "role": "assistant"
    }
  }],
  "created": 1735945911,
  "model": "llama3.2",
  "object": "chat.completion",
  "usage": {
    "completion_tokens": -1,
    "prompt_tokens": -1,
    "total_tokens": -1
  }
}

Error Handling

The API uses standard HTTP status codes and returns detailed error messages to help you debug issues.

Common Error Codes

{
  "400": "Bad Request - Invalid parameters",
  "401": "Unauthorized - Invalid API key",
  "403": "Forbidden - Insufficient permissions",
  "404": "Not Found - Resource doesn't exist",
  "409": "Conflict - SQLite constraint violation",
  "429": "Too Many Requests - Rate limit exceeded",
  "500": "Internal Server Error - Database error",
  "503": "Service Unavailable - SQLite database locked"
}

Best Practices

Coming soon! We're working on comprehensive best practices documentation for the API.

What to Expect

Our upcoming best practices guide will cover performance optimization, security recommendations, integration patterns, and more. Check back soon!