C.R.A.I.G — Documentation

VERSION 1.0 — INITIAL RELEASE

WHAT'S NEW

C.R.A.I.G v1.0 is the first public release. This section presents the complete feature set shipped with this version.

RELEASE HIGHLIGHTS

LAB — AI Document Analysis

Upload images or scanned documents and let C.R.A.I.G orchestrate a multi-step AI pipeline to extract, classify, and structure their content.

AUTH — User Management

Centralized user registry with JWT authentication. Create accounts, manage access, and generate API tokens directly from the Portal.

INCIDENT — Bug & Feature Reports

Integrated feedback form to report bugs or request new features. All reports are tracked for continuous improvement.

Multi-Model AI Orchestration

LLaVA generates the pipeline dynamically. Florence-2 handles OCR and visual tasks. Mistral / Qwen structure the extracted data.

Multi-Page Document Analysis

Upload all pages of a document at once. C.R.A.I.G automatically selects aggregation mode (≤10 pages) or sliding window mode (>10 pages) for optimal extraction.

MCP Client Connectors

Send analysis results to external systems via the Model Context Protocol. Register connectors dynamically from the Lab sidebar — no code required.

Prompt Catalog

12 ready-to-use business prompts across Finance, Accounting, Legal, Technical, and HR domains. One click applies the prompt and selects the optimal model.

Architecture Overview

Interactive Mermaid diagram of the full C.R.A.I.G architecture displayed as the Lab home screen. Printable directly from the toolbar.

Async Job Queue

Jobs are processed asynchronously via RabbitMQ. Results are delivered by webhook and stored in Redis for real-time polling.

Monitoring

Prometheus metrics exposed by each service. Grafana dashboards available at craig.monitoring.weesoft.fr (SSO-protected).

TECHNOLOGY STACK

Python 3.11 FastAPI RabbitMQ Redis PostgreSQL 15 Ollama 0.6.8 LLaVA Mistral Qwen2.5-VL Florence-2 MCP (Model Context Protocol) React + TypeScript Mermaid Caddy Docker Prometheus / Grafana GCP — Tesla T4 GPU

INTRODUCTION

OVERVIEW

C.R.A.I.G — Cognitive Reasoning Agent for Intelligent Graphical — is an AI platform designed to automate document analysis using a chain of specialized models.

ARCHITECTURE

C.R.A.I.G is composed of four independent services that communicate asynchronously:

Service	Role	Port
craig-auth	JWT authentication, user management, token generation	8000
craig-api-semantic	Job reception, RabbitMQ publishing, Redis status, webhook receiver	8001
craig-api-vision	Florence-2 inference: OCR, classification, object detection, captioning	8002
craig-worker	RabbitMQ consumer — orchestrates LLaVA → vision pipeline → webhook	—

JOB LIFECYCLE

1

SUBMIT The frontend (Lab) sends an image + question via POST /ask. The API returns a jobId immediately.

2

QUEUE The job is published to the craig_tasks RabbitMQ queue as a persistent message.

3

ORCHESTRATE The worker consumes the job, calls LLaVA with the image to generate a pipeline JSON describing which vision steps to execute.

4

EXECUTE The worker executes each pipeline step against craig-api-vision (Florence-2) and optionally a text LLM (Mistral/Qwen) for structuring.

5

DELIVER Results are sent via webhook to craig-api-semantic, stored in Redis. The frontend polls /jobs/{id}/status until status is "done".

ACCESS

URL	Service
craig.weesoft.fr	Portal — main entry point
craig.lab.weesoft.fr	Lab — document analysis interface
craig.feedback.weesoft.fr	Incident — bug & feature reports
craig.api.weesoft.fr	Semantic API (public endpoint)
craig.auth.weesoft.fr	Auth API
craig.monitoring.weesoft.fr	Grafana (SSO)

MODULE

LAB

The Lab is the main document analysis interface. Upload an image, select an AI model, ask a question, and get structured results.

GETTING STARTED

1

SIGN IN Log in with your C.R.A.I.G credentials at craig.lab.weesoft.fr. Your session is shared across the platform.

2

EXPLORE THE ARCHITECTURE The Lab opens on the Architecture screen — an interactive Mermaid diagram of all C.R.A.I.G components. Use the zoom controls or click FIT to fill the screen. Click LAUNCH THE LAB or any sidebar icon to proceed.

3

UPLOAD A DOCUMENT Drag and drop one or more images (PNG, JPEG, TIFF, PDF page) into the SOURCE panel, or click to browse. For multi-page documents, set the Number of pages field first. Click RESET next to the Source title to clear all loaded documents and start over.

4

SELECT AN AI MODEL Choose the orchestration model from the dropdown, or use the Prompt Catalog (book icon) to apply a business prompt — this sets both the model and the instruction in one click.

5

AI INSTRUCTIONS (optional) Type a natural language question or task about the document (e.g. "Extract the invoice total" or "Identify the patient name"). Whether this field is required depends on your pipeline:

Pipeline JSON	structuration_llm	prompt_by_type	Required?
absent	—	—	Yes — used to generate the pipeline
present	absent	—	No — no LLM text step
present	present	absent	Yes — becomes the LLM task
present	present	present	No — each doc type has its own prompt

6

RUN THE ANALYSIS Click LAUNCH THE PIPELINE. Jobs are processed asynchronously. For multi-page documents, a progress badge shows each page as it completes. The final JSON result appears in the Output panel when all pages are done.

7

SEND TO AN MCP CONNECTOR Once a result is available, open the MCP Connectors panel (plug icon). Connected servers show a green dot and a SEND button. Click it to push the result to an external system (e.g. PostgreSQL).

AI MODELS

Model	Type	Use Case
LLaVA	Multimodal LLM	Pipeline orchestration — analyses image and generates the sequence of vision steps
Florence-2-base	Vision model	OCR, image classification (tri), object detection, captioning
Mistral	Text LLM	Post-processing: structures raw OCR text into JSON
Qwen 2.5 VL	Multimodal LLM	Alternative to LLaVA for pipeline generation, handles complex layouts

PIPELINE STEPS (VISION)

tri — Classifies the document: determines if it is readable (lisible: true/false) and provides a detailed caption.
ocr — Extracts all text content from the image using Florence-2's OCR engine.
detection — Locates and labels objects or regions of interest within the image with bounding boxes.
poser_question — Answers a specific visual question about the image content (VQA).
structuration_llm — Sends raw OCR text to a text LLM (Mistral or Qwen) to return structured JSON output.

PIPELINE (CUSTOM)

Advanced users can provide a custom pipeline JSON via the Pipeline field, bypassing the LLaVA generation step:

{

  "etape1": { "action": "tri" },

  "etape2": { "action": "ocr", "if_previous": "etape1.lisible" },

  "etape3": { "action": "structuration_llm", "modele": "mistral" }

}

Steps are executed in alphabetical order. The if_previous field conditionally skips a step if the referenced field is falsy.

SIDEBAR PANELS

Icon	Panel	Description
	Architecture	Full Mermaid diagram of C.R.A.I.G components. Shown by default on startup. Zoomable and printable.
	Explorer	Main document analysis interface — source, pipeline, output.
	Prompt Catalog	12 ready-to-use prompts across 5 business domains. Click Apply to set the model and prompt in one click.
	MCP Connectors	Register external MCP servers and send analysis results to them directly from the sidebar.
	Settings	Temperature, context window, system prompts, and system health check.

MULTI-PAGE DOCUMENTS

Set the Number of pages field before launching the pipeline.
Upload one image per page. C.R.A.I.G assigns a shared documentId and processes each page in sequence.
For ≤10 pages: all OCR texts are merged before LLM extraction (aggregation mode).
For >10 pages: sliding window mode — pages are processed in batches of 5 with context carryover.
See the Multi-Page Analysis reference for full details.

PROMPT CATALOG

Access the catalog via the book icon in the sidebar activity bar.
12 prompts across Finance, Accounting, Legal, Technical, and HR domains.
Each prompt specifies a recommended model (LLaVA, Mistral, or Qwen).
Click APPLY: the model selector and system prompt update instantly. The catalog closes and the pipeline is ready to run.
Changing the model mid-analysis automatically cancels the current job.

JOB STATUS

pending — job is queued or processing done — results are available annule — job was cancelled error — processing failed

The Lab polls the status endpoint every 2 seconds. With a Tesla T4 GPU, processing typically completes in 15–60 seconds per page depending on model and document complexity.

SETTINGS

CONFIGURATION

All parameters can be adjusted at runtime from the Lab's Settings panel. No service restart is required.

SETTINGS PANEL

Parameter	Default	Description
Temperature	0.00	Controls output randomness. 0.0 = fully deterministic. Raise to 0.3–0.5 for more creative pipeline generation.
Context Window (NumCtx)	8 192	Token context passed to Ollama. Larger values handle longer documents but use more RAM/VRAM.
System Prompts	appsettings.json	Edit the system prompt for LLaVA, Qwen and Mistral. Click Apply Prompts — changes are stored in Redis and take effect immediately.

PROMPTS — SETTINGS PANEL VS PIPELINE EDITOR

The Lab offers two complementary ways to control AI behaviour:

Where	What it controls	When to use
Settings Panel → System Prompts	The orchestration prompt sent to LLaVA / Mistral / Qwen to generate the pipeline JSON.	When you want to tune how the model interprets a document type (e.g. always extract IBAN, always output French).
Pipeline Editor (central zone)	The pipeline JSON itself — the list of actions (tri, ocr, detection…) that Florence-2 will execute.	When you already know exactly which actions to run and want to bypass LLM generation entirely.

Both can be used together: set the system prompt in Settings to improve automatic generation, and override with the editor when you need precise control.

SYSTEM CHECK PANEL

The Settings panel includes a System Check section that verifies the status of all installed components in real time. No manual SSH required.

Component	What is checked
RabbitMQ	Queue broker connectivity — jobs cannot be submitted if unavailable.
Redis	Job status store — polling will not work if unavailable.
Ollama	LLM inference service reachability.
LLaVA / Mistral / Qwen2.5-VL	Model presence in Ollama (`ollama list`). Status missing means the model was not pulled yet.
Florence-2	Vision microservice (OCR / detection / tri) reachability.

Click REFRESH to re-run all checks. Statuses: ok — missing — unavailable.

HOW PROMPTS WORK

1

EDIT Open the Settings panel, select a model tab (llava / qwen / mistral) and edit the system prompt.

2

APPLY Click Apply Prompts. The prompt is sent to POST /config and stored in Redis under craig:runtime_config.

3

EFFECT The worker reads the Redis config on each job. The override replaces the appsettings.json value for the current session.

4

PERSISTENCE Prompts survive service restarts as long as Redis is running (TTL: none). To make permanent, update appsettings.json and redeploy.

STATIC CONFIGURATION (appsettings.json)

Server-side defaults live in /opt/craig-services/api-semantic/appsettings.json. Edit this file to permanently change prompts, MCP connectors, or Ollama endpoints. A service restart is required after file changes.

AI.OllamaModel — Default orchestration model.
AIProfiles.*.Prompt — Default system prompts (overridable at runtime).
AIProfiles.*.NumCtx / Temperature — Default context and creativity per model.
MCPServers — MCP connector endpoints (ERP, Google Drive…).

MODULE

AUTH

The Auth module manages all users and API access. It is accessible from the Portal via the AUTH button (administrator only).

AUTHENTICATION FLOW

1

LOGIN POST your email + password to /auth/login. A JWT bearer token is returned.

2

SSO COOKIE The Portal sets a craig_token cookie on the .weesoft.fr domain so it is shared with Lab and Incident automatically.

3

TOKEN VERIFICATION Each service validates tokens by calling GET /auth/verify on craig-auth. No shared secret is needed between services.

USER MANAGEMENT

Create user — Enter an email and password in the New User form and click Add. A confirmation appears and the user list refreshes.
List users — All registered accounts are displayed with their login and active/inactive status.
Delete user — Click the Delete button next to a user. A confirmation dialog prevents accidental deletion.
Generate API token — Click Token next to a user to generate a long-lived JWT. Copy it to authenticate programmatic API calls.

API TOKEN USAGE

Use the generated token as a Bearer token in any API call:

# Example: submit a job via the semantic API

curl -X POST https://craig.api.weesoft.fr/ask \

  -H "Authorization: Bearer <token>" \

  -F "image=@document.png" \

  -F "question=Extract the invoice total" \

  -F "modeleIA=llava"

SECURITY NOTES

Passwords are hashed server-side. They are never stored in plaintext.
JWT tokens expire after 24 hours by default. Generated API tokens have a longer TTL.
The /jobs/{id}/webhook endpoint is internal-only and does not require authentication.
Grafana monitoring is protected by the Auth service via Caddy forward_auth.

MODULE

INCIDENT

The Incident module lets any authenticated user report a bug or request a new feature. Access it via the Portal → INCIDENT button.

HOW TO SUBMIT A REPORT

1

SELECT REPORT TYPE Choose Bug to report a malfunction, or Feature to suggest an improvement.

2

FILL IN THE SUBJECT Write a short, descriptive title summarizing the issue (e.g. "Invisible button on mobile").

3

DESCRIBE THE ISSUE Provide as much context as possible: steps to reproduce, expected behavior, screenshots if relevant.

4

SEND Click Send. A confirmation is displayed. You can submit another report immediately.

REPORT TYPES

Type	When to use
Bug	Something is broken, produces an error, or does not behave as expected.
Feature	A new capability you would like added, or an existing behaviour you would like changed.

GOOD PRACTICES

One report per issue — do not combine several problems in a single submission.
For bugs, include the exact error message and the steps that led to it.
For feature requests, explain the use case and the expected outcome.
If the issue is urgent, contact the WeeSoft team directly in addition to filing a report.

REFERENCE

PIPELINE FORMAT

The pipeline is a JSON object describing the ordered sequence of AI steps to execute on a document. It is generated by LLaVA or provided manually.

STRUCTURE

{

  "etape1": {

    "action": "tri"                // required

  },

  "etape2": {

    "action": "ocr",

    "if_previous": "etape1.lisible", // optional condition

    "model": "florence-2-base"    // optional, default: florence-2-base

  },

  "etape3": {

    "action": "structuration_llm",

    "modele": "mistral",          // mistral | qwen2.5vl

    "source": "etape2"           // which step provides the text

  }

}

AVAILABLE ACTIONS

Action	Model	Output fields
tri	Florence-2	`description`, `lisible` (bool)
ocr	Florence-2	`texte`, `succes` (bool)
detection	Florence-2	`bboxes` (array), `labels` (array)
poser_question	Florence-2	`reponse`, `succes` (bool)
structuration_llm	Mistral / Qwen	`resultat_structure` (JSON string), `succes` (bool)

CONDITIONAL STEPS

Use if (or the alias if_previous) to skip a step based on a previous result. Supported syntax:

Syntax	Behaviour
`"etapeX.field"`	Skip if `field` is falsy (false, null, 0, empty string)
`"etapeX.field == true"`	Skip unless `field` is strictly `true`
`"etapeX.field == false"`	Skip unless `field` is falsy
`"etapeX.field == somevalue"`	Skip unless `field` equals `somevalue` (case-insensitive)

// Only store the document if the identity check found a known client

"etape4": {

  "action": "mcp_store_mock",

  "if": "etape3.found == true"

}

ADAPTIVE PROMPTS (prompt_by_type)

Add prompt_by_type to a structuration_llm step to automatically select the extraction prompt based on the doc_type returned by a preceding classify step. When this field is present, the AI Instructions field in the UI is ignored.

"etape5": {

  "action": "structuration_llm",

  "modele": "mistral",

  "source": "etape1",

  "prompt_by_type": {

    "bank_statement": "Extract IBAN, balances, dates and amounts as JSON.",

    "invoice": "Extract vendor, invoice number, total, VAT and due date as JSON.",

    "CV": "Extract candidate name, skills, experience and last position as JSON.",

    "default": "Summarize the key information as structured JSON."

  }

}

The default key is used when no entry matches the detected doc_type.

REFERENCE

MULTI-PAGE ANALYSIS

C.R.A.I.G can analyze multi-page documents by processing each page independently via OCR, then aggregating the full text before running the LLM extraction. Two modes are available depending on the number of pages declared by the user.

HOW IT WORKS

The user specifies the total number of pages in the document before launching the pipeline.
Each page image is submitted as a separate job, sharing a common documentId (UUID).
Each job runs OCR and stores the extracted text in Redis under craig:doc:{documentId}:page:{i}.
An atomic counter tracks completed pages. When the last page is processed, the final LLM extraction is triggered.
All temporary Redis keys are cleaned up after the final result is produced.

ANALYSIS MODES

Mode	Threshold	Behaviour
Aggregation	≤ 10 pages	All OCR texts are concatenated into a single prompt. Mistral extracts the requested information from the full document in one pass.
Sliding Window	> 10 pages	Pages are processed in batches of 5. Each batch receives a summary of the previous batch as context. A final synthesis pass consolidates all partial results.

FRONTEND USAGE

Upload all page images in the Lab Source panel.
Set the Number of pages field to the actual page count of the document.
The label updates automatically: aggregation for ≤10 pages, sliding window for >10.
A green progress badge shows real-time status: Page 2 / 5 processed…
The final JSON result is delivered once all pages have been processed.

JOB PAYLOAD FIELDS

Field	Type	Description
`documentId`	string (UUID)	Shared identifier across all pages of the same document.
`pageIndex`	integer	Zero-based index of the current page (0 = first page).
`totalPages`	integer	Total number of pages in the document.

RESULT STATUS VALUES

Status	Meaning
`page_traitee`	This page's OCR is complete. Waiting for remaining pages.
`document_complet`	All pages processed. Final extraction result is included.

REFERENCE

MCP CONNECTORS

C.R.A.I.G acts as an MCP client (Model Context Protocol). After an analysis, results can be sent to any registered MCP server — PostgreSQL, GitHub, Slack, Google Drive, and more — without writing any code.

WHAT IS MCP?

MCP (Model Context Protocol) is an open standard that allows AI applications to communicate with external tools and data sources via a unified interface.
C.R.A.I.G is an MCP Host + Client: it initiates connections to MCP Servers and calls their tools.
MCP Servers expose tools (e.g. insert_row, upload_file, create_issue) that C.R.A.I.G can invoke with the analysis JSON as payload.
C.R.A.I.G does not store analysis results on disk — all data flows to the external MCP server.

ARCHITECTURE

C.R.A.I.G (MCP Client)
  │
  ├──► supergateway (stdio → SSE bridge)
  │         └──► mcp-server-postgres ──► PostgreSQL (craig_resultats)
  │
  ├──► MCP Server GitHub   ──► Create issue / push file
  ├──► MCP Server Slack    ──► Post message to channel
  └──► MCP Server Drive    ──► Upload JSON to Google Drive
      

REGISTERING A CONNECTOR

Open the Lab sidebar and click the plug icon (MCP Connectors panel).
Click + ADD and enter a name and the SSE URL of the MCP server (e.g. http://127.0.0.1:8010/sse).
Click SAVE. The connector is stored in Redis and immediately available.
Connectors declared in appsettings.json are static (read-only). User-added connectors are marked custom and can be deleted.

SENDING A RESULT

Run an analysis in the Lab. Once a result is available, the MCP panel shows a green "Result ready" banner.
Each connector with status ok shows a SEND button.
Clicking SEND calls POST /mcp/send. The backend connects to the MCP server via SSE, lists available tools, and calls the first tool with the analysis JSON as argument.
The tool name and status are displayed below the connector card.

POSTGRESQL DEMO SETUP

Install a local MCP PostgreSQL server on the VM using supergateway (no Docker image required for the MCP layer).

# Install supergateway globally

npm install -g supergateway

# Create the results table in craig-postgres

docker exec craig-postgres psql -U craig_admin -d craig -c "

CREATE TABLE IF NOT EXISTS craig_resultats (

  id SERIAL PRIMARY KEY,

  document_nom TEXT,

  modele TEXT,

  question TEXT,

  resultat JSONB,

  created_at TIMESTAMP DEFAULT NOW()

);"

# Start the MCP server as a systemd service (port 8010)

supergateway --port 8010 \

  --stdio "npx -y @modelcontextprotocol/server-postgres \

  postgresql://craig_admin:PASSWORD@127.0.0.1:5432/craig"

API ENDPOINTS

Method	Path	Description
GET	`/mcp/status`	List all connectors (static + dynamic) with connectivity status.
GET	`/mcp/connectors`	List dynamic connectors stored in Redis.
POST	`/mcp/connectors`	Add a connector `{"name": "…", "url": "…"}`.
DELETE	`/mcp/connectors/{name}`	Remove a dynamic connector.
GET	`/mcp/tools/{name}`	List available tools on a connector.
POST	`/mcp/send`	Send analysis result to a connector `{"serverName": "…", "data": {…}}`.

REFERENCE

API REFERENCE

All endpoints exposed by craig-api-semantic (port 8001 / craig.api.weesoft.fr). All routes require a Bearer token except /health and the webhook receiver.

SUBMIT A JOB

POST /ask

Form data:

  image       File    PNG / JPEG image

  question    string  Natural language question

  modeleIA    string  llava | qwen2.5vl (default: llava)

  pipelineSaisi string  Optional custom pipeline JSON

Response:

  { "jobId": "uuid" }

POLL JOB STATUS

GET /jobs/{jobId}/status

Response (pending):

  { "status": "pending" }

Response (done):

  {

    "status": "done",

    "result": "{ pipeline, resultats, resultat_final }"

  }

BATCH PROCESSING

POST /process/batch

JSON body:

{

  "transactionId": "string",

  "webhookUrl": "https://your-endpoint/callback",

  "imagesBase64": ["base64...", "base64..."],

  "questionGlobale": "Extract all dates",

  "modeleIA": "llava"

}

OTHER ENDPOINTS

Method	Path	Description
GET	/health	Returns RabbitMQ + Redis connectivity status
GET	/resources	Returns available AI models and actions
POST	/jobs/{id}/cancel	Marks a job as cancelled in Redis
POST	/jobs/{id}/webhook	Internal — receives results from the worker

REFERENCE

INFRASTRUCTURE

C.R.A.I.G runs on a single GCP VM (europe-west1-b, Tesla T4 GPU) with Docker for data services, systemd for Python services, and optional MCP servers managed manually by the operator.

DOCKER SERVICES

Container	Image	Port
craig-postgres	postgres:15	5432 (localhost only)
craig-rabbit	rabbitmq:3-management	5672 / 15672
craig-redis	redis:7-alpine	6379
craig-ollama	ollama/ollama	11434
craig-prometheus	prom/prometheus	9090
craig-grafana	grafana/grafana	3000

SYSTEMD SERVICES

Unit	Command
craig-auth	uvicorn main:app --host 127.0.0.1 --port 8000
craig-api-semantic	uvicorn main:app --host 127.0.0.1 --port 8001
craig-api-vision	uvicorn vision_service:app --host 127.0.0.1 --port 8002
craig-worker	python start_worker.py
mcp-postgres (optional)	supergateway --port 8010 --stdio "npx -y @modelcontextprotocol/server-postgres …"

MONITORING

Prometheus scrapes metrics from all FastAPI services via the /metrics endpoint (prometheus-fastapi-instrumentator).
The worker exposes custom metrics on port 9092: craig_jobs_processed_total labelled by status (success/error).
Grafana is accessible at craig.monitoring.weesoft.fr. Authentication is delegated to craig-auth via Caddy forward_auth.

USEFUL COMMANDS

# Check all service statuses

systemctl status craig-auth craig-api-semantic craig-api-vision craig-worker

# View worker logs in real time

journalctl -u craig-worker -f

# Restart all Python services

systemctl restart craig-auth craig-api-semantic craig-api-vision craig-worker

# Pull a new Ollama model

docker exec craig-ollama ollama pull mistral

# Check MCP Postgres service

systemctl status mcp-postgres

# View all analysis results stored via MCP

docker exec craig-postgres psql -U craig_admin -d craig \

  -c "SELECT id, document_nom, question, created_at FROM craig_resultats ORDER BY created_at DESC LIMIT 10;"

OPERATIONS

BENCHMARK

The C.R.A.I.G benchmark suite runs the catalog pipelines against a reference set of fixtures and grades each result by structural correctness (JSON validity, field coverage) and ground-truth value matching.

LATEST RUN — 12 MAY 2026

88.7

Avg score / 100

21

Perfect (100)

2

Partial (60-99)

4

Weak (<60)

27 entries tested — 0 errors, 0 skipped — auto-retry active on 11 entries (avg 2.5 iterations).

PER-ENTRY RESULTS

Test	Category	Score	Coverage	Note
Bank Statement (2 months)	Finance	100	100%	—
Invoice Analysis (2 pages)	Accounting	97	100%	—
Contract Analysis (2 pages)	Legal	88	100%	—
Finishes Schedule (multi-page)	Technical	100	100%	—
CV / Resume (2 pages)	HR	80	60%	missing email, experience
Pay Slip (2 months)	HR	49	33%	missing employee_name, period
Detect Document Type (2 docs)	Sort	100	100%	—
Finance Pack (3 docs)	Sort	100	100%	—
Mixed Pack (CV + Contract + Payslip)	Sort	100	100%	—
Gift Card — Anonymize (2 cards)	Anonymization	70	100%	—
SEPA Transfer Order	Finance	100	100%	—
Purchase Order	Accounting	100	100%	—
Expense Report	Accounting	100	100%	—
Extrait KBIS	Legal	100	100%	—
Product Datasheet	Technical	100	100%	—
Invoice Reconciliation vs Database	MCP	100	100%	—
Lettre manuscrite	Manuscripts	100	100%	—
Carte régionale PACA	Cartography	100	100%	—
Carte de France	Cartography	100	100%	—
Personal Data — GDPR Anonymization	Anonymization	100	100%	—
Architectural Floor Plan	Technical	50	0%	missing project_name, scale, area
Relevé de compte bancaire (FR)	Finance	100	100%	—
Virement SEPA (FR)	Finance	100	100%	—
Analyse de facture (FR)	Accounting	100	100%	—
Plan d'appartement (FR)	Technical	63	33%	missing project_name, scale
Extract + Store in PostgreSQL	MCP	50	0%	missing meta fields
Full pipeline: OCR + Embed + Store	MCP	50	0%	missing meta fields

HOW TO RUN

From a machine with Python 3 and access to the API:

cd craig-api-semantic/benchmark

pip install httpx cairosvg

# Auth via JWT token

export CRAIG_TOKEN="<your_jwt>"

# Smoke test on MCP only (~4 min)

python run_benchmark.py --category MCP --include-mcp --output mcp_smoke.json

# Full benchmark with MCP (~25 min)

python run_benchmark.py --include-mcp --output full_with_mcp.json

The --include-hidden flag adds catalog entries marked hidden:true (skipped by default).

OPERATIONS

HARDWARE

C.R.A.I.G is fully self-hosted — no third-party AI APIs (OpenAI, Anthropic, etc.) required. All inference runs locally through Ollama on a GPU-equipped Linux host. The configuration below describes the minimum viable production setup, currently in use.

MINIMUM CONFIGURATION (CURRENT PRODUCTION)

Component	Specification	Notes
CPU	4 vCPU (Intel/AMD x86_64)	e.g. GCP `n1-standard-4`
RAM	15 GB	16 GB recommended
GPU	1× NVIDIA Tesla T4 — 16 GB VRAM	or equivalent: L4 (24 GB), A10, V100, RTX 3090/4090
Disk	100 GB SSD (pd-balanced)	~25 GB for models, ~30 GB for OS+containers
OS	Ubuntu 22.04 LTS (kernel 6.8+)	or Debian 12, RHEL 9
Network	Public IP (static recommended)	+ DNS for 4 subdomains, TLS auto via Caddy

SOFTWARE STACK

NVIDIA Driver — 535+ for T4, 550+ for L4
Docker + Compose — container runtime
nvidia-container-toolkit — GPU access from containers
Python 3.11 — backend services (managed via venv)
PostgreSQL 15 — application database (Docker)
Redis 7 — job state and dynamic MCP connectors (Docker)
RabbitMQ 3 — job queue for the worker (Docker)
Ollama 0.6+ — local LLM runtime (Docker, GPU-enabled)
Caddy 2 — reverse proxy with automatic Let's Encrypt TLS

AI MODELS (LOADED IN OLLAMA, ~25 GB TOTAL)

Model	Role	Size
`minicpm-v`	Vision (OCR, document understanding)	5.5 GB
`mistral-nemo:12b`	Text (structuring, delivery, chat)	7.1 GB
`qwen2.5vl`	Vision alternative (cartography, plans)	6.0 GB
`nomic-embed-text`	Semantic embeddings (RAG)	0.3 GB

EQUIVALENT CLOUD CONFIGURATIONS

Provider	Instance	GPU
Google Cloud (current)	`n1-standard-4` + T4	NVIDIA Tesla T4 (16 GB)
Google Cloud (upgrade)	`g2-standard-4` + L4	NVIDIA L4 (24 GB) — ~2× faster
AWS	`g4dn.xlarge`	NVIDIA Tesla T4 (16 GB)
Azure	`NC4as T4 v3`	NVIDIA Tesla T4 (16 GB)
OVH / Scaleway	GPU instance with T4 / L4	Equivalent EU options
On-premise	Workstation / server	RTX 3090/4090, A4000+, or 2× RTX 3060

COST REFERENCE (PREEMPTIBLE / SPOT)

The current production VM runs in preemptible mode on GCP (≈€80/month), trading stability (max 24h uptime, 30s termination notice) for ~65% cost savings.
A standard non-preemptible setup costs ≈€250/month for the same hardware.
Recommendation: preemptible is fine for staging/dev workloads or production with an auto-restart mechanism (Cloud Scheduler hitting the start endpoint every 5 minutes).

MODULE

PIPELINE BUILDER

Pipeline Builder is the unified workspace inside the Lab sidebar where you compose a C.R.A.I.G pipeline. It merges the conversational AI assistant ("Ask C.R.A.I.G") and the visual step editor (formerly "Pipeline Creator") into a single panel, accessible from the sidebar with a single icon.

TWO COMPLEMENTARY MODES

Chat mode — describe what you want to extract in plain language. The assistant generates a complete pipeline (sort, OCR, classify, structure_llm, embed, MCP…) and renders it as a chip list inside the conversation bubble.
Editor mode — the classic step-by-step visual editor. Add, remove, reorder steps; edit each step's parameters (model, engine, prompt, source, conditions).
A toggle at the top of the panel switches between the two modes at any time; the conversation history is preserved.

WORKFLOW

Persona	Typical workflow
Business user	Type the request in Chat mode → click `APPLY THIS PIPELINE`. Done.
Hybrid user	Generate a draft in Chat → click `EDIT` to switch to Editor mode with the pipeline pre-filled → adjust 1-2 steps → APPLY.
Power user	Open Editor mode directly → build the pipeline manually step by step (same UX as the legacy Pipeline Creator).

PIPELINE PREVIEW (CHAT MODE)

Each generated pipeline is shown as a colored chip list (e.g. 1. SORT · florence-2-base, 2. OCR · paddleocr, 3. CLASSIFY, 4. LLM · mistral-nemo) — friendlier than raw JSON.
The raw JSON is still available via a collapsible ▸ View JSON details block.
Two action buttons: EDIT (open in Editor mode) and APPLY THIS PIPELINE →.
Validation badge: green check (Pipeline ready) or red triangle (Pipeline invalid), plus the model that generated it.

CONVERSATIONAL REFINEMENT

The chat keeps full context: ask follow-up questions like "add a step to save the result in PostgreSQL via MCP" or "change step 4 to use qwen instead of mistral" and the assistant produces an updated pipeline.
The conversation can be cleared with the Reset button (top-right).

LEGAL

LICENSE

C.R.A.I.G is distributed under the PolyForm Noncommercial License 1.0.0, a source-available license that allows free personal, educational, research and other non-commercial use, while reserving commercial rights to WeeSoft.

AT A GLANCE

You may	You may not
Use, install, copy and modify C.R.A.I.G for personal, educational or research purposes.	Resell C.R.A.I.G or any modified version of it.
Self-host the platform inside your organisation for non-commercial use.	Embed C.R.A.I.G into a commercial product or paid service.
Distribute the source to others under the same license.	Offer C.R.A.I.G as a hosted SaaS to third parties.
Submit pull requests and improvements (under the same license).	Remove or alter the license notices in the code.

OFFICIAL LICENSE TEXT

The full, legally binding text of the PolyForm Noncommercial License 1.0.0 is published by the PolyForm Project. Always refer to the official document — the table above is a plain-English summary, not the legal text.

Official text — polyformproject.org/licenses/noncommercial/1.0.0
In the C.R.A.I.G repository — see the LICENSE file at the root.
License author — PolyForm Project (lead drafter: Heather Meeker).

COMMERCIAL USE

Any commercial use of C.R.A.I.G — including, but not limited to, reselling, embedding it inside a commercial product, or providing it as a hosted service to third parties — requires a separate commercial license from WeeSoft.

To request a commercial license, write to: contact@weesoft.fr.

PROFESSIONAL SERVICES (SEPARATE FROM LICENSING)

Deployment & installation on your infrastructure (on-premise or private cloud).
Configuration & integration with your existing systems (databases, ERPs, document repositories) via the MCP layer.
Custom pipelines tailored to your document types (invoices, contracts, technical plans, audio…).
Training & onboarding for your teams (Lab usage, prompt design, monitoring).
Production support with defined SLA, monitoring and incident response.

These services are billed independently of the underlying license and are available for both non-commercial and commercial users. Contact: contact@weesoft.fr.

CONTRIBUTING

Pull requests are welcome — by contributing you agree that your contribution is licensed under the same PolyForm Noncommercial License 1.0.0.
For larger changes, please open an issue first to discuss the scope.

DISCLAIMER

This page provides a high-level summary of the C.R.A.I.G licensing model. It is not legal advice and does not replace the official PolyForm Noncommercial License 1.0.0 text. In case of any discrepancy between this summary and the official license, the official license text prevails.