AYITA System Deployment Guide

AYITA can be deployed as a local AI assistant in a privacy-preserving environment. This guide provides step-by-step instructions for setting up AYITA using Docker or through manual installation.

System Requirements

AYITA requires sufficient computing resources to run LLMs, RAG processing, and fine-tuning workflows efficiently.

**Minimum & Recommended System Requirements**
Component	Minimum Requirements	Recommended Configuration
CPU	4 Cores (x86-64)	8+ Cores (x86-64, ARM64)
GPU	Optional (for CPU inference)	NVIDIA GPU (RTX 3090/4090, A100) with CUDA support
RAM	16GB	32GB+
VRAM (GPU Memory)	8GB	24GB+
Storage	50GB SSD	200GB+ SSD/NVMe (for large LLMs)
OS	Linux / macOS / Windows (WSL2)	Ubuntu 22.04 / macOS 13+ / Windows Server (with WSL2)
Docker Support	Required for containerized deployment	Recommended for isolated environments

Deployment via Docker

The easiest way to deploy AYITA is by using Docker, which isolates dependencies and simplifies installation.

Step 1: Install Docker & Docker Compose

Linux/macOS: .. code-block:: bash

sudo apt update && sudo apt install -y docker docker-compose
Windows: Download Docker Desktop from https://www.docker.com/products/docker-desktop/

Step 2: Pull the AYITA Docker Image

docker pull realmdata/ayita:latest

Step 3: Run AYITA as a Container

docker run --gpus all -p 8000:8000 -d realmdata/ayita

Once running, AYITA will be available at:: http://localhost:8000

Manual Installation

For users who prefer to run AYITA without Docker, a manual setup is required.

Step 1: Install Dependencies

AYITA depends on Python, Haystack, and various AI libraries. Run the following to install requirements:

sudo apt update && sudo apt install -y python3 python3-venv
python3 -m venv ayita-env
source ayita-env/bin/activate
pip install --upgrade pip
pip install farm-haystack transformers torch sentence-transformers

Step 2: Download AYITA Source Code

Clone the AYITA repository and install it:

git clone https://github.com/realmdata/ayita.git
cd ayita
pip install -r requirements.txt

Step 3: Launch AYITA

python run.py --gpu

AYITA will start at http://localhost:8000.

Configuring Local Models

By default, AYITA supports local LLMs, allowing fully private execution.

Supported Model Backends:

GPT-J / GPT-NeoX / Llama 2 (via transformers)
Mistral / Falcon (optimized for small VRAM usage)
Fine-tuned Models using PEFT (Parameter Efficient Fine-Tuning)

To specify a custom model, use:

python run.py --model llama-2-13b --gpu

For embedding-based retrieval (RAG):

python run.py --use-rag

Managing Memory & Optimization

AYITA supports memory-efficient deployment, including:

LoRA / PEFT fine-tuning for adapting LLMs with minimal VRAM.
Quantization (bitsandbytes) to run large models on consumer GPUs.
Streaming RAG responses for improved performance.

To enable quantization:

python run.py --quantization 8bit

To enable fast RAG processing:

python run.py --rag-cache

Running AYITA in a Private Network

For enterprise users, AYITA can be deployed on an internal network with authentication:

docker run -p 8000:8000 -e AUTH=true realmdata/ayita

Access will require a username/password, preventing unauthorized usage.

Next Steps

AYITA is now ready to use! 🎉 For further details:

[Developer Guide](developer_guide.html) – Learn about APIs and external integrations. [AYITA Use Cases](use_cases.html) – Explore real-world applications. [Fine-Tuning Guide](fine_tuning.html) – Customize AYITA for your needs.

—