AYITA System Deployment Guide

AYITA can be deployed as a local AI assistant in a privacy-preserving environment. This guide provides step-by-step instructions for setting up AYITA using Docker or through manual installation.

System Requirements

AYITA requires sufficient computing resources to run LLMs, RAG processing, and fine-tuning workflows efficiently.

Minimum & Recommended System Requirements

Component

Minimum Requirements

Recommended Configuration

CPU

4 Cores (x86-64)

8+ Cores (x86-64, ARM64)

GPU

Optional (for CPU inference)

NVIDIA GPU (RTX 3090/4090, A100) with CUDA support

RAM

16GB

32GB+

VRAM (GPU Memory)

8GB

24GB+

Storage

50GB SSD

200GB+ SSD/NVMe (for large LLMs)

OS

Linux / macOS / Windows (WSL2)

Ubuntu 22.04 / macOS 13+ / Windows Server (with WSL2)

Docker Support

Required for containerized deployment

Recommended for isolated environments

Deployment via Docker

The easiest way to deploy AYITA is by using Docker, which isolates dependencies and simplifies installation.

Step 1: Install Docker & Docker Compose

Step 2: Pull the AYITA Docker Image

docker pull realmdata/ayita:latest

Step 3: Run AYITA as a Container

docker run --gpus all -p 8000:8000 -d realmdata/ayita
Once running, AYITA will be available at:

http://localhost:8000

Manual Installation

For users who prefer to run AYITA without Docker, a manual setup is required.

Step 1: Install Dependencies

AYITA depends on Python, Haystack, and various AI libraries. Run the following to install requirements:

sudo apt update && sudo apt install -y python3 python3-venv
python3 -m venv ayita-env
source ayita-env/bin/activate
pip install --upgrade pip
pip install farm-haystack transformers torch sentence-transformers

Step 2: Download AYITA Source Code

Clone the AYITA repository and install it:

git clone https://github.com/realmdata/ayita.git
cd ayita
pip install -r requirements.txt

Step 3: Launch AYITA

python run.py --gpu

AYITA will start at http://localhost:8000.

Configuring Local Models

By default, AYITA supports local LLMs, allowing fully private execution.

Supported Model Backends:

  • GPT-J / GPT-NeoX / Llama 2 (via transformers)

  • Mistral / Falcon (optimized for small VRAM usage)

  • Fine-tuned Models using PEFT (Parameter Efficient Fine-Tuning)

To specify a custom model, use:

python run.py --model llama-2-13b --gpu

For embedding-based retrieval (RAG):

python run.py --use-rag

Managing Memory & Optimization

AYITA supports memory-efficient deployment, including:

  • LoRA / PEFT fine-tuning for adapting LLMs with minimal VRAM.

  • Quantization (bitsandbytes) to run large models on consumer GPUs.

  • Streaming RAG responses for improved performance.

To enable quantization:

python run.py --quantization 8bit

To enable fast RAG processing:

python run.py --rag-cache

Running AYITA in a Private Network

For enterprise users, AYITA can be deployed on an internal network with authentication:

docker run -p 8000:8000 -e AUTH=true realmdata/ayita

Access will require a username/password, preventing unauthorized usage.

Next Steps

AYITA is now ready to use! 🎉 For further details:

[Developer Guide](developer_guide.html) – Learn about APIs and external integrations. [AYITA Use Cases](use_cases.html) – Explore real-world applications. [Fine-Tuning Guide](fine_tuning.html) – Customize AYITA for your needs.