Services

On-premise infrastructure: your own compute for AI and applications

Pimento designs and builds company compute infrastructure for AI models - from GPU selection and architecture, through Kubernetes, virtualization and MLOps, to backup, disaster recovery and maintenance. From plan to a running server.

What hardware does an in-house LLM need?

Hardware requirements depend mostly on model size: the more parameters, the more GPU memory is needed for efficient operation. Smaller models run on a single card, larger ones need a server with several GPUs. We match the configuration to the model, the number of users and the budget - without oversizing.

Kubernetes, Docker, virtualization - what does the stack look like?

We build on containers as the default: Docker for packaging services, Kubernetes for orchestration and scaling, virtualization where environment isolation is needed. This stack lets the system be updated and scaled without downtime.

On-premise MLOps - how are models maintained locally?

MLOps covers the processes that keep models healthy in production: versioning of models and data, automated deployments, monitoring of answer quality and performance. We set these up fully on-premise, so the model lifecycle needs no external services.

Backup and disaster recovery

We design backups and recovery procedures for the whole stack - from data and configuration to models. We agree acceptable recovery times (RTO/RPO) and test the procedures, so a hardware failure doesn't stop the business.

Own server or cloud API - how to compare costs?

An on-premise server is an upfront investment with a predictable maintenance cost; a cloud API is pay-per-use that grows with scale. With a steady, high query volume your own infrastructure usually comes out ahead over a few years - and gives full control over your data. We help you run the numbers for both scenarios.

Questions about this service

Yes - many deployments start with a single GPU server handling the proof of concept and first users. We design the architecture so it can scale without a rebuild from scratch.

GPU servers have higher power and thermal requirements than typical office hardware - as part of the project we check the server-room conditions and advise on changes, or consider colocation when that's the better option.

Often yes - sensitive data and the model can stay on-premise while less critical workloads run in the cloud. A hybrid combines control over data with flexibility and is often the best cost compromise.

Yes - we offer maintenance: updates, monitoring, backup and incident response. Scope and SLA are agreed in a maintenance contract.

Let's talk about your project

A free consultation - no strings attached, focused on your case.