TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor ...
On-device, CPU-only, fully offline translation from spoken English to spoken Hindi. Runs on ARM Android (arm64-v8a, NEON) and x86/Linux (host build for development). No cloud; no Python at runtime.