How To Install TextGen WebUI on AWS

4 minutos de lectura
Nivel de contenido: Intermedio

There are so many open source LLMs and sometimes we wish we could try many of them with a single unified interface. This guide shows you how to install text-generation-webui from oobabooga on AWS.

Text Generation Web UI is a new tool which allows to effortlessly run multiple Large Language Models in the same instance. It helps anyone to easily run models using HuggingFace Tranformers on GPU but also models reduced in size through GPTQ or even running on CPU (such as llama.cpp) from a single UI. In this step by step guide I will walk you through how to install it on an EC2 on AWS for testing different models.

TextGeneration UI with Falcon-40B


For this short tutorial, we will be using:

  • An AWS Account
  • A Deep Learning AMI based on Ubuntu 20.04 and Pytorch 2.0.
  • An EC2 instance GPU based instance like g5 instances, with enough disk to hold multiple LLMs.

You can do this using the console. If you prefer to use Cloudformation template you can use the following:

AWSTemplateFormatVersion: '2010-09-09'
Description: CloudFormation template for launching EC2 instance with Deep Learning Image and text generation web UI setup.

    Description: IP address range that is allowed SSH access to the EC2 instance (
    Type: String
    MinLength: '9'
    MaxLength: '18'
    AllowedPattern: '(\d{1,3})\.(\d{1,3})\.(\d{1,3})\.(\d{1,3})/(\d{1,2})'
    ConstraintDescription: Must be a valid IP CIDR range of the form x.x.x.x/x.

    Description: AMI ID for the EC2 instance (default is for us-west-2 region)
    Type: String
    Default: ami-05ac04cf9d9989c1d

    Description: EC2 instance type
    Type: String
      - g5.12xlarge
      - g5.2xlarge
    Default: g5.12xlarge

    Description: Name of an existing EC2 key pair to enable SSH access
    Type: AWS::EC2::KeyPair::KeyName

    Type: AWS::EC2::Instance
      InstanceType: !Ref InstanceType
      ImageId: !Ref AmiId
      KeyName: !Ref KeyPairName
        - !Ref MySecurityGroup
        - DeviceName: /dev/sda1
            VolumeType: gp3
            VolumeSize: 1024 # 1TB
        Fn::Base64: !Sub |
          sudo apt update
          sudo apt upgrade -y
    Type: AWS::EC2::SecurityGroup
      GroupDescription: Security group for EC2 instance with text generation web UI
        - IpProtocol: tcp
          FromPort: 22
          ToPort: 22
          CidrIp: !Ref YourIpAddress

    Description: The Instance ID
    Value: !Ref MyEC2Instance

    Description: Public DNS of the EC2 instance
    Value: !GetAtt MyEC2Instance.PublicDnsName

    Description: Public IP address of the EC2 instance
    Value: !GetAtt MyEC2Instance.PublicIp

Once you have your instance setup, you can connect to the instance using ssh

ssh -i "<my_key.pem>" ubuntu@<public_ip>

Once you are inside the machine, you can follow the steps described in the official github page:

conda create -n textgen python=3.10.9
conda activate textgen
conda init bash
pip install torch torchvision torchaudio
git clone
cd text-generation-webui
pip install -r requirements.txt
pip uninstall -y llama-cpp-python
CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python --no-cache-dir
python facebook/opt-1.3b

If you use the same instance of the example (g5.12xlarge) you can use the following to get a public web URL for 72 hours

python --auto-devices --gpu-memory 24 24 24 24 --cpu-memory 186 --share

From within the UI you can now try many models from huggingface. These are some which I tried successfully from thebloke user on huggingface :

  • TheBloke/falcon-40b-instruct-GPTQ
  • TheBloke/guanaco-65B-GPTQ
  • TheBloke/WizardCoder-15B-1.0-GPTQ
  • TheBloke/vicuna-13b-v1.3-GPTQ
  • TheBloke/LLaMa-65B-GPTQ-3bit

Enter image description here

If you want to see it is actually using the GPUs and how much GPU memory these are using you can install nvtop:

sudo apt install nvtop


TextGeneration-WebUI is an interesting project which can help you test multiple LLMs using AWS insfrastructure. For more advanced use cases or to run these into production I would recommend to check Sagemaker Jumpstart and Sagemaker Endpoints .