- Newest
- Most votes
- Most comments
Hii,
The error you're encountering is due to the SageMaker Notebook instance trying to access a Docker repository that is not available or accessible. This is likely because the SageMaker Notebook instance is running on Amazon Linux, which is different from CentOS.
1.Remove the Docker repository: First, remove the Docker repository that is causing the error by running the following command:
$ sudo rm -rf /etc/yum.repos.d/docker.repo
2.Clean the package manager cache: Next, clean the package manager cache by running the following command:
$ sudo yum clean all
3.Update the package manager: Update the package manager by running the following command:
$ sudo yum update -y
After completing these steps, you should be able to proceed with the installation of Tesseract OCR as described in my previous response.
If you still encounter issues, you can try installing Tesseract OCR using the Amazon Linux Extra packages repository. Follow these steps:
1.Enable the Amazon Linux Extra packages repository: Run the following command to enable the Amazon Linux Extra packages repository:
$ sudo amazon-linux-extras install epel
2.Install Tesseract OCR: After enabling the repository, you should be able to install Tesseract OCR by running the following command:
$ sudo yum install tesseract
This should install Tesseract OCR and its dependencies on your SageMaker Notebook instance.
Links:
Amazon Linux 2 Extras Library (for installing Tesseract OCR): https://aws.amazon.com/premiumsupport/knowledge-center/ec2-install-extras-repository/
To upgrade Tesseract OCR to versions 3.05 or higher on Amazon Linux 2, you'll need to compile it from source, as the default repositories don't provide newer versions. Here's a detailed steps on how to do this:
- First, remove the existing Tesseract installation:
sudo yum remove tesseract
- Install the necessary dependencies:
sudo yum install -y autoconf automake libtool libpng-devel libjpeg-devel libtiff-devel zlib-devel libwebp-devel gcc-c++ make
- Install Leptonica, which is required for Tesseract:
wget http://www.leptonica.org/source/leptonica-x.xx.x.tar.gz
tar -xzvf leptonica-x.xx.x.tar.gz
cd leptonica-x.xx.x
./configure
make
sudo make install
sudo ldconfig
pkg-config --modversion lept
export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig:$PKG_CONFIG_PATH
- Download and compile Tesseract from source:
wget https://github.com/tesseract-ocr/tesseract/archive/4.1.1.tar.gz
tar xzvf 4.1.1.tar.gz
cd tesseract-4.1.1
./autogen.sh
export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig:$PKG_CONFIG_PATH
./configure
make
sudo make install
sudo ldconfig
- Verify the installation:
tesseract --version
This should show the newly installed version of Tesseract.
Hi, I have installed leptonica 1.82.0 and i got some version errors when installing tesseract 4.1.1. Also I have installed leptonica 1.83.0 and when installing tesseract 5.0.0, I get some installation errors. I cannot paste the entire error details here due to size limitation. Could you please verify whether the commands are working fine and can help me installing tesseract 5 or above?
Hi,
You install to add Repel to the repos to be accessed by yum to access terreract-ocr:
rpm -Uvh https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
yum -y update
Then, yum install should succeed.
To make that the repo above was added, you can also run yum repolist
to get the list of repos known by yum after your command.
Best,
Didier
Hi,
Running the above command returns the following error:
sh-4.2$ rpm -Uvh https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm Retrieving https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm curl: (22) The requested URL returned error: 404 error: skipping https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm - transfer failed
Thankyou! The solution is working.
welcome...
One more issue! With the above commands i am able to install tesseract 3.04. How can I upgrade it to versions > 3.05 or >4.*. I have tried to update it using sudo yum update tesseract, but it did not work.