Web crawler on Amazon Q

0

Hi, I am trying to crawl a website by entering the Source sitemaps on Amazon Q. Is there a way I can check the web pages crawled on Q?

Enter image description here

feita há 2 meses190 visualizações
1 Resposta
0

Hi,

Yes, there is a way: use the script q_list_documents.py in my repo: https://github.com/didier-durand/qstensils

See doc at https://github.com/didier-durand/qstensils/blob/main/doc/q_list_documents.md

Feel free to re-use and share further: it is under permissive MIT license.

This script will allow you to locate documents in trouble with indexing like the second one below:

{
        "createdAt": "2024-02-21 11:31:00.422000+01:00",
        "documentId": "s3://bucket-name/Togo.json",
        "error": {},
        "status": "INDEXED",
        "updatedAt": "2024-02-21 11:47:09.220000+01:00"
    },
    {
        "createdAt": "2024-02-21 11:31:00.709000+01:00",
        "documentId": "s3://bucket-name/What Ever Happened to Baby Jane?.json",
        "error": {},
        "status": "DOCUMENT_FAILED_TO_INDEX",
        "updatedAt": "2024-02-21 11:47:46.031000+01:00"
    },
    {
        "createdAt": "2024-02-21 11:31:00.698000+01:00",
        "documentId": "s3://bucket-name/Vicky Donor.json",
        "error": {},
        "status": "INDEXED",
        "updatedAt": "2024-02-21 11:47:53.677000+01:00"
    }

Best,

Didier

profile pictureAWS
ESPECIALISTA
respondido há 2 meses
profile picture
ESPECIALISTA
avaliado há 2 meses
  • Thanks a ton for a quick response Didier. Not a full time developer, but will try it out :-)

Você não está conectado. Fazer login para postar uma resposta.

Uma boa resposta responde claramente à pergunta, dá feedback construtivo e incentiva o crescimento profissional de quem perguntou.

Diretrizes para responder a perguntas