- Newest
- Most votes
- Most comments
Direct answer to your summary:
No, you cannot use the "drawing" tool independently of Queries.
The purpose of a Textract Dataset is specifically to train an Adapter, which is designed to improve Query performance. The drawing process is strictly tied to a Query; you are essentially telling the model: "This specific area is the answer to Query X."
If you do not want to use Queries and prefer not to manually draw boxes:
- Skip the Dataset/Adapter process entirely.
- Instead, use the standard AnalyzeDocument API with the FORMS feature.
- FORMS automatically detects key-value pairs without any training or manual "drawing" required.
In short: You only have to repeat the drawing process if you specifically need an Adapter to improve Query accuracy for complex documents. For standard field extraction, training is not necessary.
When creating a Textract dataset for training an adapter, queries are indeed the primary focus. The dataset is specifically designed to train the adapter on how to respond to queries you define for your documents.
Regarding form fields: The dataset creation process for adapters focuses on queries and their responses, not on form field extraction. Form field extraction (key-value pairs) is a separate feature in Textract that doesn't require custom training through adapters. When you annotate documents for adapter training, you're establishing the "ground truth" by linking queries to their answers in your documents.
For the annotation process, you have two options:
-
Auto-labeling: You specify your queries, and Textract attempts to automatically extract and annotate the answers from your documents. You then verify and correct these annotations as needed.
-
Manual labeling: You create queries and manually link them to the relevant answers in your documents.
You don't need to draw boxes for every field 500 times in your scenario. Instead, you define queries (questions) for the information you want to extract, and then either let Textract auto-label the answers or manually indicate where the answers are located in your documents. You need a minimum of 5 training documents and 5 testing documents to create an adapter.
The key is that you're teaching the model what information is important by linking queries to answers across your sample documents, not manually marking every instance of every field.
Sources
Preparing training and testing datasets - Amazon Textract
Customizing your Queries Responses - Amazon Textract
So, to summarize, do I have to repeat the "drawing" process 5 times if I just want to use the draw tool to indicate the fields and not actual queries?
Relevant content
- AWS OFFICIALUpdated 3 years ago

If my answer was helpful, I would appreciate it if you could mark it as the accepted answer.