There are four reasons why a crawler would create separate tables:
The source files might not be having the same type of file (CSV, parquet or JSON) Please check whether the source files in your given folder are of same type or not.
The source files might be of different compression types (snappy, gzip, bzip2) Make sure whether the compression types are of the same type for your source files.
The source files might not be having the same schema. As said in the earlier answer, for the crawler to detect a single schema for both the folders, the threshold of 70% must be met. That means, the similarity of schema between both the sources must be atleast 70%.
The structure of Amazon S3 storage partitions are different for both of the files. *I do not think this might be an issue because the structure of S3 partitions is almost the same. But please do check this.
You can know the exact reason behind your crawler creating multiple tables by checking through your crawler logs. You can login to your console -> select your crawler -> choose logs to view the logs of your crawler.
For more details please follow this article.
For the schemas to be defined as one schema, there needs to be similarity and a threshold limit. See the examples in the below reference link that talk about partition threshold higher than 70%. In your case, I am assuming there are only 2 schemas for each version that you have.
The crawler infers the schema at folder level and compares the schemas across all folders. If the schemas that are compared match, that is, if the partition threshold is higher than 70%, then the schemas are denoted as partitions of a table. If they don’t match, then the crawler creates a table for each folder, resulting in a higher number of tables.
Does an S3 bucket have to be provided for a Braket DWave run?asked 2 years ago
What should be the correct Exclude Pattern and Table level when dealing with folders with different names?Accepted Answerasked a month ago
S3 root user don't have permissionsasked 16 days ago
Cloudformation - Check if S3 folder exists and apply policy to the specific folderAccepted Answerasked 6 months ago
Can an Glue Crawler use a S3 Lambda Access Point as a data store?asked 3 months ago
Store csv data from s3 bucket automatically inside timestreamasked 6 months ago
CloudFront's "Origin Path" does not work with S3 Static Website's 302 redirectsAccepted Answerasked 2 months ago
"Grouping Behavior for S3 Data" on Crawler does not work.asked 2 months ago
s3 - date folder structure for etl data processes (airflow, etc)asked 17 days ago
I need to read S3 data, transform and put into Data Catalog. Should I be using a Crawler?Accepted Answerasked 6 months ago