Hello, I’m new to Athena so I’m not sure if this is the appropriate place to post this.
I’m trying to import Azure data into Athena, so that it can be ingested downstream by AWS Quicksight. I’ve downloaded a usage details .csv file from Azure, placed it in S3, and am trying to create an external table in Athena, based on this .csv file.
Here’s the error I receive when trying to create an external table based on .csv file (when clicking on the explain button within Athena):
line 1:48: mismatched input 'EXTERNAL'. Expecting: 'MATERIALIZED', 'OR', 'ROLE', 'SCHEMA', 'TABLE', 'VIEW'
This query ran against the "<masking my database name>" database, unless qualified by the query.
I’ve tried executing this both through a manually written SQL statement and through the “Create table from S3 bucket data” feature. Both leading to the above error. I even tried converting all data types to string (thinking it could be some kind of data type mismatch), but to no avail.
I’d greatly appreciate any help in troubleshooting this, as the error message feels vague.
My create table statement (I masked the DB name and S3 location for confidentiality reasons):
CREATE EXTERNAL TABLE IF NOT EXISTS <masked for reasons of confidentiality>
.azure_usage_apr_2023
(
BillingAccountId
string,
BillingAccountName
string,
BillingPeriodStartDate
date,
BillingPeriodEndDate
date,
BillingProfileId
string,
BillingProfileName
string,
AccountOwnerId
string,
AccountName
string,
SubscriptionId
string,
SubscriptionName
string,
Date
date,
Product
string,
PartNumber
string,
MeterId
string,
ServiceFamily
string,
MeterCategory
string,
MeterSubCategory
string,
MeterRegion
string,
MeterName
string,
Quantity
double,
EffectivePrice
double,
Cost
double,
UnitPrice
double,
BillingCurrency
string,
ResourceLocation
string,
AvailabilityZone
string,
ConsumedService
string,
ResourceId
string,
ResourceName
string,
ServiceInfo1
string,
ServiceInfo2
string,
AdditionalInfo
string,
Tags
string,
InvoiceSectionId
date,
InvoiceSection
string,
CostCenter
string,
UnitOfMeasure
string,
ResourceGroup
string,
ReservationId
string,
ReservationName
string,
ProductOrderId
string,
ProductOrderName
string,
OfferId
string,
IsAzureCreditEligible
string,
Term
string,
PublisherName
string,
PlanName
string,
ChargeType
string,
Frequency
string,
PublisherType
string,
PayGPrice
double,
PricingModel
string,
CostAllocationRuleName
string,
benefitId
string,
benefitName
string
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES ('field.delim' = ',')
STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '<S3 location is masked for reasons of confidentiality>'
TBLPROPERTIES (
'classification' = 'csv',
'skip.header.line.count' = '1'
);
Column headers of .csv file (I’m not posting data for reasons of confidentiality – it’s a standard Azure usage details format):
BillingAccountId,BillingAccountName,BillingPeriodStartDate,BillingPeriodEndDate,BillingProfileId,BillingProfileName,AccountOwnerId,AccountName,SubscriptionId,SubscriptionName,Date,Product,PartNumber,MeterId,ServiceFamily,MeterCategory,MeterSubCategory,MeterRegion,MeterName,Quantity,EffectivePrice,Cost,UnitPrice,BillingCurrency,ResourceLocation,AvailabilityZone,ConsumedService,ResourceId,ResourceName,ServiceInfo1,ServiceInfo2,AdditionalInfo,Tags,InvoiceSectionId,InvoiceSection,CostCenter,UnitOfMeasure,ResourceGroup,ReservationId,ReservationName,ProductOrderId,ProductOrderName,OfferId,IsAzureCreditEligible,Term,PublisherName,PlanName,ChargeType,Frequency,PublisherType,PayGPrice,PricingModel,CostAllocationRuleName,benefitId,benefitName