'orc.column.index.access'='false' for athena table

0

Hello.

I have a Athena table partitioned by 'client_id' & 'dt' and data format is orc.

try to add a new column into this table, so want to set "orc.column.index.access" to false.
(I know that default is true)

wonder that this "orc.column.index.access" option affects query performance for table?

trying to find if performance is changed or not by changing that option, but couldn't.

looking forward to answering.

Thanks

reikim
asked 4 years ago531 views
1 Answer
0

No, this shouldn't affect performance if configured either way, basically one just does an index lookup and the other uses a map lookup by name and this is performed at a high level once per file (relevant Presto source - https://github.com/prestodb/presto/blob/6647e13f64883f7cfa89221d91b981bcc3a57618/presto-hive/src/main/java/com/facebook/presto/hive/HiveUtil.java#L976)

More interesting is the functional difference. I'm curious why you set this to false to add a column? You can add a column with it set to true. Generally, having this set to true (map by name) is easier to manage. The one area that having it set to false would be useful is if you wanted to rename a column without having to rewrite the data since it's not bound to the name but rather the index. Problem areas though that you have to write data in the same order as the metadata definition even if a column is removed. Generally for most schema evolution use cases (added or removing columns) mapping by name is easier to work with. Some discussion about this in the Presto repo since actually in standard Presto the default is false (index lookup) but Athena internally has its own flags for this it's managing (with different defaults) - https://github.com/prestodb/presto/issues/8911

rrupp
answered 4 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions