Temp space used up in Aurora PostgreSQL whilst generating Index concurrently

0

I use a Aurora PGSQL cluster ( 4 nodes in total ). The database is partitioned by month, with the largest partition for that table being around 1.3TB of data. One of the columns within the table is a JSONB type. I'm wanting to enable GIN indexing on the column so that I query by fields within the JSONB object.

I am creating the GIN Index concurrently as to not affect live traffic. I have been able to create a GIN Index within the QA environment because the data is relatively small. However when I try to create the GIN index within production, the server runs out of temp storage whilst building that index ( see here for list of temp storage available per node size ).

An easy solution for this would be to say 'just scale up the nodes within the cluster', that way there is more temp space to build the GIN index in. This would likely work, but it seems a little overkill. If I scaled the nodes up just to GIN index creation, then scaled them down, I would be in a position whereby there is not currently sufficient hardware to rebuild that GIN Index should it need rebuilding - this seems like a smell in production..

If I scaled the instances up and left them scaled up, the instances would be massively overprovisioned and it would be very expensive.

I'm curious as to if there is any workaround for this temp space issue so that I would not have to scale up the servers so drastically. Also if I scaled the servers up, then scaled them down after the GIN indexing completes, would this be risky, is there any reason why the GIN index would have to completely rebuild after the initial build.

Thanks

asked 2 years ago1700 views
2 Answers
3

As correctly mentioned by you, the amount of temporary storage is linked to the Aurora PostgreSQL DB instance class and the maximum temporary storage corresponding to the Aurora PostgreSQL DB instance classes are stated in the document below-

[+] Temporary storage limits for Aurora PostgreSQL: https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraPostgreSQL.Managing.html#AuroraPostgreSQL.Managing.TempStorage

Furthermore, any temp operations including Indexing in PostgreSQL, needing more than this allocated space for temp operation will fail with an error similar to “SQL Error [xxxx]: ERROR: could not write block xxxx of temporary file: No space left on device”.

The suggested workaround for the same is to plan to temporarily increase the DB instance type from current to a higher DB instance class, having local storage space available for temp operation depending on the amount of temp space needed for index creation activity. Later you can downscale, if this is a one time activity.

In case this is an ongoing activity, then you will need to plan for higher instance type in longer run. For this purpose, please refer to the instance type wise local storage allocation specified in the above document for temporary storage limits.

However, I understand that you are concerned about over-provisioning the resources and as a result over-spending on the same.

Having said that, I would recommend you to opt for Amazon Aurora Serverless V2 cluster, which is an on-demand, autoscaling configuration for Amazon Aurora. As mentioned in the our official document for Amazon Aurora, Aurora Serverless v2 helps to automate the processes of monitoring the workload and adjusting the capacity for your databases, where the capacity is adjusted automatically based on application demand.

Therefore, you are charged only for the resources that your DB clusters consume. This would help you to stay within budget and avoid paying for computer resources that you don't use.

Please refer to the following document for further information on the same,

[+] Using Aurora Serverless v2: https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/aurora-serverless-v2.html

I would strongly recommend you to test your database and workload against Amazon Serverless V2 cluster, in order to understand the configurations suitable for your environment.

Additionally, please consider specifying an appropriate value for the ‘work_mem’ DB parameter. As you might be aware, it sets the base maximum amount of memory to be used by a query operation before writing to temporary disk files. Kindly review the following document for more details-

[+] Local storage space is used by temporary tables or files: https://aws.amazon.com/premiumsupport/knowledge-center/postgresql-aurora-storage-issue/#Local_storage_space_is_used_by_temporary_tables_or_files

AWS
SUPPORT ENGINEER
Ruchi_P
answered 2 years ago
  • Thanks for confirming that. Just to follow up on my last question:

    "Also if I scaled the servers up, then scaled them down after the GIN indexing completes, would this be risky, is there any reason why the GIN index would have to completely rebuild after the initial build."

    Would you say it's a risk to be running in production with instance sizes that are not capable of reproducing the current state/functionality of the database. The second part of the question "any reason", please read that as 'any automated task' that would run to try rebuild the index automatically ( and subsequently fail due to temp space issue ) as opposed to a maintenance task specifically trigger by an operator/engineer.

    Thanks

0

This is in response to your follow-up query:

As you might already be aware, there could be several instances where you would require reindexing your database tables and I have stated these below-

  • If an index becomes corrupted.
  • If the index consists of many empty or nearly-empty pages and as a result becomes bloated.
  • If you configure a storage parameter and want to ensure that the changes are in effect.
  • If an index build fails.

In all of the above scenarios, the recommended solution is to analyze the situation, provision appropriate resources and run REINDEX accordingly.

This has been explained in-depth on the official PostgreSQL documentation below-

[+] REINDEX: https://www.postgresql.org/docs/current/sql-reindex.html

Upon further research, I found that with GIN index, for avoiding long running bulk insertions, you may be required to drop the GIN index first and then perform reindexing as per the PostgreSQL documentation,

[+] GIN Tips and Tricks: https://www.postgresql.org/docs/current/gin-tips.html

In a nutshell, aforementioned are the known scenarios when reindexing maybe required. The occurrence of such situations is dependent on your database and database workload. Hence, please consider testing GIN indexes in your staging environment prior to implementing the same in your production database.

Furthermore, it is suggested to involve your in-house DBA to assist on rebuilding the indexes in such cases, as this falls out of scope of the AWS support. I would like to mention that due to AWS Data Security Policy and Shared Responsibility model, we don't have access or visibility into customer's databases, data, code and workload or queries, therefore, we mainly support RDS at its service and infrastructure level. For further information on this, kindly refer to the following links-

[+] AWS Support scope - https://aws.amazon.com/compliance/services-in-scope/

[+] Shared Responsibility Model: https://aws.amazon.com/compliance/shared-responsibility-model/

Regarding your query related to automated tasks rebuilding an index, unless programmed explicitly for rebuilding an index, there should not be any reason for it to trigger REINDEX.

AWS
SUPPORT ENGINEER
Ruchi_P
answered 2 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions