By using AWS re:Post, you agree to the AWS re:Post Terms of Use

In RDS, how are temp_bytes and FreeLocalStorage related?

0

Enter image description here

Both queries are the same except WHERE entity in (different identifier) as the first step in a CTE. The entity in 2nd query had 2x the number of rows as another entity in the 1st query

I will describe left 2 peaks because 3rd is same as 1st.

temp_bytes:

  • The peaks are at (12.14UTC, 13.14UTC) with values (172mil, 330mil).

FreeLocalStorage:

  • Starting from 133.687bil, the valleys reach 123.346 bil (~10 mil drop), 113.865 bil (~20 mil drop)

Based on the graph, i can see temp_bytes always spiking when the query ends, which is also when FreeLocalStorage goes back to full amount of 133 billion bytes.

Questions

  1. How is y-value of temp_bytes peak related to amount of drop in FreeLocalStorage?
  2. Why does temp_bytes only peak at end of query instead of go up immediately upon query start
  3. Is it likely that the doubling of height in temp_bytes from query 1 to 2 and doubling of drop in FreeLocalStorage is due to 2nd entity having double number of rows as 1st entity after the 1st query in the CTE finishes its filtering?
  4. What is the value of knowing temp_bytes and FreeLocalStorage? What decisions do they impact? If I see the temp_bytes spike is too high or FreeLocalStorage drop is too low what is the implication and what can I do?
  5. Is FreeLocalStorage the postgres temp directory on disk? I know increasing instance type to get higher FreeLocalStorage, is a solution to "ERROR: could not write to file "base/pgsql_tmp/pgsql_tmp32058.285": No space left on device" which happens when work_mem is low. This partially answers question 4.
  6. How do I predict the amount of temp_bytes or FreeLocalStorage from a query plan? If Question 3 is True, can predict query 2 metrics given query 1 metrics, and relative number of rows with query 1. However, the ideal is not to depend on other queries.

P.S Setting work_mem higher may lead to Out of memory errors, thus a low value in question 5

asked 6 months ago283 views
2 Answers
0

Han,

I'll try to be brief in my response, but feel free to ask me if you have any questions.

  1. Relationship between temp_bytes Peaks and FreeLocalStorage Drops:

    • The y-value of temp_bytes peaks represents the temporary storage used by the queries. The amount of drop in FreeLocalStorage correlates with the space consumed by these temporary files. The more temp_bytes used, the larger the drop in FreeLocalStorage.
  2. Why temp_bytes Peaks at Query End:

    • temp_bytes peaks at the end of the query because the temporary files are created and possibly expanded throughout the query execution but are only fully accounted for in the metrics when the query completes.
  3. Doubling of temp_bytes and FreeLocalStorage Drops:

    • Yes, it is likely that the doubling of the temp_bytes peak and the corresponding drop in FreeLocalStorage is due to the second entity having double the number of rows. More rows mean more temporary storage usage.
  4. Value of Knowing temp_bytes and FreeLocalStorage:

    • Monitoring these metrics helps in understanding the temporary storage requirements of your queries. High temp_bytes spikes or significant drops in FreeLocalStorage can indicate that your queries are using more temporary storage than expected. If temp_bytes spikes are too high or FreeLocalStorage drops are too low, it might indicate the need for optimizing queries or increasing instance size.
  5. Is FreeLocalStorage the PostgreSQL Temp Directory on Disk?

    • Yes, FreeLocalStorage represents the available space in the temporary storage directory on disk. When you encounter "No space left on device" errors, increasing the instance type to get higher FreeLocalStorage is a viable solution, alongside adjusting work_mem.
  6. Predicting temp_bytes or FreeLocalStorage from a Query Plan:

    • Predicting the exact amount of temp_bytes or FreeLocalStorage required from a query plan can be complex. However, you can estimate based on the number of rows processed and the operations involved (e.g., sorting, hashing). If the correlation between the number of rows and temp_bytes holds, you can use metrics from similar queries to make predictions.

Bests.

profile picture
EXPERT
answered 5 months ago
0

There are also some answers (incomplete though) on this in the thread below https://aws.amazon.com/blogs/database/tune-sorting-operations-in-postgresql-with-work_mem/

My issue is with tuning guide at https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/PostgreSQL.ManagingTempFiles.html recommending to turn on temp_bytes, but no explanation of what it means or how it helps choose instance size.

Quoting the same point i made in that thread "I'm trying to relate what we see from the query plan to temp_bytes. Is that possible? If no, then it's not a useful indicator, and the linked article in my 1st question recommending to turn temp_bytes on would not help query writing or instance sizing"

I will ignore temp_bytes for now and just observe FreeLocalStorage which relate strongly to number of rows scanned in query plan and the sum(size) column from

select CURRENT_TIMESTAMP, count(*), sum(size) from pg_ls_tmpdir();

The query above will show temp files being created in groups of (parallel workers+ 1 leader) files, growing to a size of 1GB before the next group of temp files are created. Meaning if max_parallel_workers_per_gather = 2, 3 files are created at once, filled to 1GB each at roughly uniform speed, then another set of 3 temp files appear, again filled to 1GB each, and so on

answered 5 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions