- Più recenti
- Maggior numero di voti
- Maggior numero di commenti
Hello Yi. Thank you for your question. You can take an alternative approach to handle this conflict.
It's essential to implement a dependency when scheduling a process at regular interval. You can setup an audit table with one entry per process (or a table) to keep track of the status of your latest run of the stored procedure. In that audit table you can maintain columns like:
- Process_Name
- Process_Start_Time
- Process_End_Time
- Process_State
For a given stored procedure, you set the Process_State as “Ready” before the stored procedure runs for the first time.
As a first step inside the stored procedure, you can check the **Process_State ** of latest run from the audit table and implement a logic within stored procedure to check the following :
- if Process_State is “Running” then end the procedure successfully, as it indicates that the previous iteration is not completed yet and next iteration is not ready to start.
- If the Process_State is “Ready,” then start the load process, as it indicates that the previous iteration was completed successfully and the next iteration is ready to start. And the next step will be to update the audit table to change the Process_State to “Running”. Optionally you can maintain the start and end time of the process for better debugging.
Also consider referring to this blog for more sophisticated implementation using Step Function, Lambda, DynamoDB and Redshift Data API: https://aws.amazon.com/blogs/big-data/accelerate-orchestration-of-an-elt-process-using-aws-step-functions-and-amazon-redshift-data-api/
Hello Yi. Above query should be able to determine existing lock and will serve your purpose, except during the very first run of the Stored procedure when the target table is empty. SVV_TABLE_INFO starts showing up a record for a given table only after at least one row is loaded into the table. However, SVV objects are complex views and you might see delays while returning data in a timely manner when the system is busy or if you have large number of objects. You can consider using PG_locks which is much lighter weight system table, based on your transaction ID. While system objects can serve your short term purpose, recommendation is to implement a solid control over auditing techniques either inside (call audit stored proc) or outside of Redshift (E.g. DynamoDB) if you have a strict ingestion SLAs or data freshness requirement.
Thanks Poulomi!
My understanding is once Redshift acquires a lock for the first time, Redshift will hold on to the lock for the entire transaction even if the lock is only needed at the beginning of the Tx. Given that, we know exactly which table lock to check for our SP. We went with a simple solution that checks if a lock has been held (see below), and if it is, the SP will error out without proceeding:
WITH LockedTables AS (
SELECT relation as table_id
FROM SVV_TRANSACTIONS
WHERE txn_db = 'app' AND lock_mode in ('ShareRowExclusiveLock', 'ExclusiveLock') AND
granted = true AND lockable_object_type = 'relation'
)
SELECT COUNT(1) as lock_count from SVV_TABLE_INFO
INNER JOIN LockedTables USING (table_id)
WHERE database='app' AND "schema"='ingestion' AND "table"='email_send_metadata';
Any drawbacks you can think of for the above approach?
Hello,
Thank you for posting your question.
Looking at the use case, ideally and how redshift works you cannot avoid locking on tables. You would have to schedule the SP in such a way to avoid concurrent queries. Please try limiting and schedule to avoid conflict on statements causing locks with the below document.
https://repost.aws/knowledge-center/prevent-locks-blocking-queries-redshift
query to identify sessions that are holding locks:
select a.txn_owner, a.txn_db, a.xid, a.pid, a.txn_start, a.lock_mode, a.relation as table_id,nvl(trim(c."name"),d.relname) as tablename, a.granted,b.pid as blocking_pid ,datediff(s,a.txn_start,getdate())/86400||' days '||datediff(s,a.txn_start,getdate())%86400/3600||' hrs '||datediff(s,a.txn_start,getdate())%3600/60||' mins '||datediff(s,a.txn_start,getdate())%60||' secs' as txn_duration from svv_transactions a left join (select pid,relation,granted from pg_locks group by 1,2,3) b on a.relation=b.relation and a.granted='f' and b.granted='t' left join (select * from stv_tbl_perm where slice=0) c on a.relation=c.id left join pg_class d on a.relation=d.oid where a.relation is not null;
Please add for any concerns. Thanks
Contenuto pertinente
- AWS UFFICIALEAggiornata 2 anni fa
- AWS UFFICIALEAggiornata 2 anni fa