- Newest
- Most votes
- Most comments
Below some suggestions to address and fix your issue:
-
Check for Blocking Transactions: Sometimes, other transactions or processes can block the DBCC CHECKDB from completing. You can use the sp_who2 or sp_whoisactive stored procedures to identify any blocking sessions and address them.
-
Review SQL Server Error Logs: Look for any error messages or warnings in the SQL Server error logs that might give us more insight into why the DBCC CHECKDB process is getting stuck.
-
Resource Constraints: Ensure that your server has sufficient resources (CPU, memory, disk I/O) to handle the integrity check. Resource constraints can cause long-running processes to hang.
-
Update SQL Server: Make sure your SQL Server instance is up to date with the latest service packs and updates. Sometimes, bugs that cause these issues are fixed in newer updates.
-
Restart SQL Server: As a last resort, you might need to restart the SQL Server service to clear the stuck process. This should be done with caution, especially in a production environment.
-
Consider Using a Different Method: If the issue persists, you might want to consider using a different method for checking database integrity, such as running DBCC CHECKDB with different options or scheduling it during off-peak hours.
Is this happening with a maintenance job set to process all databases or all user databases; or does the issue occur also when manually running DBCC CHECKDB against any database; or only when it's run against a specific, perhaps large user database?
If the issue appears to be specific to, or you can narrow it down to, one or more particular databases, how big is the smallest one to exhibit the issue?
Also, does DBCC CHECKDB appear as KILLED in the sp_who2 output all on its own or only after you've tried to terminate it manually? If it only gets in the ROLLBACK state on its own, can you see in CloudWatch metrics for the EBS volume if there's heavy disk I/O on the data, log, or tempdb drives while it says it's rolling back? If so, would it be possible to wait for enough time for SQL Server to finish processing the whole database contents and see if it finishes eventually? If you can see heavy I/O, you could divide the combined size of the database's data files by the MB/s read and write rates shown in CW metrics to estimate the amount of time it's likely to take. The smaller the database you can reproduce the issue with, the less time there'd be to wait.
This issue occurs during a maintenance job set to process all databases. The job runs Ola Hallengren's DatabaseIntegrityCheck with @Databases = 'ALL_DATABASES'.
The issue doesn't appear to be specific to any one database. It has happened with both a 200GB database and a 1.5TB database. I haven't attempted to terminate it. The process appears in sp_who2 and sys.dm_exec_requests as 'KILLED/ROLLBACK'. How it reaches this status is a mystery to me. Maybe it have something to do with backups / index maintenance procceses running on the server.
There is usually heavy load on this server in terms of I/O and memory usage. Normally, DBCC takes around 10-11 hours to complete when it finishes successfully.
I tried running it with PHYSICAL_ONLY, which saves only around 5-6% of runtime.
I could divide the process, but that might introduce other issues I'm trying to avoid.
Thanks for the replay.
Relevant content
- asked 2 years ago
- asked 2 years ago
- asked 3 years ago
- AWS OFFICIALUpdated 4 months ago
- AWS OFFICIALUpdated 2 months ago
- AWS OFFICIALUpdated 10 months ago
- AWS OFFICIALUpdated 6 months ago
The only blocking issue I see is when DBCC is rolling back. For some reason, it blocks user transactions and sometimes system background processes (SPIDs below 50 and occasionally negative SPIDs like -2 or -5).
My SQL Server is patched, but I'm more concerned about the resource constraint issue. However, that alone doesn't fully explain the situation. I have similar servers where DBCC runs for hours and completes successfully.
I'm posting here because I'm trying to avoid restarting SQL Server. Currently, that's my workaround, but the recovery time for the largest database is 15 minutes.
Thanks for the reply. Any help would be appreciated.