在 RDS for SQL Server 中使用本机备份和还原时可能会出现哪些常见问题?

6 分钟阅读
0

我正在为我的 Microsoft SQL Server 实例的 Amazon Relational Database Service(Amazon RDS)执行本机备份或还原。在这个过程中我可能会遇到哪些常见错误?

解决方法

使用 RDS for SQL Server 本机备份和还原选项时,可能会遇到验证错误。这些错误会立即显示,并且不会创建任务。以下是常见的错误和建议修复方法:

错误:由于任务失败或并发 RESTORE_DB 请求而中止了任务

如果您从 Amazon Elastic Compute Cloud(Amazon EC2)或本地还原备份时遇到数据库实例上的空间相关问题,则会出现此错误:

[2022-04-07 05:21:22.317] Aborted the task because of a task failure or a concurrent RESTORE_DB request.
[2022-04-07 05:21:22.437] Task has been aborted
[2022-04-07 05:21:22.440] There is not enough space on the disk to perform restore database operation.

要解决此错误,请执行以下操作:

选项 1:

1.    在源实例(EC2 或本地)上运行以下命令。此命令验证数据库的大小,包括数据文件和 Tlog 文件。在以下示例中,将 [DB_NAME] 替换为数据库的名称。

SELECT DB_NAME(database_id) AS DatabaseName,
Name AS Logical_Name,
Physical_Name, (size*8)/1024/1024 SizeGB
FROM sys.master_files
WHERE DB_NAME(database_id) = '[DB_NAME]'
GO
Database Size = (DB_Name size + DB_Name_Log size)

2.    将源实例的数据库大小与数据库实例上的可用存储空间进行比较。相应地增加可用存储空间,然后还原数据库。

选项 2:

缩小源 SQL Server 上的当前数据库日志文件以清理未使用的空间,然后执行数据库备份。

使用以下命令缩小日志文件。

DBCC SHRINKFILE (LogFileName, Desired Size in MB)

错误:由于任务失败或并发 RESTORE_DB 请求而中止了任务

当您遇到与 AWS Identity and Access Management(IAM)角色或与 SQLSERVER_BACKUP_RESTORE 选项关联的策略相关的权限问题时,会出现以下错误:

[2020-12-15 08:56:22.143] Aborted the task because of a task failure or a concurrent RESTORE_DB request.
[2020-12-15 08:56:22.213] Task has been aborted
[2020-12-15 08:56:22.217] Access Denied

要解决此错误,请执行以下操作:

1.    验证还原查询,以确保 S3 存储桶和文件夹前缀正确无误:

exec msdb.dbo.rds_restore_database
      @restore_db_name='database_name',
      @s3_arn_to_restore_from='arn:aws:s3:::bucket_name/file_name_and_extension';

2.    验证 IAM policy 是否包含以下属性:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:ListBucket",
        "s3:GetBucketLocation"
      ],
      "Resource": "arn:aws:s3:::bucket_name"
    },
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetObjectAttributes",
        "s3:GetObject",
        "s3:PutObject",
        "s3:ListMultipartUploadParts",
        "s3:AbortMultipartUpload"
      ],
      "Resource": "arn:aws:s3:::bucket_name/*"
    }
  ]
}

**注意:**将 arn:aws:s3:::bucket_name 替换为您的 S3 存储桶的 ARN。

3.    验证该策略是否与 SQLSERVER_BACKUP_RESTORE 选项中给定的角色正确关联。

4.    验证 SQLSERVER_BACKUP_RESTORE 选项是否是与数据库实例关联的选项组:

S3 存储桶 ARN
S3 文件夹前缀(可选)

有关更多信息,请参阅如何执行运行 SQL Server 的 Amazon RDS 数据库实例的本机备份?

错误:由于任务失败或并发 RESTORE_DB 请求而中止了任务

此错误通常与跨账户数据库还原有关。

示例:

  • 账户 A 有一个 S3 存储桶,用于存储备份。
  • 账户 B 有一个 RDS 数据库实例,需要在其中进行还原。

当您在与选项关联的 IAM 角色或策略中遇到权限相关问题时,就会出现错误。或者,与跨账户中的 S3 存储桶关联的存储桶策略存在权限问题。

[2022-02-03 15:57:22.180] Aborted the task because of a task failure or a concurrent
RESTORE_DB request.
[2022-02-03 15:57:22.260] Task has been aborted
[2022-02-03 15:57:22.263] Error making request with Error Code Forbidden and Http Status Code Forbidden. No further error information was returned by the service.

要解决此错误,请执行以下操作:

1.    验证账户 B(您要还原的数据库实例所在的账户)中的 IAM policy 是否包含以下属性:

{
  "Version": "2012-10-17",
  "Statement":
    [
      {
        "Effect": "Allow",
        "Action":
          [
            "s3:ListBucket",
            "s3:GetBucketLocation"
          ],
        "Resource": "arn:aws:s3:::name_of_bucket_present_in_Account_A"
      },
      {
        "Effect": "Allow",
        "Action":
          [
            "s3:GetObject",
            "s3:PutObject",
            "s3:ListMultipartUploadParts",
            "s3:AbortMultipartUpload"
          ],
        "Resource": "arn:aws:s3::: name_of_bucket_present_in_Account_A /*"
      },
      {
        "Action": [
          "kms:DescribeKey",
          "kms:GenerateDataKey",
          "kms:Decrypt",
          "kms:Encrypt"
          "kms:ReEncryptTo",
          "kms:ReEncryptFrom"
        ],
        "Effect": "Allow",
        "Resource": [
          "arn:aws: PUT THE NAME OF THE KEY HERE",
          "arn:aws:s3::: name_of_bucket_present_in_Account_A /*"
        ]
      }
    ]
}

2.    验证与账户 A 中的 S3 存储桶关联的存储桶策略是否包含以下属性:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "Permission to cross account",
      "Effect": "Allow",
      "Principal": {
        "AWS": [
          "arn:aws:iam::AWS-ACCOUNT-ID-OF-RDS:role/service-role/PUT-ROLE-NAME"   /*---- Change Details here
        ]
      },
      "Action": [
        "s3:ListBucket",
        "s3:GetBucketLocation"
             ],
      "Resource": [
        "arn:aws:s3:::PUT-BUCKET-NAME"   /*---- Change Details here
      ]
    },
    {
      "Sid": "Permission to cross account on object level",
      "Effect": "Allow",
      "Principal": {
        "AWS": [
          "arn:aws:iam::AWS-ACCOUNT-ID-OF-RDS:role/service-role/PUT-ROLE-NAME"   /*---- Change Details here
        ]
      },
      "Action": [
        "s3:GetObject",
        "s3:PutObject",
        "s3:ListMultipartUploadParts",
        "s3:AbortMultipartUpload"
      ],
      "Resource": [
        "arn:aws:s3::: PUT-BUCKET-NAME/*"  /*---- Change Details here
      ]
    }
  ]
}

有关更多信息,请参阅以下内容:

错误:找不到带有指纹“XXXXXX”的服务器证书

当您尝试使用透明数据加密(TDE)将数据库从 EC2 或本地还原到 RDS for SQL Server 时,会出现此错误:

[2022-06-1511:55:22.280] Cannot find server certificate with thumbprint 'XXXXXXX'.
[2022-06-15 11:55:22.280] RESTORE FILELIST is terminating abnormally.
[2022-06-15 11:55:22.300] Aborted the task because of a task failure or a concurrent RESTORE_DB request.
[2022-06-15 11:55:22.333] Task has been aborted
[2022-06-15 11:55:22.337] Empty restore file list result retrieved.

此错误表示有人试图将使用 TDE 加密的数据库的备份还原到除其原始服务器之外的 SQL 实例。必须将原始服务器的 TDE 证书导入目标服务器。有关导入服务器证书和相应限制的更多信息,请参阅在 SQL Server 中支持透明数据加密

除导入证书外,要解决此错误,请执行以下操作:

有两种解决办法可以防止出现此错误。

选项 1:数据库备份来源于本地或 EC2 实例,但目标 RDS SQL Server 位于多可用区中

1.    在 TDE 开启的情况下创建源数据库的备份。

2.    在本地服务器中使用将备份还原为新数据库。

3.    在新创建的数据库上关闭 TDE。使用以下命令关闭 TDE:

运行以下命令关闭数据库的加密。在以下命令中,将 Databasename 替换为您的数据库的正确名称。

USE master;
GO
ALTER DATABASE [Databasename] SET ENCRYPTION OFF;
GO

运行以下命令删除用于加密的 DEK。在以下命令中,将 Databasename 替换为您的数据库的正确名称。

USE [Databasename];
GO
DROP DATABASE ENCRYPTION KEY;
GO

4.    创建本机 SQL Server 备份并将这个新备份还原到所需的 RDS 实例。有关更多信息,请参阅如何执行运行 SQL Server 的 Amazon RDS 数据库实例的本机备份?

5.    在新的 RDS 数据库中重新开启 TDE

选项 2:该数据库来源于使用 TDE 加密的 RDS for SQL Server 数据库

1.    使用源实例的快照将数据库还原到新实例

2.    关闭从快照创建的数据库上的 TDE

3.    创建本机 SQL 备份并将这个新备份还原到所需的 RDS 实例。

4.    在新的 RDS 数据库中重新开启 TDE

观察到的 RDS for SQL Server 上的本机备份的常见错误

错误:由于任务失败或与 RDS 自动备份的首选备份窗口重叠而中止了任务

当您遇到与 SQLSERVER_BACKUP_RESTORE 选项关联的 IAM 角色或策略相关的权限问题时,就会出现以下错误。

[2022-07-16 16:08:22.067]
Task execution has started. 
[2022-07-16 16:08:22.143] Aborted the task because of a task failure or an overlap with your preferred backup window for RDS automated backup.
[2022-07-16 16:08:22.147] Task has been aborted [2022-07-16 16:08:22.150] Access Denied

要解决此问题,请执行以下操作:

1.    验证还原查询,以确保 S3 存储桶和文件夹前缀正确无误:

exec msdb.dbo.rds_restore_database
      @restore_db_name='database_name',
      @s3_arn_to_restore_from='arn:aws:s3:::bucket_name/file_name_and_extension';

2.    验证 IAM policy 是否包含以下属性:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:ListBucket",
        "s3:GetBucketLocation"
      ],
      "Resource": "arn:aws:s3:::bucket_name"
    },
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetObjectAttributes",
        "s3:GetObject",
        "s3:PutObject",
        "s3:ListMultipartUploadParts",
        "s3:AbortMultipartUpload"
      ],
      "Resource": "arn:aws:s3:::bucket_name/*"
    }
  ]
}

**注意:**将 arn:aws:s3:::bucket_name 替换为您的 S3 存储桶的 ARN。

3.    验证该策略是否与 SQLSERVER_BACKUP_RESTORE 选项中显示的角色正确关联。

4.    验证选项组中的 SQLSERVER_BACKUP_RESTORE 选项是否与数据库实例关联。

S3 存储桶 ARN
S3 文件夹前缀(可选)

有关更多信息,请参阅如何执行运行 SQL Server 的 Amazon RDS 数据库实例的本机备份?

错误:在“XXX”上写入失败,无法将区块写入 S3,S3 写入流上传失败

这是 RDS for SQL Server 的一个已知问题。数据库大小有时估计不正确,导致备份过程失败并出现以下错误。

[2022-04-21 16:45:04.597] reviews_consumer/reviews_consumer_PostUpdate_042122.bak: Completed processing 100% of S3 chunks.
[2022-04-21 16:47:05.427] Write on "XXXX" failed: 995(The I/O operation has been aborted because of either a thread exit or an application request.) A nonrecoverable I/O error occurred on file "XXXX:" 995(The I/O operation has been aborted because of either a thread exit or an application request.). BACKUP DATABASE is terminating abnormally.
[2022-04-21 16:47:22.033] Unable to write chunks to S3 as S3 processing has been aborted. [2022-04-21 16:47:22.040] reviews_consumer/reviews_consumer_PostUpdate_042122.bak: Aborting S3 upload, waiting for S3 workers to clean up and exit
[2022-04-21 16:47:22.053] Aborted the task because of a task failure or an overlap with your preferred backup window for RDS automated backup.
[2022-04-21 16:47:22.060] reviews_consumer/reviews_consumer_PostUpdate_042122.bak: Aborting S3 upload, waiting for S3 workers to clean up and exit
[2022-04-21 16:47:22.067] S3 write stream upload failed. Encountered an error while uploading an S3 chunk: Part number must be an integer between 1 and 10000, inclusive S3 write stream upload failed. Encountered an error while uploading an S3 chunk: Part number must be an integer between 1 and 10000, inclusive S3 write stream upload failed. Encountered an error while uploading an S3 chunk: Part number must be an integer between 1 and 10000, inclusive S3 write stream upload failed. Encountered an error while uploading an S3 chunk: Part number must be an integer between 1 and 10000, inclusive

解决此错误的方法是开启数据库备份压缩。这会压缩备份,使 S3 更容易接收文件。

运行以下命令开启备份压缩:

exec rdsadmin..rds_set_configuration 'S3 backup compression', 'true';

AWS 官方
AWS 官方已更新 1 年前