Testing Backups and Offloading CheckDB

While every DBA knows they need to backup all their databases, not all may realize the importance of testing those backups. Performing backups is pointless, if you’re unable to restore them.

I wanted to restore our backups on a regular basis, so I set up a process to test them automatically, on a nightly basis.

Background

Early on when I started working on the SQL Servers at Stack Overflow, we were taking daily backups. We had a handful of databases that were being restored for other processes, but the majority weren’t actively tested to ensure the backups were good. Since you never want to be in a situation where you need to restore a database and find it doesn’t work, my goal was to create a process to automatically restore our backups to a separate server, and then run DBCC CHECKDB on it.

Now, you might be wondering why I’d run CHECKDB on the backup after I restore it and asking yourself “aren’t you supposed to run CHECKDB on the SQL Server you take a backup on?”

Yes, technically many people suggest you do that, but we don’t, because while we take full backups on the primary replica in our Always On Availability Groups, we don’t have maintenance windows, and running CHECKDB on a very busy production SQL Server like we have for Stack Overflow can cause issues (ask me how I know). We can’t easily run a full CHECKDB in production, so since I was going to be restoring the backups to make sure they worked, I might as well run CHECKDB on it to verify everything.

Automating the Testing

Before I jumped into creating a new process there were some questions I needed to answer for this project:

What databases do we want or need to test the backups for?
Where can we perform this testing? Do we have a server available for use?
How often do we want to test the backups? Nightly? Weekly? What would the cadence be for this process?

The answer to the first question was pretty clear, since we can’t easily run DBCC CHECKDB in production basically, any database that was in an Always On Availability Group was on the list to have the backup restored and tested.

Next, I needed a place to perform the testing. Thankfully, we had a separate server where we were already restoring some backups, so it was the perfect candidate for this. The only issue was lack of disk space for some of the large databases. We resolved the lack of space issue by purchasing a couple of new NVMEs which gave us a 14TB drive to use for the restores.

Lastly, it was figuring out how often we wanted to test our restores. Since we take full backups nightly, we decided we would restore all databases daily, then run DBCC CHECKDB against it. Is that a little much? Maybe, but it also gives us confidence that our backups are working.

Now that I had the answer to the questions above, it was time to create a process that would do the following:

Automatically add or remove databases that need testing
Restore the our backups on a separate server
Run DBCC CHECKDB on the backup after being restored

Maintaining a List of Databases to Test

I knew the initial list of databases we wanted to test, but what if that list changed? What would happen if we added new databases, or removed databases, and no longer had backups? I didn’t want the nightly process to fail, and even though I typically know when we add or remove databases, I didn’t want to hand-hold and edit the list manually, I wanted it to be dynamic.

To do this, I had to save the list of databases to restore. I did this by first creating a table in the master database on the server we were using for backup and restore testing. The table structure is:

1
2
3
4
5
6
7
8


CREATE TABLE [dbo].DatabasesToRestore(
	[Id] [int] IDENTITY(1,1) NOT NULL,
	[Name] [sysname] NOT NULL,
	[AvailabilityGroup] nvarchar(100) NULL,
	IsActive bit NOT NULL DEFAULT(1)
	CONSTRAINT [PK_DatabasesToRestore] PRIMARY KEY CLUSTERED ([Id] ASC)
) ON [PRIMARY]
GO

This table stores basic information about each database, including name, but also the name of the AvailabilityGroup that the database is in. I purposely stored the AG name because of the way we save our backups. Our backup directory structure uses the Availability Group name instead of the name of the SQL Server in the file path because it allows the backup locations to stay consistent no matter what server is the primary. The file share for backups uses this directory structure:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12


|--- server
    |--- Backups
        |--- SQL
            |--- AG-name1
                |--- Database1
                    --- database1_20210101.bak
                    --- database1_20210102.bak
                |--- Database2
                    --- database2_20210101.bak
            |--- AG-name2
                |--- Database1
                |--- Database2

By storing the name of the Availability Group for each database, it allowed me to dynamically create the path to the backup for testing. It also made the process to find additions and deletions for each availability group more efficient, so I didn’t have to manually adjust the list of databases that would need nightly testing. All that was needed was the initial population of the databases to restore, and then the stored procedure would keep the list up to date. The stored procedure that I wrote is:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85


CREATE OR ALTER Procedure [dbo].[spUpdateDatabasesToRestore] 
AS
BEGIN
  DECLARE @fileSearch nvarchar(500) = '',
        @backupPath NVARCHAR(500) = '',
        @error nvarchar(600);

  DECLARE @ag_name nvarchar(100)

  DECLARE ag_cursor CURSOR FOR
    SELECT DISTINCT AvailabilityGroup
    FROM dbo.DatabasesToRestore
  OPEN ag_cursor
  FETCH NEXT FROM ag_cursor INTO @ag_name
  WHILE @@FETCH_STATUS = 0
  BEGIN

    PRINT 'Getting databases for AG: '+ @ag_name

	-- go through the list of availability groups and their backup path
	-- update the list of databases to restore 
	-- adding new ones and deleting old ones that are no longer active databases

	SET @backupPath = concat('\\fileserver\Backups\SQL\', @ag_name, '\')
	SET @fileSearch = 'DIR /S /b/a-d/od/t:c ' + @backupPath;

	DECLARE @files table (ID int IDENTITY, FileName varchar(max), DBName sysname null, FileDate date null)

	INSERT INTO @files (FileName)
	EXEC master.sys.xp_cmdshell @fileSearch

	UPDATE @files 
	SET DBName = Left(replace(FileName, @backupPath, ''), charindex('\', replace(FileName, @backupPath, ''))-1),
	  FileDate = 
                cast(substring(right(replace(FileName, @backuppath, ''), len(replace(FileName, @backuppath, ''))
                -charindex('\', replace(FileName, @backuppath, ''))), 
                charindex('_FULL_', right(replace(FileName, @backuppath, ''), 
                len(replace(FileName, @backuppath, ''))-charindex('\', replace(FileName, @backuppath, '')))) + 6, 8) as date)
	WHERE FileName like '%.bak%'

	PRINT 'inserting new databases for AG: '+ @ag_name

	-- Insert new databases into the Databases table 
	INSERT INTO DatabasesToRestore (Name, AvailabilityGroup)
	SELECT DBName, AGName = @ag_name
	FROM
	(
		SELECT DBName, 
			rn = row_number() over(partition by DBName order by FileDate desc)
		FROM @files
		WHERE FileDate >= GETDATE()-7
			AND FileName LIKE '%'+@ag_name+'%'
	) f
	WHERE rn = 1
		AND NOT EXISTS (SELECT 1
				FROM dbo.DatabasesToRestore d
				WHERE f.DBName = d.Name
				AND d.AvailabilityGroup = @ag_name
    				AND d.IsActive = 1)

	PRINT 'Removing any old databases for AG: '+ @ag_name

	-- mark any Databases that are no longer producing backups as IsActive = 0
	UPDATE d
	SET d.IsActive = 0
	FROM dbo.DatabasesToRestore d
	LEFT JOIN 
	(
		SELECT DISTINCT DBName, AGName = @ag_name
		FROM @files 
		WHERE FileDate >= GETDATE()-7
			AND DBName IS NOT NULL
			AND FileName LIKE '%'+@ag_name+'%'
	) f
	    ON d.Name = f.DBName
	    AND d.AvailabilityGroup = f.AGName
	WHERE d.IsActive = 1
		AND d.AvailabilityGroup = @ag_name
		AND f.DBName IS NULL

	FETCH NEXT FROM ag_cursor INTO @ag_name
  END
  CLOSE ag_cursor  
  DEALLOCATE ag_cursor 
END

This procedure uses xp_cmdshell to get the list of files in the backup directory, then adds new databases that are missing from the current list, and removes databases that haven’t had a new .bak file in the last seven days. Finally, we have a scheduled job that executes this stored procedure nightly to keep the table of DatabasesToRestore up to date.

Restoring Database Backups

Once I had a list of the databases to restore, and a way to maintain that list on its own, it was time to get the process in place to restore the backups for testing nightly.

In addition to restoring the backups, I also wanted to keep a record of what was being restored, as a reference, in the event there were issues or failures with the nightly job. To do this, I created another table which I put in the master database.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15


CREATE TABLE dbo.DatabaseRestoreLog
(
	[ID] [int] IDENTITY(1,1) NOT NULL,
	[DatabaseId] [int] NOT NULL,
	[RestoredName] [sysname] NULL,
	[RestoredDate] [date] NOT NULL,
	[RestoreStartTime] datetime NULL,
	[RestoreEndTime] datetime NULL,
	[DBCCStartTime] datetime NULL,
	[DBCCEndTime] datetime NULL,
	[PassedDBCCChecks] [bit] NOT NULL DEFAULT (0)
	CONSTRAINT [PK_DatabaseRestoreLog] PRIMARY KEY CLUSTERED ([ID] ASC),
	CONSTRAINT [FK_DatabaseRestoreLog_DatabasesToRestore] 
		FOREIGN KEY ([DatabaseId]) REFERENCES dbo.DatabasesToRestore ([Id])
);

This table allows the job to capture critical details, like how long the restore took, it also tracked whether or not the backup passed the DBCC checks (more on that later).

As I mentioned earlier, we were restoring some databases in another process, so instead of inventing something from scratch, I decided to modify the existing stored procedure and expand it for the restore testing. Again, this procedure utilizes xp_cmdshell and loops through all of the databases in the DatabasesToRestore table and looks for the oldest databases that haven’t been restored recently, creates the path to the backup, and then grabs the most recent .bak file to restore for testing.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201


CREATE OR ALTER Procedure dbo.spBackupRestoreTesting 
	@TimeLimit int = 43200,		-- the amount of time we want to limit the restores - the default is 12 hours
    @exec BIT = 0
AS
BEGIN
	DECLARE @fileSearch nvarchar(500) = '',
		@backupFile nvarchar(500),
		@data_dir nvarchar(100) = 'F:\DBRestore\', 
		@cmd nvarchar(max),
		@dropCmd nvarchar(500),
		@RestoredDBName sysname,
		@backupPath NVARCHAR(500) = '',
		@error nvarchar(600);

	-- Set the time we started the restore process so we know when we want to exit 
	-- if it hits the TimeLimit, we'll stop for the day
	DECLARE @StartTime datetime
	SET @StartTime = GETDATE()

	DECLARE @fileList Table (rowNumber int identity(1,1), backupFile nvarchar(255), backupPath nvarchar(500));

	DECLARE @FileListTable Table (
		[LogicalName] nvarchar(128),
		[PhysicalName] nvarchar(260), 
		[Type] char(1), 
		[FileGroupName] nvarchar(128), 
		[Size] numeric(20,0),
		[MaxSize] numeric(20,0), 
		[FileId] bigint, 
		[CreateLSN] numeric(25,0), 
		[DropLSN] numeric(25,0), 
		[UniqueId] uniqueidentifier, 
		[ReadOnlyLSN] numeric(25,0), 
		[ReadWriteLSN] numeric(25,0),
		[BackupSizeInBytes] bigint, 
		[SourceBlockSize] int, 
		[FileGroupId] int, 
		[LogGroupGUID] uniqueidentifier, 
		[DifferentialBaseLSN] numeric(25,0), 
		[DifferentialBaseGUID] uniqueidentifier, 
		[IsReadOnly] bit, 
		[IsPresent] bit, 
		[TDEThumbprint] varbinary(32), 
		[SnapshotUrl] nvarchar(360));

	DECLARE @database_id int
	DECLARE @database_name sysname
	DECLARE @ag_name nvarchar(100)
	DECLARE @last_restore_date date

	-- loop through the list of databases
	-- restore them, then run DBCC checks on them
	WHILE GETDATE() < DATEADD(ss,@TimeLimit,@StartTime)
	BEGIN
		-- grab the oldest databases that haven't been restored recently
		SELECT TOP 1 @database_id = d.Id, 
            @database_name = d.Name, 
            @ag_name = d.AvailabilityGroup, 
            @last_restore_date = drl.LastRestoredDate
		FROM DatabasesToRestore d
		LEFT JOIN
		(
			-- get the last date each database was restored
			SELECT 
				DatabaseId,
				LastRestoredDate = MAX(RestoredDate)
			FROM DatabaseRestoreLog
			GROUP BY DatabaseId
		) drl
			ON d.Id = drl.DatabaseId
		WHERE d.IsActive = 1
			AND (drl.LastRestoredDate <> CAST(GETDATE() AS DATE) 
			    OR drl.LastRestoredDate IS NULL)
		ORDER BY drl.LastRestoredDate 

		IF @@ROWCOUNT = 0
          BEGIN
            BREAK
          END

		PRINT CONCAT('Restoring : ', @database_name, ' in AG ', @ag_name, ' last restored date: ', @last_restore_date)

		SET @backupPath = CONCAT('\\fileserver\Backups\SQL\', @ag_name, '\', @database_name, '\')
		PRINT @backupPath

		SET @fileSearch = 'DIR *.bak /b /O:D ' + @backupPath;

		INSERT INTO @fileList(backupFile)
		EXEC master.sys.xp_cmdshell @fileSearch;
		UPDATE @fileList Set backupPath = @backupPath;

		SELECT Top 1 @backupFile = backupFile, @backupPath = backupPath
		FROM @fileList
		WHERE backupFile Like @database_name + '%'
		ORDER BY backupFile DESC

		IF @backupFile Is Not Null
			BEGIN
				DECLARE @fullPath nvarchar(500) = @backupPath + @backupFile;
				PRINT 'Backup File Found: ' + @fullPath;

				INSERT INTO @FileListTable
				EXEC('Restore FileListOnly From Disk = ''' + @fullPath + '''');

				SET @RestoredDBName = REPLACE(@backupFile, '.bak', '')

				-- insert into the log the Database we're restoring
				INSERT INTO dbo.[DatabaseRestoreLog] (DatabaseId, RestoredName, RestoredDate, RestoreStartTime)
				VALUES (@database_id, @RestoredDBName, GETDATE(), GETDATE());

				SET @cmd = 'Restore Database [' + @RestoredDBName + '] ' + char(13) + '  From Disk = ''' + @fullPath + ''' With File = 1, ' + char(13);
				SELECT @cmd = @cmd + '  Move N''' + LogicalName + ''' To N''' + @data_dir + LogicalName + '_' + @RestoredDBName + '.' +
					REVERSE(SUBSTRING(REVERSE(PhysicalName), 0, CHARINDEX('.', REVERSE(PhysicalName),0))) + ''', ' + char(13)
				FROM @FileListTable

				SET @cmd = @cmd + '  NoUnload, Stats = 5;' + char(13);
				SET @cmd = @cmd + 'Alter Database [' + @RestoredDBName + '] Set Recovery Simple With No_Wait;'

				-- if for some reason the database with the restored name exists, drop it first
				IF EXISTS (SELECT 1 
							FROM sys.databases 
							WHERE Name = @RestoredDBName)
					BEGIN
						PRINT 'Dropping existing [' + @RestoredDBName +']';
						SET @dropCmd = 'Alter Database [' + @RestoredDBName + '] Set Single_User With Rollback Immediate; Drop Database [' + @RestoredDBName + '];';
						PRINT(@dropCmd);

						IF @exec = 1 
							EXEC(@dropCmd);
					END

				-- if it doesn't exist, then restore the database
				PRINT 'Restoring [' + @database_name + '] from ' + @backupFile;
				PRINT(@cmd);
				IF @exec = 1 
					EXEC(@cmd);

				-- make sure the DB exists before trying to run DBCC CHECKDB on it
				IF EXISTS (SELECT 1 FROM sys.databases WHERE Name = @RestoredDBName) 
					BEGIN
						-- When the restore is complete, update the Log to reflect the end time
						UPDATE dbo.DatabaseRestoreLog
						SET RestoreEndTime = GETDATE(),
							DBCCStartTime = GETDATE()
						WHERE DatabaseId = @database_id
							AND RestoredName = @RestoredDBName

						EXEC dbo.spRunDBCCChecks @dbName = @RestoredDBName, @exec = @exec;

						-- When the DBCC Check is complete, update the Log to reflect the end time
						UPDATE dbo.DatabaseRestoreLog
						SET DBCCEndTime = GETDATE()
						WHERE DatabaseId = @database_id
							AND RestoredName = @RestoredDBName
					END
				ELSE 
					-- if the restore failed, then raise an error that it failed and send an emails
					BEGIN
						SET @error = 'Restore of database ''' + @RestoredDBName + ''' failed. Check the log on the server.' ;
						-- send an email alert to that the DBCC CHECKDB failed for the restored item
						EXEC msdb.dbo.sp_send_dbmail @profile_name = 'Mail', @recipients = 'useyourown@email.com', @subject = @error, @body = @error;
						RAISERROR(@error, 18, 1);
					END

			END
		ELSE
			BEGIN
				SET @error = 'No Recent backup was found in ''' + @backupPath + '''';
				RAISERROR(@error, 18, 1);
			END

		-- delete the info on the one we just restored
		DELETE FROM @fileList
		DELETE FROM @FileListTable
	END

	-- once we're done, let's clear out the DBCC_History_Log table a bit, cause it grows fast
	DELETE
	FROM dbo.DBCC_History_Log
	WHERE DBCCCheckDate < GETDATE() - 30;

	-- send an email to sql-alerts with a count of what was done and what passed
	DECLARE @TotalDatabasesRestored int 
	DECLARE @TotalPassedCheck int 

	SELECT 
		@TotalDatabasesRestored = count(*),
		@TotalPassedCheck = count(case when PassedDBCCChecks = 1 then DatabaseId end)
	FROM dbo.DatabaseRestoreLog
	WHERE RestoredDate = CAST(GETDATE() as date)

	DECLARE @subject nvarchar(100) = CONCAT('Database restore results for ', cast(getdate() as date))
	DECLARE @body nvarchar(500) = CONCAT('Total Databases Restored: ', @TotalDatabasesRestored, 'Total Databases Passing DBCC Check:',  @TotalPassedCheck)

	EXEC msdb.dbo.sp_send_dbmail @profile_name = 'Mail', 
		@recipients = 'useyourown@email.com', 
		@subject = @subject, 
		@body = @body;

	
END

Once the database is restored we perform a DBCC CHECKDB on it to see if there are any issues.

Since we restore hundreds of databases daily, I set a limit on the amount of time we perform restore testing each day. We currently limit testing to 12 hours a day, but if need be, we could easily adjust this to 18 hours or some other value. I attempted to make this process flexible to account for adding or removing databases.

Running DBCC CheckDB

Outside of restoring our backups, the last piece is the most important…running a CHECKDB against the database since we do not do so in production.

If you look closely at the previous stored procedure, you won’t see a direct call to DBCC CHECKDB. That’s because I created a separate stored procedure to make that call. But before we get to that stored procedure, there is one more table that I created for this process. Unless you are actively watching the CHECKDB run, you’re going to want to capture the results so you can see any issues that were found. In order to do this, I created one more table to capture the lengthy results:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30


CREATE TABLE dbo.DBCC_History_Log
(
	[ID] [int] IDENTITY(1,1) NOT NULL CONSTRAINT PK_DBCC_History_Log PRIMARY KEY,
	[DatabaseName] [sysname] NULL,
	[DBCCCheckDate] [date] NOT NULL,
	[Error] [int] NULL,
	[Level] [int] NULL,
	[State] [int] NULL,
	[MessageText] [varchar](7000) NULL,
	[RepairLevel] nvarchar(500) NULL,
	[Status] [int] NULL,
	[DbId] [int] NULL,
	[DbFragId] [int] NULL,
	[ObjectId] [int] NULL,
	[IndexId] [int] NULL,
	[PartitionID] [bigint] NULL,
	[AllocUnitID] [bigint] NULL,
	[RidDbId] [int] NULL,
	[RidPruId] [int] NULL,
	[File] [int] NULL,
	[Page] [int] NULL,
	[Slot] [int] NULL,
	[RefDbId] [int] NULL,
	[RefPruId] [int] NULL,
	[RefFile] [int] NULL,
	[RefPage] [int] NULL,
	[RefSlot] [int] NULL,
	[Allocation] [int] NULL,
	[TimeStamp] [datetime] NULL CONSTRAINT [DF_dbcc_history_TimeStamp] DEFAULT (GETDATE())
);

You might be wondering why I’d want to save this info? Since this is a nightly process, I just want it to run in the background and only notify me if something is wrong in the CHECKDB. I also don’t want to have to rerun CHECKDB if there’s an issue. Essentially, I only need to review the data in the table to see the error, and investigate from there.

This table contains the standard output from a CHECKDB, but also includes a few additional columns including the name of the database being tested and the date that the check was performed. The table is populated via the final stored procedure in this process:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81


CREATE OR ALTER Procedure dbo.spRunDBCCChecks
    @dbName sysname,
    @exec BIT = 0 
AS
BEGIN
	DECLARE @dropCmd nvarchar(500),
			@error nvarchar(600),
			@MailMsg nvarchar(500);

	-- DBCC CheckDB will output results that we want to capture to verify if there are errors
	IF OBJECT_ID('tempdb..#tempdbcccheck') IS NOT NULL
		DROP TABLE #tempdbcccheck

	CREATE TABLE #tempdbcccheck
	(
		[Error] [int] NULL,
		[Level] [int] NULL,
		[State] [int] NULL,
		[MessageText] [varchar](7000) NULL,
		[RepairLevel] nvarchar(500) NULL,
		[Status] [int] NULL,
		[DbId] [int] NULL,
		[DbFragId] [int] NULL,
		[ObjectId] [int] NULL,
		[IndexId] [int] NULL,
		[PartitionID] [bigint] NULL,
		[AllocUnitID] [bigint] NULL,
		[RidDbId] [int] NULL,
		[RidPruId] [int] NULL,
		[File] [int] NULL,
		[Page] [int] NULL,
		[Slot] [int] NULL,
		[RefDbId] [int] NULL,
		[RefPruId] [int] NULL,
		[RefFile] [int] NULL,
		[RefPage] [int] NULL,
		[RefSlot] [int] NULL,
		[Allocation] [int] NULL
	);

	PRINT 'Starting DBCC CHECKDB on ' + @dbName

	INSERT INTO #tempdbcccheck
	EXEC ('DBCC CHECKDB(''' + @dbName + ''') with tableresults');

	-- insert the results in the history log table
	INSERT INTO [dbo].[DBCC_History_Log] (DatabaseName, DBCCCheckDate, Error, [Level], [State], MessageText, RepairLevel, [Status], DbId, DbFragId, 
		ObjectId, IndexId, PartitionID, AllocUnitID, RidDbId, RidPruId, [File], Page, Slot, RefDbId, RefPruId, RefFile, RefPage, RefSlot, Allocation)
	SELECT @dbName, GETDATE(), t.*
	FROM #tempdbcccheck t

	-- if the there were no errors in the database check, then we can safely drop the DB
	IF NOT EXISTS (SELECT 1 
					FROM dbo.DBCC_History_Log
					WHERE DatabaseName = @dbName
						AND DBCCCheckDate = cast(getdate() as date)
						AND [Level] > 10 )
		BEGIN
			UPDATE [dbo].[DatabaseRestoreLog]
			SET [PassedDBCCChecks] = 1
			WHERE [RestoredName] = @dbName
				AND [RestoredDate] = cast(getdate() as date)

			PRINT 'Clean DBCCCheck - dropping database [' + @dbName +']';
			SET @dropCmd = 'Alter Database [' + @dbName + '] Set Single_User With Rollback Immediate; Drop Database [' + @dbName + '];';
			PRINT(@dropCmd);

			IF @exec = 1 
			BEGIN
				EXEC(@dropCmd);
			END
		END
	ELSE
		BEGIN
			SET @error = 'DBCC CHECKDB generated errors for database ''' + @dbName + ''' not dropping for further review.' ;
			-- send an email alert to sql-alerts that the DBCC CHECKDB failed for the restored item
			EXEC msdb.dbo.sp_send_dbmail @profile_name = 'Mail', @recipients = 'useyourown@email.com', @subject = @error, @body = @error;
			RAISERROR(@error, 18, 1);
		END

END

In this stored procedure I do a couple of things, run CHECKDB and then validate that the CHECKDB was successful. If it wasn’t successful, then we don’t drop the database, and we send an email alert stating that something needs to be investigated. If everything is ok with the CHECKDB, then we drop the database and it will either move to the next database to restore, or it will exit the job if we have hit the 12 hour time limit.

To get this entire process done, I have two SQL Agent jobs running - one to update the list of the DatabasesToRestore via spUpdateDatabasesToRestore and one that executes the Backup and Restore Testing procedure (spBackupRestoreTesting).

I’ve added all these scripts to GitHub, feel free to peruse them that way or tell me how terrible they are.

Final Thoughts

Currently, we’re restoring 398 databases daily and running the DBCC CHECKDB on each one of them. Does that seem like a lot? Probably, but this process has, in fact, caught issues with some of our databases.

If the CHECKDB finds an issue, it doesn’t drop the database (leaving it for investigation), it sends an email alert. This gives me ample time to figure out what has gone wrong and fix the issue on the production database.

Will this capture issues on the production server that don’t exist in the backup? No, but it does give us confidence in our backups, and has found issues with databases that needed to be fixed - like data corruption. Ideally, you’d be able to run CHECKDB in production, but when you can’t, you need to get creative. This process does two things - tests our backups daily to ensure they’re working, and runs the CHECKDB to make sure there is no corruption in the backup (which would bite us if we ever needed a backup).

How do you run CHECKDB? Do you run it on your production servers or do you run it on a separate server?

Table of Contents