I just ran a few simple tests in Azure SQL DB to see how each would perform.
I have a target table with an identity column that is the clustered primary key, and two other indexes that are the same except for the field order. (Whether having both is useful is a question for another day.) Here’s the DDL for the target table:
CREATE TABLE [ETL1]( Field1 [numeric](12, 0) NULL, Field2 [smallint] NULL, Field3 [int] NULL, Field4 [smallint] NULL, Field5 [nvarchar](1) NULL, Field6 [nvarchar](2) NULL, Field7 [numeric](4, 0) NULL, Field8 [numeric](2, 0) NULL, Field9 [nvarchar](2) NULL, Field10 [nvarchar](8) NULL, Field11 [datetime2](7) NULL, Field12 [nvarchar](8) NULL, Field13 [datetime2](7) NULL, [UpdateType] [nchar](1) NOT NULL, [RowCreated] [datetime2](7) NOT NULL, Id BIGINT IDENTITY NOT NULL ) ON [PRIMARY] GO CREATE UNIQUE CLUSTERED INDEX iscdpk ON ETL1 (Id) WITH (DATA_COMPRESSION = PAGE); CREATE UNIQUE NONCLUSTERED INDEX iscd1 ON ETL1 (UpdateType, RowCreated, Field1, Field2) WITH (DATA_COMPRESSION = PAGE); CREATE UNIQUE NONCLUSTERED INDEX iscd2 ON ETL1 (Field1, Field2, UpdateType, RowCreated) WITH (DATA_COMPRESSION = PAGE); GO
Test 1: Truncate the Target, Drop the Indexes, Insert the Data, Recreate the Indexes.
Test 2: Drop the Indexes, Truncate the Target, Insert the Data, Recreate the Indexes
Test 3: Just Truncate the Target and Insert the Data
Test 4: Truncate the Target, Drop the non-clustered Indexes (leaving the the clustered index on the identity column), Insert the Data, Recreate the non-clustered Indexes.
Here are the results. All timings are in milliseconds. These were run on a PRS1 instance of Azure SQL Database.
Trunc but don’t
|Drop Index 1||5||23,630||n/a||2|
|Drop Index 2||6||2||n/a||2|
|Insert 1.84 M rows||83,033||82,315||161,706||83,205|
|Create Index 1||12,149||12,264||n/a||12,265|
|Create Index 2||11,142||11,313||n/a||11,247|
|Total Time (ms)||126,801||150,735||161,706||106,725|
|Total Time (mins)||2.11||2.51||2.70||1.78|
Test 4 was the clear winner as it avoided the cost of recreating the clustered index. Which makes sense as the clustered index was being filled in order by the identity column as rows were added. Test 1 came in second, so if your clustered index is not on an identity column, or you have no clustered index, you are still better off dropping and recreating the indexes.
Conclusion: When inserting larger data sets into an empty table, drop the indexes before inserting the data, unless the index is clustered on an identity column.
2 thoughts on “When Loading Data, Should I Drop Indexes or Not?”
Loved reading this thhank you
Thanks! Glad to hear it was useful.