Dealing with SQL Server LOCK resource errors.

I was recently doing some testing that required a fairly large table. I created the test table, and set it up to insert about 30 million rows. I got an error that read:

The instance of the SQL Server Database Engine cannot obtain a LOCK resource at this time. Rerun your statement when there are fewer active users. Ask the database administrator to check the lock and memory configuration for this instance, or to check for long-running transactions.

Naturally, having not encountered that before, I started by Googling the error and looking up Error 1204 and Microsoft’s Support Article. The support article helpfully suggested setting locks to 0 using sp_configure. But that is the default value and mine was already set that way.

So, I created a simplified setup that would reliably produce the error based on Jeff Moden’s Tally Table Article.

if OBJECT_ID('BigTallyTable', 'U') is not NULL

drop table dbo.BigTallyTable

create table BigTallyTable
(n BigInt primary key)


insert into dbo.BigTallyTable(n)
select top (30000000)
row_number() over (order by s1.name) as n
from master.dbo.syscolumns s1,
master.dbo.syscolumns s2

And I started playing with it. Since this all tied to locks, one thing I did was break it up into batches:

declare @maxValue int, @maxperround int, @last int
set @maxValue = 30000000 --30 Million
set @maxPerRound = 500000


set @last = 0
while @last < @maxValue
begin

if @maxPerRound > @maxValue - @last
set @maxPerRound = @maxValue - @last

insert into dbo.BigTallyTable(n)
select top (@maxPerRound)
row_number() over (order by s1.name) + @last as n
from master.dbo.syscolumns s1,
master.dbo.syscolumns s2

Print 'Finished a round with @last = ' + cast(@last as varchar) + ' and @maxPerround = ' + cast(@maxPerRound as varchar)

set @last = @last + @maxPerRound
end

This avoided the error and generated the results I was looking for. On a multi-user system (I was doing this on my test instance of SQL which is on my personal machine) it would have also allowed other queries that were waiting for locks to be able to execute more freely, though whether that is necessarily good or not depends on the circumstances.

I also got it to lock by directing SQL Server to take a table lock right from the beginning.

insert into dbo.BigTallyTable with (TABLOCK) (n)

select top (30000000)
row_number() over (order by s1.name) as n
from master.dbo.syscolumns s1,
master.dbo.syscolumns s2

This ran substantially faster than splitting it into batches did and is clearly a simpler query. It does take a more aggressive table lock which, as the article on lock escalation points out, can cause more problems with blocking other queries if that is a concern.

Advertisements

Order of a Result Set in SQL

I have seen people build procedures that rely on a result set being returned in a certain order.  This is fine if an order by clause is used, but SQL Server will not guarantee the order of the results without one.

It is interesting to look at the way SQL Server normally returns a result set though.  So, lets make a small sample table:

create table orderTest (n varchar(15))

insert into orderTest (n)
values  (2),
(1),
(3),
(1)

If we run a simple “select n from orderTest” the results I get back are 2, 1, 3, 1.  In other words, it returns the results in the order they were inserted.  This is relatively consistent.  I have tried similar tests on multiple versions of sql on multiple machines.  As I said at the beginning, I have seen some programs rely on the data being returned in the order it was inserted and those programs have gone for extended periods of time without any problems.  But I want to emphasize that SQL Server will not guarantee the order without an order by clause.  If you want to be able to guarantee that you can get the data back in the order that it was inserted in then you can add an identity column and order by that.

The tendency to return things in the order they were inserted seems fairly stable with simple where clauses, but asking SQL Server to return distinct results will change the order as it checks for duplicates.  For instance “select distinct n from orderTest” gives me 1, 2, 3.

And SQL Server seems to return the results of a union all by just concatenating the results of the two queries.  So

select n
from orderTest
UNION ALL
select n
from orderTest

Gives 2, 1, 3, 1, 2, 1, 3, 1.  Essentially, the contents of the table repeated twice.  But union makes the results distinct and so it changes the order.  Redoing that with union  instead of union all gives 1, 2, 3.

Indexes will also affect the order.  You may notice the tests were all done on a heap with no indexes.  But if I add an index (either clustered or nonclustered) it makes the order for “select n from orderTest” become 1, 1, 2, 3.  Of course, you cannot always rely on the data being returned in index order either, unless there is an order by clause.  Similarly, as Conor Cunningham pointed out, the size of the dataset can affect it and break apparent order as the optimizer seeks to increase parallelization.

In short, SQL Server has certain ways it will normally return data and it can be helpful to be aware of those.  Even small changes to the query or changes in the size of the data can have a major impact on the order the results are returned.  Without an order by clause, SQL is not guaranteed to return the data in the way it normally does and you cannot rely on that order.  If the order actually matters, use an order by clause.

SQL Server Three Valued Logic and Ansi Nulls

SQL Server uses a three valued logic with True, False, and Unknown. And, normally, SQL Server Nulls are not comparable. Together that means that this little script:

if Null = Null -- Results in Unknown
print 'Num 1'

if Null != Null -- Results in Unknown
print 'Num 2'

if not (Null = Null) or not (Null != Null) -- Results in Unknown
print 'Num 3'

normally produces no results. It simply prints the standard ‘Command(s) completed successfully.’ One significant implication of this is that if the where clause includes something like

[column_name] = null

It will filter out all rows. The best way to look for Nulls is to use

[column_name] is null

But, whether or not Nulls are comparable can be controlled by the use of the Ansi_Nulls session setting. The default for Ansi_Nulls is on, and in fact the documentation warns that ANSI_NULLS may always be ON, with no way to turn it off, in future versions. Until then, this test script:

Set Ansi_nulls off

if Null = Null -- Now Results in True
print 'Num 1'

if Null != Null -- Now False
print 'Num 2'

if not (Null = Null) or not (Null != Null) -- Now True
print 'Num 3'

Dutifully prints out:

Num 1
Num 3

It is generally best to leave Ansi_nulls in its standards compliant default setting and use “is null” or “is not null”, but occasionally it can be useful to change that.

[20 Aug 13 – Updated to add tags and change category]

Variable Scope in SQL Server

The way variable scoping works in SQL Server has been well documented on MSDN, but I have seen related issues come up on forums recently so I decided to go over it.  Generally, a variable in T-SQL is available between when it has been declared and the end of the batch, function, or stored procedure that it is defined in.  That is of course fairly straightforward, but it is easy to miss some of the subtleties on what constitutes a batch.

Go creates a new batch.

The most explicit way to end a batch is with the GO command.  So for instance:

declare @var varchar(10)

set @var = 'Hi'

print @var --This works

GO --End the batch

print @var --Generates an error

Gives results of:

Hi

Msg 137, Level 15, State 2, Line 2
Must declare the scalar variable "@var".

Each Execute is in a new batch.

When using SSMS, every call to execute generates a new batch.  So, for instance if you highlight:

declare @var varchar(10)

set @var = 'Hi'

print @var --This works

In SSMS and press F5 (or Execute under the Query Menu) then it is all one batch and the print statement will execute.  If you then immediately highlight that print statement again and press F5 again, it will give the “Must declare the scalar variable” because it is executing in a new batch.

A transaction can include multiple batches.

An explicit transaction can include multiple batches.  So, you can open an explicit transaction, declare a variable within it, start a new batch and so lose access to the variable before the transaction commits.  So, to re-use the first example with a slight tweak:

begin transaction

declare @var varchar(10)

set @var = 'Hi'

print @var --This works

GO --End the batch, but not the transaction

print @var --Generates an error since out of batch

--this particular error won't roll back the transaction, but ends the batch

Results in this message:

Hi

Msg 137, Level 15, State 2, Line 2

Must declare the scalar variable "@var".

It got that error because the GO command ended the batch.  The Level 15 error generated ends the execution but not the transaction, so looking @@trancount at this point will return one and we could proceed to commit or rollback the transactions (which doesn’t actually matter in this case since it didn’t affect anything).

A procedure has a different scope even if called within the same batch.

Even if it is called as part of a longer batch, a procedure has its own scope.  So a procedure can’t directly access other variables in a batch that aren’t passed to it as a parameter and it can’t affect variables outside of it except through output parameters. So if we create a procedure just for testing like:

alter procedure SillyProc

as

declare @sillyvar varchar(5)

set @sillyvar = 'Hi'

print @sillyvar

then when it is executed there won’t be an @sillyvar available even if it is in the same batch.  Thus this:

exec SillyProc

print @sillvar --generates an error

Would give the familiar “Must declare” error.

Each execution from an external program is in a new batch.

Generally, each call to execute from a separate program will be in a new batch.  For a python example:

import pyodbc

sqlConnStr = ('DRIVER={SQL Server};Server=(local);Database=TestDb;'+

'Trusted_Connection=YES')

sqlConn = pyodbc.connect(sqlConnStr)

curs = sqlConn.cursor()

sql = """

declare @pyvar varchar(10)

set @pyvar = 'HI!'"""

curs.execute(sql) #This works

sql = "set @pyvar = 'retry'"

curs.execute(sql) #this will give an error.

curs.commit() #Won't get here.

This will give a “Must declare the scalar variable “@pyvar”.” error when it reaches the second execute statement because that is in a new batch and that new batch does not have the variable declared.

Conclusion

The scope of a variable is generally straightforward, but it can have some subtleties.  Essentially, you can use a variable within the function, procedure, or batch it is declared in.

(Edited 17 Feb. 2013 to correct a type in a sourcecode tag.)

Some Jargon from the SQL Server World

Not long ago I wrote about how to write well about SQL Server, and one major point was that jargon should be used sparingly and carefully.  Tied to that, I thought it might be beneficial to collect ten of the more commonly used jargon phrases in the world of SQL Server literature.  In doing so, I hoped to provide a concise list of terms that people joining this community might wish to be familiar with.  I also thought it would serve, as a reminder to the seasoned database professional that these were jargon and might not be understood outside our community, many even have an entirely different meaning in other communities.

To expand slightly on what I said in my previous post, jargon is not always a bad thing.  Sometimes there simply is no non-technical word to express a concept, or the technical term expresses it much more succinctly.  Some in sociology even say that jargon can help with group identity and group cohesion.  Yet, it can make communicating with those outside of the group, or new to the group, difficult.  Even those well versed in the terminology can find reading something which overuses jargon tiring and difficult to follow.  Jargon does not necessarily need to be avoided, especially when the target audience should be familiar with it, but it should be used sparingly and with care.

Here, I am focusing on terms which are relatively common within the SQL community, but which would likely be unfamiliar to those outside of it.  I am intentionally avoiding terms like IAM/PFS[1] that are relatively rarely used in the community.  I am also skipping words like table, row, and query that are amoungst the first things anyone working with SQL Server will learn.  Instead I am focusing on the words that are common but not elementary within the SQL Server community or those that are so common that people readily forget they are jargon, and I am not trying to be comprehensive even there.

1.    DBA  – The job title held by many who work with SQL Server is largely unknown outside the technical community.  Even spelling it out as “Database Administrator” has earned me quizzical looks when I talk to people outside of the computer professions.  I generally describe myself as a programmer when asked unless I know the other person actually has a technical background.  The abbreviated form also overlaps with multiple other terms, such as “doing business as” in the business/legal communities and “Doctor of Business Administration”.

2.    Normalization – Normalization is a way to organize tables and the relations between them to help reduce redundancy.  Under the right circumstances, normalization can help reduce the size of the data files, guarantee certain types of internal consistency, and make the database more flexible for future changes.  However, it can, under certain circumstances, come at a performance penalty and potentially some increased complexity in programming since it frequently involves creating more tables which must be joined together.  The term normalization is shared with many other fields such as sociology and statistics, where it refers to a way to remove certain types of statistical errors.

Normalization is a deep topic and is discussed in some detail in articles such as:

The Road to Professional Database Development: Database Normalization by Peter Larsson

Stairway to Database Design Level 9: Normalization by Joe Celko

3.    3NF – The Third Normal form is the normal form most often sought after for practical database implementations.  It was loosely summed up by E.F. Codd saying “Every Non-key attribute must provide a fact about the key, the whole key, and nothing but the key.”

4.    Codd – Speaking of E.F. Codd, many serious students of relational databases are likely to be familiar with his name.  He was a researcher for IBM who is credited with inventing the relational model for databases and created much of the foundations of the theory behind relational databases.  The phrase “so help me Codd” is also occasionally, jokingly, added to the description of 3NF.  His name is also attached to the Boyce-Codd Normal Form (BCNF, sometimes called 3.5NF).

5.    RDBMS – Relational Database Management System.  MS SQL Server is a prime example of this, but so are Oracle, MySQL, SQLite and others.  Relational databases are based, although sometimes loosely and with compromises, on the relational model first described by Codd.  Relational Databases can be contrasted with OODBMS[2] like ZODB and Key-Value Stores like BigTable.

6.    ACID – Atomicity, Consistency, Isolation, Durability.  ACID describes properties of a reliable database transaction.  To perhaps oversimplify, if a database is ACID compliant then either an entire transaction completes and is committed or none of that transaction is committed.  In SQL Server, the database engine provides mechanisms to ensure that transactions are ACID compliant.[3]   ACID is often contrasted with the BASE[4]consistency model, which describes some types of NoSQL databases.

More information at:
DBAs on ACID by Tony Davis

7.    PowerShell – PowerShell is a command line shell and scripting tool.  It permits access to the .NET framework including SMO, which permits relatively easy control of SQL Server.  There are many online resources for PowerShell, including:

Introduction to Windows PowerShell for the SQL Server DBA by Edwin Sarmiento

Exceptional PowerShell DBA Pt1 – by Laerte Junior

 

8.    CRUD – Create, Read, Update, and Delete.  They are four of the basic functions of any persistent storage system, including databases.  This comes up frequently in the context of describing an application that does CRUD work on a database, especially if that application does nothing but provide a convenient front end or GUI for users to perform CRUD operations on the database.

9.    GUI – Graphical User Interface.  This of course is a term that is used widely in tech circles, and programming circles in particular.  Most users, especially the technically sophisticated ones will already be familiar with this term, but not all.

10.  ETL– Extract, transform, and load.  When discussing databases, this is generally the process of preparing a data source and then loading it into the database or data warehouse.  I have occasionally seen this used in reference to removing information from a database and presenting it in a more user friendly format, but that seems to be a less common use.

Talking about SQL Server in particular, SSIS is one of the main ETL tools used.  But practically any programming language can be used for the purpose.  I tend to make extensive use of Python for ETL tasks that involve substantial preprocessing of the data.


[1] Index Allocation Map/Page Free Space.  I only really learned this one myself when I was investigating some surprising results on a performance test.

[2] Object Oriented Database Management System

[3] There are some subtleties here though.  Handling Constraint Violations and Errors in SQL Server by Phil Factor has details on some of the factors that affect rolling back transactions.

[4] Basically Available, Soft State, Eventual Consistency.