Maximum Functionality with SQL 2005 Large-Value Data Type and Large Row Support
By Kathi Kellenberger
Introduction
SQL 2005 delivers many new features for DBAs and developers alike. Database Administrators have many new options for security, availability, recoverability and much more. Probably the biggest news on the developer side is the CLR integration. Those shops choosing to use this functionality will be able to write user defined functions, stored procedures, user defined types and triggers with a .Net language. Even if you choose to not enable the CLR, there are still a multitude of enhancements to the T-SQL language that make upgrading to SQL Server 2005 worth seriously considering. Two of the enhancements make development easier when working with wide variable length columns and rows: Large Value Data Types and Large Row Support. To make this discussion simpler, unless stated otherwise VARCHAR will mean VARCHAR and NVARCHAR. TEXT will mean TEXT and NTEXT throughout the remainder of the article.
TEXT versus VARCHAR in SQL Server 2000
VARCHAR and NVARCHAR columns and variables can be defined to hold string data of variable length up to 8000 and 4000 characters respectively, or 8000 bytes each. Most of the time VARCHAR columns will be much smaller. Depending on the application, you may be required to create tables with some wide columns to hold paragraph type data, and it’s possible that 8000 characters may not always be large enough. One option to get around this limitation is to use TEXT columns instead of VARCHAR columns. At first glance, this seems like a great idea. TEXT columns can hold up to 2 GB of data, and SQL Server stores the TEXT data in a separate collection of pages with just a pointer in the data row. There is an option called “text in row” that can be set at the table level to store small TEXT values in the data row. The “text in row” option is turned off by default. Unfortunately, the TEXT data type is just not as flexible as the VARCHAR data type. In the following code example, I used GO to separate each command into a batch so you will see all of the error messages when you run this in SQL Server 2000 Query Analyzer.
CREATE TABLE table1 (col1 TEXT)
GO
INSERT INTO table1 (col1) VALUES (‘a’)
GO
SELECT * FROM table1 WHERE col1 = ‘a’
GO
SELECT * FROM table1 ORDER BY col1
GO
UPDATE table1 SET col1 = col1 + ‘b’
GO
SELECT COUNT(col1) FROM table1
GO
SELECT LEFT(col1,1) FROM table1
GO
DROP TABLE table1
GO
(1 row(s) affected) Server: Msg 306, Level 16, State 1, Line 1 The text, ntext, and image data types cannot be compared or sorted, except when using IS NULL or LIKE operator. Server: Msg 420, Level 16, State 1, Line 1 The text, ntext, and image data types cannot be used in an ORDER BY clause. Server: Msg 403, Level 16, State 1, Line 1 Invalid operator for data type. Operator equals add, type equals text. Server: Msg 279, Level 16, State 2, Line 1 The text, ntext, and image data types are invalid in this subquery or aggregate expression. Server: Msg 8116, Level 16, State 2, Line 1 Argument data type text is invalid for argument 1 of left function. Server: Msg 306, Level 16, State 1, Line 1 A TEXT column can’t be used with the equals (=) operator. It can’t be used in a where clause unless the LIKE operator or IS NULL is used or when used with Full Text Indexing. It can’t be used in an ORDER BY clause, and you can’t use it in an aggregate query. Many T-SQL string functions, like LEFT and REPLACE, will not work with TEXT columns. It is possible to cast the TEXT columns to VARCHAR(8000), but you will lose any characters past the 8000 that can be converted. It is also impossible to declare local variables of type TEXT in stored procedures, UDFs and triggers. While it’s tempting to view TEXT columns as a replacement for large VARCHAR columns, you lose a great deal of functionality.
The new MAX specifier for Large Value Data Types
SQL Server 2005 introduces the new MAX specifier for VARCHAR, NVARCHAR and VARBINARY columns and variables. The VARBINARY(MAX) data type can be used in place of the IMAGE data type and won’t be covered in this article. A VARCHAR column created using MAX can hold up to 2^31 characters. So, instead of using a TEXT column when the data could possibly exceed 8000 characters, you can define the column using MAX. To use the MAX specifier in your table definitions and variable declarations, just type MAX in place of the column size. If you create the table in SQL Management Studio, you’ll see VARCHAR(MAX) listed as a separate data type. The following SQL Server 2005 code demonstrates how to use the MAX specifier. You now have the functionality available with the traditional VARCHAR column and with the size only previously possible with a TEXT column:
CREATE TABLE table2 (col1 VARCHAR(MAX))
GO
— A row with 8000 characters in col1
INSERT INTO table2 (col1) SELECT
REPLICATE(‘a’,8000)
GO
— The next 5 lines would fail if a TEXT column was used
SELECT * FROM table2 WHERE col1 = ‘a’
SELECT * FROM table2 ORDER BY col1
UPDATE table2 SET col1 = col1 + ‘bbbb’
SELECT COUNT(col1) FROM table2
— Returns 8004
SELECT LEN(col1) FROM table2
GO
DROP TABLE table2
This code will run without error in SQL Server 2005, assuming you have permission to create a table in your test database. After appending the literal ‘bbbb’ to col1, it contains 8004 characters. The MAX specifier gives you the best of both worlds: the ability to store a large amount of data without the loss of functionality. You can create local variables using VARCHAR(MAX) in UDFs and stored procedures and manipulate the variables just as you would with traditional VARCHARs. The TEXT data type is still available for now, but SQL Books Online recommends that you begin using the VARCHAR(MAX) going forward. There is an issue to watch out for when working with VARCHAR(MAX). Thanks to Simon Sabin, who discovered the problem and created a blog entry about it. The code example below demonstrates how VARCHAR(MAX) doesn’t always behave as you would expect:
DECLARE @myVariable VARCHAR(MAX)
SET @myVariable = REPLICATE(‘a’,8000) +
‘bcdefghi’
SELECT LEN(@myVariable)
——
8000
The literal values are interpreted as VARCHAR data types and so is the value returned by the REPLICATE function. Adding two VARCHAR(8000) variables produces a VARCHAR(8000) value. Since the first string fills up the variable, the second string can’t be appended. The final length of @myVariable is only 8000 characters. The solution is to cast one or both of the strings to VARCHAR(MAX) to guarantee the proper result as demonstrated below:
DECLARE @myVariable VARCHAR(MAX)
SET @myVariable = REPLICATE(CAST(‘a’ AS
VARCHAR(MAX)),8000) + ‘bcdefghi’
SELECT LEN(@myVariable)
——
8008
Large Row Support
A table row in SQL Server can only contain up to 8060 bytes, including overhead. SQL Server 2000 will allow you to create a table with the potential for more than this limit. In other words, once you add up the fixed length columns and the variable length columns the total hypothetical row could exceed 8060 bytes. SQL Server 2000 will not allow any updates or inserts that would create rows that surpass the maximum row size, however. The following SQL Server 2000 code contains two create table statements. The first statement will fail because it has a row size of 16021 bytes. Any insert into table3, if the table could be created, would exceed the limit. The character data type will take up all of the space allocated, even for NULL values. The second statement succeeds, but returns a warning about the size of the row. — SQL Server 2000 script CREATE TABLE table3 (col1 CHAR(8000), col2 CHAR(8000)) GO CREATE TABLE table4 (col1 VARCHAR(8000), col2 VARCHAR(8000)) Server: Msg 1701, Level 16, State 2, Line 1 Creation of table ‘table3’ failed because the row size would be 16021, including internal overhead. This exceeds the maximum allowable table row size, 8060. Warning: The table ‘table4’ has been created but its maximum row size (16025) exceeds the maximum number of bytes per row (8060). INSERT or UPDATE of a row in this table will fail if the resulting row length exceeds 8060 bytes. If you attempt to insert a row with 8000 characters in each column of table4, the insert fails. INSERT INTO table4 (col1, col2) SELECT REPLICATE(‘a’,8000), REPLICATE(‘b’,8000) GO DROP TABLE table4 Server: Msg 511, Level 16, State 1, Line 1 Cannot create a row of size 16013 which is greater than the allowable maximum of 8060. The statement has been terminated. With Large Row Support, another new feature of SQL Server 2005, you can create a table with varying length columns with a total potential row size that exceeds the 8060 byte limit without a warning as long as the non-varying length columns do not exceed the limit. If a row that exceeds 8060 bytes is inserted or updated, SQL will move the largest overflow column or columns to another page (the ROW_OVERFLOW_DATA allocation unit) so that the row fits the page. The columns where the data has been moved out of the data row store pointers to the overflow data. The statements from the previous code, creating and inserting a row into table4, will run without warnings or errors in SQL Server 2005. SQL Server 2005 Books Online cautions that you may experience performance problems if a large percentage of rows flow over to another page, so be careful when using this new feature. The use of VARCHAR(MAX) is recommended over Large Row Support by SQL Books Online.
Conclusion
Large Value Data Types and Large Row Support are just two of the many great enhancements you will find with SQL Server 2005’s implementation of T-SQL. Developers can now use VARCHAR(MAX) to hold as much data as the old TEXT data type without losing any of the rich functionality of the traditional VARCHAR. As new databases are designed, new code is written, or existing code is updated, VARCHAR(MAX) should be used in place of TEXT columns. With Large Row Support, it is possible to use traditional VARCHAR columns and insert rows that exceed the 8060 byte limit. DBAs and developers should be familiar with both new features and how they work in order to make the best decisions for their database design.
Author Bio
Kathi Kellenberger is a DBA at Bryan Cave LLP, an international law firm headquartered in St. Louis, MO. She is a frequent guest columnist at SQLServerCentral.com and is co-author of Professional SQL Server 2005 Integration Services. She is a PASS volunteer, a speaker at the 2005 PASS Summit and co-chair of the St. Louis SQL Server Users Group. She also holds MCSD (VB6) and MCDBA certificates.