Use INFORMATION_SCHEMA to find data about your database

Have you ever been in this situation?  You’re the new developer for a database powered application and you have to figure out what tables contain a specific column?  Or maybe you’ve got a legacy database that needs it’s stored procedures and user defined functions cleaned up?  Or perhaps you’ve been assigned a unique database inventory task that requires you to report on the structure of your database, rather than the contents of your database.  Besides cracking open your favorite SQL editor and clicking through your database by hand, you might find the data you need from your database’s INFORMATION_SCHEMA System View.

All of the RDMS systems I’ve worked with have built in System Views ( or tables ) that contain data about the structure and contents of your database.  These views are there to help you with non data tasks.  For the sake of this post we’re only going mention the INFORMATION_SCHEMA, which is the system view I use most in SQL Server, and MySQL.  For your reference, here are links to INFORMATION_SCHEMA docs for four popular RDMS systems.

  1. SQL Server – http://msdn.microsoft.com/en-us/library/ms186778.aspx
  2. MySQL – http://dev.mysql.com/doc/refman/5.5/en/information-schema.html
  3. Oracle – http://download.oracle.com/docs/cd/E17952_01/refman-5.0-en/information-schema.html
  4. DB2 – http://www.tar.hu/sqlbible/sqlbible0100.html

If you work with SQL Server, you’ll be happy to know you can see all of the System Views in Management Studio by default.

SQL Server Management Studio shows all System Views

System Views in the AdventureWorks database

If you work with MySQL, you can see the INFORMATION_SCHEMA database in MySQL Workbench, but you’ll have to enable it in Preferences first.

Edit > Preferences > SQL Editor > Check 'Show Metadata Schemata'

Now for the MySQL users, turning on the Metadata Schemata is not required in order to query those system tables.  Assuming your user has appropriate permissions, you can always fire up Workbench and fire a query like this.

[sql]
SELECT *
FROM INFORMATION_SCHEMA.Columns
WHERE Column_Name LIKE ‘%user%’
[/sql]

MySQL has INFORMATION_SCHEMA tables too!

Now that you know about INFORMATION_SCHEMA, you should be happy to know this is part of the SQL-92 standard. Which means whatever sql scripts you write in MySQL, will most likely work with SQL Server as well. So go ahead and start querying your database, you’ll probably find uses for the INFORMATION_SCHEMA in your daily life really quickly.

I’m going to assume you’ve got the general idea here, so here are a few sql scripts that I’ve used over the years.  The first three are informational queries, and the last three are stored procedures that generate .NET or Coldfusion code based off of a table name.

Find all tables with the column EmployeeID

[sql]
SELECT *
FROM INFORMATION_SCHEMA.COLUMNS
WHERE COLUMN_NAME LIKE ‘EmployeeID’
[/sql]

Which tables have an EmployeeID column?

How many tables have the word employee in the name

[sql]
SELECT *
FROM INFORMATION_SCHEMA.TABLES
WHERE LOWER(TABLE_NAME) LIKE ‘%employee%’
[/sql]

How many %employee% tables are in the DB?

How many SPROCs and UDFs does our database contain?

[sql]
SELECT s.SPROCs, f.UDFs
FROM
(
SELECT COUNT(ROUTINE_NAME) AS ‘SPROCs’, NULL AS ‘UDFs’
FROM INFORMATION_SCHEMA.ROUTINES
WHERE ROUTINE_TYPE = ‘PROCEDURE’
) AS s,
(
SELECT NULL AS ‘SPROCs’, COUNT(ROUTINE_NAME) AS ‘UDFs’
FROM INFORMATION_SCHEMA.ROUTINES
WHERE ROUTINE_TYPE = ‘FUNCTION’
) AS f
[/sql]

How many stored procedures and user defined functions are in the DB?

The stored procedures listed below all accept an incoming table name, then reads data from INFORMATION_SCHEMA to generate code for the specified database table.  Instead of listing the raw SQL, I’m just showing a sample result and allowing you to download the raw SQL sprocs.

Generate C# ADO.NET to VO Stored Procedure

Build custom C# ADO.NET to custom VO fillerup code

download stored procedure


Generate .NET VO class Stored Procedure

Generate C# VO class for DB table name

download stored procedure

Generate Coldfusion CFFunction with CFQUERY Stored Procedure

Generate Coldfusion CFFunction with CFQUERY for specified table name

download stored procedure

I use these System Views almost daily, and there are a lot of scenarios where my job would take a lot longer to do if I didn’t know about this information.  The INFORMATION_SCHEMA also feeds my passion to build code that makes code, just love it.  Hopefully this nickel tour was enough information to help somebody out.

Today I took 30 minutes to save hours

UPDATE – 1/6/2011 : Since writing this post I have bundled this entire reset process into a single windows batch file. In addition to restoring my database, I also have to remove temp files from my server, so it was time to go command line with all of this.

Here the contents of my batch file.

### DELETE TEMP FILES
del /q /f /s Z:_FavoriteClient_GIT_webappinvoices* /f /s

### RESTORE DATABASE
sqlcmd -S "localhost" -i Z:_FavoriteClient_GIT_sqlrestoreBackup.sql

To use this in a .bat file, copy and paste the text into Notepad, then save as “restore.bat”.  Remember, you will need to wrap your filename.bat with quotes, otherwise Notepad will save the file as filename.bat.txt.

* The contents of restoreBackup.sql is listed below.


One of my favorite pastimes as a programmer is writing code that writes code, aka workflow automation.  It’s not something I do on every project, but I’ve been doing a lot of SQL Server development recently, and I’ve been having a lot of fun using the INFORMATION_SCHEMA views to build VO classes, forms representing tables, or just finding what tables have a specific column.  Today I took a 30 minute trip to the SYS side and wrote a tsql script that will save me hours.

The web application I’m working on is a payment processor made up of five steps.  In order to test I have to setup multiple sales for multiple clients, then log in as three different admin users to push the transactions through the system.  This gets my test transactions to the proper testable state.  The process of setting up testable transactions takes over 30 clicks, and once I hit step 3 of the wizard, my test transactions are completed in a way that I have to re-setup the test data ( 20 GOTO 10 ).

I thought about a handful of options, and ended up going with this solution.

  1. Setup all test transactions in web application by hand ( get the data ready )
  2. Take a full database backup ( freeze the data )
  3. Use TSQL script to drop all connections to my app’s db, then restore the database to the “testable” state from step 1 ( reset the DB )

Here is my TSQL script

[sql]
DECLARE @sessID int,
@dbName varchar(50),
@userName varchar(50),
@backupFile varchar(200)

SET @dbName = ‘DA413’ — your database name
SET @userName = ‘DA413’ — sql user account to look for
SET @backupFile = ‘D:DBBackupDA413.bak’ — path to SQL backup file

— use a cursor to store all session_ids
DECLARE session_cursor CURSOR
FOR
SELECT session_id
FROM sys.dm_exec_sessions
WHERE original_login_name = @userName

— open cursor and grab first row
OPEN session_cursor
FETCH NEXT FROM session_cursor INTO @sessID

— loop through session_ids
WHILE @@FETCH_STATUS = 0
BEGIN

— kill it
— using EXEC because the sproc kill does not like @variables
EXEC(‘kill ‘ + @sessID)

— get the next session_id
FETCH NEXT FROM session_cursor INTO @sessID
END

— cursor cleanup
CLOSE session_cursor
DEALLOCATE session_cursor

— restore backup
USE master
RESTORE DATABASE DA413
FROM DISK = @backupFile
GO
[/sql]

A few notes about this script :

  • I only need to disconnect my web application’s database user, not any user
  • I’m using the latest full backup, and not a specific database snapshot
  • The SPROC kill doesn’t like @variables as input, so use EXEC
  • Your web server doesn’t know the user was disconnected, so you’ll have to log back into your application.
  • If you use this technique, be sure to RERUN your backup if you add anything to the database ( EX : new table column, stored procedure, etc )
  • If you’re not the only person connected to this database, make sure you don’t disconnect anybody else using the same database name

This solution is perfect for me because I have full control over my code, database, and server.  It’s also great because I can test my application, run a single sql script, and 10 seconds later I can test my application again.  While this solution is perfect for me, it’s probably best used as reference for others.  However, this technique of rolling back the database could be applied to any software application using SQL Server for it’s datasource.

Hope this helps somebody.

Selecting random ids using TOP and a CTE

While testing visualizations in a Flex application, I needed to do some underlying data cleanup in SQL Server.  One of my tasks was to manually update an entity table and set the status column to one of three possibilities.  Status group A and B both needed to be roughly 20% of my tables total record count, and status group C would be the remaining rows that weren’t touched by status A or status B.  Oh and there’s one more thing, the ids in each status group can not be in sequential order, they have to be random.

At first I thought no sweat.  My dataset is still small ( only 2000 rows ), so if we want uber control I could do the math and generate my id lists by hand.  Yes, hand crafting is possible and under a deadline that kind of logic almost makes sense.  However, I already know the table I’m working with will grow in the future, and I’ll probably have to do this data update again, so why not do this right?  While playing around with different select statements I had a “EUREKA!” moment.  SQL Server’s TOP operator supports PERCENT, not just number.  I couldn’t believe it.  I use TOP at least once a week and I always forget about TOP PERCENT.  Since I already know how to select random rows via CTE, it was time to put it all together.

Before giving you the final SQL, here are the important parts to be familiar with.  Also, for the sake of example I’m using the AdventureWorks database so you can play along at home.

TOP PERCENT

If you just need 50% of the rows in a table, but you’re not concerned about the sequence returned, you can fire this query.  This will give you a sequential listing of ProductIDs

[sql]
SELECT TOP 50 PERCENT ProductID
FROM Production.Product
ORDER BY ProductID
[/sql]

Which will look something like this.

SQL's TOP operator returns rows sequentially

COMMON TABLE EXPRESSION

Now let’s say you want to randomly pull all rows from a table.  This can be achieved using this CTE.

[sql]
WITH data( ProductID ) AS (
SELECT ProductID
FROM Production.Product
)
SELECT ProductID
FROM data
ORDER BY NEWID()
[/sql]

Which will look like this

Common Table Expressions in SQLSERVER are super helpful

If you’re looking to randomly select values from a pre-determined list, see my CTE sample here.

So now that you’ve seen TOP PERCENT and CTE in action, it’s time to put these together and solve my initial task of creating randomly selected groups of ids, of a percent size.

RANDOMLY SELECT TOP PERCENT

Putting it all together, here is the query I used to create my first status group.

[sql]
WITH data( ProductID ) AS (
SELECT ProductID
FROM Production.Product
)
SELECT TOP 20 PERCENT ProductID
FROM data
ORDER BY NEWID()
[/sql]

Which gives me a dataset that is 20% of all rows in Production.Product, and the ids are in random order.

And there you have it. Randomly selecting a percent sized data set from a table in SQL Server. The SQL here is really pretty simple, but for some reason I always forget TOP PERCENT. I’m hoping this post will help me remember TOP PERCENT, and maybe even help somebody else with some TSQL.

What if you want to PIVOT against a text column?

If you’ve ever worked with or researched SQL Server’s PIVOT function, you probably noticed most of the samples pivot against an id column.  Typically an int column like EmployeeID, or StoreID.  That’s fine and dandy, but what happens when you want to PIVOT against a varchar column?  If you’ve been in this need you know this is a bit of a task.

I had this need on an app recently and built a little dynamic sql action that does just this.  The example below however, uses the the DatabaseLog table in the AdventureWorks sample database to return a count of Events logged for each Schema.  Before jumping into the PIVOT, here’s a simple query that gives you the same information, all Schemas, Events, and Event counts.

SELECT      [Schema], [Event], COUNT( [Event] ) AS 'event_count'
FROM        DatabaseLog
GROUP BY    [Schema], [Event]
ORDER BY    [Schema]

Running this query should give you a long result looking something like this.

Data is there, format isn't nice like PIVOT

While this query returns the same information to you, I don’t like this format as much as using PIVOT.  This query result is long and requires a bit of manipulation to get into a readable format.

Now let’s have a look at retrieving the same information using the PIVOT function.

/*
Example of a dynamic PIVOT against a varchar column from the Adventureworks database

References :
PIVOT & UNPIVOT function
http://msdn.microsoft.com/en-us/library/ms177410.aspx

AdventureWorks sample Databases
http://msdn.microsoft.com/en-us/library/ms124501(v=SQL.100).aspx

AdventreWorks.DatabaseLog
http://msdn.microsoft.com/en-us/library/ms124872.aspx
*/

USE AdventureWorks

-- populate temp Event table
SELECT DISTINCT [Event] as 'Event'
INTO	#events
FROM	DatabaseLog

-- this var will hold a comma delimited list of [Event]
DECLARE	@eventList nvarchar(max)

-- create a flattened [Event], list for the PIVOT statement
SELECT	@eventList = COALESCE( @eventList + ', ', '') + CAST( QUOTENAME( [Event] ) AS VARCHAR(1000) )
FROM	#events
ORDER BY [Event]

-- drop table var since our data now lives in @eventList
DROP TABLE #events

-- this var will hold the dynamic PIVOT sql
DECLARE @pvt_sql nvarchar(max)

-- NOTE : we're using dynamic sql here because PIVOT
-- does not support sub SELECT in the 'FOR Event IN ( )'
-- part of the query.
-- If we don't use dynamic SQL here, the PIVOT function
-- requires you to hard code each 'Event'
-- Using SELECT * here so the [Event] columns are auto included
SET @pvt_sql = 'SELECT	*
                FROM
                (
                    SELECT	[Event], [Schema]
                    FROM	DatabaseLog
                ) AS data
                PIVOT
                (
                    COUNT( Event )
                    FOR Event IN
                    ( ' + @eventList + ' )
                ) AS pvt'

-- run the query
EXEC sp_executesql @pvt_sql

Assuming you have the AdventureWorks database installed on your server, running this sql should give you a result looking something like this.

Dynamic PIVOT on text column Event

Show all Schemas and count of each Event type

This query result was truncated to fit in this post, but just know the query above creates a column for every Event in the Databaselog table.

A quick explanation of what’s happening in this sql

  1. First you fill a table variable ( #events ) with all Events from DatabaseLog
  2. Next create a comma delimited list of the Events inside of the table variable
  3. Drop the table variable now that we’ve got our delimited list of Events
  4. Build the PIVOT statement as a string so you can inject the Events list
  5. Fire the dynamic SQL via EXEC

Dynamic SQL is something that comes in handy from time to time, but I do my best to only use it if I absolutely have to.  In this case we’re using it because the PIVOT function does not allow sub SELECT statements.  This is also why we create a specially formatted delimited list of Events prior to building the dynamic sql.

So there you have it, one example of using PIVOT against a varchar column instead of an integer column.  Also, this is a pretty good example of a dynamic PIVOT since it’s pretty simple.  I hope this makes sense, and if you have any suggestions of better techniques, I’d love to hear it.

Incorrect syntax near the keyword ‘table’ in TSQL

Ran into something little that I know I’m going to forget if I don’t write down. It appears that when using a TABLE variable in tsql ( SQL Server 2005 ), you must DECLARE that variable on it’s own line, as opposed to inline with your other @variables.

Typically in my sprocs or sql scripts I do my best to have a main DECLARE block and seperate my @variables with a comma like this.

Typically I DECLARE=

If you're using a TABLE variable, put it on it's own DECLARE line

After some mucking around, it turns out moving the TABLE @variable to it’s own DECLARE line fixes this issue.

DECLARE TABLE @variables on their own line

DECLARE TABLE @variables on their own line

I haven’t found this info in SQL BOL, so I hope this helps somebody else.

What happens in EXEC, stays in EXEC. Lifespan of a MSSQL table variable

One of my all time favorite features of MSSQL 2005+ is being able to create table variables on the fly from SELECT statements. This isn’t a lesson in what table variables are, but here is an easy sample in case this is a new concept.

Running this query

SELECT * INTO #myTableVar FROM YourTable

Gives you a new table variable named myTableVar. Table variables are scoped to the active connection, so running this will work.

// make table var
SELECT * INTO #myTableVar FROM YourTable
// show me the data
SELECT * FROM #myTableVar
// you can drop it if you wish
DROP TABLE #myTableVar

However, let’s say you have an aspx page or a sproc that runs this query.

SELECT * INTO #myTableVar FROM YourTable

You can not access myTableVar in a separate connection to the database because as soon as the first query’s connection closes, myTableVar gets dropped.   Here are a few other scenarios that also demonstrate the scoping of a table variable.

-- FAILS
EXEC ('SELECT * INTO #tmp FROM MyTable;');
-- #tmp does not exist
SELECT * FROM #tmp
#tmp only exists inside of EXEC

Table variable #tmp lives inside of EXEC

Here we see that the table variable #tmp only lives for the life of the statement inside of EXEC. The second SELECT * calls is outside of the EXEC statement.

-- #tmp2 works inside of EXEC statement
EXEC ('SELECT * INTO #tmp2 FROM MyTable; SELECT * FROM #tmp2');
table variables in EXEC live in EXEC

What happens in EXEC, stays in EXEC

Here #tmp2 works because it’s being used inside of the EXEC statement. This is worth knowing if you work with dynamic sql statements and exec.

-- works!
SELECT * INTO #tmp FROM MyTable;
-- #tmp exists
SELECT * FROM #tmp
typical sample of using mssql table variable

typical sample of using mssql table variable

This is a typical example that you may use inside a sproc, trigger, script, etc. Both sql calls live in the same space, so #tmp exists.

Select random value from a range of values

Earlier I blogged about creating random numbers using tsql functions.  Here are two techniques for selecting a random value from a pre-defined range of values in a tsql script.  The first technique uses a table variable ( MSSQL 2000 + ), and the second uses a Common Table Expression or CTE ( MSSQL 2005+ ).

Select a random value using a table variable


-- var to hold random integer
declare @field_val int

-- create table var to hold value range [ 0, 512, 1024, 2048, 4096 ]
-- inserting the first value sets the structure for the table variable
SELECT 0 AS 'num'
INTO #temp

-- insert data into table var
INSERT INTO #temp VALUES ( 512 )
INSERT INTO #temp VALUES ( 1024 )
INSERT INTO #temp VALUES ( 2048 )
INSERT INTO #temp VALUES ( 4096 )

-- assign random value
SELECT TOP 1 @field_val = num FROM #temp ORDER BY NEWID()

-- show value
SELECT @field_val

-- drop the table variable
DROP TABLE #temp

Select a random value using a CTE

-- define our data table
WITH data( car )
AS
(
	-- UNION together our range of values
	SELECT 'audi' AS 'car'
	UNION
	SELECT 'bmw' AS 'car'
	UNION
	SELECT 'infinity' AS 'car'
	UNION
	SELECT 'lexus' AS 'car'
	UNION
	SELECT 'porsche' AS 'car'
)
-- select a random value
SELECT TOP 1 car FROM data
ORDER BY NEWID()

Both of these techniques can be used with numbers or text. Just be sure to mind your quotes, and variable datatypes.  Being able to pick a random value in data generation scripts has proven very useful.  I hope this helps somebody else out as well.

Generate random integers using tsql UDFs

Ever need to generate random numbers from the integer family?  I had this need on a project so I whipped up these four tsql User Defined Functions to help with this task.  There are four functions in all, one for tinyint, smallint, int, and bigint.  Additionally, you will need to create one VIEW since you can not fire the tsql function RAND() inside of a udf.

With these functions, you can generate random integers in their native range.

[sql] SELECT dbo.getRandomInt( NULL, NULL ) [/sql]

Or you can restrict your random integers to a range of your liking.

[sql] SELECT dbo.getRandomInt( 1000, 1000000000) [/sql]

Just as a reminder, here are the native ranges for these four integer types as supported by MS SQL Server 2005

bigint -2^63 (-9,223,372,036,854,775,808) to 2^63-1 (9,223,372,036,854,775,807)
int -2^31 (-2,147,483,648) to 2^31-1 (2,147,483,647)
smallint -2^15 (-32,768) to 2^15-1 (32,767)
tinyint 0 to 255

Each of these functions have the same structure and primarily differ only by the integer type’s native range.  Here is the guts of one of the UDFs in case you want just the facts.

[sql]/******************************************************************************
Generate a random int
——————————————————————————-
USAGE :
— Get random int in the default range -2,147,483,648 to 2,147,483,647
SELECT dbo.getRandomInt( NULL, NULL )

— Get random tinyint within a specific range
SELECT dbo.getRandomInt( 1000, 30000 )

REQUIREMENT : Since you can’t call RAND() inside of a UDF,
this function is dependant on the following VIEW vRand :

— BEGIN VIEW
— This is only a helper VIEW since currently you can not use RAND() in a UDF
— DROP VIEW vRand
CREATE VIEW [dbo].[vRand]
AS
SELECT RAND() AS ‘number’
— END VIEW

******************************************************************************/

USE SmartEarth
GO

IF OBJECT_ID (N’getRandomInt’) IS NOT NULL
DROP FUNCTION getRandomInt
GO

CREATE FUNCTION getRandomInt( @min_in int, @max_in int )
RETURNS int
WITH EXECUTE AS CALLER
AS
BEGIN
——————————————————————————-
DECLARE @max int,
@min int,
@rand NUMERIC( 18,10 ),
@max_big NUMERIC( 38, 0 ),
@rand_num NUMERIC( 38, 0 ),
@out int;

— define this datatype’s natural range
SET @min = -2147483648    — -2,147,483,648
SET @max = 2147483647    — 2,147,483,647

— Check to see if a range has been passed in.
— Otherwise, set to default tinyint range
IF( @min_in is not null AND @min_in > @min )
SET @min = @min_in

IF( @max_in is not null AND @max_in < @max )
SET @max = @max_in
— end range check

— get RAND() from VIEW since we can’t use it in UDF
SELECT @rand = number FROM vRand

— CAST @max so the number generation doesn’t overflow
SET @max_big = CAST( @max AS NUMERIC(38,0) )

— make the number
SELECT @rand_num = ( (@max_big + 1) – @min ) * @rand + @min;

— validate rand
IF( @rand_num > @max )
— too big
SET @out = @max
ELSE IF ( @rand_num < @min )
— too small
SET @out = @min
ELSE
— just right, CAST it
SET @out = CAST( @rand_num AS int )

— debug
— SELECT @min_in AS ‘min_in’, @max_in AS ‘max_in’, @min AS ‘min’, @max AS ‘max’, @rand, @rand_num AS ‘rand_num’, @out AS ‘out’

— return appropriate
RETURN @out;

——————————————————————————-

END;
GO[/sql]

So where do you get the code?

You can view all functions and view online at the following gist.github urls:

Or you can just download all the source code in one zip file here.

Hopefully this will help somebody out.  If you’re a DBA or just a tsql wizard, let me know what you think.  Can I do these functions a better way?  Is this already built into SQL2005 and I just didn’t know it?  All of this tsql was written against SQL Server 2005, but I’m pretty sure it would work on SQL2000 and SQL2008 as well.

I built a calendar in a tSQL SPROC

Here’s a blast from the past that I recently found in my archives. It’s a novelty stored procedure I wrote during my MS SQL 2000 DBA days.
Back when I wrote this sproc, I was really big into writing calendar applications. I have written some sort of calendar application in
almost every language I know, so writing one in tSql made sense to me.

While I never used this sproc in an application, or had any practical use for it, I still think it’s cool. It’s primarily an excercise using
tSql’s date functions, and my all time favorite feature of MS SQL 2000+, table variables.

If you use MSSQL 2000 or higher and don’t use table variables, I highly recommend looking into these. In a nutshell, it’s a type of tSql variable
that is a table. You can select, insert, update, and delete the rows in this variable just like it’s a real table. The lifespan of a table variable
is the length of your connection to your db. So if you have a table var #myTable in sprocA, as soon as sprocA completes execution, #myTable is gone.
SprocB can’t access #myTable unless is specifically creates a new table var by this name.

So I wrote an article about this sproc for a database site years ago and I haven’t been able to find it again. This was web1.0 days, so I’m sure the site
is gone by now. The good thing is I still have the sproc, and now you can to.

So here’s the info. The sproc efCalendar accepts a month number and year number, and spits out two recordsets.

  1. Month, Year
  2. Calendar view of that month

Recordset 1 is two columns, month name, and year.

Recordset 2 is a calendar view of the specified month. There is a column for each weekday, starting with Sunday and ending with Saturday.
Then there is a row for each week in the specified month.

Here is what the results look like when run in Query Analyzer.

tSQL calendar sproc

tSQL calendar sproc

Download the efCalendar sproc here.

Create comma seperated list out of a sql query ( tsql )

Surfing the net for a sql answer, I came across something really cool. Below is an example of how to create a comma separated list of values in a single query.

–declare holder var
DECLARE @list VARCHAR(8000)

–build comma separated list
SELECT @list = COALESCE(@list + ‘, ‘, ”) + CAST(track_id AS VARCHAR(5) )
FROM webtool_tracks

–show results
SELECT @list AS ‘list’

The results should look something like this