Blog Hiatus

I apologize for the lack of posts. I am still quite enthused about doing this blog. But on the 25th of October I lost my Dad. As Dad’s go he was about the best Dad a guy could ask for. He was kind, happy, and an all around good guy. It’s hard to sum up what a father is to you, especially when they were great. I could go on, but will simply say I miss him and always will, but will see him in heaven some day.

As for the blog, I have a lot of ideas cooking. What makes a good database code review, thoughts on computing ubiquity, my love of Expression Blend 3 and more. But my heart hasn’t been in putting my thoughts to the blog. Give me a week and I’ll be back posting.

OnStartups Answers

I’m a huge fan of Stack Overflow, it has quickly become a top source for great tech question and answers. It has a novel way of keeping the most current answer to the top and a way for the community to dialogue about a solution. Spolsky and Atwood have definitely made a very valuable resource that they’re expanding to other areas. A while ago they would be turning this into a product which got me wondering what other sites like this we’d see pop up.

Well another site based on it has popped up and I’m quite happy to see that it is about entrepreneurship, called OnStartups Answers. I can’t wait to see how this does.

Forced Long Tails

I’ve been thinking about the Long Tail concept lately and the affect of this theory on businesses. Last month Chris Anderson, the creator of the Long Tail commented about Netflix data and the Long Tail. What prompted his post was a paper written at Wharton about the Long Tail affect.

The Long Tail - Picture by Hay KranenIf you’re not familiar with the Long Tail Theory it is a theory that selling small volumes of niche items can bring in significant revenue. This seems particularly true in the digital realm, where the cost to store digital files on servers is considerably cheaper than storing physical objects in a warehouse. The term Long Tail is a result of the graph of inventory and sales (seen left). Where the yellow section is the niche items and results in healthy revenues.

The papers findings appear to echo a lot of academia’s thoughts on the Long Tail. That it is a good theory, and one that has been around for a while, but it does not work out the best when you have to account for physical storage and delivery. I see where they are coming from with this, but I’ll leave this up to the academics to debate.

 

Since I read this article what has kept coming to me is the question, “What are the negative affects of a business seeking out to maximize revenue by implementing the Long Tail?” In fact, what got me thinking about this is the very Netflix data that the paper and Chris reference.

Netflix has been adding more and more movies to its collection for quite some time. Everyday more and more obscure movies, TV shows and self help videos are added. Thousands more than the 3,000 videos Blockbuster puts in its stores. According to the Long Tail theory, the abundance of niche films, documentaries, kids shows, foreign films, snooty French films, and cult classics will create quite a revenue stream for Netflix. Furthermore, it may even decrease demand for mega-pictures. Thus, Netflix can purchase fewer of the hot new releases, because theoretically their customer base will be viewing the niche films more often. It’s a win-win for Netflix and their customers.

This would appear not to be the case once you take into account Netflix Throttling. When first introduced, throttling of your Netflix queue meant Netflix would simply skip the movies at the top of your queue that were currently on everyone’s queue. Instead favoring customers who had received fewer films this month. As this process was refined Netflix finally let you know you were being throttled by telling you movies in your queue had a “Very Long Wait”, “Long Wait”, or a “Short Wait”.

Thus, customers who used the service to the maximum were punished. And customers who paid the same amount but viewed fewer movies each month were rewarded. The business principle here is obvious, the customers who do not view many movies per month cost less to keep. While the customers who view what Netflix deems as “heavy users” cost more per month due to shipping costs.

If a heavy user is upset about this, well Netflix does have over 100,000 movies in their collection, why don’t you choose one of those? Effectively, Netflix has artificially forced its customer base down the Long Tail, rather than customers naturally gravitating towards it. In other words Netflix has a “Forced Long Tail”.

In fact, since the addition of throttling Netflix has even made it harder to even find the newest releases. The new release page commonly shows movies that have been available on Netflix for months. In fact in the New Releases section right now is Mall Cop, a movie that was released to DVD in May, five months ago. This in fact is not the worst offender I see; Madagascar 2 was released exactly nine months ago and is in the New Releases section. This section once truly held new releases, movies that have been out in the past few weeks and even told their customers what movies were coming soon. This is another case of how Netflix is pushing its customer base down the Long Tail.

Is the practice of pushing your customers down the Long Tail in fact a bad practice? Or is it simply a way to expose customers to other movies other than the top hits, in fact a band-aid approach to poor movie suggestion technology?

The positive effect of pushing your customers down the Long Tail of course is to your bottom line. You can purchase fewer mega-hits, potentially increase revenue from niche titles and possible add more customers seeking the niche titles. But at the expense of what?

Customer satisfaction.

The negative effect of this practice is obvious, those customers who love the service the most are instead punished for it, in fact upping your plan to a higher paying, more movies out at a time, plan appears to have no affect on throttling. You pay more, but are still throttled.

The take away here is of course there is always a downside to an upside. The Long Tail may in fact increase revenue as customers explore increased availability to niche products. But at some point businesses may feel that the margins on these niche products are too low and force their customers down the Long Tail at the expense of perhaps customer satisfaction.

But then, the airlines industry has been sacrificing customer satisfaction for years and it’s just doing fine. Right?

Tags: , ,

Using Table-Valued Parameters in SQL Server 2008 and C#

Until recently I had not had an opportunity to use Table-Valued Parameters a new feature in SQL Server 2008. I had looked at them briefly, thought they were a nice addition, and then moved on. Finally though, I found a chance to use them.

The intended use for Table-Valued Parameters is of course to send multiple rows of data to SQL Server from the client. Previous ways to do this involved calling a single stored procedure repeatedly, using XML, using SQLBulkCopy, or my least favorite, creating a stored procedure with many parameters. Each of these methods had draw backs. Calling a single stored procedure over and over works just fine (and was my preferred method) but you are creating a lot of traffic for something that should be simple. XML was nice too, but depending on the complexity of what you are sending you will need a lot of XQuery in your stored procedure; making something simple much more complex. SQLBulkCopy works great, I’ve used it before, but sometimes you may want to do more to your data once it is at the database. Thankfully table-valued parameters solve many of these short comings.

You can use table-valued parameters from your application with DataTables, DbDataReader, or IEnumerable objects. The majority of examples for table-valued parameters are done using DataTables. The problem I have with DataTables is that this seems like too much of an ad-hoc method with more overhead then is needed. Whereas going the IEnumerable route lets you create and use objects that you would likely already be using. This is the route I prefer, and the one I will demonstrate.

For this example assume we have a database of people and each of the people may have one or more aliases. They must be rather shady people. Here is the table for storing the people:

CREATE TABLE tbl_Person
(
      [ID] INT IDENTITY(1,1) NOT NULL,
      [FirstName] VARCHAR(20) NOT NULL,
      [LastName] VARCHAR(20) NOT NULL
)

It is quite simple, an identity field called ID to give everyone a unique number and a first and last name.

Now we have a table to store their aliases:

CREATE TABLE tbl_Alias
(
      [ID] INT NOT NULL,
      [FirstName] VARCHAR(20) NOT NULL,
      [LastName] VARCHAR(20) NULL
)

Another simple table to keep our shady people’s aliases; let’s pretend for ease of use there are primary keys and foreign keys shall we?

To add to this table we will use a stored procedure, my favorite mechanism to get data to the database. Why? Because you give the DBA something concrete upon which they can optimize your database, among other reasons. But that’s another post.

Our stored procedure will be an inserting stored procedure which will add our new shady person and their aliases in one call. Thus we will need to create our user-defined table type  first.

CREATE TYPE udt_tbl_Alias AS TABLE
(
      [FirstName] VARCHAR(20) NOT NULL,
      [LastName] VARCHAR(20) NULL
)

Again, nothing complicated. You will see why I left off the ID field in a bit.

You should note that you cannot use ALTER commands on user-defined table types. So if you want to update the table type later you will have drop it and create it. But if it is being used by a stored procedure you will have to temporarily comment that out of your stored procedure first, then drop the UDT, create the UDT again, and uncomment the UDT in your stored procedure. So plan ahead! Having planned ahead, here is our stored procedure for inserting:

CREATE PROCEDURE usp_ins_Person
      @FirstName VARCHAR(20),
      @LastName VARCHAR(20),
      @Aliases udt_tbl_Alias READONLY
AS
BEGIN
      SET NOCOUNT ON

      INSERT INTO tbl_Person VALUES (@FirstName, @LastName)

      INSERT INTO tbl_Alias (ID, FirstName, LastName)
            SELECT @@IDENTITY [ID], FirstName, LastName FROM @Aliases
END

The stored procedure requires three parameters: a first name, a last name and the user-defined table-valued parameter. The table type must be marked as READONLY and thus you will not be able to update the table within the stored procedure, only select from it. Otherwise, the stored procedure is rather straight forward. The person is added to tbl_Person and the aliases are added to tbl_Alias. You can see why I chose not to have the ID part of the user-defined table type, since we will not the ID till the person has been inserted.

To use the stored procedure in SQL Server Management Studio you can do the following.

DECLARE @Aliases udt_tbl_Alias
DECLARE @FirstName VARCHAR(20) = 'Bryan'
DECLARE @LastName VARCHAR(20) = 'Smith'
 
INSERT INTO @Aliases VALUES ('Database', 'Guy'), ('DBA', NULL)
 
EXEC usp_ins_Person @FirstName, @LastName, @Aliases
 
SELECT * FROM tbl_Person
SELECT * FROM tbl_Alias

The output should look like this.

Output

Output

Great, we’re halfway there! The database side of things is taken care of, now we will need to take care of our client. Within the client code we will create a Person class. The class will have a property for the first name, last name, and aliases. This is why I like the IEnumerable route; it makes sense to store the aliases for a person with the person, and doing so as a list makes sense.

Person Class:

namespace TestUDTApplication
{
    class Person
    {
        public string FirstName { get; set; }
        public string LastName { get; set; }
        public AliasCollection Aliases { get; set; }
 
        public Person(string firstName, string lastName)
        {
            FirstName = firstName;
            LastName = lastName;
            Aliases = new AliasCollection();
        }
    }
}

Alias Class:

namespace TestUDTApplication
{
    class Alias
    {
        public string FirstName { get; set; }
        public string LastName { get; set; }
 
        public Alias(string firstName, string lastName)
        {
            FirstName = firstName;
            LastName = lastName;
        }
    }
}

AliasCollection Class:

using System.Collections.Generic;
using System.Data;
using Microsoft.SqlServer.Server;
 
namespace TestUDTApplication
{
    class AliasCollection : List<Alias>, IEnumerable<SqlDataRecord>
    {
        IEnumerator<SqlDataRecord> IEnumerable<SqlDataRecord>.GetEnumerator()
        {
            SqlDataRecord ret = new SqlDataRecord(
                new SqlMetaData("FirstName", SqlDbType.VarChar, 20),
                new SqlMetaData("LastName", SqlDbType.VarChar, 20)
                );
 
            foreach (Alias alias in this)
            {
                ret.SetString(0, alias.FirstName);
                ret.SetString(1, alias.LastName);
                yield return ret;
            }
        }
    }
}

Let’s look over this code. First off the Person class has three public properties. FirstName, LastName, and Aliases. The Aliases property is an instance of the AliasCollection class, which inherits the Alias class as a List and then implements the IEnumerable interface. The List<Alias> turns our Alias into a List, which is perfect for handling our aliases within the client code.

The implementation IEnumerable<SqlDataRecord> is what will let us use our List as the input to our user-defined table. Specifically we are implementing IEnumerable with SqlDataRecord, this represents a single row of data and its associated metadata. This is what ADO.Net and SQL Server will need to map our list of aliases to our user-defined table. The documentation is not really clear on this part unfortunately, it leads you to believe that anything that implements IEnumerable should suffice, however; what you must have is IEnumerable with SqlDataReader. This does not come out of the box with List<Alias> so we create our own, otherwise we’ll get an InvalidCastException.

Finally, from our client, using the Enterprise Library Data Access, we can load people and aliases using a user-defined table type.

Person person = new Person("Bryan", "Smith");
person.Aliases.Add(new Alias("DBA", "Dude"));
person.Aliases.Add(new Alias("Database", "Guy"));
 
SqlDatabase db = (SqlDatabase)DatabaseFactory.CreateDatabase("UDTTest");
DbCommand cmd = db.GetStoredProcCommand("usp_ins_Person");
 
db.AddInParameter(cmd, "@FirstName", DbType.String, person.FirstName);
db.AddInParameter(cmd, "@LastName", DbType.String, person.LastName);
db.AddInParameter(cmd, "@Aliases", SqlDbType.Structured, person.Aliases);
 
db.ExecuteNonQuery(cmd);

I know personally, I’ll be using this method from now one whenever possible over using user-defined tables with ad-hoc DataTables to load data, or heaven forbid calling the same stored procedure repeatedly.

Tags: ,

DBCC CHECKFILEGROUP bug in SQL Server 2008

Update 2009-12-09: Commulative Update 8 for SQL Server 2008 and a subsequent Hotfix after Commulative Update 5 for SQL Server 2008 SP1 have corrected this issue. So if you haven’t been able to run DBCC CHECKFILEGROUP’s go grab these!

I noticed Cumulative Update 4 for SQL Server 2008 is out now. In it I found not one but two issues fixed that affect me. Thankfully, both issues have a relatively minor impact on my systems. 

However, the fix that I want to see, is not in this Cumulative Update. It’s a fix for DBCC CHECKFILEGROUP not working on file groups that are part of a partitioned table.

DBCC CHECKFILEGROUP runs the normal DBCC CHECKDB commands, but only on the specified file group. This is quite handy when you’re dealing with a very large database (VLDB); say one in the multi terabyte range, because a CHECKDB command can take an awful long time to run on huge files. Thus one common practice, championed by Paul Randal is to run DBCC CHECKFILEGROUP regularly on each file group. This lets you break up a really long process into manageable chunks. 

Of course, what I did not know until recently is that DBCC CHECKFILEGROUP doesn’t work in right in SQL Server 2008. In fact it just skips the specified file if it is part of a partition. In my case I have a 13TB database that uses partitioning heavily. After conversion from 2005 to 2008 I went to run my DBCC checks, as I normally do, to find that it wasn’t doing the check. File groups that would take 40 minute returned in seconds and upon examination of the output the file group was being skipped. 

So blissfully thinking that someone else must have seen this I did some Google Kung-Fu … and found nothing. 

Then I did some posting on forums … and found nothing (well I did find some nice MVP’s). 

So then I called up Microsoft Premier Support, gave them my repro script and found out that I had found a bug.

Here’s the script in question, so you can try it out. This is for SQL Server 2008 RTM and SP1, Enterprise and Developer editions, and I’ve tried it out on Windows 7, Server 2003 and Server 2008. 

USE [master]

GO

-- -------------------------------------------------------------

– Create the test database

-- -------------------------------------------------------------

CREATE DATABASE [TestFileGroup] ON  PRIMARY (

      NAME = N’TestFileGroup’,

      FILENAME = N’J:\Data_Staging\MSSQL10.STAGING\MSSQL\Data\TestFileGroup.mdf’ , — Update the path

      SIZE = 2048KB , MAXSIZE = UNLIMITED, FILEGROWTH = 1024KB

),

FILEGROUP [FG_1]

(

      NAME = N’FG_1′,

      FILENAME = N’J:\Data_Staging\MSSQL10.STAGING\MSSQL\Data\FG_1.ndf’ , — Update the path

      SIZE = 2048KB , MAXSIZE = UNLIMITED, FILEGROWTH = 1024KB

),

FILEGROUP [FG_2]

(

      NAME = N’FG_2′,

      FILENAME = N’J:\Data_Staging\MSSQL10.STAGING\MSSQL\Data\FG_2.ndf’ , — Update the path

      SIZE = 2048KB , MAXSIZE = UNLIMITED, FILEGROWTH = 1024KB

)

LOG ON

(

      NAME = N’TestFileGroup_log’,

      FILENAME = N’J:\Logs_Staging\MSSQL10.STAGING\MSSQL\Data\TestFileGroup_log.ldf’ , — Update the path

      SIZE = 2048KB , MAXSIZE = 2048GB , FILEGROWTH = 10%

)

GO

 USE [TestFileGroup]

GO

 – ————————————————————-

– Create the partition function

– ————————————————————-

CREATE PARTITION FUNCTION [myRangePF](int)

AS RANGE LEFT FOR VALUES (5)

GO

 – ————————————————————-

– Create the partition scheme

– ————————————————————-

CREATE PARTITION SCHEME [myRangePS]

AS PARTITION [myRangePF]

TO ([FG_1], [FG_2])

GO

 – ————————————————————-

– Create a partitioned table on myRangePS

– ————————————————————-

CREATE TABLE [dbo].[myPartitionTest]

(

      [Id] [int] IDENTITY(1,1) NOT NULL,

      [Data] [varchar](10) NULL

) ON myRangePS (Id)

 – ————————————————————-

– Insert Test Data, Enough to cross both partitions

– ————————————————————-

INSERT INTO myPartitionTest

VALUES (‘Test 1′)

 INSERT INTO myPartitionTest

VALUES (‘Test 2′)

 INSERT INTO myPartitionTest

VALUES (‘Test 3′)

 INSERT INTO myPartitionTest

VALUES (‘Test 4′)

 INSERT INTO myPartitionTest

VALUES (‘Test 5′)

 INSERT INTO myPartitionTest

VALUES (‘Test 6′)

 INSERT INTO myPartitionTest

VALUES (‘Test 7′)

 INSERT INTO myPartitionTest

VALUES (‘Test 8′)

 SELECT * FROM myPartitionTest

 – ————————————————————-

– Run DBCC CHECKFILEGROUP

– ————————————————————-

DBCC CHECKFILEGROUP (‘FG_1′)

/*

      Expect it to show it found 5 rows on the object myPartitionTest and also to say it couldn’t check FG_2.

*/

 DBCC CHECKFILEGROUP (‘FG_2′)

/*

      Expect it to show it found 3 rows on the object myPartitionTest and also to say it couldn’t check FG_1.

*/

As you can see, the script is fairly simple. It creates a database with a couple of extra file groups, then proceeds to make a partition function and scheme with a table to go use them. Then it adds some data to the table, so data will reside on both partitions. Finally it does a DBCC CHECKFILEGROUP command and this is where it gets interesting.

What should happen when you run CHECKFILEGROUP (‘FG_1’) is that it will say it cannot check data on FG_2, Primary, etc, but will return results for the checks it ran on FG_1. But what actually happens is it returns what it can’t check and doesn’t do any checks on FG_1.

So there you have it, I’m hoping that Cumulative Update 5 will have the fix I need. I know a few others who run VLDB’s and this is disappointing news for them as well. Hopefully, this will get enough attention and will warrant a quick fix.

Do you certify?

I’m currently preparing to take the test to become a Microsoft Certified Technology Specialist. More specifically I’m taking exam 70-432 to achieve the “MCTS: SQL Server 2008, Implementation and Maintenance” certification. The title is quite a mouthful.

I’ve followed the Microsoft database certs for quite a while now and have watched them grow and become more specialized as many other MS certs have as well. As it currently stands there are six certifications for SQL Server 2008. Not counting the Microsoft Certified Master and Microsoft Certified Architect tracts. The six core tests are divided into three tracts: Implementation and Maintenance, Development, Business Intelligence. Each of these tracks are divided into the Technology Specialist (MCTS) level and the “Pro” level the “Microsoft Certified IT Professional”.

Thus if you are a pure-DBA, only playing in the Implementation and Maintenance realm of SQL Server you can get your MCTS and then MCITP in just that realm. Or if like myself you are both a developer, dba and do work in BI you may end up wanting to go the full gauntlet.

As it stands though, that’s six tests. I’m committed to the MCTS for Implementation and Maintenance as well as 70-433, MCTS in Database Development. After that I plan to evaluate whether to continue on.

In fact, those two certifications will be my first. I toyed with the idea of getting certified in SQL 2005 but never bit the bullet. After some contemplation, I’ve found that I want to become certified in SQL Server 2008.

I’m using the certification process to be sure I’m up to snuff in all areas concerning SQL 2008, as my primary environment is now SQL 2008 and I expect that I’ll have my entire environment upgraded to SQL 2008 by the end of the year. I feel quite lucky in this, as I know many people who are still in SQL 2000, just looking to go to 2005 or have quite a mix of 2000, 2005 and 2008.

Others use certifications as a statement for their resume. Which is nice, but I think it is commonly known that certification does not make an expert. Certification is simply another litmus test of someone’s proficiency.

For me though, I’ll become certified, and that’s quite nice I suppose, but more importantly I’ll have a nice focused avenue to be sure I’m up to date fully on the new capabilities of SQL Server 2008. I like nice concise packages and with how Microsoft certifications are now done this fits the billet.

Tags: ,

Hello there.

I’ve debated writing a blog for a while. I tried four years ago and the groove never stuck. But for the past two years I keep saying to myself and those I work with “I should write a blog”. Or at the very least have a place to point out things of interest.

You see, I’m interested in the Art of Software. An awful lot goes into any given piece of software. Software starts as an idea and doesn’t magically morph into the next juggernaut website, desktop app, or iPhone application. Thought goes into it, about a whole host of topics: what language, what database, how will it work, how will it look, how will I sell it, should I even sell it, will anyone use it, how do I let them know it exists, how do I support it. And on and on and on.

So here I go again…

Tags: