MSDN Home >  MSDN Library >  Windows Development > 

A Developer's Perspective on WinFS: Part 1

Shawn Wildermuth
http://adoguy.com

March 2004

Applies to:
   PDC 2003 Developer Preview

Summary: Explains how WinFS defines types of objects that can be stored and examines the WinFS API. Using WinFS to store data in a cohesive data store, not just a file system, systems and users can store rich metadata about a myriad of objects. (15 printed pages)

Contents

Redmond, We Have a Problem . . . Don't We?
The WinFS Type System
Working with the WinFS Objects
Conclusion

There is so much to talk about in the next version of the Windows operating system, code-named "Longhorn," that the new storage subsystem, code-named "WinFS," may have escaped your notice. Of all the changes in Longhorn, WinFS may represent the biggest fundamental change. Richard Grimes does a great job in his recent MSDN Magazine article describing how WinFS will change the user experience. What you need to know next is how it affects you, the developer.

Redmond, We Have a Problem . . . Don't We?

Users create a lot of data. They write documents, they store their photos, they rip their CDs, and they get a lot of e-mail. Many users feel information overload just from their own data creation. A variety of technologists have been thinking about a better way to store and manage data for many years. Storing data is fine, but managing that data is the real trick.

Users are only part of the problem though. Developers have the same sorts of issues with data. They write a myriad of different applications, from simple personal applications to full-fledged enterprise applications. Whenever developers start to gather any sort of data (which is almost always), they must figure out how to store it.

The Data Conundrum

Since developing applications usually involves creating, storing, or making sense of data, the problem of data storage usually rears its ugly head. There are several typical routes that developers take:

  • File System: Application data is serialized into a binary or text file and saved in the file system.
  • Database: Application data is broken up into its parts and stored in a database. Code is often hand crafted to make this data easier to deal with (for example, business objects) and database server code is often required to make the solution work.
  • Registry: Applications take the approach of saving data in the registry where the data is small or only configuration-based.

While all of these solutions have been working fine in Windows application development for years now, the work to build the infrastructure to make these solutions work has always been a grind in most development shops. Many third-party developers have attempted to solve the solution by generating business objects or class libraries to automate this system.

In addition to the problem of building the specific infrastructure, each of these typical solutions implies a storage structure. Database data is usually relational, registry is hierarchical, and file storage can be anything from relational (serialized tables, for example) and hierarchical (for example, XML), to free-form bits of data (a Microsoft Word document, for example). Integrating an application's specific data with other user data is troublesome at best and further exacerbates the problem of data storage.

The Data Integration Problem

Data integration solutions must first try to make sense of all the different places applications can store information. Let's look at a typical example:

I am writing an application, I need access to the user's Contact List, Documents folder, and Music Folder. To access each of these data stores, I have to deal with three different APIs to get that data (MAPI to access Contacts, System.IO for the Documents folder, and the Windows Media Player SDK to find all the music for a user). It would be better if we had a consistent API for users' data.

But that is not the worst part of the problem. If my application wants to tie its data and these external pieces together, it will need code that not only establishes the relationship between the objects, but deals with changes to that data and handles data synchronization problems such as orphaned data.

Can WinFS Help?

The WinFS storage system technology will change the way that the user experiences an operating system. The core concept behind WinFS is that by storing data in a cohesive data store, not just a file system, systems and users can store rich metadata about a myriad of objects. In addition, these objects in WinFS will be able to be related to each other in fundamental ways. As a developer you can take advantage of this richness.

WinFS has a rich data model for storing data of all sorts of different structures. It does not require relational storage (even though it utilizes some relational database technology under the covers), or hierarchical storage (like LDAP data stores do), but allows developers to create their data structures and let WinFS store the data without ever requiring them to understand or make sense of the actual storage modality.

The WinFS Type System

WinFS is all about defining types of objects that can be stored. Definition of those types involves several different primitive types of the WinFS type system: Items, NestedTypes, ScalarTypes, and Relationships. These four primitive types represent all the data that can be stored within WinFS. Let's take the example of Bob:

Figure 1. Bob and his WinFS data

In WinFS, Bob and his data are represented by these four primitive types as depicted in Figure 2:

Figure 2. Bob's WinFS elements

Each of these primitive types has a very specific job:

  • Item: A fundamental object within WinFS.
  • ScalarType: An atomic piece of information that can be stored about a particular item.
  • NestedType: A structured set of information that can be stored within an item or another NestedType.
  • Relationship: A link between two items.

Understanding Items

The basic object that can be stored is the Item object (represented in the WinFS API as the Item class). Any object that is represented as an atomic unit (like users, files, folders, images, and so forth) is represented by an Item in WinFS.

Items themselves are not stored, but the Item type is specialized for different types of objects. For example, Figure 3 shows a simplified display of how Files, Documents, and Contacts are all inherited from the Item type:

Figure 3. Item specialization

Each of these WinFS types is a specialization of the base Item type. These specializations add new pieces of information specific to the type of object it represents. For example, a File has a timestamp and last accessed time; a Document has the author of the document; and a Person has the first and last name of the Person. This hierarchy continues to specialize as new types of objects are created to be stored in WinFS. This can be seen in Figure 4:

Figure 4. Extending the specializations

For example, the Photo class consists of an Image, a Document, and an Item. While a Photo is a document, and therefore has an author, and last date changed, it is also an Image. Being an Image, it will need to store data like color depth, image size, and format. Lastly, the Photo needs even more data that the Image type has. It wants to store the type of camera it was taken with, the time the photo was taken, and whether a flash was used.

Item Structure

Each WinFS object that specializes (or inherits from) the Item primitive defines the actual fields that will be stored as part of the object. For example, a Contact object defines that a contact includes a name, one or more addresses, and other contact information. The base Item object only knows how to take that information and store it in WinFS. WinFS accomplishes this with NestedTypes and ScalarTypes. For example, here is a depiction of a contact in WinFS:

Figure 5. Our friend Bob

Bob contains a number of pieces of information about him. He is male; he has one personal address, a home address, and the data associated with that address. As we saw earlier, these objects are stored as primitive types:

Figure 6. WinFS primitive types

Even though these are based on primitive types, they are specialized for the data that will reside in them:

Figure 7. WinFS storage classes

WinFS stores Bob as a Person and his gender as a ScalarType of the Person class. The Person class has a NestedCollection to hold a set of personal addresses. The Address within that collection is a NestedType that contains the component parts of the address. Each of these component parts is a ScalarType. This is the real structure of a person (though simplified). The use of ScalarTypes and NestedTypes allowed the structure of a Person to be defined.

ScalarTypes and NestedTypes work together to define the schema for a particular Item in the WinFS store based on the following simple rules:

  • Items may contain ScalarTypes, NestedTypes, and collections of either of these.
  • Nested Types may contain ScalarTypes, NestedTypes, and collections of either of these.
  • Scalar Types support only a specific set of types:
    "WinFS" scalar types Managed SQL type CLR type
    Bigint SqlInt64 Int64
    Int SqlInt32 Int32
    Smallint SqlInt16 Int16
    Tinyint SqlByte Byte
    Bit SqlBoolean Boolean
    Decimal SqlDecimal Decimal
    Money SqlMoney Decimal
    float SqlDouble Double
    real SqlSingle Single
    datetime SqlDateTime DateTime
    char SqlString String
    varchar SqlString String
    nchar SqlString String
    nvarchar SqlString String
    varbinary SqlBinary Byte[]
    image SqlBinary Byte[]
    Uniqueidentifier SqlGuid Guid
    Text SqlString String
    Ntext SqlString String


Relationships in WinFS

The final primitive type you need to understand is the Relationship. A relationship is simply a mapping between two Items in WinFS, a Source and a Target. Note that in the PDC 2003 build of Longhorn, Relationships are called Links. In future releases of Longhorn what were called Links are now called Relationships.

Relationships are bidirectional but imply that there is a hierarchy of sorts, with Sources being above Targets. There are three types of relationships in WinFS:

  • Holding relationships: Relationships that impart a lifetime management role for the Target of the relationship.
  • Reference relationships: Relationships that are a linkage between two Items, without the lifetime management role.
  • Embedding relationships: Relationships that model an object linkage between a parent object that embeds a child object. This is not supported in the PDC 2003 release of WinFS.

Holding relationships

Holding relationships dictate a lifetime on the Target portion of the relationship. For example, a relationship between a File and a Folder is usually a holding relationship. If the Folder is deleted, the File will also be deleted if no other holding relationships exist.

Lifetime management is how WinFS knows when an object can be removed from the data store. Unlike a database where you can orphan objects pretty easily, WinFS will delete an item when all of its holding relationships are deleted. This deserves some clarification.

For example, we have a Folder called One and a file called test.txt and we create a holding relationship between them:

Figure 8. A Folder and a file

At this point if we delete the folder or the relationship, the file will be deleted because there will be no holding relationship to keep it alive. If we add a new folder called Two and add another holding relationship between the new folder and the test.txt file we get a single file with two relationships:

Figure 9. One file, two holding relationships

At this point we can remove the holding relationship between Folder One and the file, and since there is at least one holding relationship, the file will still exist.

Figure 10. One file, one holding relationship left

But if you remove Folder Two's holding relationship to the file, it will be deleted. This is a powerful tool for storing objects that need to exist in a number of different places

Reference relationships

Reference relationships are different than holding relationships because they can still be navigated like holding relationships, but do not have any lifetime management to them.

If we create a third Folder called Three and create a reference relationship our picture will look something like this (note that the dashed line represents a reference relationship):

Figure 11. One holding relationship, one reference relationship, one file

If we were to delete the relationship between Folder Two and the file, the file would deleted. We would still have a reference relationship, but that is not enough to preserve the lifetime of the file.

Working with the WinFS Objects

First of all, we need to figure out how to use the WinFS API. The entire WinFS API is in the System.Storage namespace and its sub-namespaces. These namespaces exist within a couple of assemblies that must be referenced: System.Storage.dll and System.Storage.Schemas.dll. System.Storage.dll assembly is the base assembly that contains all the core functionality. The System.Storage.Schemas.dll assembly contains all the built-in schemas for WinFS. This includes a range of classes for the built-in schemas. For example: System.Storage.Contact.Contact, System.Storage.Contact.Person, System.Storage.File.Folder, and so on.

In addition to adding the WinFS assemblies, you will need to make a reference to WinCorLib.dll. This assembly contains some base functionality that WinFS takes advantage of.

While all three of these assemblies are registered in the GAC (global assembly cache), they are a bit fickle at this point about showing up correctly in Visual Studio .NET, so it is likely that at some point you'll need to find them by hand. The locations of these assemblies will probably change as the Longhorn project continues, but at this point they exist in the following places:

  • WinFS assemblies: \Windows\Microsoft.NET\Windows\v6.x directory.
  • WinCorLib assembly: \Windows\Microsoft.NET\Avalon directory.

Now that we have the references, our code needs to have access to the namespaces themselves:

using System.Storage;
using System.Storage.Contact;

The System.Storage namespace is almost always needed when you do WinFS development. Each built-in schema has its own namespace set up as a child of the System.Storage namespace. For example, all the File classes exist in the System.Storage.Files namespace and the Contact classes exist in the System.Storage.Contact namespace. You can look at the Longhorn SDK for an exhaustive list, but for this example we want to use the Contact schema.

Now that you have the assemblies referenced, you will need to understand how WinFS deals with storage before we can actually work with WinFS objects:

Figure 12. WinFS structure

To start out, each machine has a single instance of WinFS. This instance is simply a service that manages and maintains the WinFS. Within a single machine, data is broken up into different logical units called stores. To begin with, each machine has a default store (called the DefaultStore). Even though many stores can exist on a particular machine, I will use only the DefaultStore for the purposes of this article.

Every store has its own schema container. This means that each store can have its own unique schema for Item storage. In most cases there will be a commonality of schemas across schema.

The key concept to understand about stores is that as you manipulate WinFS objects, you must understand which store the WinFS objects are contained in. Much like drives or shares in NTFS, WinFS objects are containers that can store Items. Before you can get, find, or read WinFS objects, you must know what store you expect them to exist in.

You can access stores by using the UNC syntax:

\\machinename\DefaultStore

Or

\\localhost\DefaultStore

Or

\\MainServer\OurSpecialStore

The first thing we need to do in all WinFS applications is tell WinFS where we want to work. This means we need to tell the WinFS API which store we want to use. To do this, we use an ItemContext object. An ItemContext is used to represent the root for WinFS operations. It represents three different operational ideas:

  • Store and scope: The part of a store to work against. For example, \\machinename\DefaultStore\SomeFolder would work only on objects that exist within SomeFolder in the default store.
  • Transaction boundary: A logical container for a single transactional piece of work. This will be explained later in the text.
  • Working object cache: As objects are opened and manipulated, the ItemContext is the boundary for what objects are cached in memory for performance improvements.

The idea of an ItemContext may seem a bit overwhelming at first, but in the simple case you will simply want to use the default ItemContext (which is the DefaultStore on the local machine):

// Open the default Context (\\machinename\DefaultStore)
ItemContext ctx = ItemContext.Open();

If you simply open an ItemContext by calling its static Open method with no parameters you get the default context, which is the default store on the local machine. Once you have an ItemContext you are ready to start opening objects in the store.

After the ItemContext is set up, you can get objects by finding Items of a particular type. You can think of WinFS objects being stored like rows in a table as seen in Figure 13:

Figure 13. Objects in rows

All objects of a certain WinFS Type are stored together. For example, we can ask the Person class to find all the Person objects (within the ItemContext) by simply calling:

// Find all People in the Default Store
FindResult results = Person.FindAll(ctx);

WinFS Item classes support finding Items in the data store of their type. They do this with static methods that support finding all, some, or just one Item. For example, our code asks the Person class to get all the Persons (People?) in our ItemContext. The result of the FindAll method is a FindResult object. You will get very familiar with this class since it is used for almost all access to objects in WinFS. The FindResult class is a simple container for the results. It supports enumerating through the results. At the present time, this class is a little sparse. Enumeration is about all you will be able to do with the FindResult class. But that is usually all you really want. So this is how we would use the results:

// Add them to the ArrayList
ArrayList personList = new ArrayList();
foreach (Person person in results)
{
  personList.Add(person);
}

For this simple example, we can just iterate through the FindResult to save our Person objects in an easy array. The Items that WinFS returns to you are completely strongly typed. In other words, to get at different pieces of information about a specific Item, you simply use the properties of the class. For example:

// Get the current person
Person person = personList[0] as Person;

// Get a Scalar Type (works if the BirthDate exists)
DateTime birthDate = person.BirthDate;

// Get a NestedType (works if the PrimaryAddress and PostalCode exists)
string zip = person.PrimaryAddress.PostalCode;

You can retrieve information about Items by simply calling their properties. Since the classes are strongly typed, it is almost straightforward to get at the data. Unfortunately, much of the data in strongly-typed classes return their values in OptionalValue objects.

What are OptionalValue objects? The OptionalValue structure is a new addition to the .NET Framework in the Visual Studio 2005 (formerly code-named "Whidbey") release. This structure is meant to allow objects to store optional values in a meaningful way. The OptionalValue structure is a generic type that simply wraps the .NET Type of ScalarType. The structure allows all the properties and methods of the .NET Type to be exposed while keeping type safety in the API.

The OptionalValue structure adds two properties and one method that allow for the data to be optional. Before you attempt to use the data, you need to use one of the following properties or method to see whether the data actually exists:

  • HasValue Property: This returns a Boolean true value if the OptionalValue has an actual value.
  • Value Property: This returns the appropriate value that is stored in the structure, if one exists. You should only call this if HasValue is true.
  • GetValueOrDefault() Method: This returns the value or an appropriate default. This usually means a null for all reference types (including strings) and an appropriate default for value types (int's default value is zero, for example).

Since the ScalarTypes are really made up of OptionalValue structures, our code would be more robust if we wrote it as:

// Get a Scalar Type (or the default)
DateTime birthDate = person.BirthDate.GetDefaultOrValue();

// Or 

// Get the Scalar Type (only if it has a value)
DateTime birthDate;
if (person.BirthDate.HasValue) birthDate = person.BirthDate.Value;

When working with NestedTypes within an Item, you will often need to determine whether the object exists by checking for a null reference. To be more robust we could have written the zip code property like so:

// Get a NestedType 
string zip;
if (person.PrimaryAddress != null) 
{ 
  if (person.PrimaryAddress.PostalCode.HasValue)
  {
    zip = person.PrimaryAddress.PostalCode.Value;
  }
}

We can look at objects individually using this code, but how do we make connections between the different objects inside of WinFS?

Using Relationships in WinFS API

We can take our relationship example from earlier and write WinFS API code to show how these relationships work. Let's look at a little example:

// Create a folder
Folder one = new Folder(root, "FirstFolder");
one.DisplayName = "First Folder";

// Create an array of bytes that will 
// be the contents of our file
byte[] body = Encoding.UTF8.GetBytes("My Test File");

// Create a new file in the first folder
File file = new File(one, "test.txt");

file.Stream = body;
file.StreamSize = body.Length;

// Save everything
ctx.Update();

This code creates a folder off the root of our ItemContext. It then creates a new file and stores it in the folder. When we create the File object, we specify a parent object (the first folder). When we save the file, it creates a holding relationship between the two Items. Next we can add another folder and add the file to the new folder:

// Create another folder
Folder two = new Folder(root, "SecondFolder");
two.DisplayName = "Second Folder";

// Add a link to the second folder (and name the link with the file name)
two.Members.Add(file, true, "test.txt");

// Save everything
ctx.Update();

All is hunky-dory at this point. We have a file and if we inspect either of the folders the file is shown in both places (but only exists once in WinFS). Now we have one file with two folders. For fun, we can remove the file from the first folder:

// Remove the file from the first folder
one.Members.Remove(file);

// Save Everything
ctx.Update();

The file is still good because Folder Two still has a holding relationship. But what if we remove it from Folder Two's members?

// Remove the file from the other folder
// This will cause the file to be deleted because 
// no one has a holding relationship any longer
two.Members.Remove(file);

// Save Everything
ctx.Update();

WinFS will delete the file for us because it does not have any holding relationships any longer. This is a powerful tool for storing objects that need to exist in a number of difference places. Imagine we want one folder to store annual reports by year, and another folder to categorize by company. We still only need one annual report, but we can store it in these categorization folders.

Suppose we go back to where we still have one holding relationship for our file (see Figure 10) and add a new Folder, but create a reference relationship instead:

// Create another folder
Folder three = new Folder(root, "ThirdFolder");
three.DisplayName = "Third Folder";

// Add a link to the second folder (and name the link with the file name)
three.Members.Add(file, false, "test.txt");

// Save everything
ctx.Update();

When we now remove Folder Two again, the file is still removed because the only relationship left is a nonholding relationship. It is not enough to preserve the file's life. Folder Three is left with an orphaned relationship, but that is expected in reference relationships.

Conclusion

I hope you have come to the end of this article with a better understanding of what makes up WinFS. Understanding the differences between Items, NestedTypes, ScalarTypes, and Relationships will put you well on your way to getting prepared to use WinFS in your first Longhorn application. In our next article, A Developer's Perspective on WinFS: Part 2 we will delve into how to change and find WinFS objects.


©2004 Microsoft Corporation. All rights reserved. Terms of Use |Trademarks |Privacy Statement
Microsoft