Blog home

Get more out of your content: content modeling and data stores

In our opinion, content modeling is one of the most important topics in content management, and should be one of the main factors when you're evaluating a CMS. 

Deane Barker (blogs at Gadgetopia, works at Blend Interactive) is very knowledgeable about the CMS business and technology, and he has a range of important and educational blog posts on content management. Deane recently posted a very good blog post called “Chasing the Ideal: Relational Content Modeling in Content Management”. In the blog post he talks about relational content management, or in other words, how different content items relate to each other in content management systems.  

content_treecontent_treecontent_tree

In the blog post Webnodes is mentioned as one of the exceptions to the rule that CMS’ are bad at relational content modeling. Which is cool.. :-) But this is not why I’m writing this blog post. Deane’s post is very good, but there are a couple of points I’d like to comment and elaborate on, as well as give you some background on how we think about content modeling.

In the blog post, Deane writes that excellence in content modeling really boils down to two main competencies, each with several sub-competencies. All of the competencies are well-thought out, and they all make sense in the context they are written. Deane defines the best way to do content modeling in the context of a content tree. In our opinion there are limitations and problems with using a content tree as the basis for your content model. Content isn't naturally structured as a neat hierarchy with parent/child relations. The natural structure of content is more like a network of relations.  

network_relationsnetwork_relationsnetwork_relations

You can work with content in Webnodes in the way that Deane describes, but for situations where you have a fairly complex data model, our preferred approach is slightly different.  We have the content itself in focus, not how it fits in a content tree or what pages we need in a website. 

Our preferred approach to content modeling is to create an ontology. An ontology can be defined in many ways, but one I like defines it as “a model for describing the world that consists of a set of types, properties, and relationship types”. In other words, you create a definition of the content in of the business domain in question using terms, data and relations that are commonly used in that domain. For more information on why and how to create an ontology, Stanford has a very good article

When the ontology has been defined, our CMS framework then translates the ontology definition into an object model, and then it generates the concrete classes based on rules we have built into the system. Developers can then work against the object model, and the built-in ORM automatically creates the database tables and fields needed to store all the data as defined by the ontology. 

By creating an ontology in this way, we can retain all the inherent semantics in the content and the relations between the content. When the main content structure has been defined, we can start to think about how we can make it easy to create and edit the content. This is usually where we make use of the content tree. Our system has a number of methods for finding content, but for users with basic computer skills, the simple file/folder analogy is the easiest to understand, so we usually store the content in the content tree. But that’s only for findability in our edit interface, not as a part of the main content model.

But modeling isn’t everything

Content modeling is an important part of making a good website, and selecting a CMS that can model your content is crucial. But being able to create an advanced content model is just one part of the equation. You also need to be able to take advantage of the content model you create. That can be split into two parts:

1.)    The API must be simple to use, even if the content model is complex. If it’s too hard to use, you’ll never be able to fully exploit the content model.

2.)    Good query capabilities are crucial when the content models get more complex. It’s important to note that many of the systems that are good at modeling complex content are bad at querying the content. To make good use of a complex content model, you should make sure you can make queries containing joins, group by and aggregates. If your selected system doesn’t offer such query capabilities, make sure you understand what you’re missing, or if there are other ways (map reduce or similar stuff) to solve your problems

3.)    With more complex content structures and query methods, the need for a CMS runtime to do processing outside page requests quickly becomes a necessity, and should not be underestimated. In Webnodes, we have integrated deeply with Windows Workflow Foundation. This gives us a very robust and light weight runtime for handling long running processes.

Data model – how to store?

How a CMS chooses to store its data greatly impacts how it’s able to do in terms of modeling and querying. All the different data storage methods have different benefits and drawbacks.

The most common data storage methods in web content management are:

Entity Attribute Value (EAV) database design

This is one of the most common ways of storing data in a CMS. There are multiple variations of this architecture, but the general principle is that you store triples consisting of the entity (class/type), the attribute (property/field) and the values. This has a number of benefits:

  • It’s very flexible as there is no limit on the number of attributes you can have on an entity.
  • It’s very space efficient as you’re only storing actual values, and not null values.
  • You can change your content model without changing the table definitions.

But it also has some drawbacks:

  • It’s relatively slow to fetch multiple entities
  • It’s difficult and slow to make queries against

NoSQL

NoSQL is a relatively new term that groups together a range of data storage systems that works differently from a normal relational database. Within the NoSQL term there are many different types of data stores, and each of them have different pros and cons.

Key/value stores (Redis)
Graph databases (AllegroGraph, Neo4J)
Document databases (CouchDB, MongoDB and RavenDB).

NoSQL has become very important on the Web today. Many of the largest websites in the world are using NoSQL databases. Since the NoSQL term includes very different systems, I’ll look on the benefits and drawbacks of document databases, which seems to be the most popular and versatile type. Benefits:

  • Very fast reads and writes
  • Very scalable
  • Schema-less

Drawbacks:

  • Limited querying compared to relational databases, but getting better
  • Easy to make mistakes in design if you’re used to relational databases
  • Schema-less

XML

XML data storage is used by several content management systems as the main data store.
XML has several benefits:

  • Doesn’t require a database or other external software, as the xml files are usually stored in the file system (there are XML database systems that try to combine the benefits of xml and relational databases).

  • Easy to use xml as the basis for presentation in multiple channels.
  • Easy integrations: xml is often used as the format for data exchange between different systems.
  • XML is widely used, and there’s a big ecosystem around it.
  • XML is more free form than a database with tables and rows

Drawbacks:

  • Slow
  • Difficult to do complex queries

Relational databases

Relational databases are the most common way to store structured data. In this context we are not talking about just using a relational database system, but using it as it was intended, with one row for one entity, one column per field in each entity and relations between tables. This is the method we have chosen for Webnodes CMS.

Benefits:

  • Very good and standardized query capabilities (SQL)
  • Capable of storing complex content structures

Drawbacks:

  • Relatively slow to read 
  • Slow to insert and update
  • Hard to scale 

We chose a relational data store at the core of Webnodes. We believe it offers the most benefits for the vast majority of our potential customers. The main drawback is that we lack the scalability for the top 0.1% of sites on the web. As it is, we can deliver millions of page views per day even without output caching. Very few companies need higher performance than that, so we can live with that tradeoff. We are working on some stuff for those that require even higher performance, but that won't be released until later this year.

The benefit of using a relational content model at the core of our system is that we can offer what we believe to be class-leading content modeling and query capabilities. 

What are your thoughts on content modeling and data stores for content management systems? Please comment below!

4/18/2011
Posted by: Vidar Langberget
Categories:
  
Comments (0)
Add comment
Title:
Name:
Email:
Comment:
Captcha Image