Healthcare Analytics with Cerner: Part 1 - Data Acquisition

Healthcare Analytics with Cerner: Part 1 - Data Acquisition


I am starting a series of blog posts, dedicated to healthcare analytics with Cerner. My intent is to get someone new to Cerner started with her/his analytics projects whether it is a Business Intelligence (BI) project, ad-hoc research or a quick proof of concept using Cerner's data. I will be focusing on data acquisition options, Cerner Millennium Data Model, core tables and touch on some common challenges.

Cerner is a leader and a top EHR vendor by Market Share as of 2016. Cerner's systems are best-in-class and it will take many pages to list all the products and solutions, offered by Cerner. Cerner also attracts potential clients with an unbelievable degree of customization to support any possible workflow or organizational deployment.


Healthcare Analytics space is an evolving field. US Healthcare was the last industry to embrace computerized systems and consequently data analytics and Healthcare BI is still considered new and "hot" in 2016. EHR/EMR vendors have been largely focusing on the features of their products and concentrating lately on interoperability, data exchange and security. While all the major vendors, including Cerner, offer Analytics and Reporting solutions, they are hardly sufficient to cover fast growing demand of Healthcare leaders and workers. Getting data out of EHR system is still a tough chore, demanding a team of skilled data professionals. Lots of work is needed to make a sense out of EHR data by feeding it to Enterprise Data Warehouse (EDW), BI and Advanced Analytics tools.

One of the selling features of Cerner is a great deal of flexibility and customization. But with great flexibility comes great complexity. I remember one of our clinical IT experts told me one day:

if you've seen one Cerner configuration, you've only seen one Cerner configuration!

Thousands of settings and parameters can be customized at a facility or location level and it that makes
our job harder. A great variety of additional Cerner modules, products and packages can be licensed and installed at one organization and not available or activated at the other one. If you find a working CCL / SQL code at uCern community forum, it does not mean that the same solution will work for your organization because of the differences in Cerner configuration.

With all the challenges we need to tackle, it is a very exciting field to be in right now! No doubt we will see more innovation in this space and very soon.

Cerner Infrastructure Deployment Model.

Cerner offers several deployment models as far as infrastructure goes:

  • Local-hosted (On-premises) model - hardware and software deployed to a customer's data center and managed by customer's staff.
  • Remote-hosted model - hardware and software deployed to Cerner's data center and managed by Cerner.
  • Shared model - hardware and software is deployed to Cerner's data center, managed by Cerner and shared with other Cerner's customers.

Why should you care? Based on your deployment model, you will have various ways to get data out of Cerner and a degree of flexibility to do that. I am personally familiar with Remote-hosted model and my organization is quite influential with Cerner as one of the largest and one of the first clients of Cerner.

While I had very good experience with Cerner's technical staff, keep in mind vendors are not really motivated to help you build your own analytics solution. The reason is very simple - they want to sell you their own! Remember that while Cerner has data, they do not own it - data belongs to your patients and your organization regardless of a deployment model chosen. If a vendor disagrees with that or thinks they know better, it is time to get a strong executive support within your organization and discuss data access options.

Data Acquisition.

Let's see how to get data for your project out of Cerner and to load it to your favorite BI tool. If you wonder, my favorite one is QlikView.

It comes down to 4 options. There are maybe more, but these are the most common ones, listed in order of my preference from the least preferred to the most preferred:

  • Build a CCL script or use a Cerner's reporting tool (more on that later) to extract data, schedule it to run and to drop a file to company's SFTP site. From there, you can ingest files to your Data Warehouse (DW) or BI tool. Often times you will have to rely on other teams and people to do it for you and of course they have their priorities. And every time you need to make a change or bring additional data element, you would have to do it again and change your file format / report.

  • Access Cerner Millennium database (using SQL) to load data directly from Cerner to DW or BI tool. You will need to be very careful as you will be working with a production database. It only takes one bad query to slow down the entire system and impact clinical staff.

  • Better option is to implement a company wide replication process to load Cerner Millennium tables as is (raw data) to a common staging area, which can be another database server or a Hadoop-based Data Lake environment. Normally an incremental process would be implemented to synchronize data and bring only changed/new data. That process can be run once a day or more frequently. In case of Qlik, this layer can be also built using centralized and automated QVD extract process.

More modern approach is to use tools like Oracle Golden Gate to implement real-time, log-based change data capture and stream data changes to Apache Hadoop environment, using distributed messaging system such as very popular Apache Kafka.

  • The last (and really the best option in my opinion for Data Analysts and BI Developers) is to rely on your company's EDW or Cerner EDW (Enterprise Data Warehouse) offering.

Most of the complexity and the hard ETL work (Extract-Transform-Load) will be done for you and validated thoroughly by the subject matter experts. Data in EDW will be modeled around specific subject areas (e.g. Patient Encounters, Patient Experience Surveys, Insurance Claims, Cardiology, Business Volumes etc.) and will be much easier to consume and understand so you can spend more time working on your project and deliver value much quicker, rather than spending hundreds of hours building your own data extracts and processes around it (infamously known as data plumbing!)

Last 3 options allow direct database access (using SQL queries). I do not like to deal with file extracts and always try to get direct database access. If your organization does not have mature EDW or you are simply not allowed to access Cerner Millennium database directly, then file based extracts is your only choice.

Cerner Reporting and Analytics Products.

Discern Explorer CCL (Cerner Command Language).

From Cerner's documentation:

Discern Explorer is a full-featured, fourth-generation tool, patterned after Structured Query Language (SQL) concepts. With Discern Explorer you can extract data from Oracle, SQL Server, RMS, and C-ISAM databases. Discern Explorer is a powerful tool that provides flexibility in the skill set needed to build programs and design reports. There are three options to build Discern Explorer programs, Visual Explorer, Discern Visual Developer, and command-line Discern Explorer.

Visual Explorer is a graphical user interface for data validation, audits, and designing reports. Visual Explorer enables you to extract, analyze, and format data from both Cerner Millennium and client-developed database tables.

Discern Visual Developer is a front-end integrated development environment (IDE) for creating and editing Host queries, programs, and templates.

The Explorer Menu is an application included with Discern Explorer that allows you to create a menu of Discern Explorer programs that you have built with Visual Explorer, Discern Visual Developer, or command-line Discern Explorer. After the menu has been created, you can run your programs by selecting them from the menu.

In other words, CCL Developer creates CCL programs, which can be deployed via Explorer Menu for users to run on demand. OpsView Scheduler is another tool that allows to schedule CCL jobs and monitor their execution.

CCL is also used to customize Cerner applications like PowerChart so it is much more than just a reporting / SQL tool. Experienced CCL developers are in high demand and hard to find. Very likely your organization has experienced CCL developers or even a dedicated CCL team - make some friends with that team! These guys would know how to find data you need or at least point you to the right direction. They normally like chocolate (wink wink).

Discern Visual Developer (DVDev):

Explorer Menu:

OpsView Scheduler:


PowerInsight (PI) is Cerner's operational reporting platform built on top of SAP BusinessObjects and Cerner Millennium. Depending on your agreement, you will get a few BO Universes with some canned reports such as Core Measures, Lighthouse reports etc. Normally every organization would extend or create their own universes. Get access to PowerInsight and get familiar with it.

My little secret is to use PowerInsight to figure out table and column names with data I need for my projects. You can create a new Webi document, drop some dimensions (e.g. Attending Physician or Nursing Unit) and some measures (e.g. Length of Stay) and then see actual SQL query generated against Cerner Millennium database. The query normally will be very ugly and very long but you can reverse engineer it to a something you can use.

If you do not want to deal with SQL, you can schedule file extracts straight out of PI and drop them to SFTP site - ask your PI admins.

Discern Analytics 2.0 (DA2).

DA2 is a newer Cerner's product, similar to SAP BusinessObjects and built on the Eclipse Platform. With DA2 you can design, view and schedule reports. There is a semantic layer to define subject areas (domains), dimensions and measures. It is no SAP BusinessObjects, but I found it really powerful, flexible and feature rich. Strangely enough I do not really know anyone using DA2.

Like PowerInsight, it is a handy tool to pinpoint location of data in Cerner Millennium database - DA2 comes with ready to use domains and a set of dimensions and measures. You can simply drop them to a new document and generate actual database query. Last time I checked though, DA2 had only a few subject domains and Revenue Cycle domain was the most complete one.

Cerner EDW platform.

I do not know much about the new Cerner's Enterprise Data Warehouse platform (HealtheEDW) as the company I work for do not use this product. I heard it is a modern platform, built from the ground up and powered by Apache Hadoop and Vertica. Best Cerner engineers worked on that project.

The legacy PowerInsight EDW product was not really good and it was more like an Operational Data Source than a real Data Warehouse. Cerner used Informatica ETL tool to load data from Cerner Millennium database to another Oracle database server (EDW). Additional sources (such as financial data) could be added to EDW by customer.


HealtheAnalytics and HealtheEDW is an enterprise data warehouse that enables population and enterprise-wide insight through structured use-case focused analytic experiences as well as ad-hoc reporting capabilities.
The population and enterprise data is made available via the HealtheIntent platform, which aggregates data across multiple, disparate sources and normalizes the data into meaningful information. HealtheAnalytics provides prestructured content around a specific analytic focus that includes key performance indicators (KPIs), reports, and data models. Beyond that, users can drill down to isolate specific underlying variables using analytic visualization tools including, Tableau and SAP BusinessObjects.

And Cerner's legacy EDW platform:


PowerInsight Enterprise Data Warehouse is a structured, enterprise-wide data warehouse that combines clinical, management operational, and financial data to facilitate strategic decision making. It allows executives and managers to review data from different business viewpoints. Management information in the warehouse consists of process-related data derived from the clinical, operational, and financial systems from the Cerner target source, Cerner Millennium. Financial information also includes transactions from external general ledger and cost accounting systems.

Cerner Millennium database.

Cerner Millennium database (Oracle) is the one that stores all the EHR data in a central place. It is the pillar of Cerner Millennium 3-Tier Architecture (Database - Middleware - Client). All the tools mentioned above rely and use data from Cerner Millennium database. Every time a nurse makes a change to a patient chart, antibiotics administered, vitals signs taken, or patient discharged from a hospital - changes will be recorded to Cerner Millennium database.

Whether you use CCL or PowerInsight or query Cerner Millennium database directly, you need to understand the basics of Cerner Millennium Data Model.

The second part of this blog series will be focused on Cerner Millennium Data Model.