Oh no, Big Data!

As part of our ongoing blog post series explaining new technical terms, we will address “Big Data” in this post. Due to the new CMS requirements regarding Facility Assessments, Emergency Preparedness, and QAPI, long term care facilities are being forced to take a more quantitative approach to care delivery. How to approach the data management aspect is discussed below.

Although the term “Big Data” causes an individual to think solely in terms of data volume, big data is really about selecting appropriate data stores and appropriate tools to process data. For example, in the data center, the staff, budget, and complexity of operating services motivates choosing a few data stores — like the relational database, and then trying to get as much from that service as possible. In the cloud, the economics shift. If you need one NoSQL table, you don’t need to hire a whole team to operate the service. The Cloud invites “and” — choosing one datastore, such as NoSQL, does not mean excluding all others. Unlike in the data center, the shared capital infrastructure also means that firms can choose the best data store for a specific use case. In the Cloud, we find a multitude of options when deciding where to store and how to process data and this alone can make a tremendous difference in how we approach data processing.

What motivates the many storage and data processing options we find in the Public Cloud? What tools and data stores are now fundamental to Long Term Care? What are the underlying qualities of these “new” engine, tools, and stores?

Technology in the data center may progress, yet the “escape,” or migration of infrastructure capital to the Public Cloud has now been ongoing for over ten years. No matter how big your organization is, you will simply not be able to keep pace. By all means, run applications in the data center, but innovate in the Cloud and don’t pretend there’s an alternative. You may be doing something in the data center that’s important, but it’s not innovation.

Now let’s deconstruct data processing via abstraction from the ground up with the idea of continuous integration, serverless computing, and DevSecOps in mind. Think of DevSecOps as the integration of Development, Security, and Operations, but also a “feedback loop” and pipeline that itself continuously emits information. In the data center, process is fraught with manual approvals based on very little empirical data. In the Cloud, we build automated processes that enable information about code tests, vulnerability scans, and provenance to be captured and reported as and when needed, without human intervention. Finally, the computers are doing some work! It turns out people aren’t actually good at scanning volumes of files. So, what is it people actually do when they approve code releases in the data center? How much evidence can they actually review?

Encapsulating Complexity

We can begin by removing the physical layer by moving our underlying physical resources to the cloud. By introducing GPU or FPGA processing in addition to traditional processing resources, we gain elasticity for storage and compute, providing endless scale for massive data volumes. We also need to address the long-term availability of our data, so Information Lifecycle Management (ILM,) is managed by utilizing multiple seamless storage platforms. These platforms can work together throughout the lifecycle of your information, from creation to destruction. In addition to capability, we gain time for our entire storage management team and reduce processing times for large batch cycles and recovery from failed processing.

As we approach the data layer, we can bring a broader variety of tools, in addition to old friends we know well, for processing and analysis. We can use an integrated toolset to manage both batch and real time data processing. We can matrix inputs and outputs into stream processes from targets such as Message Queues, Data Warehouses, Real Time Streams, or Files. In addition to our traditional tasks, we can rapidly prototype new approaches and technologies like containers, serverless computing, the hadoop ecosystem, and a variety of NoSQL, Graph, and object datastores in a cost-effective manner. Unstructured data can be processed using any number of tools, with the added benefit of not needing to create additional copies of our data. Again, providing innovation time for our DBAs, Data Architects, Data Scientists, and many others. Data transformation is more reliable and the broader toolset allows innovation and optimization efforts move forward more rapidly.

To address the veracity of data and our regulatory requirements, we follow our DevSecOps approach to data management and processing. Our “Infrastructure as Code” to Software Development LifeCycle (SDLC) is automated using fully audited and repeatable processes to satisfy regulatory reporting burdens. We can reduce the time spent on evidence production for survey to mere moments. Auditing artifacts are created easily as an integral part of the care delivery process.

Addressing processing in this way allows us to perform continuous integration on our entire stack and master change control. Not only does this eliminate the fear that many providers have – change – it provides cost optimization. Because we can scale horizontally or vertically as needed, we will always process in the most efficient manner possible, optimizing costs and time, allowing our operations staff to spend more time innovating and less time fearing change.

The Cloud and Big Data processing go well together. The ability to address the challenges of Health Care Data processing, represented by the “Vs”, so easily and flexibly utilizing Big Data processing techniques and cloud services enables us to address our challenges over a long time horizon and enable innovation.


The amount of data which needs to be stored, analyzed, or reported is staggering. Think of all the tapes, tape infrastructure, SANs, replicas, NAS, etc …that exist just for storing data at the physical level. Then all the databases, analytics, and document data stores are stacked on those. Over time, we also need to save more data for a longer period of time. The volume challenge speaks for itself.


The speed at which new data becomes available, or needs to be output, can vary from the microsecond to annually and anything in between. Creating a system to manage, normalize, and process the various speeds is challenging. Many duplicates of a data point are created as it moves through the processing environment, and maintaining consistency of the data and calculations used in processing are also issues.


If we include the massive volumes of data stored in our file systems, there are substantial amounts of unstructured data which have not typically been analyzed due to the need for humans to do the work.


When gathering data, not only do we need to trust our sources, but calculations for QAPI need to be done rapidly and accurately. Not only do we need to manage the truthfulness of our data, but the CMS requirements are daunting. From the data stored, to how it is processed, to reporting, the CMS has strict requirements. Veracity of the data is important from both an input and output perspective.

Data Management Challenges – Legacy vs. Modern

Attribute Challenge Traditional Cloud Innovation
Volume More Data, Desire to keep original data and historical data. Data does not fit into relational database. Relational database does not scale well for complex workflow applications. Physical infrastructure is finite, moreover, tightly coupled patterns make change difficult at best. We build it and leave it alone until we are ready to upgrade it. Elasticity. Scale the capacity to ingest any amount of data. Scale the rate at which we can process requests for objects Ability to escape 3 year refresh cycle, elastic storage, elimination of archival issues. Multiple data stores and at lower cost than data center managed data stores.
Challenge Traditional Cloud Innovation
Velocity Speed from microseconds to quarters. Shift from Batch to Real-time views is as fundamental as the shift from procedural to object-oriented programming. Various platforms which had to be integrated for final outputs. Objects are stored in a relational database regardless of fit to purpose because often the RDBMS is the only choice. Specific data stores, including streams, which can scale-out to ingest any amount of data, and durably buffer that data so that it can be processed and replayed. Faster time to market than traditional ETL, ability to matrix inputs and outputs.
Working with streams enables “natural” modeling in changes in the state of complex systems.
Challenge Traditional Cloud Innovation
Veracity Calculations and Models, Regulatory Reporting Labor intensive testing process. Lean operations teams must shuffle data around and try to serve many applications using a constrained storage capacity. Innovation and Strategy must compete with increasingly complex regulatory data requirements. Integrated DevOps approach to data management, SDLC for models and ability to rapidly prototype Rapid testing under more scenarios, Rapid Innovation due to low barrier to entry, yet maintaining tight source control and auditing artifacts.
Automation Stream and replay any amount of data at production volumes when testing strategies and new products.
Low Communication and Coordination noise as web services are the contract.


Need help understanding how to manage your healthcare organization’s data for compliance or assistance with technology-related projects? We are here to help – contact CMS Compliance Group today.

Reach out today and let's get started!

Urgent Compliance Concern? Call CMSCG

(631) 692-4422
cmscg podcast. five-star quality

Contact CMS Compliance Group

© 2011-2024 CMS Compliance Group, Inc. All Rights Reserved. Privacy Policy