The main problem being tackled is to utilise patient data collected from health trusts across the country, and use these data to alert health professionals when a patient may be entering a life-threatening condition to allow positive medical intervention.

The system and architecture that we will be put into place has to be designed in such a way that is fitting for the hospital environment, as well as the health trust's targets and budgets that may be in place.

Architecture OverviewEdit

The data for the system will come under two main categories - historical data and realtime streams. Typically, historical data will include statistical data from the NHS and archived records of realtime streams (streams whose patients have long since been discharged), as well as any related static data. Realtime streams will include live data from a series of sensors, including but not limited to patient-side medial equipment such as oxygen saturation and BP sensors.

Given Hadoop is a batch based processing architecture, it is envisaged that the system will initially be populated with historical data, then processed through the cluster to derive certain patterns. This part of the analysis will prove to be the most computationally intensive, and a batch-based approach should initially suffice. The data used for this will ideally be stored on distributed filesystems suited for Hadoop such as HDFS.

The second part of the data processing involves continuously analyzing the input data streams and matching them with any previously generated patterns and models. Preliminary investigations in this matter have concluded that whilst computationally simple pattern matching can be performed by single machines or servers, more sophisticated matching would require the use of frameworks such as Trident/Storm over distributed computers for more advanced CEP. Thus the amount of data and sensors to be processed has to be weighted against the benefits, since a completely new architecture will be required once the computational complexity of the tasks at hand reaches a certain point, substantially raising the technical expertise required and costs.

It should be mentioned that this point that should a combination of Hadoop and Trident/Storm be required, it may be worthwhile to consider a currently emerging framework known as Spark, which is designed to combine batch-based and streaming data into a single dataset for computation, combining model generation with business intelligence. Using a concept called Resilient Distributed Datasets (RDDs), it combines distributed storage, batch-based processing, and streaming data processing into a single framework. However, the project is relatively immature compared to Hadoop/HDFS and Trident/Storm, and whilst the theoretical basis of the computational paradigm is sound, the implementation has not yet been tested in production on a large scale in industry.

With regards to the consumption of the said business intelligence, feedback and results will come primarily in the form of event notifications. Thus typically a change in a patient's condition will result in the identification or matching of a particular pattern. If a match is made and the event is deemed to be worthy of further investigation, the system will then notify a member of hospital staff of the patient details, their change in status which prompted this alert, and a recommended course of action based on a predicted result if further action was not taken.

Big Data ArchitectureEdit

Problem ClassificationEdit

The problem we are facing is healthcare, primarily patient data recorded from the sensor technologies at the Critical Care unit. These data are analysed in real-time, strategy and decisions are made for medical interventions. 

Big Data TypeEdit

The type of data generated is human generated-data. 

Alerting SystemEdit