An overview of interactive visual data mining techniques for knowledge discovery

In the past decade, the analysis of data has faced the challenge of dealing with very large and complex datasets and the real‐time generation of data. Technologies to store and access these complex and large datasets are in place. However, robust and scalable analysis technologies are needed to extract meaningful information from these datasets. The research field of Information Visualization and Visual Data Analytics addresses this need. Information visualization and data mining are often used complementary to each other. Their common goal is the extraction of meaningful information from complex and possibly large data. However, though data mining focuses on the usage of silicon hardware, visualization techniques also aim to access the powerful image‐processing capabilities of the human brain. This article highlights the research on data visualization and visual analytics techniques. Furthermore, we highlight existing visual analytics techniques, systems, and applications including a perspective on the field from the chemical process industry.


INTRODUCTION
I t is predicted that the digital universe in the year 2020 will be 44 times as large as it was in the year 2009, where it was estimated to be 800,000 petabytes in size. 1 Computer systems record our daily lives, 2 for example, CCTV cameras and GPS-enabled navigation devices such as smart phones record our whereabouts; loyalty card and credit card issuers record our consumer behavior; and social media applications such as facebook or email capture our social life. 2 However, also in the scientific arena, large quantities of data are stored and need to be analyzed. For example, in astronomy, data bases comprise terabytes of image data (see the GSC-II 3 or the Sloan Digital Sky survey, 4 ) In bioinformatics, the human genome project stores the genetic blueprint of the entire human body comprising 20,000-25,000 genes and three billion chemical base pairs. 5 Hardware technology allows us to not only capture but also store these data. Hence, the commercial demand in developing and improving analysis techniques to extract meaningful information from the digital universe is growing. 6 This leads to the development of novel and scalable visual analytics techniques.
According to Xie,7 in the recent years, data mining researchers have started to regard visualization techniques as a critical aspect of the decision making and data analysis process. Visualization can help analysts to visually discover different kinds of patterns such as clusters, relationships, and associations. The visual analysis capabilities of the human brain can be combined with the silicon computer hardware to improve the scalability and accuracy of the discovery of patterns from large and complex data. Keim et al. 8 lay out that this combined usage of silicon and biological hardware should be augmented with the human expert's knowledge in an interactive way. 8 This is described as visual data analytics or short Visual Analytics.
The aim of this overview article is to discuss the role of visual analytics in data analysis in the overall knowledge discovery from data (KDD) process. Furthermore, this article highlights existing visual analytics software systems and applications. One particularly relevant application of data visualization and visual analytics techniques is the chemical process industry. This industry offers applications in the research for new products and production methods, as well as for monitoring the current production status and trends on large chemical plants. One of the biggest global players in the chemical industry is Evonik Industries AG and this article provides an overview on the use of visual analysis techniques in this company.
The article is organized as follows: First, the article discusses visual data analytics and how it fits into the overall KDD process. Then typical components and concepts often used in such visual data analytics software systems are discussed. Next, some actual visual data analytics software systems are highlighted followed by a review of visualization techniques used in data stream mining. Then, the article describes visual data analytics applications in the chemical process industry among other applications. The article is concluded by highlighting major challenges and providing an outlook into the future of visual data analytics.

THE INTERACTIVE PROCESS OF VISUAL DATA MINING AND KNOWLEDGE DISCOVERY FROM DATA
This section highlights the basic idea of the combination of visualization and data analytics techniques and discusses the basic principles involved.
Information visualization is defined as 'the use of computer-supported, interactive, visual representations of abstract data to amplify cognition'. 9 Information visualization aims to gain understanding of data using graphical representations to access the powerful image-processing capabilities of the human brain. This is described in the information seeking mantra as 'overview first, zoom/filter, details on demand'. 10 Visual analytics has been defined as the 'science of analytical reasoning facilitated by interactive visual interfaces'. 11 Visual analytics is based on the same

FIGURE 1
The process of visual analytics according to Ref 13. ideas as information visualization. However, visual analytics also comprises the incorporation of automatic analysis methods prior, after, and during the usage of interactive visual representations. 8

The Principle of Visual Analytics
According to Ref 12 visual analytics as a field of research started in 2005 with Ref 11 regarded as commencement. Keim et al. 13 describes the visual analytics processes as a feedback loop comprising different stages and transitions 13 as illustrated in Figure 1. The first stage is the gathering of data. The data may come from different data sources and hence need to be integrated before further analysis methods can be applied. This step also comprises preprocessing of data such as data normalization and data cleaning. This preprocessing and integration are represented by the transformation arrow.
After this transformation, the analyst applies either visualization techniques or automated analysis methods. Automated analysis methods can comprise statistical and data mining techniques. Visualizations are then not only used to evaluate the findings, but also to refine the automated analysis methods. This alternation between automatic analysis and information visualization is the main characteristic of visual analytics and the difference when compared with just information visualization. This is also reflected in the mantra of visual analytics 'Analyze First-Show the Important-Zoom, Filter and Analyze Further-Details on Demand'. 8 It is important to optimize the visualizations for the human visual system to create an effective visual analytics system; also user interaction methods such as focus and context are important for scientific visualization. 14

FIGURE 2
The knowledge discovery from data process augmented with the visual analytics process. The dashed components and lines are the augmented elements from visual analytics.
analytics process as discussed in the 'The Principle of Visual Analytics' section.
The dashed printed components are taken from the visual analytics process, whereas the solid printed elements correspond to the KDD process. Please note that none of the KDD steps have been removed. Also, note that as opposed to the KDD process, the visual analytics process is interactive through interaction with the user of the system and the feedback loop is feeding changes back to the data input 15 as shown in Figure 1.
The following steps are extracted from Figure 2 and describe the main activities involved in the KDD process augmented by the process of visual analytics as it is depicted in Figure 1: r Data Integration and Cleaning: It comprises the integration of different data sources into one data store or warehouse.
r Data Selection and Transformation: Before the data mining or analysis task, the data need to be selected from the data store. When selecting data, these data may be transformed, for example, normalized, cleaned, reduced, or converted into a certain format for the data mining step or for the visualization. In the original KDD process, the data analyst is responsible for this. However, after incorporating of visual analytics, this step may be automated with user interaction information and the feedback used instead. Two versions of the data may be generated, one in an appropriate format for the data mining step and one for visualization purposes.
r Data Mining: Intelligent methods are applied to the preprocessed data to extract useful patterns such as classification or association rules. However, these models or patterns are refined by the user through interaction with their visual representation.
r Evaluation and Interpretation: In the original KDD process, models derived in the data mining step need to be examined to identify the interesting aspects of the extracted patterns and to find a meaningful representation. However, the application of the visualization methods can provide direct insights to the analyst.
Information visualization and visual analytics are similar approaches as they both aim to combine the human brain and silicon hardware to analyze data. However, having made the distinction between both terms and putting visual analytics in the context of KDD, one must note that these terms are often used interchangeably in the literature with information visualization actually referring to visual analytics or visual data analytics. This paper is no exception. Another confusion often present in the literature is the interchangeable use of the terms visual online analytical processing (visual OLAP) and visual analytics. OLAP refers to the ad hoc exploration of multidimensional data volumes where each feature represents a dimension. One common task in OLAP is to find intersections and relationships between dimensions, analyze and visualize them. Visual OLAP additionally uses visualizations to interact with the analytical algorithms and to refine them. This is very similar to visual analytics. However, visual analytics is not limited to multidimensional data.
The next section highlights some existing data mining software systems that incorporate the ideas of visual analytics and information visualization. Some of these systems are of commercial nature and some are free.

TYPICAL CONCEPTS AND COMPONENTS OF VISUAL DATA ANALYTICS SYSTEMS
This section discusses typical components and concepts that are often found in software systems that are more focussed on the visual aspects of KDD. The visual concepts outlined in this section can be found in many visual data analysis software systems. As it has been highlighted in the previous section, visual data analysis or visual analytics is more than just static data representations. The user rather manipulates the data and data representations through interactions with multiple and diverse views. With the expression view, we refer to a graphical data representation. We may use the word view and representation interchangeably in this article. Some visual and interaction concepts that can be found in many visual data analysis systems are highlighted below. When we talk about views, we talk about a visual component in the graphical user interface that visualizes some aspects of the system such as data, configurations, data mining models, data mining results, etc. In the case when the view displays data, we also call it data representation.

Typical User Interface Concepts
Visualizations can be as versatile as the number of possible applications or even datasets used. In fact, many data mining and visual data analysis systems such as Refs 17 and 18 are open source or at least allow to integrate new views in the form of plugins, and thus allow users to integrate their own individual application-tailored views. Hence, the section titled 'Typical Concepts and Components of Visual Data Analytics Systems' highlights visual concepts and views in a more general sense.
Visual analytics often comprises the usage of multiple views, which requires a well designed and intuitive user interface, taking into consideration the display and arrangement of the visualization and allow the user to interactively parameterize views. 18 Most visual data analysis tools are based on the same or a similar layout compared with the 'Eclipse Rich Client Platform'. 19 This layout or similar ones are used in many data mining or data visualization tools for example in Refs 17,18,and 20. This has the advantage that the visual layout of the framework and basic interactions are already familiar with many potential users. However, some tools use their individual visualization techniques different to those used in Eclipse, for example, Weka's explorer. 2 Figure 3 shows a typical Eclipse-based GUI comprising the basic views labeled in black letters in the screen shot, however, additional views are possible. The Project Tree is usually a hierarchical navigator through the project; the Library offers to select tools, views, data, etc. for data mining and visualization projects; the Properties view usually displays information about selections in the project tree or in the main canvas; the Main Canvas shows a visual representation of the project, usually in the form of a workflow. However, other visualizations are possible in the main canvas such as the graph representation in the CGV tool highlighted in the 'Available Visual Data Analysis Software Systems' section. The Console view produces textual and tabular output such as error messages or warnings.

Interaction Techniques
According to the authors of Ref 21, interaction in the context of Human Computer Interaction can be described as the communication between the user and the system. In the context of Figure 1, user interaction is essential for the visual exploration of data and the adjustment of data mining models. Only welldesigned interaction with the view and the data mining models can help the user to browse and select subsets of the data, adjust the visual mapping of the data, visually modify the parameter setting, and adjust the visual mapping of the data mining model and the associated results.
In this sense, the user realizes a dynamic interaction, a two-way process of feeding information into the data mining system but also retrieving new or modified information through the visualization. This is in contrast to a passive interaction using static images that can be observed, rotated, zoomed, and un-zoomed to enhance the user's mental model on the data. 22 According to the authors of Ref 23, an interaction technique is defined as a way of using a physical input/output device to perform a task on using a human-computer dialogue. Some basic interaction concepts can be found in many visual data analysis systems and are discussed here in general terms rather than looking into application-specific techniques. For more information about data/information visualization, we refer the interested reader to Ref 24. The authors of Ref 25 conducted an exhaustive review of existing software systems and the literature about information visualization and hence data visualization. In particular, 59 papers, 51 systems, and 311 identified individual interaction techniques have been reviewed. 25 However, the authors further point out that different interaction techniques aim to achieve the same or similar goals and hence they established a categorization of these techniques based on the notion of user intent in using these techniques. These categories are described below, which are (a) Select, (b) Explore, (c) Reconfigure, (d) Encode, (e) Abstract/Elaborate, (f) Filter, and (g) Connect. For visual interaction examples, we refer to the articles cited within the description of the interaction techniques.
(a) Select: This technique of interaction enables the user to mark interesting data items in a view. This allows the user to easily identify and keep track of the data items of interest if the representation of the data is changed, especially if there are many items represented in the view. An example of select that can be found in Weka and KNIME is the scatter matrix. 2,17 A scatter matrix is a matrix of plots of each feature against each other. Selected data points in one plot can be highlighted across all plots in the scatter matrix. Select can be combined with other techniques, i.e., Reconfiguration to see where data items move when rearranging representations. A well-known example of Select used together with Reconfiguration is the 'placemark' feature of Google Earth. 26 'Placemark' allows to select a geographical location, rotate Volume 3, July/August 2013 and zoom the view, and easily return to the marked location.
(b) Explore: This technique enables the user to view and inspect a different subset of the data. This is useful especially if the data are very large and hence screen sizes as well as the processing capabilities of the human brain limit the data representation and processing as a whole. The user can explore a subset, gain insights, and move on to explore further data. Yi et al. 25 further highlight two commonly used approaches to Explore interaction, namely Panning and Direct-Walk. In photography, panning refers to the movement of a scene in front of a fixed camera. In the context of information visualization or visual analytics, the scene is the data and the camera is the user's eye. The simplest way of moving the scene could be via scroll bars. 25 Direct walk allows the user to move the focus of the view from one position in the data structure to another, i.e., using hyperlinks such as in web browsers.
(c) Reconfigure: This technique enables the user to change the data representation by changing its spatial arrangement. Thus, the user is presented with different views of the data that helps to uncover hidden relationships. For example, in scatter plots some of the data elements may have similar or the same visualized numerical values, which results in the data elements to overlap in the plot. In this case, jitter can be used to make hidden data elements visible. Jitter refers to shifting the data elements randomly in the display space by a small amount to avoid the overlap of data elements. 27 For example, the jitter technique is used by the authors of Ref 27 and in the Spotfire 28 visual analytics software. However, also the rotation of 3D scatter plots may reveal overlapping data elements.
(d) Encode: This technique enables users to alter the representation of data elements. For example, altering their size, color, or shape. Altering the representation is intended to increase the human cognition in terms of understanding relationships in the data and the distribution of data elements. In Yi's review, two widely used encoding approaches are highlighted, changing the type of the data representation and interaction to change the encoding of the data elements such as color, size, or shape. 25 Changing the type of the data representation is intended to reveal new aspects of relationships between data elements. A tool that allows to change the data representation is Ref 28. Changing the encoding of data elements through user interaction is intended for the user to find color schemes most suitable to discover distributions of multiple variables or features. Tools that allow the user to change the encoding system based on color or similar encoding mechanisms are for example. 29,30 (e) Abstract/Elaborate: This technique enables the user to adjust the level of detail of the data representation. This allows the user to view the data in a wider context or in a more detailed view on demand often with many different levels of detail in between the context and the individual data elements. This can be realized in many different ways, i.e., tool tip texts that appear when the user moves the mouse cursor over a particular data element. However, tool tip texts usually only allow two levels of detail. More sophisticated approaches are needed, i.e., graph or tree views. The expansion and collapsing of graph or tree nodes to reveal and hide subtrees or subgraphs is, for example, implemented in the CGV system. 18 (f) Filter: This interaction technique enables the user to define conditions that change the set of data items displayed in the current view. The data items that are 'filtered out' remain unchanged but are hidden or displayed in a different way. Usually, the filter interaction is complemented with a reset facility that allows to recover the hidden or differently displayed data items. A popular approach of performing filter operations in visualization tools is the usage of dynamic query controls to select conditions/ranges such as check boxes. 25,31 More specialized controls exist, for example in the TimeSearcher tool 32 that allows the data analyst to define conditions graphically using 'timeboxes' and 'angular queries'. 'Timeboxes' are used for filtering multiple univariate time-series profiles according to time and value ranges; and 'angular queries' are used for filtering according to the timeseries value rate of change in a given time frame. 32 (g) Connect: This interaction technique enables the user to highlight relationships between data items. Highlighting can happen within the same view or across different data representations. An example of highlighting between different views is KNIME's 17 line plot and scatter plot, data items highlighted in either of these plots are cross highlighted in the other active plot(s) as well. An example for highlighting within the same view is CGV's magic eye view. 18 The magic eye projects a graph's hierarchy in the form of a tree onto a 3D hemisphere. CGV extends the magic eye view to visualize cross edges among selected hierarchy nodes by spanning an arch around the 3D hemisphere.
At this stage, we would like to highlight that not all interaction techniques can be allocated to one of the categories outlined in Ref 25. Some visualizations are highly specialized on the application domain and may therefore not fit well in one or more of the categories.

Use of Animation
According to the authors of Ref 33 animation has become popular in graphical user interfaces because of its engaging nature. Animation can be used to facilitate perception of changes in the data and data graphics. 33 However, the authors of Ref 33 also claim that animation has to be used with care as it may contribute to distraction if used inappropriately. Inappropriate use of animation could be for example animating irrelevant information or change that may grasp the users attention and thus misleads the users biological visual analysis system. This potential danger has resulted in research that aims to direct the usage of animation in general.
In this section, we highlight three general kinds of animations for data and information visualization that can be found in the literature and visual analytics systems. The kinds of animation are animation of viewports, animated transition of graphs, and animated time. With the term viewport, we refer to a typically rectangular viewing area that is of interest to the observer. Animated viewpoints are visualizations of the navigation space with respect to changes to the current viewport. These changes can happen, for example, through user interactions such as discussed in 'Interaction Techniques' section. For example, filtering of data items, zooming into the display area, changing the level of detail, etc. If these changes happen abruptly, then they may be too difficult to be comprehended by the observer. Animation can be used to allow the user to maintain an overview during interaction. 34 The authors of Ref 34 presented a generic model for the smooth animation of such changes that takes, among other aspects, the optimal animation velocity into account. This smooth viewport animation has been implemented and used, for example, by the authors of Refs 35 and 36. Animated transitions of graphs is the animation of switches from one data representation to another. For example, the transition from a bar chart to a pie chart to see the relative percentages of the data represented in the bar chart. Animation of such transitions is intended to allow the user to identify elements across diverse representations. 33 The authors of Ref 33 have investigated animated transitions between statistical data representations and derived guidelines for the design of animated transitions, which they applied in their DynaVis visualization system. With animated time, we refer to animated changes over time in the data rather than changes on the viewpoint or the representation. The representation of time can be simply facilitated by using the display space, also known as static mapping, or through animation by using the physical time, also known as dynamic mapping. 37 Both static representation of time and dynamic representation of time are important for visual analytics. For example, the static representation of time allows the user to observe all available information and to compare the data with respect to the time, whereas dynamic mapping allows to observe the general development of the data over time. 38 However, the disadvantage of animation in general is that the data may simply be too complex to be perceived by the viewer, 39 this is also the reason why the authors of Ref 35 refrain from animating more than one view simultaneously.

AVAILABLE VISUAL DATA ANALYSIS SOFTWARE SYSTEMS
Standard data analysis systems such as SPSS 40 and Weka 2 already provide a wide range of interactive visualization techniques and data views. However, their visualization techniques' aim is more for reporting and the interpretation of the data mining results and data statistics rather than for the actual analysis of the data and the interactive adjustment of the data mining models. Below, we highlight some open source and some commercial data mining tools that allow to visually analyze data, some of which are tailored to a particular kind of data analysis application.
Bak et al. 41 use a combination of interactive visual analysis methods to effectively analyze multivariate datasets with demographic data. The authors use a self-organising map (SOM) 42 to visualize each cluster as a radial parallel coordinate plot. 43 Opacity bands 44 within this plot illustrate the variance within a cluster. The background color coding is used to correlate the cluster with a target value. User interaction is facilitated at different stages of the analysis as well as within the visualization. 41 For example, the user can modify the color maps and their scaling and can choose different types of scaling such as logarithmic scaling. The visualization also provides a 'mouse-over', which the user can utilize to identify individual cluster members and to display their data. The same techniques have also been used to analyze minority or ethnic residential patterns. 45 A complete implementation of an interactive graph visualization system, called coordinated graph visualization (CGV), has been presented in Ref Figure 4 shows a screenshot of the system. The main focus of the CGV system is on user interaction, addressing in particular the problem of how to interact with a graph that is too large to be displayed on a computer screen. CGV is based on hierarchy tree  computations and graph macro-views that can be displayed in the available display space. 46 Multiple views allow the user to explore, interact, and access the data from different perspectives simultaneously. Figure 4 shows the basic graphical user interface of CGV. The main canvas of main graph view uses a node-link representation of a macro-view graph. To represent the graph hierarchy, CGV uses several coordinated views. The hierarchy view (at the top of Figure 4) uses a superimposed polyline to visualize the current antichain and a color coding for illustrating selected node attributes.
The complementary textual tree view and the magic eye view are displayed enlarged in Figure 5. The textual tree view (at the left-hand side of Figure  4 and at the bottom of Figure 5) is collapsible and complements the hierarchy view by providing textual labels. The textual tree view can also be magnified by a fisheye transformation. The magic eye view (at the top left corner of Figure 4 and at the top of Figure 5) is based on the Walker Layout 47 and is extended by cross-edges among nodes. These cross-edges are visualized using arcs spanning around the eye. However, there are further views in the CGV system, and we refer the interested reader to Ref 18 for further details. CGV distinguishes between coordinated and uncoordinated interactions. Uncoordinated interactions are specific to the particular view such as color or font size, etc., whereas coordinated interactions are changes to the global perspective of the data and hence are propagated to all views, so that all views are consistent in their representation of the data. CGV supports basic user interactions such as zooming, panning, fisheye magnification, identifying objects (coordinated), locate objects (coordinated), lock/unlock and brushing the focus (coordinated), expand/collapse graph or tree structure (coordinated), and visual parameter adjustment (uncoordinated). CGV implements filtering strategies that aim to filter out irrelevant nodes and thus enhance the clarity of the graph representation. 18 Furthermore, CGV implements view space and data navigation techniques, visual augmentation techniques, and undo operations that are not discussed further here. However, a comprehensive discussion can be found in Ref 18.
The NFloVis system 48 is a visual analytics system for the visualization of NetFlow data (IP traffic information) in a computer network to detect network attacks. NFlowVis uses two principal visualization techniques, home centric flow visualization and graph-based flow visualization. In the home centric approach, the local hosts that are related to attacking hosts are visualized in a TreeMap 49 and attacker hosts are placed at the borders. Flows between attack-ers and local hosts are visualized using splines. The color of the local hosts and their size and the color of the splines can be used to represent various properties such as packets or bytes transferred. 50 Thresholds can be used to hide splines with a low traffic and highlight splines with a high traffic to the attackers as shown in the screenshot of NFlowVis in Figure 6.
Some commercial visual analytics tools exist such as Miner3D 51 for 3D data visualization of multidimensional data, Panopticon 52 that allows to visualize real-time streams of data, and SAS Visual Analytics. 53 SAS visual analytics also emphasizes on the system's computational scalability to large data volumes, which is achieved through in-memory processing. Another tool for visual data analytics specialized for large quantities of data is Spotfire. 28 Also, long-established data analytics tools such as Matlab 54 have recognized the importance of data visualization and model-based visualizations components such as in Matlabs Simulink extension. 55 As mentioned before, some tools such as Panopticon are tailored for the visualization of data streams. The following section addresses the challenges in visualizing data streams and data stream analysis techniques.

VISUAL DATA MINING FOR DATA STREAMS
Mining data streams have attracted a big deal of attention over the last decade. A large number of techniques have been proposed to address the problem of analyzing high-rate streaming data in real time such as incremental learing algorithms. 56 Gaber et al. 57 have provided a review of clustering, classification, and frequent pattern mining techniques adopted for mining of streaming data. In a more recent review, a taxonomy of notable techniques in the area has been provided by Gaber in Ref 58. This taxonomy divided stream mining techniques to (1) two phase techniques that star with an online stage, followed by an offline one; (2) Hoeffding bound techniques that provide statistical guarantee on the approximation of the data mining output; (3) symbolic approximation that proved to be the state-of-the-art approach to time series analysis; and (4) granularity-based techniques that provide a framework for resource-adaptive data stream mining.
Limited work has been proposed for visualization of data stream mining results. This is because of the dynamic nature of the results and also to the potential large knowledge structures produced by the process (number of clusters, number of levels in a  decision tree, etc.). Massive Online Analysis (MOA) 59 is a comprehensive data stream mining tool that provides, among its functionalities, dynamic visualization of data stream clustering.
Visualization of the results of data stream mining becomes more challenging when the process run on mobile devices with small screen real-estate like smartphones. Mobile data stream mining has been well researched over the last few years. 60 Open Mobile Miner is a stream mining tool that is tailored to function on mobile platforms. 61 The screen of the mobile device can get cluttered rapidly when trying to visualize the results of a data stream mining process, running onboard the device. Addressing the problem adopting clutter reduction techniques has been reviewed in Ref 62. The reviewed techniques addressed the problem in a general way, not for mobile devices specifically. Ellis and Dix 62 have categorized the clutter reduction techniques into three categories: (1) representation change of data items, (2) distance change among data items, and (3) animation when a temporal dimension of the data exist. Although these approaches deal with the clutter effectively in different cases, they are not self-adaptive.
Adaptability to screen clutter for mobile devices has been proposed by Gaber. 58 The approach is based on a theoretical underpinning that balances perception, clutter, and amount of information of the visual image, having two observations: (1) the more information, the greater the clutter of the visualized image; (2) the higher the clutter, the less the perception of the image. With the objective of increasing the information and reducing the clutter to enhance the perception, Gaber 58 has developed Adaptive Clutter Reduction (ACR). 63 ACR is a generic theory that states an optimization function that works on maximizing both the information and perception while reducing the amount of clutter.
ACR has been applied to data stream clustering on the smartphone. The technique has been coined clutter-aware cluster visualization (CACV). The process is divided into two steps. The first is applying the data stream clustering technique (RA-Cluster). 64 The visualization of the evolving clusters is done in the second step. Applying the ACR theory, all clusters are presented to satisfy two requirements: (1) maximum allowable screen coverage by the clusters and (2) maximum allowable percentage of cluster overlapping. Accordingly, a heuristic-based technique is applied. One of four levels of visualization is applied at any point in time to satisfy the two requirements. The levels are ranked according to the amount of information. Thus, the higher the level, the more information is presented on the screen. When a clutter situation occurs by having at least one of the two requirements not satisfied, the level of visualization changes to the lower ranked in the list. These levels are (1) normal: clusters are presented on the screen having the size representing the number of points in the cluster; (2) scaling: clusters are scaled down in an attempt to satisfy the two requirements; (3) coloring/shading: all clusters would have the same size, having the darker ones represent those clusters with larger number of points; (4) active mode: only active clusters are presented on the screen. It can be observed that each level of visualization represents the level of information. Thus, CACV satisfies the requirements of the ACR theory, by keeping the highest information and perception level, whereas minimizing the screen clutter.
Visualization of data stream mining results is still in its infancy state. ACR theory represents an attempt to generalize the solution for small screen realestate of smartphones. Nevertheless, the approach is applicable for larger screens. However, more work on the application of ACR to other data mining techniques is needed.

VISUAL DATA MINING APPLICATIONS IN THE CHEMICAL PROCESS INDUSTRY
The application of visual analytics to commercial as well as to scientific problems is becoming popular because of the growing availability of commercial as well as scientific visual analytics tools as highlighted in the 'Available Visual Data Analysis Software Systems' section. This section discusses the usage of visual data analytics from the perspective of the chemical process industry as an example. However, further applications of visual data analytics are highlighted in the 'Further Applications of Visual Data Analysis' section.
As an example in this article, we use the case of Evonik Industries AG, which is one of the biggest global players in the chemical process industry. As of 2012, Evonik Industries AG employed roughly 33,000 employees and operates production plants in 24 countries. This and more information on Evonik Industries AG can be found at http://evonik.com.
Evonik Industries AG generates massive amounts of data about production processes and on laboratory-scale research. Typical applications in this industry that collect data are Process Control Systems (PCSs), Process Information Systems (PIMSs), and Laboratory Information Management Systems (LIMSs). The collection of these systems is used to define Key Performance Indicators (KPIs). There are two main user groups of these systems, the production and the research and technical support. The aforementioned systems are generally used for reporting and for the improvement of the production in terms of reducing cost and improving the product quality.
For doing so, the production uses the KPIs but also aims to predict the future product quality using data mining techniques. However, within the production group, the view on visual data mining differs between plant operators and plant managers. For example, the actual plant operator desires detailed information about the actual process' state of a single active plant; the operator uses this information for supervising the current production. The plant manager desires a less detailed overview of the production but for several plants and over a longer period of time. This condensed information for the manager often needs to be displayed in so-called information dashboards. Information dashboards are easy to read condensed data representations often only showing the current status and historic and future trends. The manager uses this information for reporting and improvements of the production processes. 65 Yet, the data for both, the plant operators and the plant managers, are the same. Hence, visual data analytics needs to support filtering and other interaction techniques accordingly. The Examples of Visualizations in the Production section gives examples for the use of visual data analytics in Evonik Industries AG within the production user group.
The research user group's goal is to improve the products' recipes to improve the production quality and lower the cost of raw material. This group aims to automatize their laboratory work by using High Throughput Screening (HTS) and robotics to conduct a multitude of tests in a short period of time, examples can be found here. 66,67 To analyze the experimental data, this group uses information visualization and visual data analytics techniques. The 'Examples of Visualizations in Research' section gives examples for the use of visual data analytics in Evonik Industries AG within the production user group.

Examples of Visualizations in the Production
The visualization of real-time data during a production process aims to identify the process's 'capability index', which is a measure used to estimate the optimal production capacity. There is a balance between changing production parameters (which may destabilize the process) and keeping the process stable. Statistical information and data visualization help to decide about interventions. At Evonik Industries AG, an application called ChemSPF, 68 an in-house development, is used for the statistical evaluation in process analysis and protection of process capability.  The ChemSPF-chart and histogram in Figure 7 show some of the visualizations used to display quality attributes of an ongoing production. The ChemSPFchart is on the bottom half of Figure 7, it also displays the mean value (MV), upper (USL) and lower limits (LSL) shown for triggering a warning, as well as upper and lower limits for triggering interventions (UAL and LAL). Please note that the acronyms used in Figure 7 are based on the German language and hence the acronym expansions are omitted here. Also, the name ChemSPF charts comes from a German expression and is also known as Statistical Process Control (SPC) charts in the English language.
These are just two examples of standard visualizations used in the production. However, there are many more including text-based information. What is important to note is that the charts in Figure 7 display only one variable. However, in a production process, there may be hundreds of variables with different value ranges that have to be observed in real time. Hence, these charts have to be redrawn in adequate time intervals and need to convey large amounts of information simultaneously.

Examples of Visualizations in Research
The production group uses mostly basic visualizations, whereas the research group needs to observe as many variables of historic productions or laboratoryscale productions within the same visualization. The task for the researcher is to extract as much information as possible from the data. Evonik Industries AG uses an application that makes use of visual data analytics techniques, which is used by researchers for analyzeng large quantities of data. The application is called KinFit 69 and is an in-house development. One task of visual analytics in research at Evonik Industries AG, is to find correlations between variables. Therefore, KinFit gives a data overview by showing each dimensions plotted against each other, which is also known as scatter matrix in other data mining tools such as WEKA. 2 Within Evonik Industries AG the visual analytics techniques in KinFit are used for exploring production data and for general research purposes. Information visualizations in KinFit are also used for reporting.
Many more interactive visualization approaches are used. For example, to visualize multiple dimensions (variables) in the same chart, multi-dimensional charts such as displayed in Figure 8 are frequently used. The user can change the encoding of the data elements such as colors, symbols, and the symbol's size as well as moving, zooming, or rotating the graph. KinFit also allows to animate the graph to show a sixth dimension such as time.
A polar or radar chart such as the one displayed in Figure 9, also called spider chart, shows in each spoke one dimension. Each line is representing one data record. This chart can be used for detecting outliers and clusters. The user can interact with the graph parameters and set for each dimension an optimal value and a weight. The application then produces the range of the optimal records as displayed in Figure 9.
However, one challenge of visual analytics that remains to be solved in the chemical process industry is to convey simultaneously thousands of variables with different and changing values ranges in real time.

Further Applications of Visual Data Analysis
The last two subsections showed that visual data analytics is one of the emerging technologies for data analysis in the chemical process industry. Also, a range of applications has been mentioned throughout the article, especially in the 'Available Visual Data Analysis Software Systems' section. However, further Volume 3, July/August 2013

FIGURE 9
KinFit's polar chart implementation for displaying data with multiple dimensions, typically used for identifying optimal production process states. applications exist. For example, the authors of Ref 70 propose a two-stage visualization approach for fraud detection in stock market trading. In the first stage, they use 3D tree maps to observe real-time stock market performance and detect unusual patterns; and in the second stage, they use social network visualization to analyze the behavior of the suspected pattern to detect imminent attacks. In Ref 71, the authors presented a visual analytics tool that visualizes the aggregate risk alongside a traditional wealth-time plot. The system is aimed at nonexpert users to help with personal finance decisions. The system has been used in a laboratory environment by 27 volunteers and been evaluated using several economics methods. For example, the results shows that the users gain greater returns for similar levels of risk. Also, it shows that users of the system explore and learn more than the control group, which is reflected in a higher number of investment modifications of system users. Also 23% of the users of the system claimed that they feel more confident in understanding financial planning, whereas 0% of the control group made this claim. In general, the results show that the system improves the decision-making process. The authors of Ref 72 applied visual analytics on multidimensional gene expression datasets. Their approach is to visualize the gene expression conjoined with statistical data retrieved from the original data. The authors of Ref 73 uses visual analysis techniques in the physics arena to gain insights into simulations of magnetically confined burning plasmas. The authors of Ref 74 have developed a visual analytics system that has been used in a variety of tasks in the context of analyzing hyperspectral images of historical documents.

CONCLUSIONS AND MAJOR CHALLENGES
As discussed in the previous sections of this paper, visual analytics as a field of study is only just emerging. We have attempted to explain that this new field is a combination of interactive visualization and advanced data and predictive analytics that includes building, verifying, and using predictive models. As shown in Figures 1 and 2, it is not only the visualization of the data and its exploratory analysis, the process that is often mistakenly given the name of visual analytics, but also an interactive method available for semiautomated clustering, classification, prediction, dimensionality reduction, information retrieval, feature extraction, etc, which leads to knowledge and actionable decision generation.
In the process of such a visual analytics session, one would expect the underlying predictive models to be searching for and exploiting user feedback in many different forms. Such feedback could take a form of extracting user preferences by monitoring what is used and what is focused on, which in turn could be translated into modified optimization criteria by the underlying models to adapt to the user's current needs. We would also expect that the users should be provided with powerful wizard-driven ways of exploiting and building data processing methods without a need to understand or set any of the technical parameters of such advanced methods. This represents the model building and model visualization interactive loop in Figures 1 and 2.
The visualization of extracted knowledge in turn drives further refinements. This focus on the data or how the data or the existing models should be transformed closes the loop that can be repeated a number of times. This process becomes even more challenging and interesting when the data arrive continuously (i.e., streaming) and the predictive models must operate in a (semi-)automated manner. Visualization of the changing knowledge that is brought to the attention of the user and suggestions of what may be interesting in the recent data are then also a part of the picture, a picture that may be changing very quickly.
There is a growing number of visual data analysis software tools such as the ones highlighted in the 'Available Visual Data Analysis Software Systems' section and various components required for building of visual analytics systems and tools are also being developed and discussed in related fields as highlighted in the 'Typical Concepts and Components of Visual Data Analytics Systems' section, but what remains a formidable challenge is to bring all of the components together in robust tools that can be used by users with a wide range of technical abilities and know-how. As illustrated in a snapshot of visualization tools that are used by one of our partners in the process industry (see Section 'Visual Data Mining Applications in the Chemical Process Industry'), we are far from such visual analytics scenarios as the current practises very often end on interactive visualization for which the tools are available. It is the role of the data analysts to explore the data and look for things that may be interesting, unusual, dangerous, and promising. We are still very heavily relying on the creativity and curiosity of humans who in addition must have quite formidable technical abilities and knowledge of data and predictive analytics tools to realize their full potential.
In developing a next generation of visual analytics systems, one would hope to relax (if not remove completely) this need for the user's deep knowledge of the predictive analytics methods and enhance the ability of visual analytics systems so they in part can become more autonomous and 'creative' and 'curious' in looking for things that are interesting in some sense. To reach such a state, a lot of research and development effort are still needed and one of this publication's goals is to bring this fact to the attention of wider communities with keen interest in what is emerging as the visual analytics field.