Data Mining in Pharmacovigilance: A View from the Uppsala Monitoring Centre
DATA MINING
In
this chapter we are going to use the term ‘data mining’ for any computational
method used to auto-matically and continuously extract useful information from
large amounts of data. Data mining is a form of exploratory data analysis
(Hand, Mannila and Smyth, 2001) and a key component of the knowledge discov-ery
process (Fayyad, Piatetsky-Shapiro and Smyth, 1996). Data mining can clearly be
used on any data set, but the approach seems particularly valuable when the
amount of data is large and the possible rela-tionships within the data set
numerous and complex. Although data mining of drug utilisation information, and
other relevant data sets such as those relating to poisoning, medical error and
patient records, will add greatly to pharmacovigilance (Anonymous, 2003; Bate et al., 2004), research in this area is
still prelim-inary and will not be discussed in detail here.
In
principle the WHO Collaborating Centre for International Drug Monitoring (the
Uppsala Monitor-ing Centre, UMC) has been doing data mining since the
mid-1970s, using an early relational database. As with many automated systems,
the relational database to a very large extent replicated a manual approach. In
this instance it was the Canadian ‘pigeon hole’ system (Napke, 1977), where
reports were physically assigned a slot, which encouraged visual inspection.
Thus observation could be made of when certain cate-gories of report were
unexpectedly high. From the UMC database, countries in the WHO Programme for
International Drug Monitoring have been provided with information, reworked by
the UMC, on the summarised case data that is submitted from each national
centre. This information has been presented to them according to agreed
categories and classifi-cations as determined amongst Programme members from
time-to-time. This kind of system suffers from the following limitations:
·
It is prescriptive, the groupings being determined on what
is found broadly useful by experience
·
Each category is relatively simple, but the informa-tion beneath
each heading is complex, and format-ted rigidly
·
There is no indication of the probability of any
rela-tionship other than the incident numbers in each time period.
This
system does not even have all the user-friendliness of the pigeon hole system,
which allowed a user to visually scan the amount of reports as they were filed
to see the rate of build up in each pigeon hole. Admittedly, the sorting was
relatively coarse, but the continuous visual cue given by the accumu-lation of
case reports was very useful. In improving on the pigeon hole system and
adapting it for the ever-increasing amounts of data involved, one can imagine a
computer program being able to survey all data fields looking for any pair of
events that stand out as occurring together more frequently than expected.
Different measures of association have been proposed for the purpose of
analysing disproportional reporting of ADR terms with drug substances. The
proportional reporting ratio (PRR), which is akin to a relative risk, and the
reporting odds ratio (ROR) are classical statistical measures of association
that can be combined with for example chi-squared tests for associations to
guard against spurious findings. Bayesian and empirical Bayesian approaches
take this one step further by providing shrinkage estimates such as the
Information Component (IC) (Bate et al.,
1998; Orre et al., 2000) and the EBGM
(DuMouchel, 1999). These are typically closer to the null hypothesis of
independence than classical estimates and less volatile when data is scarce. As
such, they provide robust measures of association that account for both
signifi-cance and strength. Furthermore, a Bayesian approach is intuitively
correct for a situation where there is a need to continuously re-assess
probability of relation-ships with the acquisition of new data and over time.
In Bayesian inference, new data modifies the prior probabilities to posterior
probabilities, and the poste-rior probabilities can be used as prior
probabilities in subsequent analyses. The process can be iterated indefinitely.
The
next level of complexity is to consider the effects of adding other objects as
variables. Complex pattern recognition in spontaneous reporting data may
extract information related to ADR syndromes, patient risk groups, drug interactions
and data quality prob-lems. It typically increases the computational demands
and often requires more sophisticated quantitative methods. The UMC has chosen
the Bayesian Confi-dence Propagation Neural Network (BCPNN) as the most
favourable framework for development in this area. This is a statistical neural
network consisting of a matrix of interconnected nodes that represent different
data fields. It is trained according to Bayes law based on the data provided to
it. The use of Bayesian logic seems natural since the relationship between each
node will alter as more data is added. The network ‘learns’ the new weights
between nodes, and can be asked how much those weights are changed by the
addition of new case data or by the consideration of higher-order associations.
Related Topics
TH 2019 - 2023 pharmacy180.com; Developed by Therithal info.