The first thing to realize is that we can never be absolutely certain that a signal is present in a data
train [159, 161
]; we can only give confidence levels about its presence, which could be close to 100% for
high values of the SNR. The next thing to realize is that, whatever the SNR may be, we cannot be
absolutely certain about the true parameters of the signal: at best we can make an estimate and these
estimates are given in a certain range. The width of the range depends on the confidence level required,
being larger for higher confidence levels [159
].
Maximum likelihood estimates have long been used to measure the parameters of a known signal buried
in noisy data. The method consists in maximizing the likelihood ratio – the ratio of the probability that a
given signal is present in the data to the probability that the signal is absent [188, 159
]. Maximum
likelihood estimates are not always minimum uncertainty estimates, as has been particularly demonstrated
in the case of binary inspiral signals by Balasubramanian, et al. [66
, 67
]. However, until recently,
this is the method that has been very widely followed in the gravitational wave literature. But
what is important to note is that maximum likelihood estimates are unbiased when the SNR is
large3,
and the mean of the distribution of measured values of the parameters will be centered around
the true parameter values. This is an important quality that will be useful in our discussion
below.
Bayesian estimates, which take into account any prior knowledge that may be available about the
distribution of the source parameters, often give much better estimates and do not rely on the availability of
an ensemble of detector outputs [343, 274]. However, they are computationally a lot more expensive than
maximum likelihood estimates.
In any one measurement, any estimated parameters, however efficient, robust and accurate, are unlikely
to be the actual parameters of the signal, since, at any finite SNR, noise alters the input signal. In the
geometric language, the signal vector is being altered by the noise vector and our matched filtering aims at
computing the projection of this altered vector onto the signal space. The true parameters are expected to
lie within an ellipsoid of dimensions at a certain confidence level – the volume of the ellipsoid increasing
with the confidence level at a given SNR but decreasing with the SNR at a given confidence
level.
The ambiguity function, well known in the statistical theory of signal detection [188], is a very powerful
tool in signal analysis: it helps one to assess the number of templates required to span the
parameter space of the signal [324], to make estimates of variances and covariances involved in the
measurement of various parameters, to compute biases introduced in using a family of templates
whose shape is not the same as that of a family of signals intended to be detected, etc. We
will see below how the ambiguity function can be used to compute the required number of
templates. Towards the end of this section we will use the ambiguity function for the estimation of
parameters.
The ambiguity function is defined (see Equation (91) below) as the scalar product of two normalized
waveforms maximized over the initial phase of the waveform, in other words, the absolute value of the scalar
product4.
A waveform
is said to be normalized if
, where the inner product is inversely weighted by
the PSD, as in Equation (79
). Among other things, normalized waveforms help in defining signal strengths:
a signal is said to be of strength
if
. Note that the optimal SNR for such a signal of strength
is,
.
Let , where
is the parameter vector comprised of
parameters,
denote a normalized waveform. It is conventional to choose the parameter
to be the lag
, which
simply corresponds to a coordinate time when an event occurs and is therefore called an extrinsic
parameter, while the rest of the
parameters are called the intrinsic parameters and characterize the
gravitational wave source.
Given two normalized waveforms and
, whose parameter vectors are not necessarily the
same, the ambiguity
is defined as
It is clear that the ambiguity function is a local maximum at the “correct” set of parameters,
. Search methods that vary
to find the best fit to the parameter values make use of
this property in one way or another. But the ambiguity function will usually have secondary
maxima as a function of
with fixed
. If these secondaries are only slightly smaller than the
primary maximum, then noise can lead to confusion: it can, at random, sometimes elevate a
secondary and suppress a primary. These can lead to false measurements of the parameters. Search
methods need to be designed carefully to avoid this as much as possible. One way would be
to fit the known properties of the ambiguity function to an ensemble of maxima. This would
effectively average over the noise on individual peaks and point more reliably to the correct
one.
It is important to note that in the definition of the ambiguity function there is no need for the functional
forms of the template and signal to be the same; the definition holds true for any signal-template pair of
waveforms. Moreover, the number of template parameters need not be identical (and usually
aren’t) to the number of parameters characterizing the signal. For instance, a binary can be
characterized by a large number of parameters, such as the masses, spins, eccentricity of the
orbit, etc., while we may take as a model waveform the one involving only the masses. In the
context of inspiral waves, is the exact general relativistic waveform emitted by a binary,
whose form we do not know, while the template family is a post-Newtonian, or some other,
approximation to it, that will be used to detect the true waveform. Another example would be signals
emitted by spinning neutron stars, isolated or in binaries, whose time evolution is unknown,
either because we cannot anticipate all the physical effects that affect their spin, or because the
parameter space is so large that we cannot possibly take into account all of them in a realistic
search.
Of course, in such cases we cannot compute the ambiguity function, since one of the arguments to the ambiguity function is unknown. These are, indeed, issues where substantial work is called for. What are all the physical effects to be considered so as not to miss out a waveform from our search? How to make a choice of templates when the functional form of templates is different from those of signals? For this review it suffices to assume that the signal and template waveforms are of identical shape and the number of parameters in the two cases is the same.
The computational cost of a search and the estimation of parameters of a signal afford a lucid geometrical
picture developed by Balasubramanian et al. [67] and Owen [278
]. Much of the discussion below is
borrowed from their work.
Let ,
, denote the discretely sampled output of a detector. The set of all possible
detector outputs satisfy the usual axioms of a vector space. Therefore,
can be thought of as an
-dimensional vector. It is more convenient to work in the continuum limit, in which case we
have infinite dimensional vectors and the corresponding vector space. However, all the results
are applicable to the realistic case in which detector outputs are treated as finite dimensional
vectors.
Amongst all vectors, of particular interest are those corresponding to gravitational waves from a given
astronomical source. While every signal can be thought of as a vector in the infinite-dimensional vector
space of the detector outputs, the set of all such signal vectors do not, by themselves, form a vector
space. However, the set of all normed signal vectors (i.e., signal vectors of unit norm) form a
manifold, the parameters of the signal serving as a coordinate system [66, 67
, 278
, 280]. Thus,
each class of an astronomical source forms an
-dimensional manifold
, where
is
the number of independent parameters characterizing the source. For instance, the set of all
signals from a binary on a quasi-circular orbit inclined to the line of sight at an angle
,
consisting of nonspinning black holes of masses
, and
, located a distance
from the
Earth5
initially in the direction
and expected to merge at a time
with the phase of the signal at
merger
, forms a nine-dimensional manifold with coordinates
, where
is the polarization angle of the signal. In the general case of a signal characterized by
parameters
we shall denote the parameters by
, where
.
The manifold can be endowed with a metric
that is induced by the scalar product
defined in Equation (79
). The components of the metric in a coordinate system
are defined
by6
Now, by Taylor expanding around
, and keeping only terms to second order in
,
it is straightforward to see that the overlap
of two infinitesimally close signals can be computed using
the metric:
The metric on the signal manifold is nothing but the well-known Fisher information matrix usually
denoted , (see, e.g., [188, 284]) but scaled down by the square of the SNR, i.e.,
. The
information matrix is itself the inverse of the covariance matrix
and is a very useful quantity in signal
analysis.
Having defined the metric, we next consider the application of the geometric formalism in the estimation of
statistical errors involved in the measurement of the parameters. We closely follow the notation of Finn and
Chernoff [159, 161, 115].
Let us suppose a signal of known shape with parameters is buried in background noise that is
Gaussian and stationary. Since the signal shape is known, one can use matched filtering to dig the signal out
of the noise. The measured parameters
will, in general, differ from the true parameters of the
signal7.
Geometrically speaking, the noise vector displaces the signal vector and the process of matched filtering
projects the (noise + signal) vector back on to the signal manifold. Thus, any nonzero noise will make it
impossible to measure the true parameters of the signal. The best one can hope for is a proper statistical
estimation of the influence of noise.
The posterior probability density function of the parameters
is given by a multivariate Gaussian
distribution8:
Let us first specialize to one dimension to illustrate the region of the parameter space with which one
should associate an event at a given confidence level. In one dimension the distribution of the deviation
from the mean of the measured value of the parameter is given by
These results generalize to dimensions. In
-dimensions the volume
is defined by
When the SNR is large and
is not close to zero, the triggers are found from the signal with
matches greater than or equal to
. Table 2 lists the value of
for several values of
in one, two and three-dimensions and the minimum match
for SNRs 5, 10 and
20.
![]() |
![]() |
![]() |
||||
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
||||||
5 | 0.04 | 0.9899 | 0.16 | 0.9592 | 0.36 | 0.9055 |
10 | 0.01 | 0.9975 | 0.04 | 0.9899 | 0.09 | 0.9772 |
20 | 0.0025 | 0.9994 | 0.01 | 0.9975 | 0.0225 | 0.9944 |
![]() |
||||||
5 | 0.092 | 0.9767 | 0.2470 | 0.9362 | 0.4800 | 0.8718 |
10 | 0.023 | 0.9942 | 0.0618 | 0.9844 | 0.1200 | 0.9695 |
20 | 0.00575 | 0.9986 | 0.0154 | 0.9961 | 0.0300 | 0.9925 |
![]() |
||||||
5 | 0.1412 | 0.9641 | 0.32 | 0.9165 | 0.568 | 0.8462 |
10 | 0.0353 | 0.9911 | 0.08 | 0.9798 | 0.142 | 0.9638 |
20 | 0.00883 | 0.9978 | 0.02 | 0.9950 | 0.0355 | 0.9911 |
Table 2 should be interpreted in light of the fact that triggers come from an analysis pipeline in which the templates are laid out with a certain minimal match and one cannot, therefore, expect the triggers from different detectors to be matched better than the minimal match.
From Table 2, we see that, when the SNR is large (say greater than about 10), the dependence of the
match on
is very weak; in other words, irrespective of the number of dimensions, we expect the
match between the trigger and the true signal (and for our purposes the match between triggers
from different instruments) to be pretty close to 1, and mostly larger than a minimal match of
about 0.95 that is typically used in a search. Even when the SNR is in the region of 5, for
low
again there is a weak dependence of
on the number of parameters. For large
and low SNR, however, the dependence of
on the number of dimensions becomes
important. At an SNR of 5 and
,
for
dimensions,
respectively.
Bounds on the estimation computed using the covariance matrix are called Cramér–Rao bounds.
Cramér–Rao bounds are based on local analysis and do not take into consideration the effect of distant
points in the parameter space on the errors computed at a given point, such as the secondary maxima in the
likelihood. Though the Cramér–Rao bounds are in disagreement with maximum likelihood estimates,
global analysis, taking the effect of distant points on the estimation of parameters, does indeed give
results in agreement with maximum likelihood estimation as shown by Balasubramanian and
Dhurandhar [65].
A good example of an efficient detection algorithm that is not a reliable estimator is the time-frequency transform of a chirp. For signals that are loud enough, a time-frequency transform of the data would be a very effective way of detecting the signal, but the transform contains hardly any information about the masses, spins and other information about the source. This is because the time-frequency transform of a chirp is a mapping from the multi-dimensional (17 in the most general case) space of chirps to just the two-dimensional space of time and frequency. Even matched filtering, which would use templates that are defined on the full parameter space of the signal, would not give the parameters at the expected accuracy. This is because the templates are defined only at a certain minimal match and might not resolve the signal well enough, especially for signals that have a high SNR.
In recent times Bayesian inference techniques have been applied with success in many areas in astronomy and cosmology. These techniques are probably the most sensible way of estimating the parameters, and the associated errors, but cannot be used to efficiently search for signals. Bayesian inference is among the simplest of statistical measures to state, but is not easy to compute and is often subject to controversies. Here we shall only discuss the basic tenets of the method and refer the reader for details to an excellent treatise on the subject (see, e.g., Sivia [343]).
To understand the chief ideas behind Bayesian inference, let us begin with some basic concepts in
probability theory. Given two hypothesis or statements and
about an observation, let
denote the joint probability of
and
being true. For the sake of clarity, let
denote a statement about the universe and
some observation that has been made.
Now, the joint probability can be expressed in terms of the individual probability densities
and
and conditional probability densities
and
as follows:
For instance, if denotes the statement it is going to rain and
the amount of humidity in the air
then the above equation gives us the posterior probability that it rains when the air contains a certain
amount of humidity. Clearly, the posterior depends on what is the likelihood of the air having a certain
humidity when it rains and the prior probability of rain on a given day. If the prior is very small
(as it would be in a desert, for example) then you would need a rather large likelihood for
the posterior to be large. Even when the prior is not so small, say a 50% chance of rain on
any given day (as it would be if you are in Wales), the likelihood has to be large for posterior
probability to say something about the relationship between the level of humidity and the chance of
rain.
As another example, and more relevant to the subject of this review, let be the statement the data
contains a chirp (signal),
the statement the data contains an instrumental transient, (noise), and let
be a test that is performed to infer which of the two statements above are true. Let us suppose
is a
very good test, in that it discriminates between
and
very well, and say the detection
probability is as high as
with a low false alarm rate
(note that
and
need not necessarily add up to 1). Also, the expected event rate of a chirp
during our observation is low,
, but the chance of an instrumental transient is
relatively large,
. We are interested in knowing what the posterior probability of
the data containing a chirp is, given that the test has been passed. By Bayes theorem this is
Thus, Bayesian inference neatly folds the prior knowledge about sources in the estimation process.
One might worry that the outcome of a measurement process would be seriously biased by our
preconception of the prior. To understand this better, let us rewrite Equation (106) as follows:
The above example tells us why we have to work with unusually-small false-alarm probability in the case of gravitational wave searches. For instance, to search for binary coalescences in ground-based detectors we use a (power) SNR threshold of about 30 to 50. This is because the expected event rate is about 0.04 per year.
Computing the posterior involves multi-dimensional integrals and these are highly expensive
computationally, when the number of parameters involved is large. This is why it is often not possible to
apply Bayesian techniques to continuously streaming data; it would be sensible to reserve the application of
Bayesian inference only for candidate events that are selected from inexpensive analysis techniques. Thus,
although Bayesian analysis is not used in current detection pipelines, there has been a lot of effort in
evaluating its ability to search for [116, 351, 123, 121
] and measure the parameters of [117
, 122
, 377] a
signal and in follow-up analysis [378].
http://www.livingreviews.org/lrr-2009-2 | ![]() This work is licensed under a Creative Commons License. Problems/comments to |