An obvious question to ask is what constitutes a “large
number” of synthetic data sets. Based on an analysis
of the expected worst-case performance of the method,
a good rule of thumb turns out to be the following: if
we wish our p-values to be accurate to within about e of
the true value, then we should generate at least 1
4 e−2 synthetic data sets. Thus if, for example, we wish our p-value to be accurate to about 2 decimal digits, we would choose e = 0.01, which implies we should generate about 2500 synthetic sets.
number” of synthetic data sets. Based on an analysis
of the expected worst-case performance of the method,
a good rule of thumb turns out to be the following: if
we wish our p-values to be accurate to within about e of
the true value, then we should generate at least 1
4 e−2 synthetic data sets. Thus if, for example, we wish our p-value to be accurate to about 2 decimal digits, we would choose e = 0.01, which implies we should generate about 2500 synthetic sets.