How to Make Good Use of Diagnostic or Classification Criteria

for Rheumatoid Arthritis or Other Diseases


(By Branko Soric)

December, 2009


The gist of the problem


      It is usual to use a contingency table in order to show the numbers of patients who have or do not have a certain disease (for example rheumatoid arthritis - briefly: RA) which is diagnosed either truly or falsely:

                                            True RA      True non-RA              Sum        _

Diagnosed as "RA":               a                     b                           a+b   (=S)

Diagnosed as "non-RA":        c                     d                           c+d         _

                                 Sum:      a+c  (=R)        b+d  (=N)      a+b+c+d  (=M)


      S  and  M  are known, while  R  and  N  are unknown. We can find the values of  R  and  N,  as well as the proportions of true and wrong "RA"-diagnoses  (a/S  and  b/S)  by simply solving a system of two equations with two unknowns  (see in the appendix!).

      Theoretically, in very large sets, and if sensitivity and specificity of a test are known exactly, the value  b/S = P  (as well as  a/S = 1-P)  is absolutely dependent on  S/M.



Bayes' postulate


      It seems that (mis)understanding the use of diagnostic or classification criteria for rheumatoid arthritis (RA) -  similarly as in the case of statistical inference -  was for some time based on the wrong Bayes' postulate (not to be confused with the correct Bayes' theorem!)(1), which says that unknown probabilities (or proportions) may be supposed to be equal to each other.

      For example, if we suppose that in a large set of experiments there are as many true null hypotheses as true alternatives, we may easily obtain a small proportion of false discoveries (among statistically significant results). Likewise, if the sensitivity and specificity of the classification criteria for RA are near 0.9, and if we suppose that there are as many RA patients as non-RA patients in a large group of rheumatic patients, then we may expect about ten percent of misclassifications.  (It should be kept in mind, though, that percentages, obtained by applying the classification criteria for RA, can only be approximate or expected values. Much larger groups of patients would have to be studied in order to obtain more exact results).


      However, there are 1.3 million cases of RA among U.S. adults, and 46.4 million cases of all forms of arthritis.(2)  If the classification criteria were applied to all these patients, we might easily detect about one million true RA cases (from the RA group) and some 4.6 million false-RA cases (from the other group). As a result, some 82 percent of those classified as RA would be misclassified ! (Namely: 4.6/5.6 = 0.82). Only the remaining 18 percent would indeed have RA.

      I have asked some of the leading experts if the classification criteria should be applied to the general population, or to a population of patients coming to general-practitioners' offices, or only to rheumatic patients, or only to those who come to rheumatologists' offices, etc.  I have received this answer: "The criteria were designed for patients who were candidates for the diagnosis of RA, and not for the general population". But, who are the candidates?  Cannot almost any patients with arthritis or arthralgia (or some other symptom(s)) be considered as candidates?


Diagnosing or classifying?


      The ARA diagnostic criteria for rheumatoid arthritis were proposed in 1956 and revised in 1958.  In 1987 the American College of Rheumatology (ACR) published a new revised set of classification criteria, where the term “diagnostic” criteria was replaced by “classification”.(3)

      According to some distinguished rheumatologists, "the criteria are used for classification in studies, not for individual diagnosis, which is an individual clinical decision for each patient".  - "The ACR criteria were put together so that investigators doing clinical trials would agree as to whether or not a patient had 'rheumatoid arthritis,' as defined by the criteria".  - "The sole purpose of the ACR criteria was to standardize on a patient set that could be entered into clinical trials".  - Etc.

      The 1987 criteria for the classification of rheu­matoid arthritis (RA) were formulated from a comput­erized analysis of 262 patients with RA and 262 control subjects with rheumatic diseases other than RA (i.e. non-RA), where the diagnoses had not been originally based on classification criteria, but the criteria were later applied to these two groups. Let  M  denote the known total number of patients (M = 524).  Let  R  be the unknown number of true RA cases, and let  N  be the unknown number of true non-RA cases.  M = R+N = 524.  Table 1. shows a part of the results i.e. numbers of correctly and incorrectly classified cases (which numbers are taken from the paper by Arnett et al.(3)):

_                                                                                                             _

Table 1.   (Classification tree method)               Classified:

_                                                Overall       Correctly       Incorrectly _

RA patients:                             262 =R       245  =pR        17 =(1-p)R

Non-RA (control subjects):     262 =N       234  =p'N       28 =(1-p')N_


   Sensitivity =  245 / 262  =  0.935  =p           [ M = R+N = 524 ]

   Specificity =  234 / 262  =  0.893  =p'          [ S= pR+(1-p')N = 273 ]


Classified as RA:   S =  273 =  245 (true RA)  +  28 (true non-RA)       

Classified as non-RA:   251 =    17 (true RA)  +234 (true non-RA)   _

_                Total:   M = 524 =  262 (true RA)  +262 (true non-RA)   _

(See also Table 4. in the appendix!)


      So, there were as many true RA cases as true non-RA cases (262+262=524).  Of all the 524 patients, 273 were classified as RA, while 251 were classified as non-RA. The proportions of misclassifications were not large:  28/273 = 0.103 among those classified as RA, and 17/251 = 0.068  in the other group (classified as non-RA). 

      However, if the numbers of true RA and true non-RA cases are very different from each other, there can be many misclassifications.  The unknown proportions of false classifications can be calculated from the observed proportion of cases classified as RA  (or non-RA)  among all the patients. (If we know S/M, we can calculate the unknown values and proportions).


Finding the proportion of wrongly classified cases


      If  p = 0.935 =sensitivity, and if  p' = 0.893 = specificity,  then the known number (S) of cases classified as RA is  S = pR+(1-p')N = 273  where N and R are not known, but we can calculate them as follows:

      A simple derivation (in the appendix!) yields:  R =[S- 0.107M[ / 0.828 = 262 and hence we can also calculate:  N = M-R = 524-262 = 262. 

      The proportion (P) of misclassified cases among the  S  cases that are classified as RA is:   P = (1-p')N/(pR+p'N) = (1-p')N/S  ;   the proportion (P*) of misclassified cases among the  M-S  cases that are classified as non-RA is:   P* = (1-p)R/(M-S). 

      Some combinations of the values R and N are given in Table 2.  For a given known ratio  S/M  we can find in the table the corresponding proportion of mistakes (P).

_                                                                                                       _

Table 2.

M     =     R    +    N        S/M             (M-S)/M       P            P*_

1000 =  1000  +     0        0.935 = p      0.065=1-p        0          1.000

1000 =    900  +   100      0.852            0.148            0.013       0.395

1000 =    800  +   200      0.769            0.231            0.028       0.225

1000 =    700  +   300      0.687            0.313            0.047       0.145

1000 =    600  +   400      0.604            0.396            0.071       0.098

1000 =    500  +   500      0.521            0.479            0.103       0.068

1000 =    400  +   600      0.438            0.562            0.147       0.046

1000 =    300  +   700      0.355            0.645            0.211       0.030

1000 =    200  +   800      0.273            0.727            0.314       0.018

1000 =    100  +   900      0.190            0.810            0.507       0.008

1000 =      0    + 1000      0.107=1-p'   0.893 = p'     1.000          0  _

_          (p = sensitivity = 0.935 ;    p' = specificity = 0.893)          _ 


Proportion of true RA diagnoses


      The diagnostic criteria from 1958 were sometimes used with the intention of diagnosing RA in individual cases, before the introduction of the new classification criteria of 1987.  Perhaps some of the latter criteria might sometimes help to estimate the percentage of true RA diagnoses(?).  For example, let us consider a possibility of applying only three of the seven criteria given in Table 3.

_                                                                                                                                   _

Table 3.                                Sensitivity (p)             Specificity (p')                          _

1. Morning stiffness           91.2 %    p1=0.912      40.4 %   p'1=0.404   1-p'1=0.596

2. Arthritis of 3 or

    more joints                      90.7 %    p2=0.907      84.0 %   p'2=0.840   1-p'2=0.160 

3. Arthritis of hand joints   79.3 %    p3=0.793      84.0 %   p'3=0.840   1-p'3=0.160

4. Symmetric arthritis

    (any region)                     94.3 %    p4=0.943      74.3 %   p'4=0.743   1-p'4=0.257

5. Rheumatoid nodules       43.4  %   p5=0.434      97.7 %   p'5=0.977   1-p'5=0.023

6. Serum rheumatoid

    factor                                80.4 %    p6=0.804      87-0 %   p'6=0.870   1-p'6=0.130

7. Radiographic changes

    (ARA)                              77.2 %    p7=0.772      93.7 %   p'7=0.937   1-p'7=0,063

(Note:  According to Arnett et al.(3), a patient is said to have rheumatoid

_           arthritis if he/she has satisfied at least 4 of the above 7 criteria).              _


      [ From the data given in Table 3. we find:  p2p5p7 = = 0.9070.4340.772 = 0.304  and:  (1-p'2)( 1-p'5)( 1-p'7) = 0.160.0230.0063 =  0.00023 ].


      Let us suppose that the criteria No. 2., 5. and 7. are applied to a group of 36,000  patients (=M), and suppose that the unknown number of RA patients in this group is only 1000 (=R).  (So, there are N = 35,000 non-RA patients in this group). On the assumption (which is not sure!) that the symptoms No. 2., 5. and 7. appear in single patients independently of each other (i.e. in random combinations), only about 304 RA patients will be discovered  (namely:  p2p5p7 = 0.304 ;  10000.304=304), but we can expect more than 97 percent of the RA diagnoses to be true. Namely, the expected number of wrong RA diagnoses is 35,0000.00023= 8 ;  8/(304+8)=0.026 or 2.6 percent.


Other diseases


      There are other diagnostic criteria that can be used for diagnosing (or classifying) various other diseases.(7)  Perhaps, in order to determine the achieved proportion of true diagnoses of such a disease, it might be useful to reason in a similar way as above and use the observed proportion of diagnoses made on a large number of patients (S/M).



Diagnoses and discoveries


      The possibility to estimate the proportion of false diagnoses by means of the classification criteria for RA is remindful of statistical verification of the results of scientific experiments by means of calculating the "false discovery rate" (FDR)(4) or by some other similar methods.(5) 

      Namely, if we diagnose RA in a single case, or if we obtain a statistically-significant result in a single experiment, we may surmise that our diagnosis - or our "statistical discovery", respectively - may be true.  However, in order to determine the probability of a mistake, we need a large set of patients, or a large set of experiments, respectively. 

      If we know the ratio S/M, we can calculate the proportion P, as described above.  Similarly, if we know the ratio (r/n) of statistically significant results (r) in a large set of experiments (n), we can calculate (for example) the expected maximal proportion of false discoveries (Qmax), from the following formula: 

      Qmax = [(n/r)-1] / [(1/0.05)-1],  where 0,05 is the "5-percent level of statistical significance".(5)(6)  For example, if  r/n = 0.69  (i.e.:  n/r = 1.449),  we find:  Qmax = 0.024 which shows that the maximally-expected proportion of false discoveries is less than 0.05 i.e. less than 5 percent.  




      Known values:  M,  S,   S/M,  (M-S)/M = 1-(S/M),   p =  0.935   p' = 0.893  

wherefrom we calculate:  R ,  N,   P,   P*

Two equations with two unknowns (R and N):    S = pR+(1-p')N  ;    M =R+N

S = pR + (1-p')(M-R) =  pR + (1-p')M-(1-p')R = (p+p'-1)R+(1-p')M

S-(1-p')M = (p+p'-1)R ;    R =  [S-(1-p')M] /(p+p'-1) 

1-p' = 0.107 ;  p+p'-1 = 0.828  ;     R =  [S - 0.107M] / 0.828  ;  

R/M =  [(S/M) - 0.107] / 0.828 ;      N=M-R      

S/M =  0.828R/M +0.107 ;   P = (1-p')N/(pR+p'N) = (1-p')N/S

P= (1-p')N/S ;    P* = (1-p)R/(M-S)   


Table 4. _                      True RA            True non-RA              Total              _

Classified as RA:            a = 245                  b =   28                   a+b = 273 =S     

Classified as non-RA:     c =   17                  d = 234                   c+d = 251        

_                Total:          a+c= 262 =R        b+d = 262 =N     a+b+c+d= 524 =M_


Sensitivity = p = a/(a+c) ;      specificity = p' = d/(b+d) 

Positive predictive value = a/(a+b)  = 1-P  ;     P = (1-p')N/S  = b/(a+b)  

P  is the proportion of non-RA cases among the  S  cases that are classified as RA.    


      See: DISCUSSION (click here!)




1.  Lancelot Hogben: Mathematics in the Making, Rathbone Books Limited, London - Mladinska knjiga, Ljubljana, 1977. p. 269 


2.  Charles G. Helmick, David T. Felson, Reva C. Lawrence, et al.: Estimates of the Prevalence of Arthritis and Other Rheumatic Conditions in the United States, Arthritis & Rheumatism,  Part 1,  Vol. 58,  No. 1,  January 2008,  pp. 15-25



3.  Frank C. Arnett, Steven M. Edworthy, Daniel A. Bloch, et al.: The American Rheumatism Association 1987 Revised Criteria for the Classification of Rheumatoid Arthritis, Arthritis and Rheumatism, Vol. 31. No. 3  March 1988



4.  Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B, 57, 289-300.


5.  Sorić, B. (1989). Statistical 'Discoveries' and Effect-size Estimation, Journal of the American Statistical Association, Vol. 84, No. 406 (Theory and Methods),  608-610


6.  Soric, B (2001).  Statisticko zakljucivanje  (Statistical Inference)




7.   Some Internet addresses of papers dealing with diagnostic criteria for various diseases:


      Diagnostic Criteria for Multiple Sclerosis


      Revised diagnostic criteria for neuromyelitis optica


      An Empirical Study of Different Diagnostic Criteria for Delirium Among Elderly Medical Inpatients


      Diagnostic Criteria for Prader-Willi Syndrome


      DSM-IV-TR criteria for PTSD  -  "In 2000, the American Psychiatric Association revised the PTSD diagnostic criteria in the fourth edition of its Diagnostic and Statistical Manual of Mental Disorders (DSM-IV-TR)".


      Diagnostic Criteria for NF-1 (Neurofibromatosis type 1)


      Validity of diagnostic criteria for chronic inflammatory demyelinating polyneuropathy: a multicentre European study


      Diagnostic Criteria for Atrophic Rhinosinusitis


      The revised World Health Organization diagnostic criteria for polycythemia vera, essential thrombocytosis, and primary myelofibrosis: an alternative proposal


      Diagnostic Criteria for HHT (Hereditary Hemorrhagic Telangiectasia)


      Diagnostic Criteria For Diabetes Mellitus


      Diagnostic criteria for atopic dermatitis: a systematic review


      ICHD-II Diagnostic Criteria for Cluster Headache


      New Diagnostic Criteria For Alzheimer's Disease


December, 2009

Branko Sorić

(doctor of medicine, retired)


Vlaška 84

10000 Zagreb, Croatia

Fax: +385 1 4623 436


See: DISCUSSION (click here!)