APPENDIX C.
Accuracy of the Data

CONTENTS

Sample Design
Confidentiality of the Data
Errors in the Data
Estimation Procedure
Editing of Unacceptable Data

INTRODUCTION


The data contained in this data product are based on the 1990 census
sample.  The data are estimates of the actual figures that would have
been obtained from a complete count.  Estimates derived from a sample
are expected to be different from the 100-percent figures because they
are subject to sampling and nonsampling errors.  Sampling error in data
arises from the selection of persons and housing units to be included
in the sample.  Nonsampling error affects both sample and 100-percent
data, and is introduced as a result of errors that may occur during the
collection and processing phases of the census.  Provided below is a
detailed discussion of both types of errors and a description of the
estimation procedures.

SAMPLE DESIGN


  Every person and housing unit in the United States was asked certain
basic demographic and housing questions (for example, race, age, marital
status, housing value, or rent).  A sample of these persons and housing
units was asked more detailed questions about such items as income,
occupation, and housing costs in addition to the basic demographic and
housing information.  The primary sampling unit for the 1990 census was
the housing unit, including all occupants.  For persons living in group
quarters, the sampling unit was the person.  Persons in group quarters
were sampled at a 1-in-6 rate.

  The sample designation method depended on the data collection
procedures.  Approximately 95 percent of the population was enumerated
by the mailback procedure.  In these areas, the Bureau of the Census
either purchased a commercial mailing list, which was updated by the
United States Postal Service and Census Bureau field staff, or prepared
a mailing list by canvassing and listing each address in the area prior
to Census Day.  These lists were computerized and the appropriate units
were electronically designated as sample units.  The questionnaires were
either mailed or hand-delivered to the addresses with instructions to
complete and mail back the form.

  Housing units in governmental units with a precensus (1988) estimated
population of fewer than 2,500 persons were sampled at 1-in-2.  Govern-
mental units were defined for sampling purposes as all incorporated 
places, all counties, all county equivalents such as parishes in
Louisiana, and all minor civil divisions in Connecticut, Maine,
Massachusetts, Michigan, Minnesota, New Hampshire, New Jersey, New York,
Pennsylvania, Rhode Island, Vermont, and Wisconsin.  Housing units in
census tracts and block numbering areas (BNA's) with a precensus housing
unit count below 2,000 housing units were sampled at 1-in-6 for those
portions not in small governmental units (governmental units with a
population less than 2,500).  Housing units within census tracts and
BNA's with 2,000 or more housing units were sampled at 1-in-8 for those
portions not in small governmental units.

  In list/enumerate areas (about 5 percent of the population), each
enumerator was given a blank address register with designated sample
lines.  Beginning about Census Day, the enumerator systematically
canvassed an assigned area and listed all housing units in the address
register in the order they were encountered.  Completed questionnaires,
including sample information for any housing unit listed on a designated
sample line, were collected.  For all governmental units with fewer than
2,500 persons in list/enumerate areas, a 1-in-2 sampling rate was used.
All other list/enumerate areas were sampled at 1-in-6.

  Housing units in American Indian reservations, tribal jurisdiction
statistical areas, and Alaska Native villages were sampled according to
the same criteria as other governmental units, except the sampling
rates were based on the size of the American Indian and Alaska Native
population in those areas as measured in the 1980 census.  Trust lands
were sampled at the same rate as their associated American Indian
reservations.  Census designated places in Hawaii were sampled at the
same rate as governmental units because the Census Bureau does not
recognize incorporated places in Hawaii.

  The purpose of using variable sampling rates was to provide
relatively more reliable estimates for small areas and decrease
respondent burden in more densely populated areas while maintaining
data reliability.   When all sampling rates were taken into account
across the Nation, approximately one out of every six housing units in
the Nation was included in the 1990 census sample.

CONFIDENTIALITY OF THE DATA


To maintain the confidentiality required by law (Title 13, United
States Code), the Bureau of the Census applies a confidentiality edit
to the 1990 census data to assure that published data do not disclose
information about specific individuals, households, or housing units.
As a result, a small amount of uncertainty is introduced into the esti-
mates of census characteristics.  The sample itself provides adequate
protection for most areas for which sample data are published since the
resulting data are estimates of the actual counts; however, small areas
require more protection.  The edit is controlled so that the basic
structure of the data is preserved.

  The confidentiality edit is implemented by selecting a small subset
of individual households from the internal sample data files and blanking
a subset of the data items on these household records.  Responses to
those data items were then imputed using the same imputation procedures
that were used for nonresponse.  A larger subset of households is
selected for the confidentiality edit for small areas to provide greater
protection for these areas.  The editing process is implemented in such
a way that the quality and usefulness of the data were preserved.

ERRORS IN THE DATA


Since statistics in this data product are based on a sample, they
may differ somewhat from 100-percent figures that would have been
obtained if all housing units, persons within those housing units, and
persons living in group quarters had been enumerated using the same
questionnaires, instructions, enumerators, etc.  The sample estimate
also would differ from other samples of housing units, persons within
those housing units, and persons living in group quarters.  The
deviation of a sample estimate from the average of all possible samples
is called the sampling error.  The standard error of a sample estimate
is a measure of the variation among the estimates from all the possible
samples and thus is a measure of the precision with which an estimate
from a particular sample approximates the average result of all
possible samples.  The sample estimate and its estimated standard error
permit the construction of interval estimates with prescribed confidence
that the interval includes the average result of all possible samples.
Described below is the method of calculating standard errors and confi-
dence intervals for the data in this product.

  In addition to the variability which arises from the sampling pro-
cedures, both sample data and 100-percent data are subject to nonsampling
error.  Nonsampling error may be introduced during any of the various
complex operations used to collect and process census data.  For example,
operations such as editing, reviewing, or handling questionnaires may
introduce error into the data.  A detailed discussion of the sources of
nonsampling error is given in the section on "Control of Nonsampling
Error" in this appendix.

  Nonsampling error may affect the data in two ways.  Errors that are
introduced randomly will increase the variability of the data and
should therefore be reflected in the standard error.  Errors that tend
to be consistent in one direction will make both sample and 100-percent
data biased in that direction.  For example, if respondents consistently
tend to under-report their income, then the resulting counts of house-
holds or families by income category will tend to be understated for the
higher income categories and overstated for the lower income categories.
Such biases are not reflected in the standard error.

Calculation of Standard Errors

Totals and Percentages--Tables A through C in this appendix contain
the information necessary to calculate the standard errors of sample
estimates in this data product.  To calculate the standard error, it is
necessary to know the basic standard error for the characteristic (given
in table A or B) that would result under a simple random sample design
(of persons, households, or housing units) and estimation technique;
the design factor for the particular characteristic estimated (given in
table C); and the number of persons or housing units in the tabulation
area and the percent of these in the sample.  For machine-readable
products, the percent-in-sample is included in a data matrix on the file
for each tabulation area.  In printed reports, the percent-in-sample is
provided in data tables atthe end of the statistical tables that compose
the report.  The design factors reflect the effects of the actual sample
design and complex ratio estimation procedure used for the 1990 census.
Tape purchasers will receive table C, the table of design factors, as a
supplement to the technical documentation.  Table C is included in this
appendix for printed reports.

  The steps given below should be used to calculate the standard error
of an estimate of a total or a percentage contained in this product.  A
percentage is defined here as a ratio of a numerator to a denominator
where the numerator is a subset of the denominator.  For example, the
proportion of Black teachers is the ratio of Black teachers to all
teachers.

1.  Obtain the standard error from table A or B (or use the formula
    given below the table) for the estimated total or percentage,
    respectively.

2.  Find the geographic area to which the estimate applies in the appro-
    priate percent-in-sample table or appropriate matrix, and obtain the
    person or housing unit "percent-in-sample" figure for this area.  Use
    the person "percent-in-sample" figure for person and family charac-
    teristics.  Use the housing unit "percent-in-sample" figure for
    housing unit characteristics.

3.  Use table C to obtain the design factor for the characteristic (for
    example, employment status, school enrollment) and the range that
    contains the percent- in-sample with which you are working.  Multiply
    the basic standard error by this factor.

The unadjusted standard errors of zero estimates or of very small
estimated totals or percentages will approach zero.  This is also the
case for very large percentages or estimated totals that are close to
the size of the tabulation areas to which they correspond. Nevertheless,
these estimated totals and percentages still are subject to sampling and
nonsampling variability, and an estimated standard error of zero (or a
very small standard error) is not appropriate.  For estimated percentages
that are less than 2 or greater than 98, use the basic standard errors in
table B that appear in the "2 or 98"row.  For an estimated total that is
less than 50 or within 50 of the total size of the tabulation area, use a
basic standard error of 16.

  An illustration of the use of the tables is given in the section
entitled "Use of Tables to Compute Standard Errors."

Sums and Differences--The standard errors estimated from
these tables are not directly applicable to sums of and differences
between two sample estimates.  To estimate the standard error of a sum
or difference, the tables are to be used somewhat differently in the
following three situations:

   1.  For the sum of or difference between a sample estimate and a
       100-percent value, use the standard error of the sample estimate.
       The complete count value is not subject to sampling error.

   2.  For the sum of or difference between two sample estimates, the
       appropriate standard error is approximately the square root of the
       sum of the two individual standard errors squared; that is, for
       standard errors:

         (FORMULAS AVAILABLE IN PRINTED DOCUMENTATION ONLY)

         This method, however, will underestimate (overestimate) the
       standard error if the two items in a sum are highly positively
       (negatively) correlated or if the two items in a difference are
       highly negatively (positively) correlated.  This method may also
       be used for the difference between (or sum of) sample estimates
       from two censuses or from a census sample and another survey.  The
       standard error for estimates not based on the 1990 census sample
       must be obtained from an appropriate source outside of this
       appendix.

   3.  For the differences between two estimates, one of which is a
       subclass of the other, use the tables directly where the calcu-
       lated difference is the estimate of interest.  For example, to
       determine the estimate of non-Black teachers, one may subtract the
       estimate of Black teachers from the estimate of total teachers.
       To determine the standard error of the estimate of non-Black
       teachers apply the above formula directly.

Ratios--Frequently, the statistic of interest is the ratio of two
variables, where the numerator is not a subset of the denominator.  For
example, the ratio of teachers to students in public elementary schools.
The standard error of the ratio between two sample estimates is estimated
as follows:

   1.  If the ratio is a proportion, then follow the procedure outlined
       for "Totals and Percentages."

   2.  If the ratio is not a proportion, then approximate the standard
       error using the formula below.

        (FORMULAS AVAILABLE IN PRINTED DOCUMENTATION ONLY)

Medians--For the standard error of the median of a characteristic, it is
necessary to examine the distribution from which the median is derived,
as the size of the base and the distribution itself affect the standard
error.  An approximate method is given here.  As the first step, compute
one-half of the number on which the median is based (refer to this result
as N/2).  Treat N/2 as if it were an ordinary estimate and obtain its
standard error as instructed above.  Compute the desired confidence
interval about N/2.  Starting with the lowest value of the characteristic,
cumulate the frequencies in each category of the characteristic until the
sum equals or first exceeds the lower limit of the confidence interval
about N/2.  By linear interpolation, obtain a value of the characteristic
corresponding to this sum.  This is the lower limit of the confidence
interval of the median.  In a similar manner, continue cumulating
frequencies until the sum equals or exceeds the count in excess of the
upper limit of the interval about N/2.  Interpolate as before to obtain
the upper limit of the confidence interval for the estimated median.

  When interpolation is required in the upper open-ended interval of a
distribution to obtain a confidence bound, use 1.5 times the lower
limit of the open-ended confidence interval as the upper limit of the
open-ended interval.

Confidence Intervals

A sample estimate and its estimated standard error may be
used to construct confidence intervals about the estimate.  These
intervals are ranges that will contain the average value of the
estimated characteristic that results over all possible samples, with a
known probability.  For example, if all possible samples that could
result under the 1990 census sample design were independently selected
and surveyed under the same conditions, and if the estimate and its
estimated standard error were calculated for each of these samples,
then:

   1.  Approximately 68 percent of the intervals from one estimated
       standard error below the estimate to one estimated standard error
       above the estimate would contain the average result from all
       possible samples;

   2.  Approximately 90 percent of the intervals from 1.645 times the
       estimated standard error below the estimate to 1.645 times the
       estimated standard error above the estimate would contain the
       average result from all possible samples.

   3.  Approximately 95 percent of the intervals from two estimated
       standard errors below the estimate to two estimated standard
       errors above the estimate would contain the average result from
       all possible samples.

  The intervals are referred to as 68 percent, 90 percent, and 95 percent
confidence intervals, respectively.

  The average value of the estimated characteristic that could be derived
from all possible samples is or is not contained in any particular
computed interval.  Thus, we cannot make the statement that the average
value has a certain probability of falling between the limits of the
calculated confidence interval.  Rather, one can say with a specified
probability of confidence that the calculated confidence interval
includes the average estimate from all possible samples (approximately
the 100-percent value).

  Confidence intervals also may be constructed for the ratio, sum of, or
difference between two sample figures.  This is done by first computing
the ratio, sum, or difference, then obtaining the standard error of the
ratio, sum, or difference (using the formulas given earlier), and
finally forming a confidence interval for this estimated ratio, sum, or
difference as above.  One can then say with specified confidence that
this interval includes the ratio, sum, or difference that would have
been obtained by averaging the results from all possible samples.

  The estimated standard errors given in this appendix do not include all
portions of the variability due to nonsampling error that may be
present in the data.  The standard errors reflect the effect of simple
response variance, but not the effect of correlated errors introduced
by enumerators, coders, or other field or processing personnel.  Thus,
the standard errors calculated represent a lower bound of the total
error.  As a result, confidence intervals formed using these estimated
standard errors may not meet the stated levels of confidence (i.e., 68,
90, or 95 percent).  Thus, some care must be exercised in the interpreta-
tion of the data in this data product based on the estimated standard
errors.

  A standard sampling theory text should be helpful if the user needs
more information about confidence intervals and nonsampling errors.

Use of Tables to Compute Standard Errors

The following is a hypothetical example of how to compute a standard
error of a total and a percentage.  Suppose a particular data table
shows that for City A 9,948 persons out of all 15,888 persons age 16
years and over were in the civilian labor force.  The percent-in-sample
table lists City A with a percent-in-sample of 16.0 percent (Persons
column).  The column in table C which includes 16.0 percent-in-sample
shows the design factor to be 1.1 for "Employment status."

  The basic standard error for the estimated total 9,948 may be obtained
from table A or from the formula given below table A.  In order to avoid
interpolation, the use of the formula will be demonstrated here.
Suppose that the total population of City A was 21,220.  The formula for
the basic standard error, SE, is

         (FORMULAS AVAILABLE IN PRINTED DOCUMENTATION ONLY)

  The standard error of the estimated 9,948 persons 16 years and over
who were in the civilian labor force is found by multiplying the basic
standard error 163 by the design factor, 1.1 from table C.  This yields
an estimated standard error of 179 for the total number of persons 16
years and over in City A who were in the civilian labor force.

  The estimated percent of persons 16 years and over who were in the
civilian labor force in City A is 62.6.  From table B, the unadjusted
standard error is found to be approximately 0.85 percentage points.  The
standard error for the estimated 62.6 percent of persons 16 years and
over who were in the civilian labor force is 0.85 x 1.1 = 0.94
percentage points.

  A note of caution concerning numerical values is necessary.  Standard
errors of percentages derived in this manner are approximate.  Calcula-
tions can be expressed to several decimal places, but to do so would
indicate more precision in the data than is justifiable.  Final results
should contain no more than two decimal places when the estimated
standard error is one percentage point (i.e., 1.00) or more.

  In the previous example, the standard error of the 9,948 persons 16
years and over in City A who were in the civilian labor force was found
to be 179.  Thus, a 90 percent confidence interval for this estimated
total is found to be:

          (FORMULAS AVAILABLE IN PRINTED DOCUMENTATION ONLY)

  One can say, with about 90 percent confidence, that this interval
includes the value that would have been obtained by averaging the
results from all possible samples.

  The following is an illustration of the calculation of standard errors
and confidence intervals when a difference between two sample estimates
is obtained.  For example, suppose the number of persons in City B age
16 years and over who were in the civilian labor force was 9,314 and
the total number of persons 16 years and over was 16,666.  Further
suppose the population of City B was 25,225.  Thus, the estimated
percentage of persons 16 years and over who were in the civilian labor
force is 55.9 percent.  The unadjusted standard error determined using
the formula provided at the bottom of table B is 0.86 percentage
points.  We find that City B had a percent-in-sample of 15.7.  The range
which includes 15.7 percent-in-sample in table C shows the design factor to
be 1.1 for "Employment Status." Thus, the approximate standard error of the
percentage (55.9 percent) is 0.86 x 1.1 = 0.95 percentage points.

  Now suppose that one wished to obtain the standard error of the
difference between City A and City B of the percentages of persons who
were 16 years and over and who were in the civilian labor force.  The
difference in the percentages of interest for the two cities is:

          (FORMULAS AVAILABLE IN PRINTED DOCUMENTATION ONLY)

  The 90 percent confidence interval for the difference is formed
as before:

          (FORMULAS AVAILABLE IN PRINTED DOCUMENTATION ONLY)

  One can say with 90 percent confidence that the interval
includes the difference that would have been obtained by averaging the
results from all possible samples.

  For reasonably large samples, ratio estimates are normally distributed,
particularly for the census population.  Therefore, if we can calculate
the standard error of a ratio estimate then we can form a confidence
interval around the ratio.  Suppose that one wished to obtain the
standard error of the ratio of the estimate of persons who were 16
years and over and who were in the civilian labor force in City A to
the estimate of persons who were 16 years and over and who were in the
civilian labor force in City B.  The ratio of the two estimates of
interest is:

          (FORMULAS AVAILABLE IN PRINTED DOCUMENTATION ONLY)

  Using the results above, the 90 percent confidence interval for
this ratio would be:

          (FORMULAS AVAILABLE IN PRINTED DOCUMENTATION ONLY)