What is randomization in statistics



Chance may be blind, the random principle has a method

The randomization | JavaScript to randomize the test subjects


Random Procedures in the Social Sciences

    Chance is used more frequently in the social sciences. For example, a representative random sample is drawn from the population in order to estimate certain parameters in the population (such as the mean value) with the help of statistics. This is a random drawing of people from a population. On the other hand, test subjects are randomly assigned to certain conditions (randomization). Finally, it is occasionally determined at random which study group receives which experimental condition. From a number of possible stimuli, some are chosen at random to present to the subjects. In a teaching goal-oriented test, a random decision is made as to which tasks from the basic set of all possible tasks are to be included in the test. In MC tasks, the position of the correct alternative is determined at random. These random processes have different purposes and one should be aware of the differences.
What is randomized in randomization?
    Figure 1 is intended to clarify and differentiate between various random processes in the course of an investigation:

    Illustration 1: various random procedures in the context of an investigation


Drawing a random sample
    At the beginning of an experiment it seems advisable to think about which population you want to generalize the results to later. If possible, you recruit your subjects by first drawing a random sample from the defined population and considering this sample as your study sample (= those subjects who are available for the study). The sole purpose of this random selection is to obtain representative results about the population that will later enable a better generalization of the findings on the population.
    Such representativeness is not required for internal validity.
    The random drawing of a sample from the population has nothing to do with randomization. Most of the time, in research practice, one cannot select subjects from a defined population, but is happy if one receives subjects at all (casual sample, ad hoc sample, convenience sampling). Whatever the study sample is based on, the subjects in this study sample can be randomized.
Randomization
    The crucial randomization is that Randomization of subjects to the groups involved in the experiment. The test subjects of the entire sample are randomly assigned to the individual experimental study groups or directly to the individual experimental conditions.

    However, it is advisable, if possible, in addition to randomizing the subjects to the study groups, to determine by chance which study group should receive which experimental condition. The randomization of the experimental conditions for the study groups is not absolutely necessary and has nothing to do with the randomization of the subjects, but prevents a subjective influence by the researcher, who may favor a certain condition with very specific subjects and thus perhaps unconsciously an investigator effect can produce.

    Special cases:
    Random assignment of subjects to experimental groups is the same as random assignment of experimental conditions to test subjects. In a WWW experiment, not all subjects are first collected and then randomized, but the subject who starts the experiment is randomly assigned an experimental condition.

    One could also assign the test subjects to the experimental conditions directly from the population by drawing as many random samples as there are conditions, but the method presented here appears more transparent and efficient.

Purpose and efficiency of randomization
    Purpose of randomization is to enable the comparability of the people in the experimental groups with regard to all conceivable personal variables and influences irrelevant to the study. Randomization is the only method that can control both known and unknown confounding factors. This also means that the groups come from comparable environments and, for example, are exposed to other comparable external influences over a longer period of study.
    Because it is only about the comparability of the groups and not about the representativeness of the groups for any populations, the randomization can be carried out with any test samples (also with extreme groups, e.g. highly neurotic, feeble-minded, professors). It is then decisive that there are, for example, comparable morons in the experimental groups, not that these morons are representative of the Federal Republic of Germany.

    Since every randomization is afflicted with random errors, the randomization does not guarantee comparability, but it does guarantee comparability within certain statistical error limits. In other words: If the subjects are randomly divided into 2 groups, then possible differences between the groups with regard to all possible variables should only be due to random sampling errors, i.e. they should not differ significantly apart from the alpha error.

    The efficiency of randomization depends on

    • the homogeneity / heterogeneity of the subjects including their environments and
    • the group size (n per test group)

    •  
    The more homogeneous the subjects, the more efficient the randomization. However, the homogeneity of the subjects is difficult to assess with regard to all possible variables, which means that the group size is more important as a directly influenceable variable.
    The efficiency of the random allocation is rather low with a small group size (n <= 10). (See unnecessary excursus: Simulation: Randomization on EG and KG). The comparability of the personal mean values ​​(for each personal variable) in the individual groups increases with the size of the experimental groups, according to the statistical principles. The statistically expected difference between the mean values ​​in both conditions can be estimated under certain conditions with the aid of the standard error of the mean value differences of EG and KG. (see: unnecessary excursus: "How much do the experimental groups differ immediately after randomization?"). Since it is difficult to quantify a binding minimum group size, one should use as many test subjects as possible for safety reasons [recommendation: at least 30 per group). It is advisable to check the efficiency of the randomization actually carried out on several variables. Information on people such as age, gender and school education is relatively easily available.
Randomization process: how does randomization work?
    Random assignment of subjects to the experimental groups means that each subject has the same chance of being assigned to one of the conditions. One can imagine it something like this: If N subjects are to be assigned to 2 conditions, there are x possibilities to assign the subjects to two conditions. Chance now decides which of these x possibilities is selected. (see the unnecessary: ​​excursus: "How many possibilities are there to assign N test subjects to k groups?)"
    The assignment of a test subject must in no way be made dependent on any non-random conditions. For example, it must not happen that test subjects who are a little late can no longer be assigned to the first condition because this condition has already been implemented. Because then the test subjects who came too late and not chance would have determined that they did not come into the first condition. Random allocation is a methodological principle that must be strictly adhered to. Much of what is colloquially referred to as random is not strictly random, it just appears that way.

    With a total of 3 conditions, guess which condition each subject receives. In the long run one cannot make significantly more than 33.3% correct predictions. If you still reliably make significantly more correct forecasts, then there are only 2 options:
    1. The principle of chance was applied incorrectly.
    2. One has the skills of a clairvoyant.
    You can check if you are a clairvoyant.

    Special case: First parallelize and then randomize
    This, too, occasionally limited randomization The aforementioned procedure is ultimately comparable to randomization, because each subject has the same chance of being assigned to one of the conditions.

Illustrative example of correct randomization
    60 subjects should be randomly assigned to 3 conditions (A, B, C):

    version 1
    There is a numbered list with the 60 names of the subjects
    You take 60 index cards and write the letters A, B or C on each of 20 cards.
    Then you shuffle the cards quite well.

    Now assign the condition on the first card to the first subject in the list, the condition on the second card to the second subject in the list, and so on.
    Instead of index cards, you can of course use notes (but the same notes), shake them around in an urn and then reach the order by blindly pulling out the notes.

    Variant 2
    I take 60 index cards and write the names of the subjects on each card. Then I shuffle the cards. Then I decided.
    Card 1 to A
    Map 2 to B
    Card 3 to C
    Card 4 for A
    Card 5 for B
    ..............

    Task:

    12 participants should be randomly allocated to the condition EG and KG.
    Calculate the mean value of intelligence and the Abitur grade for EG and KG
    The procedure can be repeated several times. If the task was worked on in groups, the individual results can then be displayed together:
    Different passes produce slightly different results.
    Do you know a variable from statistics that allows you to assess the differences in the results?

       
    You can print out an assignment sheet for the assignment
    You can use the data from Excel
How do I systematically create chance?
    The above examples should illustrate the principle, but are not recommended in every case, as complete mixing may not be guaranteed as easily (see e.g. the specially constructed and well-monitored machine for determining the lottery numbers). For this reason there are random tables or programs that generate random numbers.

    With the help of random programs you can save yourself mixing and, for example, have a random sequence created directly. For the above task, in which 12 participants should be divided into 2 groups, a program has suggested the following order, for example:
    1, 2, 12, 3, 9, 7, 6, 4, 5, 10, 11, 8,
    This could now be used in different ways to allocate the subjects, e.g .:

    • the first 6 in EG, the second 6 in KG or
    • the odd sequence numbers in EG, the even sequence numbers in KG. Then the following subject numbers would fall into the EC

    • 1,12,9,6,5,11
    With the help of a randomizer you can request any number of orders: (here another 5 arrangements)
      8, 10, 2, 7, 6, 9, 5, 3, 11, 4, 1, 12,
      4, 9, 5, 2, 3, 8, 6, 1, 12, 7, 11, 10,
      4, 12, 10, 9, 3, 2, 1, 5, 11, 7, 8, 6,
      8, 4, 2, 9, 12, 7, 5, 1, 3, 10, 11, 6,
      10, 6, 1, 8, 7, 11, 3, 5, 4, 9, 2, 12,
    But you never get the idea to request random sequences until you have found one that you like. If you already draw several, then chance should again decide which random sequence should be used

    The JavaScript on the page random assignment of test subjects (subjects) to the experimental groups enables direct randomization of all subjects to the experimental groups. There you enter the number of available subjects (i.e. N of the study sample) and the number of experimental groups (conditions). The JavaScript then randomly divides the subjects into the possible groups.

What is no randomization?
    You randomly choose 2 school classes from the population of all school classes and consider these two classes as your study sample. Then you determine by tossing a coin which class EG and which class make up the KG.
    Here, not test subjects, but classes are drawn at random, and the conditions are randomly assigned to the classes, not to the students. In the randomly drawn classes there can be very different students (e.g. class A attends an 'elite school favored by the upper class', class B comes from 'a residual school near a socially problematic area')

    2 pages are randomly drawn from a telephone book. The people on the one hand belong to the EG, the other to the KG.
    Here, too, only 2 groups are drawn at random and not the units that should actually be drawn at random. Group A could come from a different place than group B. Even if both groups came from the same place, it would also be conceivable that the alphabetical order of the names in the phone book does not list the people randomly with regard to all possible characteristics, but rather people whose Names starting with A are different from people whose names start with O.

    You determine that the subjects who happen to be the first to try out in the KG, the rest of them come to the EC.
    In this case, the first subjects could by no means have appeared first by chance and differ from the last subjects, e.g. with regard to their punctuality.

    Even if one does not have the slightest suspicion that the groups are somehow different, the strictly random procedure is always preferable to a "randomly appearing allocation procedure" because it systematically creates comparability.

Checking the success of a randomization
Since randomization cannot guarantee the comparability of the groups, but only makes it probable, it is advisable to check the comparability of the groups on the available data. Proven confounding variables are primarily suitable as relevant control variables, i.e. those variables which have a strong influence on the dependent variable and which, as a rule, have a very high correlation with it. If the experimental groups do not differ significantly from one another with regard to these control variables, certain indications for the success of the randomization have been provided.

example
UV = motivation method (A, B) to increase concentration in dictation.
AV = error in the dictation
potential confounding variables = previous performance in German, spelling skills, age

The test subjects are randomly divided into two groups. and then checked whether the two groups differ significantly in terms of German grades, spelling test and age. It is crucial that the variables used for control were measured before the experimental manipulation, or that they cannot be influenced by the UV. But then it also seems worth considering to guarantee comparability with regard to important confounding variables and, moreover, to randomize it. (see: Recommended excursus: "Combine randomization with other control methods!)



Summary:
  • Randomization (as a control technique) means the random allocation of test subjects to the experimental groups. The representativeness of the sample is irrelevant. Any (even extreme) sample can be randomized.
  • The assignment of subjects to the experimental groups must be strictly random, which must be ensured by a systematic random assignment procedure before the start of the experiment.
  • The purpose of the random allocation is to make the experimental groups comparable with regard to all personal differences as well as all test-irrelevant environmental influences to which they are exposed.
  • The randomization does not guarantee absolute comparability and is not sufficient, especially for small group sizes. The efficiency of the randomization increases with the number of subjects per group. If possible, the success of the randomization should be checked using available control variables.
  • The process of "first parallelize, then randomize" is also a form of randomization.
  • Randomization is the most important control principle in the design of experiments. It is the only method that controls all known and unknown disruptive factors.

Work stimulation:Practical randomization attempt in the seminar
created 4.12. 1997; last update 5/27/2004; Bernhard Jacobs, [email protected]