The proposed ASD diagnostic tool contains 3 major steps: the VBM, tBCGA, and automatic
autism diagnostic tool (see Fig. 1). In the first step, all available ABIDE samples are processed by VBM, and a set
of 989 VBM features for each sample is extracted. In the second step, the tBCGA selects
an optimum subset of significant VBM features for ASD. In the third step, the optimum
VBM features selected by the tBCGA are used to train an ELM model for ASD-patient
and healthy-person classification.
4.1 Tailored Binary Coded Genetic Algorithm
The presented tBCGA is a modification of a famous genetic algorithm specifically designed
for efficient search of the optimum subset of VBM features in the proposed automatic
autism diagnostic tool. The genetic algorithm is widely used for solving extremely
complex optimization problems in industrial and scientific applications. In general,
the genetic algorithm is a heuristic optimization tool that utilizes powerful gene
adaptation mechanisms from nature. In the first and most challenging step, the optimization
problem is translated into a set of meaningful chromosomes. This combination of chromosomes
forms a solution of the genetic algorithm. Each solution may or may not describe the
given optimization problem’s characteristics well. Each solution is numerically evaluated
by using a fitness function. It iteratively updates chromosomes, forms solutions,
and calculates fitness. In this way, it may find a solution with high fitness, which
can be a pseudo-optimum solution for the examined optimization problem.
The performance can be affected by many factors. The examined problem must be converted
into a set of the chromosomes correctly. Incorrect transformation may cause significant
performance loss. A basic and systematic way of updating chromosomes given by genetic
algorithm operators must be carefully chosen based on the characteristics of the problem,
and the parameters must be properly adjusted. Incorrect choice of operators leads
to poor convergence of the optimization process.
In the proposed tBCGA, each solution is a vector of 989 binary coefficients, where
each coefficient defines the status of a corresponding VBM feature. A binary coefficient
of 1 implies that the corresponding VBM feature is selected, and a binary coefficient
of 0 implies that the corresponding feature is skipped. Thus, the proposed solution
forms a set of features for further analysis.
Fig. 2. Framework of the proposed Tbcga.
The framework of the proposed tBCGA is presented in Fig. 2. It starts from creating an initial population of 100 binary vectors $\mathrm{F}^{0}$,
which are generated randomly by placing 50-200 binary 1 values in random locations.
Each binary vector $\mathrm{F}_{\mathrm{i}}^{0}$ forms a set of VBM features, which
is used to train an ELM classifier. Each binary vector $\mathrm{F}_{\mathrm{i}}^{0}$
is evaluated by fitness value $\mathrm{f}_{\mathrm{i}}^{0}$, which is the overall
training accuracy of the ELM classifier. Finally, the union of 100 binary vectors
$\mathrm{F}_{\mathrm{i}}^{0}$ and corresponding fitness values $\mathrm{f}_{\mathrm{i}}^{0}$
forms the initial population $(\mathrm{F}_{\mathrm{i}}^{0};\mathrm{f}_{\mathrm{i}}^{0}),\,\,i=1,\ldots
,100$. The proposed method iteratively updates populations until a termination criteria
is triggered. Each iteration (see Fig. 2) consists of 3 main operations:
· Selection
· Statistical selective crossover and mutation
· Fitness
The proposed tBCGA settings are a population size n of 100, crossover/mutation rate
of 0.8, and a termination criterion that stops the process if there is no fitness
improvement for the last 10 populations. After extensive experiments, we found that
it stops in around 50 populations. The final ASD diagnostic tool was created from
the generated ELM classifier with the best fitness value (or overall training accuracy).
Selection
In the genetic algorithm framework, the selection procedure determines the chance
for each solution from population n-1 to contribute to the next population n. The
selection procedure computes the probabilities for each solution in the population
based on corresponding fitness values such that solutions with a lower fitness value
have lower probability to be selected, and vice versa. Solutions with a low fitness
value have a small but non-zero chance to contribute to the next generation, whereas
solutions with high fitness have a significant chance to contribute to the next generation.
The selection procedure is one of the basic evolutionary processes in nature, where
individuals with better survival skills have a higher chance to reproduce.
The geometric ranking method [18] was used as a selection procedure in this paper. The geometric ranking method sorts
solutions in descending order based on fitness values and computes selection probability
as follows:
where
$\mathrm{q}'$ is a normalization factor, $\mathrm{r}_{\mathrm{j}}$ is the rank of
the j-th solution in the sorted set, n is the population size, and q is $10^{-3}$.
Proposed statistical selective crossover and mutation
Crossover and mutation play a vital role in the genetic algorithm. Crossover is a
basic gene exchange procedure in nature that increases the diversity of individuals.
Crossover randomly mixes genes of two individuals to produce a child. If this gene
mixture produces individuals that have higher chances to survive and reproduce, then
it can increase the average resistance against harsh environments found in nature.
The genetic algorithm’s crossover process mimics that of nature. A crossover picks
genes from two parents (or two solutions from the population) and creates one child
solution, which may have better characteristics. The design of crossover must maximize
the ability of gene exchange to achieve higher fitness. Each new solution generated
by crossover should maximize the effect of positive characteristics and neglect negative
characteristics of parents, to increase fitness value. The design of crossover defines
the efficiency. In this paper, a statistical selective crossover (see Fig. 3) was designed specifically for the proposed tBCGA.
Fig. 3. Proposed statistical selective crossover.
The proposed statistical selective crossover uses two solutions selected by the selection
procedure and uses binary vectors, $F_{1}$ and $\mathrm{F}_{2}$, to build a new binary
vector $F_{\mathrm{new}}$. The statistical selective crossover is designed to balance
the number of binary 1 values, which are the selected VBM features for further analysis,
between populations. In the first step, the statistical selective crossover defines
all coefficients as 1 in $F_{1}$ and$F_{2}$. In Fig. 3, $~ I_{1}$ = (1,2,4,5,7,8,11) and $I_{2}$ = (2,4,5,6,9,10). In the second step, all
positions of 1 are unified together as $I=~ I_{1}\cup I_{2}$= (1,2,2,4,4,5,5,6,7,8,9,10,11).
In the third step, a random binary vector $R=\left(0,1,0,1,1,0,1,0,1,1,0,0,1\right)$
is generated, with size equal to the length of the unified set$I$. A subset of I is
created by selecting only indices with a corresponding binary 1 in R, explicitly $I_{new}^{*}$
= (2,4,4,6,7,8,11) in the example above. A new set $I_{\mathrm{new}}$ = (2,4,6,7,8,11)
is created by removing duplicates from $I_{new}^{*}$. A new binary vector $F_{\mathrm{new}}$
is created by placing binary 1 into the positions listed in $I_{\mathrm{new}}$.
Mutation is another basic gene alteration procedure in nature and is as important
as crossover. Mutation randomly modifies genes, which may lead to positive or negative
change in characteristics. A significant portion of mutations causes minimal positive
or negative change. With small probability, mutations may cause a massive and deadly
change in the genome. In the opposite scenario, mutations may significantly improve
the genome, giving individuals new characteristics, which could not be produced via
crossover. Crossover only mixes genes available in past populations, so it is limited
in how it can modify current characteristics and cannot create completely novel genes
not present in older populations. The proposed mutation operator creates a new binary
vector F, where 50-200 coefficients of 1 are randomly allocated. A set of n binary
vectors F generated by mutation is the initial population (see Fig. 2).
4.3 Extreme Learning Machine
The proposed ASD diagnostic tool based on the tailored genetic binary coded genetic
algorithm solves the problem of binary classification of ASD subjects versus healthy
controls. The proposed tBCGA trains around 100,000 classifiers, which is 100 populations
* 50 solutions per population * 2 choices by crossover * 10 attempts. The most popular
machine learning tools such as convolutional neural networks (CNNs), decision trees,
and others are unable to train the necessary number of classifiers in a reasonable
time. Hence, a special machine learning tool with relatively high accuracy and fast
training phase must be used instead. We chose the ELM, which is extremely fast and
has acceptable overall accuracy.
The design of ELM is very simple. ELM has only one hidden layer and an extremely fast
training phase. Most of the machine learning tools based on neural networks must iteratively
update neuron weights for training. ELM computes optimized neuron weights analytically
without expensive iterative training. In a one-hidden-layer ELM neural network, the
input weights and bias are assigned randomly, and output weights are computed analytically
[16]. Due to advanced training, the ELM may use any type of neuron activation function
fitted to the current data, including non-linear ones, which is not possible for most
of the neural networks with iterative weight-update processes. We have tried various
activation functions and picked the Gaussian activation function as a best choice.
For more details, refer to another study [16].
The performance of ELM training depends on randomly parameters assigned. Thus, for
each set of selected VBM features in the fitness calculation procedure, an ELM classifier
is trained 10 times with 10 different sets of random input weights and biases. The
ELM classifier with best training accuracy is then selected. The suggested 10-fold
validation method avoids weak solutions and efficiently balances the performance of
the ELM.