# Definitions

Hidden node
a node that is neither an input nor an output node.

Bias unit
An input that is always set to 1. Can connect to any node other than an input.

Composed of:

1. Selection
2. Crossover
3. Mutation

# Augmenting

We start with the smallest possible set of nodes containing inputs and outputs. This is called minimal initialization, and is a distinct advantage of NEAT.

The exact structure of your initial population is not set in stone, For simple networks, this could mean having a connection between each input node and output node.

The bias can be thought of as a y-intercept to our network: it allows us to shift our output up or down.

# Speciation

The compatibility distance σ between two genomes is given by:

where: N is the number of genes in the larger genome. E is the number of excess genes. D is the number of disjoint genes. is the average difference in weight between all matching connections, including disabled ones. c1, c2, c3 are hard-coded importance coefficients.

TODO It is unclear what “Number of genes” means, so I am doing number of nodes + number of connections.

The paper defines the adjusted fitness of a particular genome as:

Where s**h evaluates to 1 if the two genomes are similar enough to be classified as the same specifies. Thus, this equation simplifies to:

where N is the number of genomes in the species.

# Breeding

For each species in every generation, only the single highest-performing genome survives. The rest of the new population is randomly bred from the other highest performing genomes, but almost all of those parents still die.

The number of offspring allocated to each species k is proportional to its relative performance in the generation.:

where: k is the average adjusted fitness of species k. tot is the sum of the average adjusted fitnesses for each species. |P| is the constant population size.

# Activation

The output of any given node is given by:

Where w i**j is the connection weight from node i to node j, x i is the output of node i, and σ is the activation function, which is typically the sigmoid function $\frac{1}{1+e^{-x}}$.

However, https://neat-python.readthedocs.io/en/latest/glossary.html and other implementations use a different formula, given by:

output = activation(bias+(response∗aggregation(inputs)))

where response is a constant that may be subject to evolution.

# Normalizing Inputs

If the input variables are combined linearly, as in an MLP [multilayer perceptron], then it is rarely strictly necessary to standardize the inputs, at least in theory. The reason is that any rescaling of an input vector can be effectively undone by changing the corresponding weights and biases, leaving you with the exact same outputs as you had before. However, there are a variety of practical reasons why standardizing the inputs can make training faster and reduce the chances of getting stuck in local optima. Also, weight decay and Bayesian estimation can be done more conveniently with standardized inputs.