Given a training data
where
We will use log-likelihood as loss (error) function.
Here, if
$$ \mathcal{J}(w,b) = - \frac{1}{m} \sum_{i=1}^{m} \left[ y_{i} log \hat{y}{i} + (1 - y{i}) log (1 - \hat{y}_{i}) \right] $$
We want to find
Up to this we can see this as a logistic regression problem. But a neural network can have a lot of hidden layers which have these logistic regression like architecture per hidden layer.
Suppose we have
where