Skip to content

francisco-alonso/rnn-cat-notcat

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Logistic Regression with a Neural Network

Build the general architecture of a learning algorithm, including:

  • Initializing parameters
  • Calculating the cost function and its gradient
  • Using an optimization algorithm (gradient descent)
  • Gather all three functions above into a main model function, in the right order.

Recomended setup and packages

  • Use a virtual environment in order to avoid installing packages directly in your host machine
  • The following packages will be used: numpy, h5py, matplotlib and PIL/scipy to test the model

Problem statement

When loading the data, we have: 1. A training set with the format (209, 64, 64, 3) with 209 images labels as cat (y=1) or non-cat (y=0) 2. A test set with the format (50, 64, 64, 3) with 50 images labeled as cat or no-cat 3. Each image is of shape (num_px, num_px, 3) where in this case num_px is 64 and the number of channels is 3

General Architecture

The goal is to build a logistic regression using a neural network mindset. In order to do so, we have to consider the following mathematical expressions for an example $x^{(i)}$

$$z^{(i)} = w^T x^{(i)} + b \tag{1}$$ $$\hat{y}^{(i)} = a^{(i)} = sigmoid(z^{(i)})\tag{2}$$ $$ \mathcal{L}(a^{(i)}, y^{(i)}) = - y^{(i)} \log(a^{(i)}) - (1-y^{(i)} ) \log(1-a^{(i)})\tag{3}$$

The cost is then computed by summing over all training examples: $$ J = \frac{1}{m} \sum_{i=1}^m \mathcal{L}(a^{(i)}, y^{(i)})\tag{6}$$

The steps to follow from here are the following:

  • Initialize the parameters of the model w and b
  • Learning the parameters of the model by minimizing the cost
  • Use the learned parameters to make predicitions
  • Analyze the results and conclude

Builiding the parts of our algoritm

  1. Model structure definition
  2. Initialize model's parameters
  3. Loop where:
    • current loss is calculated (forward propagation)
    • current gradient is calculated (backward propagation)
    • update parameters (gradient descent)

Learning rate

  • Different learning rates give different costs and thus different predictions results.
  • If the learning rate is too large (0.01), the cost may oscillate up and down. It may even diverge (though in this example, using 0.01 still eventually ends up at a good value for the cost).
  • A lower cost doesn't mean a better model. You have to check if there is possibly overfitting. It happens when the training accuracy is a lot higher than the test accuracy.
  • In deep learning, we usually recommend that you:
    • Choose the learning rate that better minimizes the cost function.
    • If your model overfits, use other techniques to reduce overfitting.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages