인공지능 3. 단층 신경망

서론

단층신경망은 인공지능의 기본이 되는 모델이다. 단층신경망을 통해 기계학습의 기초를 탄탄히 다져보자.

퍼셉트론과 단층신경망

퍼셉트론은 입력층과 출력층, 두 부분으로 구성되어 있으며, 여러 입력 값에 각각의 가중치를 곱한 뒤, 그 합을 활성화 함수에 넣어 출력을 결정한다. 퍼셉트론은 기본적으로 단층 신경망의 형태를 취하고 있지만, 주로 선형 이진 분류 문제를 해결하는 데 사용되며, 하나의 출력 뉴런만을 가진다.

단층 신경망은 입력층과 출력층, 단 두 층으로만 이루어진 가장 간단한 형태의 인공 신경망이다. 입력층은 외부 세계로부터 데이터를 받아들이는 역할을 하고, 출력층은 입력 데이터를 바탕으로 최종적인 결정이나 예측을 내놓는다. 이때, 단층 신경망은 입력층과 출력층 사이에 가중치를 적용하고, 활성화 함수를 거쳐 결과를 도출한다.
즉, 퍼셉트론은 가장 간단한 형태의 인공신경망으로, 입력과 가중치를 곱한 후 활성화 함수를 거쳐 출력을 결정한다. 단층신경망은 이러한 퍼셉트론을 활용한 기본적인 신경망 구조로, 입력층과 출력층만을 가지고 있다.

적용 범위: 퍼셉트론은 주로 간단한 이진 분류 문제에 적용되며, 단층 신경망은 퍼셉트론을 확장하여 다양한 형태의 문제를 해결할 수 있도록 설계될 수 있다. 즉, 퍼셉트론은 단층 신경망의 한 형태로 볼 수는 있으나, 모든 단층 신경망이 퍼셉트론은 아니다.
출력 뉴런의 수: 퍼셉트론은 기본적으로 하나의 출력 뉴런만을 가지고 있어 단일 결정만을 내린다. 반면, 단층 신경망은 여러 출력 뉴런을 포함할 수 있어 다중 클래스 분류 등 보다 복잡한 문제 해결이 가능하다.
활성화 함수: 초기의 퍼셉트론 모델은 단순히 입력의 가중합을 계산하여 그 값이 특정 임계값을 넘으면 활성화되는 방식(계단 함수 사용)으로 동작했다. 하지만 단층 신경망에서는 다양한 활성화 함수(Sigmoid, ReLU 등)를 적용하여 더 복잡한 비선형 문제를 해결할 수 있다.

단층신경망 구현

단층 신경망을 구현하는 알고리즘은 다음과 같다.

초기화: 가중치와 편향 값을 초기 설정합니다. 가중치는 입력 신호가 결과에 미치는 영향력을 조절하고, 편향은 출력 신호가 활성화되는 정도를 조정합니다.
입력 처리: 신경망에 입력 데이터를 제공합니다. 이 예제에서는 AND 연산을 구현하므로, 입력 데이터는 두 개의 이진 값(0 또는 1)입니다.
가중합 계산: 입력 데이터에 대해 각각의 가중치를 곱한 후, 모든 곱셈 결과를 더합니다. 이 결과에 편향 값을 더합니다. 이렇게 해서 얻은 값은 신경망의 가중합입니다.
활성화 함수 적용: 가중합에 활성화 함수를 적용합니다. 단층 신경망에서는 주로 계단 함수를 사용하는데, 이 함수는 특정 임계값을 기준으로 출력을 결정합니다. 예를 들어, 계단 함수에서 임계값이 0이라면 가중합이 0보다 크면 1을 출력하고, 그렇지 않으면 0을 출력합니다.
출력: 활성화 함수를 거친 결과가 신경망의 최종 출력이 됩니다. AND 연산의 경우, 두 입력 모두 1일 때만 최종 출력이 1이 됩니다.

이 과정을 통해 단층 신경망은 간단한 논리 연산을 수행할 수 있습니다. 신경망의 핵심은 입력 데이터와 가중치의 조합을 통해 원하는 출력을 얻어내는 것이며, 이를 통해 더 복잡한 패턴 인식이나 데이터 분류 등 다양한 문제를 해결할 수 있습니다. 하지만 단층 신경망은 구조가 단순하여 복잡한 문제를 해결하는 데는 한계가 있다.

합성 함수의 미분을 활용 (연쇄법칙 Chain Rule)

단층신경망의 학습에 있어서 중요한 개념은 합성 함수의 미분과 연쇄법칙이다. 합성 함수의 미분을 통해 각 레이어를 거치며 변화하는 오차의 양을 계산할 수 있고, 연쇄법칙을 통해 이 오차를 역으로 전파시켜 가중치를 최적화할 수 있다.

데이터가 2개가 있을 때: 각 데이터 당 손실함수의 총 합인 비용함수

데이터가 2개 있을 때, 각 데이터에 대한 손실 함수의 총 합을 계산하여 비용 함수(Cost Function)를 구하는 것은 머신러닝 모델을 학습시키는 과정에서 중요하다. 비용 함수는 모델의 예측이 실제 값과 얼마나 잘 일치하는지를 측정하는 지표로, 모델의 성능을 평가하고 최적화하는 데 사용된다.
비용 함수의 주요 목표는 모델의 예측이 실제 값에 얼마나 가까운지를 측정하는 것이다. 학습 과정에서 모델의 파라미터를 조정하여 비용 함수의 값을 최소화하게 함으로써, 모델의 예측 정확도를 높이는 것이 목표이다. 비용 함수의 값이 작을수록 모델의 성능이 좋다고 평가할 수 있다.

참고

import numpy as np

# 데이터 읽기
output=np.load('perceptron_data.npz')
output.files

train_set_X=output['train_set_X']
train_set_Y=output['train_set_Y']

test_set_X=output['test_set_X']
test_set_Y=output['test_set_Y']
print(train_set_X.shape)
print(test_set_X.shape)

train_set_Y

def sigmoid(z):
    return 1/(1+np.exp(-z))

# Perceptron 의 sign 함수로 구별되었던 y 값을 0과 1로 바꿈
train_set_Y=  np.heaviside(train_set_Y, 0)
test_set_Y=  np.heaviside(test_set_Y, 0)

def initialize_with_zeros(dim):
    """
    가중치 w 의 차원이 (dim, 1), 즉 "dim = 특성의 갯수" 을 맞춤. 
    
    Returns:
    w -- initialized vector of shape (dim, 1)
    b -- initialized scalar (corresponds to the bias)
    """
    
    w = np.zeros((dim,1))
    b = 0

    assert(w.shape == (dim, 1))
    assert(isinstance(b, float) or isinstance(b, int))
    
    return w, b

train_set_Y

def propagate(w, b, X, t):
    """
    데이터와 레이블을 넣었을 때, 
    
    1. 예측값 y 를 계산하고, 
    2. 레이블과 예측값으로부터 손실함수를 계산
    
    이를 통해 가중치와 편향을 업데이트 함. 
    

    Arguments:
    w -- 가중치, a numpy array of size (number of features, 1)  = (2, 1)
    b -- 편향, a scalar
    X -- 데이터의 크기는 (특성 갯수 , 데이터의 갯수) = (2, example 갯수 )
    t -- 레이블 (containing 0 과 1로 구성)  = (1, example 갯수 )

    Return:
    cost -- negative log-likelihood cost for logistic regression
    dw -- gradient of the loss with respect to w, thus same shape as w
    db -- gradient of the loss with respect to b, thus same shape as b
    
    """
    
    ndata = X.shape[1]  # example의 갯수를 계산 
    
    # FORWARD PROPAGATION (손실함수 계산)
    Z = np.dot(w.T,X)+b # 활성 함수 계산:  Vectorize와 브로드캐스팅 활용
    Y = sigmoid(Z)   # 활성 함수 계산:  Vectorize와 브로드캐스팅 활용 
    # cost = -np.sum(np.multiply(t,np.log(Y))+np.multiply((1-t),np.log(1-Y )))/ndata  
    delta = 0.0001 
    cost = -np.sum(np.multiply(t,np.log(Y + (Y ==0)* delta))+np.multiply((1-t),np.log(1-Y+ (Y==1)* delta )))/ndata # compute cost
    
    
    # BACKWARD PROPAGATION (TO FIND GRAD)
    
    dw = np.dot(X,(Y-t).T)/ndata
    db = np.sum(Y-t)/ndata

    assert(dw.shape == w.shape)
    assert(db.dtype == float)
    
    grads = {"dw": dw,
             "db": db}
    
    return grads, cost

def optimize(w, b, X, t, num_iterations, learning_rate, print_cost = False):
    """
    경사하강법을 사용하여, 비용함수를 최소화 
    
    Arguments:
    w -- 가중치, a numpy array of size (number of features, 1)  = (2, 1)
    b -- 편향, a scalar
    X -- 데이터의 크기는 (특성 갯수 , 데이터의 갯수) = (2, example 갯수 )
    t -- 레이블 (containing 0 과 1로 구성)  = (1, example 갯수 )
    
    num_iterations -- number of iterations of the optimization loop
    learning_rate -- learning rate of the gradient descent update rule
    print_cost -- True to print the loss every 100 steps
    
    Returns:
    params -- 가중치와 편향이 있는 딕셔너리 
    grads -- 비용함수를 미분한 dw, db 가 있는 딕셔너리
    costs -- 학습 커브를 그리기 위해, 각 단계별 비용을 저장
    
    """
    
    costs = []
    
    for i in range(num_iterations):
        
        grads, cost = propagate(w, b, X, t)
        dw = grads["dw"]
        db = grads["db"]
        
        # 가중치와 편향을 경사하강법을 활용하여 업데이트 
        w += -learning_rate*dw # w = w + (-learning_rate* dw)
        b += -learning_rate*db # b + = : b = b + 
        
        # 비용을 반복 100번당 저장 
        if i % 100 == 0:
            costs.append(cost)
        
        # 비용을 반복 100번당 프린트 할 것인가. 
        if print_cost and i % 100 == 0:
            print ("Cost after iteration %i: %f" %(i, cost))
    
    params = {"w": w,
              "b": b}
    
    grads = {"dw": dw,
             "db": db}
    
    return params, grads, costs

# GRADED FUNCTION: predict

def predict(w, b, X):
    '''
    Predict whether the label is 0 or 1 using learned logistic regression parameters (w, b)
    
    Arguments:
    w -- 가중치, a numpy array of size (number of features, 1)  = (2, 1)
    b -- 편향, a scalar
    X -- 데이터의 크기는 (특성 갯수 , 데이터의 갯수) = (2, example 갯수 )
    
    Returns:
    Y_prediction -- a numpy array (vector) containing all predictions (0/1) for the examples in X
    '''
    
    ndata = X.shape[1]
    Y_prediction = np.zeros((1,ndata))
    
    Z = np.dot(w.T,X)+b 
    Y = sigmoid(Z)
    
    for i in range(Y.shape[1]):
        
        # Convert probabilities Y[0,i] to actual predictions p[0,i]
        if Y[0,i] > 0.5 :
            Y_prediction[0,i]= 1
        else :
            Y_prediction[0,i]= 0
    
    assert(Y_prediction.shape == (1, ndata))
    
    return Y_prediction

# GRADED FUNCTION: model

def model(X_train, Y_train, X_test, Y_test, num_iterations = 2000, learning_rate = 0.5, print_cost = False):
    """
    Builds the logistic regression model by calling the function you've implemented previously
    
    Arguments:
    X_train -- 학습용 인풋 데이터 (2, m_train)
    Y_train -- 학습용 레이블 (1, m_train)
    X_test -- 테스트용 데이터 (2, m_test)
    Y_test -- 테스트용 레이블 (1, m_test)
    num_iterations -- 최적화 반복 수에 관련된 하이퍼파라메터
    learning_rate -- 학습률의 하이퍼파라메터
    print_cost -- Set to true to print the cost every 100 iterations
    
    Returns:
    d -- dictionary containing information about the model.
    """
    
    
    # initialize parameters with zeros 
    w, b = initialize_with_zeros(X_train.shape[0])

    # Gradient descent
    parameters, grads, costs = optimize(w, b, X_train, Y_train, num_iterations, learning_rate, print_cost)
    
    # Retrieve parameters w and b from dictionary "parameters"
    w = parameters["w"]
    b = parameters["b"]
    dw = grads["dw"]
    db = grads["db"]
    
    # Predict test/train set examples 
    Y_prediction_test = predict(w, b, X_test)
    Y_prediction_train = predict(w, b, X_train)

    ### END CODE HERE ###

    # Print train/test Errors
    print("train accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_train - Y_train)) * 100))
    print("test accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_test - Y_test)) * 100))

    
    d = {"costs": costs,
         "Y_prediction_test": Y_prediction_test, 
         "Y_prediction_train" : Y_prediction_train, 
         "w" : w, 
         "b" : b,
         "dw" : dw,
         "db" : db,
         "learning_rate" : learning_rate,
         "num_iterations": num_iterations}
    
    return d

d = model(train_set_X, train_set_Y, test_set_X, test_set_Y, num_iterations = 10000, learning_rate = 0.1, print_cost = True)

# Plot learning curve (with costs)
import matplotlib.pyplot as plt

costs = np.squeeze(d['costs'])
plt.plot(costs)
plt.ylabel('cost')
plt.xlabel('iterations (per hundreds)')
plt.title("Learning rate =" + str(d["learning_rate"]))
plt.show()

no_points= [] # 리스트 초기화
yes_points=[] 

for i in range(train_set_X.shape[1]):
    if train_set_Y[0,i] < 1 :
        no_points = [list(train_set_X[ : ,i])]+ no_points
    else :
        yes_points= [list(train_set_X[ : ,i])]+ yes_points

w= d['w']
b= d['b']

plt.scatter(np.array(no_points)[:,0],np.array(no_points)[:,1], c='red', marker='x')
plt.scatter(np.array(yes_points)[:,0],np.array(yes_points)[:,1], c='blue', marker='o')
x=np.linspace(-20,20,20)

def f(x):
    return -x/2+5

def dec_test(x):
    return -(w[0][0]*x+ b)/ w[1][0] # w1 x1 +w2 x2 + b = 0 -> x2 = -(w1 x1 +b) / w2

plt.plot(x,f(x),color='black',label='$original$')
plt.plot(x,dec_test(x),color='green',linestyle = '--', label='Single layer ')
plt.grid()
plt.legend(loc=(1.04,0))
plt.show()

learning_rates = [2,0.1,0.01, 0.001, 0.0001]
models = {}
for i in learning_rates:
    print ("learning rate is: " + str(i))
    models[str(i)] = model(train_set_X, train_set_Y, test_set_X, test_set_Y, num_iterations = 10000, learning_rate = i, print_cost = False)
    print ('\n' + "-------------------------------------------------------" + '\n')

for i in learning_rates:
    plt.plot(np.squeeze(models[str(i)]["costs"]), label= str(models[str(i)]["learning_rate"]))

plt.ylabel('cost')
plt.xlabel('iterations (hundreds)')

legend = plt.legend(loc='upper center', shadow=True)
frame = legend.get_frame()
frame.set_facecolor('0.90')
plt.show()

# log(0) 문제 해결
np.log(0)

import numpy as np 
test=np.array([1,2,3,0])
np.log(test)

test == 0

delta = 0.0001
(test == 0 )*delta

np.log(test + (test == 0 )*delta)

# Cross entropy 함수의 특성
def cross_entropy(t, y):
    return -(t*np.log(y)+(1-t)*np.log(1-y))

x=np.linspace(0.001,0.999,100)
plt.plot(x,cross_entropy(1,x),color='Red',label='when t=1,  Log(y)')
plt.plot(x,cross_entropy(0,x),color='Blue',linestyle = '--', label='when t=0, Log(1-Y)')
plt.grid()
plt.legend(loc=(1.04,0))
plt.ylabel('cross entropy')
plt.xlabel('Y')
plt.show()

# Plot learning curve (with costs)
import matplotlib.pyplot as plt

costs = np.squeeze(d['costs'])
plt.plot(costs)
plt.ylabel('cost')
plt.xlabel('iterations (per hundreds)')
plt.title("Learning rate =" + str(d["learning_rate"]))
plt.show()

no_points= [] # 리스트 초기화
yes_points=[] 

for i in range(train_set_X.shape[1]):
    if train_set_Y[0,i] < 1 :
        no_points = [list(train_set_X[ : ,i])]+ no_points
    else :
        yes_points= [list(train_set_X[ : ,i])]+ yes_points

w= d['w']
b= d['b']

plt.scatter(np.array(no_points)[:,0],np.array(no_points)[:,1], c='red', marker='x')
plt.scatter(np.array(yes_points)[:,0],np.array(yes_points)[:,1], c='blue', marker='o')
x=np.linspace(-20,20,20)

def f(x):
    return -x/2+5

def dec_test(x):
    return -(w[0][0]*x+ b)/ w[1][0] # w1 x1 +w2 x2 + b = 0 -> x2 = -(w1 x1 +b) / w2

plt.plot(x,f(x),color='black',label='$original$')
plt.plot(x,dec_test(x),color='green',linestyle = '--', label='Single layer ')
plt.grid()
plt.legend(loc=(1.04,0))
plt.show()

learning_rates = [2,0.1,0.01, 0.001, 0.0001]
models = {}
for i in learning_rates:
    print ("learning rate is: " + str(i))
    models[str(i)] = model(train_set_X, train_set_Y, test_set_X, test_set_Y, num_iterations = 10000, learning_rate = i, print_cost = False)
    print ('\n' + "-------------------------------------------------------" + '\n')

for i in learning_rates:
    plt.plot(np.squeeze(models[str(i)]["costs"]), label= str(models[str(i)]["learning_rate"]))

plt.ylabel('cost')
plt.xlabel('iterations (hundreds)')

legend = plt.legend(loc='upper center', shadow=True)
frame = legend.get_frame()
frame.set_facecolor('0.90')
plt.show()

저작자표시 (새창열림)

'컴퓨터공학 > AI' 카테고리의 다른 글

인공지능 5. DNN (1)	2024.04.19
인공지능 4. 다층 신경망 (0)	2024.04.11
인공지능 2. 기초 최적화 이론 (0)	2024.04.05
인공지능 1. 퍼셉트론 (0)	2024.04.01
AI를 배우기 전 행렬 이론 기초 (2)	2024.03.14

정리병 걸린 Jinger

인공지능 3. 단층 신경망

서론

퍼셉트론과 단층신경망

단층신경망 구현

합성 함수의 미분을 활용 (연쇄법칙 Chain Rule)

데이터가 2개가 있을 때: 각 데이터 당 손실함수의 총 합인 비용함수

참고

'컴퓨터공학 > AI' 카테고리의 다른 글

댓글

티스토리툴바

인공지능 3. 단층 신경망

서론

퍼셉트론과 단층신경망

단층신경망 구현

합성 함수의 미분을 활용 (연쇄법칙 Chain Rule)

데이터가 2개가 있을 때: 각 데이터 당 손실함수의 총 합인 비용함수

참고

'컴퓨터공학 > AI' 카테고리의 다른 글

관련글

댓글

티스토리툴바