SW 연구실: 캐글 음성인식 대회 5위 솔루션(작성중)

캐글에서 구글이 주최하고 음성인식 대회가 있었습니다.
https://www.kaggle.com/c/tensorflow-speech-recognition-challenge
우리나라에서는 텐서플로우 코리아에서 추가 10팀 선발해서 별도 상금도 주는 (1위 2천5백만,10위안에 들면 1천만) 대회 였습니다. (저는 선발 못되었구요 ㅜ)

(시간이 안나서 쓰지를 못하고 있는데 조금씩 여유 될때 쓰는 중)

Input Data Pipeline

Normalize
Augmentation
1D Input
2D Input
1D + 2D Combination Input

model

1D CNN Simple
1D CNN Residual
2D CNN InceptionResnetV2

ensemble

Probability Average
Corr. Weight Average

1-1 Normalize
def normalize_wav(wav):

    wav_mean = np.mean(wav)
    wav = wav - wav_mean
    wav_max = max(abs(wav))
    if wav_max == 0 : # zero divide error
        wav_max = 0.01
    wav = wav.astype(np.float32)/wav_max
    return wav

1-2 Augmenation

def white(N, state=None):
    """
    White noise.
    
    :param N: Amount of samples.
    :param state: State of PRNG.
    :type state: :class:`np.random.RandomState`
    
    White noise has a constant power density. It's narrowband spectrum is therefore flat.
    The power in white noise will increase by a factor of two for each octave band, 
    and therefore increases with 3 dB per octave.
    """
    state = np.random.RandomState() if state is None else state
    wn = state.randn(N)
    wn = normalize_wav(wn)
    return wn


def pink(N, state=None):
    """
    Pink noise. 
    
    :param N: Amount of samples.
    :param state: State of PRNG.
    :type state: :class:`np.random.RandomState`
    
    Pink noise has equal power in bands that are proportionally wide.
    Power density decreases with 3 dB per octave.
    
    """
    state = np.random.RandomState() if state is None else state
    uneven = N%2
    X = state.randn(N//2+1+uneven) + 1j * state.randn(N//2+1+uneven)
    S = np.sqrt(np.arange(len(X))+1.) # +1 to avoid divide by zero
    y = (irfft(X/S)).real
    if uneven:
        y = y[:-1]
    return normalize_wav(y)


def blue(N, state=None):
    """
    Blue noise. 
    
    :param N: Amount of samples.
    :param state: State of PRNG.
    :type state: :class:`np.random.RandomState`
    
    Power increases with 6 dB per octave.
    Power density increases with 3 dB per octave. 
    
    """
    state = np.random.RandomState() if state is None else state
    uneven = N%2
    X = state.randn(N//2+1+uneven) + 1j * state.randn(N//2+1+uneven)
    S = np.sqrt(np.arange(len(X)))# Filter
    y = (irfft(X*S)).real
    if uneven:
        y = y[:-1]
    return normalize_wav(y)


def brown(N, state=None):
    """
    Violet noise.
    
    :param N: Amount of samples.
    :param state: State of PRNG.
    :type state: :class:`np.random.RandomState`
    
    Power decreases with -3 dB per octave.
    Power density decreases with 6 dB per octave. 
    """
    state = np.random.RandomState() if state is None else state
    uneven = N%2
    X = state.randn(N//2+1+uneven) + 1j * state.randn(N//2+1+uneven)
    S = (np.arange(len(X))+1)# Filter
    y = (irfft(X/S)).real
    if uneven:
        y = y[:-1]
    return normalize_wav(y)


def violet(N, state=None):
    """
    Violet noise. Power increases with 6 dB per octave. 
    
    :param N: Amount of samples.
    :param state: State of PRNG.
    :type state: :class:`np.random.RandomState`
    
    Power increases with +9 dB per octave.
    Power density increases with +6 dB per octave. 
    
    """
    state = np.random.RandomState() if state is None else state
    uneven = N%2
    X = state.randn(N//2+1+uneven) + 1j * state.randn(N//2+1+uneven)
    S = (np.arange(len(X)))# Filter
    y = (irfft(X*S)).real
    if uneven:
        y = y[:-1]
    return normalize_wav(y)

full code는 https://github.com/ttagu99/speech_recognition

SW 연구실

2018년 3월 1일 목요일

캐글 음성인식 대회 5위 솔루션(작성중)

댓글 없음:

댓글 쓰기