Rotary Position Embedding (RoPE)

a novel method to effectively leverage the positional information.

- 각 token x에 d-dimensional vector p를 더하는 형태

- trainable vector를 쓰는 경우도 있고 (BERT), Sinusoid function을 사용하는 경우도 있고(Transformer)

* sinusoid function?

정현파 함수

일정한 주기를 가지고 반복되는 주기 함수

cosine, sine 함수를 sinusoid 함수라 함

- position m과 n 사이의 relative distance 정의

- clipping: 특정 거리 넘어가는 정보는 clip하여 값은 값으로 두겠다.

- T5는 clip하지 않고, 최대 offset까지 log-scale로 값 증가시킴.

- 위 RPE 기반의 방식. but, additive form이 아닌 multiplicative + sinusoid

- additive 방식에 비해 interpretable

1) encodes the absolute position with a rotation matrix

2) meanwhile incorporates the explicit relative position dependency in the self-attention formation.

* rotation matrix?

1) flexibility of sequence length

2) decaying inter-token dependency with increasing relative distance

3) capability of equipping linear self-attention with relative position encoding

* linear self-attention?

self-attention의 출력이 linear transformation에 의해 다른 표현 공간으로 변환됨. -> 모델의 complexity 증가

출처

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

티스토리툴바