What Next? : Leveraging Surprise in a Recurrent Neural Network to (de)Construct Morphological Complexity in Japanese
The question of how we as cognitive agents and biological creatures acting within a world describe, understand, and communicate about this shared world and our motivations within it has long existed at the center of cognitive science. The ability to utilize language has even been hailed as a hallmark of what it is to have a mind. Language is an action that involves transforming complex, non-linear information about the world and ourselves into linear expressions and communicating them in real time, and yet it seems every language accomplishes this in its own way. Formal language modeling exhibits a bias towards the linguistic features observed in English and related languages, namely the ability to describe language in terms of word-based units. This bias has caused language models to be ill-equipped for success in languages exhibiting a high degree of morphological complexity, such as the agglutinative morphology found in Japanese, and indicates a shortcoming in our understanding of the underlying cognitive processes that allow us to be thinkers, speakers, and listeners. This study embarks on an effort to challenge the traditional scales at which we conceptualize information processing in language. I employ a recurrent neural network (RNN) to explore character-by-character predictability in samples of contemporary Japanese text. I address what properties of agglutinative morphology are salient to neural networks and offer a comparison between meaning construction in Japanese and English. This study further investigates the unique morpho-syntactic role played by orthography in Japanese. I find success in extracting certain key features of the structure of Japanese sentences using an RNN and offer a path for deepening our understanding of the differences in information encoding and processing that we observe across languages and how these differences arise.