InproceedingsStatistical Type Inference for Incomplete Programs [peng-StatisticalTypeInferenceIncompletePrograms-2023]

Abstract: We propose a novel two-stage approach, Stir, for inferring types in incomplete programs that may be ill-formed, where whole-program syntactic analysis often fails. In the first stage, Stir predicts a type tag for each token by using neural networks, and consequently, infers all the simple types in the program. In the second stage, Stir refines the complex types for the tokens with predicted complex type tags. Unlike existing machine-learning-based approaches, which solve type inference as a classification problem, Stir reduces it to a sequence-to-graph parsing problem. According to our experimental results, Stir achieves an accuracy of 97.37% for simple types. By representing complex types as directed graphs (type graphs), Stir achieves a type similarity score of 77.36% and 59.61% for complex types and zero-shot complex types, respectively.