Speaker: Prof. Sunita Sarawagi (Professor, Computer Science and Engineering, IIT Bombay)
Large language models shared by several users are gaining popularity as the method of choice for various NLP tasks. Our focus is the Text-to-SQL semantic parsing task. When presented with a private relational database, for which a limited amount of supervision is available, it is unclear how to adapt the shared model to generate SQLs targeted for the private DB. A standard approach would be to include relevant labeled examples as prompts for each test example. In this talk we argue for an alternative architecture where the labeled examples are used to modify the decoding algorithm as lookup cases. In this context, we present a convergent algorithm for conditionally matching relational algebra trees by modifying the existing Sinkhorn algorithm. Even when paired labeled examples are not available, we show how given only an SQL workload from a private relational database, we can generate a diverse labeled dataset for adapting large models for natural language queries on unseen databases. We conclude with a discussion of open challenges in this area.
Slides : TBA
Meeting Recording:
Dr. Sunita Sarawagi is the Institute Chair Professor at IIT Bombay in Computer Science and Engineering. She is a woman AI researcher who has several research papers and patents registered on her name. She was a visiting AI scientist at Google Research Mount View, Mountain View, California from July 2014 to June 2016. From August 1996 to February 1999, she was the research staff member in the QUEST database mining group at the IBM Almaden Research Centre. She received the best paper award at the ACM-SIGMOD International Conference on Management of Data in 1998.