The Wild West of NLP Modeling, Evaluation, and Documentation

Event Start
Event End
Building 2, Level 5, Room 5220


Language models trained using transformers dominate the NLP model landscape, making Hugging Face (HF) the de facto hub for sharing, benchmarking, and evaluating NLP models.  The HF hub provides a rich resource for understanding how language models evolved, opening up research questions such as ‘Is there a correlation between model documentation and its usage?’, ’How have these models evolved?’, ‘What do users document about their models?’. In the first part of my talk, I’ll give a macro-level view of how the NLP model landscape has evolved based on our systematic study of 75K HF models. In the second part, I’ll discuss advances, challenges, and opportunities in evaluating and documenting NLP models developed in an industry setting. Are commercial APIs from Microsoft, Google, and Amazon for NLP tasks any better than rule-based systems? Is documentation for LLMs accessible to non-expert users in the industry? Our work on creating a unified toolkit for evaluation (robustness gym) and reporting (interactive model cards) attempts to address these questions.

Brief Biography

Nazneen is a Research Lead at HuggingFace, a startup with a mission to democratize ML, leading the robust ML research direction. Before HF, she worked at Salesforce Research with Richard Socher and led a team of researchers focused on building robust natural language generation systems based on LLMs. She completed her Ph.D. in CS at UT-Austin with Prof. Ray Mooney.

Nazneen has over 35 papers published at ACL, EMNLP, NAACL, NeurIPs, and ICLR and has her research covered by Quanta magazine, VentureBeat, SiliconAngle, ZDNet, and Datanami. She is also teaching a course on interpreting ML models with Corise -- More details about her work can be found here

Contact Person