Justin Smith | publications

Journal Papers

How Developers Diagnose Potential Security Vulnerabilities with a Static Analysis Tool Smith, Justin, Johnson, Brittany, Murphy-Hill, Emerson, Chu, Bei-Tseng, and Richter, Heather IEEE Transactions on Software Engineering 2018 [Abstract] [PDF]
While using security tools to resolve security defects, software developers must apply considerable effort. Success depends on a developer’s ability to interact with tools, ask the right questions, and make strategic decisions. To build better security tools and subsequently help developers resolve defects more accurately and efficiently, we studied the defect resolution process — from the questions developers ask to their strategies for answering them. In this paper, we report on an exploratory study with novice and experienced software developers. We equipped them with Find Security Bugs, a security-oriented static analysis tool, and observed their interactions with security vulnerabilities in an open-source system that they had previously contributed to. We found that they asked questions not only about security vulnerabilities, associated attacks, and fixes, but also questions about the software itself, the social ecosystem that built the software, and related resources and tools. We describe the strategic successes and failures we observed and how future tools can leverage our findings to encourage better strategies.

Conference Papers

Does ACM’s Code of Ethics Change Ethical Decision Making in Software Development? McNamara, Andrew, Smith, Justin, and Murphy-Hill, Emerson In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering 2018 [Abstract] [PDF]
Ethical decisions in software development can substantially impact end-users, organizations, and our environment, as is evidenced by recent ethics scandals in the news. Organizations, like the ACM, publish codes of ethics to guide software-related ethical decisions. In fact, the ACM has recently demonstrated renewed interest in its code of ethics and made updates for the first time since 1992. To better understand how the ACM code of ethics changes software-related decisions, we replicated a prior behavioral ethics study with 63 software engineering students and 105 professional software developers, measuring their responses to 11 ethical vignettes. We found that explicitly instructing participants to consider the ACM code of ethics in their decision making had no observed effect when compared with a control group. Our findings suggest a challenge to the research community: if not a code of ethics, what techniques can improve ethical decision making in software engineering?
Spreadsheet Practices and Challenges in a Large Multinational Conglomerate Smith, Justin, Middleton, Justin A., and Kraft, Nicholas A. In Visual Languages and Human-Centric Computing (VL/HCC) 2017 [Abstract] [PDF]
Spreadsheets are ubiquitous. Thus, it is important to understand the challenges faced by spreadsheet users in practice. To better understand these challenges, we surveyed ABB employees and then interviewed a cross-section of survey respondents. We used a two-phase coding process to classify the challenges they described. Our survey findings demonstrate that practices in our single-company setting are consistent with practices in broader settings. Our interviews revealed both individual and organizational challenges. For instance, individual participants described data pipeline challenges related to importing data from external sources or storing and archiving spreadsheet data. Further, participants’ collective responses revealed challenges pertaining to knowledge distribution within the organization. We outline possible interventions to address these challenges. Our results will help guide researchers and tool designers in addressing the practical challenges facing spreadsheet users.
Flower: Navigating Program Flow in the IDE Smith, Justin, Brown, Chris, and Murphy-Hill, Emerson In Visual Languages and Human-Centric Computing (VL/HCC) 2017 [Abstract] [PDF]
Program navigation is a critical task for software developers. State-of-the-art tools have been shown to support effective program navigation strategies, and do so by adding widgets, secondary views, and visualizations to the screen. In this work, we build on prior work by exploring what types of navigation can be supported with relatively few interface elements. To that end, we designed and implemented a prototype tool, named Flower, that supports structural program navigation while maintaining a minimalistic interface. Flower enables developers to simultaneously navigate control flow and data flow within the Eclipse Integrated Development Environment. Based on a preliminary evaluation with eight programmers, Flower succeeds when call graphs contained relatively few branches, but was strained by complex program structures.
Do Developers Read Compiler Error Messages? Barik, Titus, Smith, Justin, Lubick, Kevin, Holmes, Elisabeth, Feng, Jing, Murphy-Hill, Emerson, and Parnin, Chris In Proceedings of the 39th International Conference on Software Engineering 2017 [Abstract] [PDF]
In integrated development environments, developers receive compiler error messages through a variety of textual and visual mechanisms, such as popups and wavy red underlines. Although error messages are the primary means of communicating defects to developers, researchers have a limited understanding on how developers actually use these messages to resolve defects. To understand how developers use error messages, we conducted an eye tracking study with 56 participants from undergraduate and graduate software engineering courses at our university. The participants attempted to resolve common, yet problematic defects in a Java code base within the Eclipse development environment. We found that: 1) participants read error messages and the difficulty of reading these messages is comparable to the difficulty of reading source code, 2) difficulty reading error messages significantly predicts participants’ task performance, and 3) participants allocate a substantial portion of their total task to reading error messages (13%–25%). The results of our study offer empirical justification for the need to improve compiler error messages for developers.
Just-in-Time Static Analysis Do, Lisa Nguyen Quang, Ali, Karim, Livshits, Benjamin, Bodden, Eric, Smith, Justin, and Murphy-Hill, Emerson In Proceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis 2017 [Abstract] [PDF]
We present the concept of Just-In-Time (JIT) static analysis that interleaves code development and bug fixing in an integrated development environment. Unlike traditional batch-style analysis tools, a JIT analysis tool presents warnings to code developers over time, providing the most relevant results quickly, and computing less relevant results incrementally later. In this paper, we describe general guidelines for designing JIT analyses. We also present a general recipe for transforming static data-flow analyses to JIT analyses through a concept of layered analysis execution. We illustrate this transformation through CHEETAH, a JIT taint analysis for Android applications. Our empirical evaluation of CHEETAH on real-world applications shows that our approach returns warnings quickly enough to avoid disrupting the normal workflow of developers. This result is confirmed by our user study, in which developers fixed data leaks twice as fast when using CHEETAH compared to an equivalent batch-style analysis.
A Cross-Tool Communication Study on Program Analysis Tool Notifications Johnson, Brittany, Pandita, Rahul, Smith, Justin, Ford, Denae, Elder, Sarah, Murphy-Hill, Emerson, Heckman, Sarah, and Sadowski, Caitlin In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering 2016 [Abstract] [PDF]
Program analysis tools use notifications to communicate with developers, but previous research suggests that developers encounter challenges that impede this communication. This paper describes a qualitative study that identifies 10 kinds of challenges that cause notifications to miscommunicate with developers. Our resulting notification communication theory reveals that many challenges span multiple tools and multiple levels of developer experience. Our results suggest that, for example, future tools that model developer experience could improve communication and help developers build more accurate mental models.
Paradise Unplugged: Identifying Barriers for Female Participation on Stack Overflow Ford, Denae, Smith, Justin, Guo, Philip J, and Parnin, Chris In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering 2016 [Abstract] [PDF]
It is no secret that females engage less in programming fields than males. However, in online communities, such as Stack Overflow, this gender gap is even more extreme: only 5.8% of contributors are female. In this paper, we use a mixed-methods approach to identify contribution barriers females face in online communities. Through 22 semi-structured interviews with a spectrum of female users ranging from non-contributors to a top 100 ranked user of all time, we identified 14 barriers preventing them from contributing to Stack Overflow. We then conducted a survey with 1470 female and male developers to confirm which barriers are gender related or general problems for everyone. Females ranked five barriers significantly higher than males. A few of these include doubts in the level of expertise needed to contribute, feeling overwhelmed when competing with a large number of users, and limited awareness of site features. Still, there were other barriers that equally impacted all Stack Overflow users or affected particular groups, such as industry programmers. Finally, we describe several implications that may encourage increased participation in the Stack Overflow community across genders and other demographics.
FUSE: A Reproducible, Extendable, Internet-Scale Corpus of Spreadsheets Barik, Titus, Lubick, Kevin, Smith, Justin, Slankas, John, and Murphy-Hill, Emerson In Proceedings of the 12th Working Conference on Mining Software Repositories 2015 [Abstract] [PDF]
Spreadsheets are perhaps the most ubiquitous form of end-user programming software. This paper describes a corpus, called Fuse, containing 2,127,284 URLs that return spreadsheets (and their HTTP server responses), and 249,376 unique spreadsheets, contained within a public web archive of over 26.83 billion pages. Obtained using nearly 60,000 hours of computation, the resulting corpus exhibits several useful properties over prior spreadsheet corpora, including reproducibility and extendability. Our corpus is unencumbered by any license agreements, available to all, and intended for wide usage by end-user software engineering researchers. In this paper, we detail the data and the spreadsheet extraction process, describe the data schema, and discuss the trade-offs of Fuse with other corpora.
Questions Developers Ask While Diagnosing Potential Security Vulnerabilities with Static Analysis Smith, Justin, Johnson, Brittany, Murphy-Hill, Emerson, Chu, Bill, and Lipford, Heather Richter In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering 2015 [Abstract] [PDF]
Security tools can help developers answer questions about potential vulnerabilities in their code. A better understanding of the types of questions asked by developers may help toolsmiths design more effective tools. In this paper, we describe how we collected and categorized these questions by conducting an exploratory study with novice and experienced software developers. We equipped them with Find Security Bugs, a security-oriented static analysis tool, and observed their interactions with security vulnerabilities in an open-source system that they had previously contributed to. We found that they asked questions not only about security vulnerabilities, associated attacks, and fixes, but also questions about the software itself, the social ecosystem that built the software, and related resources and tools. For example, when participants asked questions about the source of tainted data, their tools forced them to make imperfect tradeoffs between systematic and ad hoc program navigation strategies.
A Study of Interactive Code Annotation for Access Control Vulnerabilities Thomas, Tyler, Chu, Bill, Lipford, Heather, Smith, Justin, and Murphy-Hill, Emerson In Visual Languages and Human-Centric Computing (VL/HCC), 2015 IEEE Symposium on 2015 [Abstract] [PDF]
While there are a variety of existing tools to help detect security vulnerabilities in code, they are seldom used by developers due to the time or security expertise required. We are investigating techniques integrated within the IDE to help developers detect and mitigate security vulnerabilities. In this paper, we examine using interactive annotation for access control vulnerabilities. We evaluated whether developers could indicate access control logic using interactive annotation and understand the vulnerabilities reported as a result. Our study indicates that developers can easily find and annotate access control logic but can struggle to use our tool to trace the cause of the vulnerability. Our results provide design guidance for improving the interaction and communication of such security tools with developers.

Workshop Papers

What Questions Remain? An Examination of How Developers Understand an Interactive Static Analysis Tool. Thomas, Tyler, Lipford, Heather, Chu, Bill, Smith, Justin, and Murphy-Hill, Emerson R In WSIW at SOUPS 2016 [Abstract] [PDF]
Security vulnerabilities are often accidentally introduced as developers implement code. While there are a variety of existing tools to help detect security vulnerabilities, they are seldom used by developers due to the time or security expertise required. We are investigating techniques integrated within the IDE to help developers detect and mitigate security vulnerabilities. In previous work, we examined the questions developers ask when investigating security vulnerabilities with static analysis tools. With those questions as a lens, we now investigate our proposed approach of interactive static analysis. We evaluated the interactions and perceptions of professional developers as they interacted with warnings produced by our tool. Our results provide evidence that our approach effectively communicates security vulnerability information to software developers and provides design guidance for such tools.

Other Peer-Reviewed Papers (GC & Tools)

Supporting Effective Strategies for Resolving Vulnerabilities Reported by Static Analysis Tools Smith, J. In 2018 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC) 2018 [Abstract] [PDF]
Static analysis tools detect potentially costly security defects early in the software development process. However, these defects can be difficult for developers to accurately and efficiently resolve. The goal of this work is to understand the vulnerability resolution process so that we can build tools that support more effective strategies for resolving vulnerabilities. In this work, I study developers as they resolve security vulnerabilities to identify their information needs and current strategies. Next, I study existing tools to understand how they support developers’ strategies. Finally, I plan to demonstrate how strategy-aware tools can help developers resolve security vulnerabilities more accurately and efficiently.
Cheetah: Just-in-Time Taint Analysis for Android Apps Do, Lisa Nguyen Quang, Ali, Karim, Livshits, Benjamin, Bodden, Eric, Smith, Justin, and Murphy-Hill, Emerson In Software Engineering Companion (ICSE-C), 2017 IEEE/ACM 39th International Conference on 2017 [Abstract] [PDF]
Current static-analysis tools are often long-running, which causes them to be sidelined into nightly build checks. As a result, developers rarely use such tools to detect bugs when writing code, because they disrupt their workflow. In this paper, we present Cheetah, a static taint analysis tool for Android apps that interleaves bug fixing and code development in the Eclipse integrated development environment. Cheetah is based on the novel concept of Just-in-Time static analysis that discovers and reports the most relevant results to the developer fast, and computes the more complex results incrementally later. Unlike traditional batch-style static-analysis tools, Cheetah causes minimal disruption to the developer’s workflow. This video demo showcases the main features of Cheetah: https://www.youtube.com/watch?v=i_KQD-GTBdA.
Identifying Successful Strategies for Resolving Static Analysis Notifications Smith, Justin In Proceedings of the 38th International Conference on Software Engineering Companion 2016 [Abstract] [PDF]
Although static analysis tools detect potential code defects early in the development process, they do not fully support developers in resolving those defects. To accurately and efficiently resolve defects, developers must orchestrate several complex tasks, such as determining whether the defect is a false positive and updating the source code without introducing new defects. Without good defect resolution strategies developers may resolve defects erroneously or inefficiently. In this work, I perform a preliminary analysis of the successful and unsuccessful strategies developers use to resolve defects. Based on the successful strategies identified, I then outline a tool to support developers throughout the defect resolution process.
Resolving Input Validation Vulnerabilities by Retracing Taint Flow Through Source Code Smith, Justin In Visual Languages and Human-Centric Computing (VL/HCC), 2016 IEEE Symposium on 2016 [Abstract] [PDF]
Various security-oriented static analysis tools are designed to detect potential input validation vulnerabilities early in the development process. To verify and resolve these vulnerabilities, developers must retrace problematic data flows through the source code. My thesis proposes that existing tools do not adequately support the navigation of these traces. In this work I will explore the strategies developers use to navigate tainted data flow in source code and work toward solutions that support successful strategies.