Tracy Bannon discusses the implementation of Generative AI in software engineering with the goal of enhancing the speed and quality to meet the demands of end-users.
Tracy Bannon, a passionate Software Architect and Change Agent, engages extensively in writing, speaking, teaching, and practicing software architecture. With experience spanning commercial and government sectors, her expertise lies in integrating advanced software practices such as AI/ML and Generative AI into the software development lifecycle.
Software plays a pivotal role in shaping our world. QCon London aims to advance software development by disseminating knowledge and fostering innovation within the developer community. As a conference crafted for practitioner-driven exchanges, QCon targets technical team leads, architects, engineering directors, and project managers who drive innovation in their teams.
Bannon shared insights from her urban exploration, highlighting how reliant we’ve become on navigational technologies. From smartphones to embedded systems in vehicles and wearable tech, navigation is seamlessly integrated into our daily lives, though it wasn’t always this prevalent or accessible.
When I learned to drive, it was with the use of a physical map. In fact, my ability to fold the map back perfectly was even evaluated. However, such skills became redundant with the advent of digital technology. During this shift, the wealth of cartographic data was transformed into digital formats. Innovators proposed creating an interface where users could input their start and finish points and receive sequentially mapped directions, although initially, these directions still required printing. Occupants in the passenger’s seat would play the role of navigators, announcing, “In 100 meters, take a left, the ramp onto the M4.”
It wasn’t long before devices like Garmin or TomTom entered the scene, combining detailed maps, vocal guidance, and dedicated hardware into a seamless navigational assistant. My children began driving with a TomTom, but I ensured they learned map reading too, valuable for those times when “the signal was lost.” Navigation technology eventually became widespread. In 2008, the launch of iPhone 3G with in-built GPS changed everything, empowering us to locate ourselves in real-time, track orders, and monitor transport arrivals. This progression to an expectation of immediacy and real-time information parallels developments in AI and software engineering that we are witnessing today, particularly in generative AI and its applications in software development.
I am Tracy Bannon, a lover of word clouds, a software architect, and a recent researcher at the MITRE Corporation—a federally funded research entity. Here, we prioritize delivering honest technological insights without a sales agenda. Let’s take a moment to reflect on 2023—remember when ChatGPT reached a user base of 100 million? Suddenly, AI-themed content seemed to permeate every digital conversation, a phenomenon I call Chronic FOMO. AI appeared so ubiquitous that it seemed plausible for grocery items to bear AI labels. Yet, amidst this hype, it’s important to stay grounded. At conferences like QCon and on platforms like InfoQ, we discuss the real trajectory of AI adoption beyond just the excitement, sometimes referencing models like the Gartner Hype Cycle to visualize this progression.
The beauty of words often exacts reflection. Where do we stand with AI in software engineering? Are we navigating through Gartner’s Hype Cycle stages – like the peak of inflated expectations or the slope of enlightenment? Or have we reached the plateau of productivity? In agreement with Gartner, it seems we’re at the peak of inflated expectations, even though Gartner may not always be timely with their observations, as it often feels we are further along by the time they report it. Intriguingly, they believe we will reach the plateau of productivity in 2 to 5 years.
Many might wonder if this timeline holds up to scrutiny. From my perspective, governed by both experience and research, I find it aligns with reality. The work of software architects and engineers is intricately complex, requiring navigational decisions that are not straightforward. Grady Booch encapsulates this well, stating that the pursuit of software engineering is perpetually towards higher abstraction levels. The necessity for orchestration platforms and diverse libraries to scaffold and enhance AI, and specifically generative AI, is well-discussed within the community.
Privileged to collaborate with around 200 top-tier data scientists and engineers, I’ve explored the vast landscape of AI beyond generative AI. Preparing for a presentation at QCon, intense discussions and numerous slides led to a simpler analogy with Legos spilt across a table symbolizing the expansive and interlocked domain of AI, where generative AI represents just a fragment.
Diverse AI and machine learning types and algorithms exist, each with specific utilities within various stages of the software development lifecycle, including DevSecOps. Since my initial exposition in October last year, several new applications have emerged. It’s crucial to recognize that while AI and machine learning have been integral to our advancements, the journey towards realizing more complex digital twins in cyber-physical systems extends beyond just scripting new codes. Instead, it calls for a blend of deterministic and non-deterministic approaches, where the unpredictable yet potent nature of generative AI both excites and challenges us.
Treat generative AI like an eager young apprentice. Imagine them not as a college graduate but as an energetic 15-year-old. Occasionally, they get something impressively right, which leaves you delighted. However, oftentimes you find yourself puzzled, questioning their logic. This sentiment parallels narratives we’ve encountered about AI and machine learning. It’s critical to monitor these closely.
Let me take a moment to reinforce that these aren’t just personal observations – current research supports these views. Various service providers of AI technologies are now incorporating numerous disclaimers and offering extensive guidance, emphasizing the importance of maintaining human oversight. Have you considered whether generative AI aligns with DevSecOps principles? It doesn’t. Consider traceability: if outputs are produced by an unknowable entity, tracking their origins becomes challenging.
Then there’s auditability, a core component of DevSecOps. How can we audit what we don’t understand? Or take reproducibility – ever tried clicking the ‘regenerate’ button? Does the result replicate the original? Also, think about explainability: can you comprehend what was just created, whether it’s a test, code, script, or something completely different? Security concerns are also paramount and will be extensively discussed.
A survey involving over 500 developers found that 56% were incorporating AI in their work, and all these developers reported encountering security issues with automated code completion or generation. Moreover, the use of generative AI potentially decreases collaboration. Why might this be? If you’re focusing on interacting with an AI rather than a human colleague, then you’re likely neglecting essential human interactions and team collaboration.
The discussion so far has emphasized diminishing collaboration. What sectors currently utilize this methodology for software development? Our conversations have revolved around providing capabilities to users, but what about our personal application in creating software and implementing these capabilities? I do not overlook industry insights or substantial surveys as a researcher. Yes, Stack Overflow enthusiasts.
According to a recent survey, 37,000 developers participated and 44% of them are currently employing AI in their professional roles. Additionally, another 25% expressed a keen interest in utilizing AI, which may or may not be attributed to FOMO. What applications are found for the AI among the 44% of users? Let me share some figures: 82% are using AI to generate code, 48% for debugging purposes, and 34% for documentation. My personal favorite utilization involves explaining existing code bases. Less than 25% deploy AI for software testing.
This anecdote is genuine, sharing how my team and I used AI around January to aid in analyzing our requirements. We engaged with our user base, obtaining their consent to record our conversations and use these transcriptions for analysis with a GPT tool, which they agreed to. Additionally, we utilized crowd-sourcing through surveys, primarily freeform in nature.
The complexity in our approach was minimized, and through detailed prompts, we could identify sentiments and needs in the requirements that were previously unnoticed. This example illustrates the powerful application of AI in requirement analysis, where you input your language and extract vital information rather than relying on AI-generated content. It is crucial to ensure all prompts are integrated into your version control systems for consistency and tracking.
It’s essential not only to integrate your content into version control systems but also to be mindful of which models or services are being applied, as each interacts differently with the prompts you set up. Discussing diverse datasets is crucial since models have shown bias. Ensuring diversity among the user groups from whom requirements are gathered is already best practice, and now it’s even more important to consider the inherent biases in models. This includes diversifying datasets, the interviewees, and the people you consult, alongside consistent and thorough testing with human oversight.
I find it beneficial for test cases particularly. A piece of research released in January prompted reflection, revealing that only 47% of organizations have automated their testing processes. In industries involving cyber-physical systems, like military applications where I’m engaged, I believe this percentage should be much higher. This suggests that 53% are still conducting manual tests. It’s vital to acknowledge and accept the ongoing presence of manual testing and to equip quality assurance professionals with the tools they need, such as chat engines, for effective testing.
Ensuring that QA specialists are equipped with functional requirements, manual test cases, scenarios, user stories, and journey maps is vital. Engaging them in Chain-of-Thought prompting with tools like GPT can be incredibly supportive, as its utility in generating insightful assistance might surprise you. Reference to a survey from Stack Overflow highlights that while 55% are considering using generative AI for testing purposes, only 3% actually trust it, possibly due to its non-deterministic nature. However, generative AI can be instrumental in creating synthetic test data, though it’s not without challenges regarding precision and other potential issues.
One significant concern is privacy. Utilizing parts of your data in third-party subscription models poses risks if not managed in-house, leading to potential data privacy and integrity issues. Awareness of how your data is handled when outsourced is crucial. Moreover, testing can also produce irrelevant outcomes, a phenomenon occasionally observed as ‘hallucinations’ in test generation. This underscores the importance of maintaining transparency and explainability, as the results and code generated may not always align with expectations.
Discussing the subtle shift in the tech industry from ‘code generation’ to ‘code completion’ sheds light on its current evolution. This renaming suggests a significant transformation. The adaptation process to these changes involves a willingness to expose one’s code base to emerging technological models, which might be either hosted or local. The quality of code produced usually appears well-structured and nicely formatted, though it might not always function correctly. According to a study by Purdue University, about 52% of responses to software engineering prompts are incorrect, indicating the risk of unreliable code outputs. This highlights not only the revolutionary capabilities of these tools but also their limitations and the necessity for cautious engagement.
The plethora of choices even for a minor coding request can lead to what is known as decision fatigue, a phenomenon previously studied primarily in high-stakes fields like medicine and the military. However, it’s becoming relevant in software development, particularly with tools integrated within development environments that aim to assist programmers. These tools can alleviate the overwhelming feeling of starting from scratch but might contribute to mental exhaustion over time with considerable use.
Certain considerations need to be taken into account, especially regarding the varied impact on productivity among different users. Novices or less experienced members might not experience as significant productivity boosts as the more seasoned professionals who can more readily identify and understand issues within the code. Additionally, an entity known as GitClear has monitored a stable trend in code churn across various industries from 2019 to 2023, illustrating persistent challenges in software development practices.
Code churn involves checking in and checking out code repeatedly, often adjusting and re-examining it frequently due to continuous issues or updates. As we head into 2024, the rate of code churn is predicted to double, and it’s uncertain whether this is a generational trend or related to other factors. This uptick in code churn apparently leads to less secure software, despite the common hesitation to accept this. An anecdote I can share involves my own experience in March using GitHub Copilot within a Java codebase, where despite careful and thoughtful prompts, critical security vulnerabilities like OWASP Top 10s appeared quite easily. A study from Stanford also supports this, suggesting that the appearance of well-formatted code might deceive us into overlooking embedded security flaws. This situation underscores the need for more thorough testing and human oversight.
Moreover, generative AI tools need careful scrutiny for reliability. An interesting point highlighted by a study from North Carolina State University reveals that a significant 58% of professionals tend to “cop out” during code reviews, focusing only on the differences. This becomes problematic when dealing with less familiar coders or new team members. For instance, my colleague Carlton, a technical lead, normally relies on quick diffs for reviewing code from his long-trusted developer, Stephen. However, the introduction of AI tools and the induced haste in development might warrant a more meticulous review approach, potentially altering the trust dynamics and operational methods like pair programming.
This emphasizes the crucial role of comprehensive human involvement in code reviews, particularly as tools and workflows evolve, advocating for a shift in oversight strategies and possibly more adaptive team dynamics.
You might consider incorporating a domain expert at some point and possibly rotating roles more frequently. Additionally, think about assembling a group that can enhance your capabilities in SAST and more comprehensive static analysis. Recently, GitLab announced their acquisition of a company specializing in SAST to enhance the scanning processes within the DevOps pipeline, which emphasizes the necessity to closely monitor our coding practices.
It’s important to maintain a separation between generating code and generating tests. Each process requires independent verification to mitigate bias and avoid potential blind spots, which is a fundamental precaution. Additionally, there’s a phenomenon known as overfitting where a model adjusts too closely or exactly to a particular set of data and ignores broader application, similar to how tests might focus too narrowly on one section of your code base at the expense of others. While using generative AI tools is not discouraged, being aware of their limits and preparing accordingly is vital.
Assessing your organization’s readiness for implementing generative AI in software engineering starts with evaluating the robustness of your SDLC. If your processes are solid, leveraging generative AI might be beneficial. However, if there are foundational issues, it’s advisable not to just add generative AI into the mix without addressing core basics first. When I join a new organization or team, I often start by asking whether the team has ownership over their production path. This question leads to a series of follow-up queries that help diagnose existing challenges and decide if it’s necessary to reassess fundamental practices. During the 2021 lockdowns, while attending the virtual DevOps Enterprise Summit, I explored various interactive tools and discussed these topics with peers, including an introduction to Chris Swan through a friend.
My colleague Brian Finster and I, along with several others, found ourselves in a heated debate full of frustration. The common gripe was about the obstacles in adopting DevSecOps and implementing a CI/CD pipeline. Faced with numerous excuses, we decided to establish a definitive guide which we hosted on minimumcd.org. Our role is mainly to curate this open-source documentation focused on the essentials of CI/CD.
So, what exactly are these minimums? Before layering AI or additional complexities, one should adhere to basic practices like continuous integration by ensuring not to leave uncommitted code on local machines overnight. We encourage team collaboration through regular commits—even if the work is incomplete, by using feature flags for safer integrations and to avoid disruptions in the workflow.
After committing your updates, the next big question is how these get into production. That’s where the pipeline comes in, dictating deployability and release standards based on pre-established criteria coded into it. Post-commit, the code is considered an immutable electronic asset; it remains untouched by humans, with the pipeline managing further operations. This automation is a pivotal aspect of DevSecOps principles.
Understanding and maintaining consistency across testing environments, ideally mirroring production settings, is crucial. To help get started with operational metrics, explore the DORA metrics—starting perhaps with Deployment Frequency. For guidance, scanning a QR code typically directs users to research platforms. Participating in surveys available on these sites can also discern which metrics are most applicable to monitor.
Discussing the integration of generative AI into business processes, it’s essential to anticipate shifts in workflow, metrics, and overall dynamics. Inform your teams focusing on metrics that fluctuations will occur and training will be necessary to adapt. Resistance might arise due to established habits, but change is still worthwhile.
The notion of increased productivity with AI tools often revolves around perceived, rather than tangible, improvements. The highlighted productivity is typically related to the excitement over new technology rather than actual enhancements in efficiency or quality. A distinction must be made between perceived short-term gains and genuine long-term improvements. Thus, focusing on team productivity, where collective performance matters more than individual speed, is more beneficial. For methods of measuring productivity effectively, one might consider utilizing the framework proposed by Dr. Nicole Forsgren and her colleagues at Microsoft in 2021, which emphasizes human factors in productivity.
Focusing on job satisfaction helps in understanding productivity, suggesting a possible new dimension to productivity frameworks, including ‘trust’. Trust becomes particularly significant in environments using deterministic AI and machine learning, where outcomes are predictable and repeatable. Trust in technology not only supports its adoption but fosters a deeper understanding and integration into existing systems.
Imagine a pilot using a heads-up display, repeatedly trained to rely on AI for accurate readings like altitude and geographical orientation. Generative AI, inherently unpredictable and deceptive, contrasts starkly with the reliable outputs needed in such high-stakes environments. This brings us to the question: as generative AI becomes more integrated into our tools, can it be trusted? This uncertainty might induce anxiety. Understanding how to gauge productivity with evolving AI technologies remains a challenge yet to be fully unraveled.
The significance of context in technology deployment is critical. Consider your library as an analogy for your code base—your intellectual property and crucial organizational assets. Granting a model access to these resources necessitates careful consideration, especially when contrasting in-house model management with external subscription services. I’m not dismissing the use of subscription models but advising to approach them with full awareness and understanding of organizational boundaries’ implications.
Interacting with InfoSec departments often highlights concerns about information security and flow. Proposing to externally share extensive code bases to enhance context in AI training typically faces resistance. It’s essential to diligently read pop-ups, end user license agreements, and to stay informed about what you’re agreeing to. This caution became personally relevant to me during an incident involving training materials of nominal importance, prompting thoughts on security measures for more sensitive information. Always remember the importance of keeping human oversight in technology engagements—a public service announcement worth noting.
Discussing the integration of AI into enterprise strategy, it’s crucial regardless of organizational size—whether a two-person startup or a multinational corporation. Even if a data strategy exists, conducting a thorough needs assessment is vital. It may seem mundane, but gathering stakeholders for a brainstorming session with simple tools like Post-it notes can help clarify valuable AI applications and guide strategic decisions.
Implementing every potential enhanceent on a large scale is not always feasible, which leads us to the importance of starting with a focused, limited pilot. This strategic approach allows you to test and learn, uncovering the potential and the boundaries of new technologies. Pilots help organizations understand the necessary skills within, identify if additional hiring is needed, and address the initial governance issues effectively.
Governance should not overlook risks, but it must align with your project’s objectives. Critical to success are monitoring and feedback mechanisms. An often overlooked yet essential aspect is thought leadership within your AI strategy. This doesn’t mean public speaking or publications per se, but nurturing internal leaders to keep pace with rapid tech advancements, such as the overwhelming waves of AI developments like DALL·E’s automated imagery. Know when to begin based on strategic needs, document your decisions, and measure their impacts to ensure alignment with business goals.
It’s time to design your AI-assisted software engineering practices. Remember the principles of software architecture and the importance of trade-off analysis which have guided tech for decades. When selecting tools, assess whether an off-the-shelf solution meets urgent needs versus developing a custom solution. Define your priorities based on necessary trade-offs to achieve desired outcomes efficiently.
It might not be perfectly customized for my specific domain needs. It might also be less secure, but that’s a choice we might decide on. Perhaps I have the resources and capabilities to customize it myself. I might choose to implement a model within my organization or use an external service, coordinating with an internal review group. There are multiple scenarios, but it’s crucial to make informed decisions. Remember to adhere to established best practices. Emphasize keeping human oversight in processes. Ensure that all details such as source codes, the prompts, and the specifics of models used are managed securely. Protect your vulnerabilities and avoid sharing sensitive information with public platforms and engines.
Consider the analogy of a man carefully walking on a tightrope between mountains. He encounters risks yet manages them well with safety measures like tethers. He’s engaging in a risky activity, but he mitigates the risks effectively. Think of 2023 as a year where we were catching up with necessary regulations. Regulatory frameworks are evolving. There’s a complex issue with intellectual property. Often, if a model is trained using public data, you may not hold copyright over the generated outputs due to the ownership being traced back to another source.
Furthermore, should your data inadvertently become part of a dataset, for instance through conversation logs retained and analyzed by third parties, you risk losing intellectual property rights. In the U.S., copyright law requires a human’s direct involvement in the creation process. This necessitates careful handling of automatically generated content. Consider the types of questions you should be asking your service providers, or questions to consider if you are the service provider. An appendix includes detailed question sheets for further reference.
Here are a few questions to ponder. How does your model prevent the creation of vulnerabilities that could be malicious? What safety measures are implemented when I use your model, or if you are providing the model, how do you ensure safety? And if an issue arises with the model that necessitates changes, how will you inform me so that I can assess the impact on my operational and value chain? These are critical questions to ask.
Let’s project into the future. I’ll spare you the exhaustive details of this topic, which includes generative AI, AI in general, and machine learning. The crucial aspect here is represented by the red arrow on our metaphorical chart – signifying that we are currently at the “peak of inflated expectations” in these fields, which I truly believe. As seen across various social platforms, the growth of AIOps and the evolution in other AI and ML arenas is undeniable. With generative AI still in its early stages and other technologies progressing, what do you foresee in the coming 12 to 24 months? My recent engagements involved discussions with professionals from big names like Microsoft, IT Revolution, Yahoo, the Software Engineering Institute, and peers from MITRE Corporation.
Our collective insights suggest an ongoing trend of increasing data silos. As AI tools diversify, each becomes a unique interaction entity – not shared amongst users, creating isolated experiences, particularly with generative AI tools. Currently, this segmentation may contribute to slower data processing and potential quality declines, an issue expected to temporarily worsen before it resolves. The demand for platform engineering is expected to rise since platforms serve to minimize user errors through codified best practices, be it through low-code or no-code solutions or innovations for customized development needs.
Turning to different perspectives, consider the advice against sending adult children to coding bootcamps, echoing Jensen Huang’s viewpoint. While some fear AI might supplant coders, others believe proficient software engineers will continue thriving, dependent on their mastery and ability to impart those skills. Listen closely to these subtle but crucial differences.
Then there’s Devin. Have you encountered Devin, or perhaps browsed OpenDevin launched shortly after? Observing it provides intrigue—it’s akin to watching an AI operate as a software engineer through ‘AI swarming,’ where numerous agents interact dynamically, fostering various operational patterns like ‘coder critic.’ We’re witnessing AI’s transition from a solitary tool to integrated agents within the software development life cycle (SDLC), which necessitates adjusting the roles assigned to humans involved, potentially introducing ‘GenAI’ into your teams—a term symbolizing AI’s integration beyond generational categorizations like Gen X or Gen Z.
In 1939, a classic film made its debut – The Wizard of Oz. Interestingly, this black and white movie transitions into vibrant Technicolor as Dorothy ventures from her mundane life into the miraculous world of Oz. This moment when Dorothy steps into a realm filled with unprecedented colors and wonders, much like the technological leap to rich, immersive experiences we encounter today in digital interfaces, emphasizes a parallel to our current advancements in technology.
Looking ahead, the technological landscape is bound to be astounding. The methodologies we currently employ are primarily human-centric. For instance, limiting work-in-progress and tackling one user story at a time are strategies that cater to human limitations. However, with the introduction of sophisticated AI elements into our teams, we must reconsider our optimization strategies to better integrate these new intelligent agents.
Reflecting on Technicolor and its shift from conventional black and white, we recognize that it’s impossible to revert to older, simpler times in technology. The concept of prompt engineering emerges as a crucial discipline, demanding a thorough understanding of its ethics, ownership of outcomes, and the dynamics of machine-human interactions. These points further lead to considerations of trust and reliability within software teams.
Your mission, should you choose to accept it, involves actively engaging with your organization to shed light on covert uses of AI, promoting an open and non-judgmental understanding of its application. Prioritizing cybersecurity and establishing firm protocols, while maintaining open lines of communication with technology providers, are essential steps toward navigating this new era confidently and responsibly.
Use those questions or be ready to answer those questions if you are the provider of generative AI capabilities to your organization. That’s your call to action. I need something from all of you. You’re actually the missing piece of my puzzle. As a researcher, I want to understand, how are you using generative AI? How is your organization preparing? How are you personally focusing on getting ready for this? What are you doing? Share your organization’s lessons learned. Tell me about your stories. Tell me about the challenges that you have. Or tell me about the things that you want to learn about because you haven’t gotten there yet. This is in color. What matters in all of this is the humans. This is what matters.
See more presentations with transcripts
Aug 27, 2024
Welcome to DediRock, your trusted partner in high-performance hosting solutions. At DediRock, we specialize in providing dedicated servers, VPS hosting, and cloud services tailored to meet the unique needs of businesses and individuals alike. Our mission is to deliver reliable, scalable, and secure hosting solutions that empower our clients to achieve their digital goals. With a commitment to exceptional customer support, cutting-edge technology, and robust infrastructure, DediRock stands out as a leader in the hosting industry. Join us and experience the difference that dedicated service and unwavering reliability can make for your online presence. Launch our website.