For the past few days a high severity vulnerability impacting multiple GNU/Linux distributions is going around and as expected, this is from the CUPS printing stack.
Starting Nmap 7.01 ( https://nmap.org ) at 2024-09-27 11:45 UTC
mass_dns: warning: Unable to determine any DNS servers. Reverse DNS is disabled. Try using --system-dns or specify valid servers with --dns-servers
Nmap scan report for localhost (127.0.0.1)
Host is up (0.000054s latency).
PORT STATE SERVICE
631/tcp closed ipp
Inspect the installed packages:
apt list --installed | egrep '(cups-browsed|libcupsfilters|libppd|cups-filters|ipp)'
Over a period of last two weeks I found myself in multiple discussions regarding LLMs, AI, Microsoft Co Pliot and its use cases with individuals in academia to Software Sales. I ended up investigating about a code base for Signal.org last week and I took help of a LLM based tool to speed up discovery. I thought its a good idea to document what I was doing as it will help to showcase use cases considering the hype around the technology and perhaps help someone.
Task at hand
Signal is an opensource communication tool which is arguably one of the most secure messing tools available. In this case I wanted to investigate whether Signal can be used for a secure private communication requirement where the software including the back-end Instant message Server, Android and Web clients should be running in a self hosted environment.
Typically when we investigate a software like this, its possible to read the reviews, follow documentation, instructions etc. But Signal comes with no documentation whatsoever as the purpose of the software is not be used as a self hosted solution by the looks of it. That means the investigation will involve studying the code to understand the underlying technologies, dependant services, dependencies on any third party products etc. The server software was written Java and my expectation was an FOSS database like PostgreSQL, a caching layer and a message queue will be involved in addition to Google’s FCM and Apple push notification services.
I fired up Visual Studio Code with AI Coding assistant Cody. This tool has handful of LLMs but in free version I have Claude 3.5 Sonnet available. After browsing through the code, I could query the code base with the help of the code assistant.
The tool was able to provide a good summary of the codebase including the technologies being used. The biggest surprise was the mention of Amazon DynamoDB instead of a FOSS database. And it quickly helped to reach the conclusion that setting up the service in a self hosted environment may not as very easy. After looking into the codebase little deeper it was quite clear the software is using AWS Java SDK and heavily reliant on DynamoDB. As demonstrated here, we could identify one of the biggest dependencies for setting up the software in few minutes with the help of an LLM tool.
After having a high level idea of the project, I decided to look into the tests that’s a place to find fine grained details about the (any) project. I was able to quickly run tests & found errors in the logs. Usually such errors takes time to decipher and needs deep understanding of the stack involved.
But this time , I turned to the force:
It took 2 queries against the error log to identify the root cause of the error and the service the source code was expecting.
I could address the dependency in no time. Its important to note that I had never came across this particular dependency (statsd) and even then I could find my way around in less than 10 minutes.
Progress
In a matter of few hours a high-level listing of the dependencies and overview of the code base completed. The LLM was not able to help with queries like finding a FOSS alternative for Amazon DyanamoDB API but it fared really well when presented with various errors. Obviously it failed in listing all the important functionality and dependencies as well.
(The Cody Code Assistant was able to support with numerous run time errors along the way)
At end of few hours the following was achieved:
High level understanding of code base
A list of dependencies
Surprisingly good support when it comes to parsing errors and providing feedback
While the Code assistant failed listing out details like VoIP services, extensive use of certain technologies like GRPC, overall the support provided in nagging the code base for the initial assessment was commendable.
Summary
The LLM technology when effectively used can be used to assist in myriad of tasks. Code assistants are generating queryable environments based on the code & are very efficient. In this given example, the productivity gain is clearly visible as tasks like understanding an error log which could end up taking a long time and often collaboration of multiple engineers was handled in a matter of minutes. For tasks where the tool was asked to analysis already existing data like error logs it performed really well. There are numerous tools getting introduced now which claims to assist in software development to perform end-to-end software development but review of such tools are beyond the scope of this post. The LLM based tools are matured enough of support software development and provide tangible productivity boost when used efficiently.