Context

Over a period of last two weeks I found myself in multiple discussions regarding LLMs, AI, Microsoft Co Pliot and its use cases with individuals in academia to Software Sales. I ended up investigating about a code base for Signal.org last week and I took help of a LLM based tool to speed up discovery. I thought its a good idea to document what I was doing as it will help to showcase use cases considering the hype around the technology and perhaps help someone.

Task at hand

Signal is an opensource communication tool which is arguably one of the most secure messing tools available. In this case I wanted to investigate whether Signal can be used for a secure private communication requirement where the software including the back-end Instant message Server, Android and Web clients should be running in a self hosted environment.

Typically when we investigate a software like this, its possible to read the reviews, follow documentation, instructions etc. But Signal comes with no documentation whatsoever as the purpose of the software is not be used as a self hosted solution by the looks of it. That means the investigation will involve studying the code to understand the underlying technologies, dependant services, dependencies on any third party products etc. The server software was written Java and my expectation was an FOSS database like PostgreSQL, a caching layer and a message queue will be involved in addition to Google’s FCM and Apple push notification services.

I fired up Visual Studio Code with AI Coding assistant Cody. This tool has handful of LLMs but in free version I have Claude 3.5 Sonnet available. After browsing through the code, I could query the code base with the help of the code assistant.

The tool was able to provide a good summary of the codebase including the technologies being used. The biggest surprise was the mention of Amazon DynamoDB instead of a FOSS database. And it quickly helped to reach the conclusion that setting up the service in a self hosted environment may not as very easy. After looking into the codebase little deeper it was quite clear the software is using AWS Java SDK and heavily reliant on DynamoDB. As demonstrated here, we could identify one of the biggest dependencies for setting up the software in few minutes with the help of an LLM tool.

After having a high level idea of the project, I decided to look into the tests that’s a place to find fine grained details about the (any) project. I was able to quickly run tests & found errors in the logs. Usually such errors takes time to decipher and needs deep understanding of the stack involved.

But this time , I turned to the force:

It took 2 queries against the error log to identify the root cause of the error and the service the source code was expecting.

I could address the dependency in no time. Its important to note that I had never came across this particular dependency (statsd) and even then I could find my way around in less than 10 minutes.

Progress

In a matter of few hours a high-level listing of the dependencies and overview of the code base completed. The LLM was not able to help with queries like finding a FOSS alternative for Amazon DyanamoDB API but it fared really well when presented with various errors. Obviously it failed in listing all the important functionality and dependencies as well.

(The Cody Code Assistant was able to support with numerous run time errors along the way)

At end of few hours the following was achieved:

  • High level understanding of code base
  • A list of dependencies
  • Surprisingly good support when it comes to parsing errors and providing feedback

While the Code assistant failed listing out details like VoIP services, extensive use of certain technologies like GRPC, overall the support provided in nagging the code base for the initial assessment was commendable.

Summary

The LLM technology when effectively used can be used to assist in myriad of tasks. Code assistants are generating queryable environments based on the code & are very efficient. In this given example, the productivity gain is clearly visible as tasks like understanding an error log which could end up taking a long time and often collaboration of multiple engineers was handled in a matter of minutes. For tasks where the tool was asked to analysis already existing data like error logs it performed really well. There are numerous tools getting introduced now which claims to assist in software development to perform end-to-end software development but review of such tools are beyond the scope of this post. The LLM based tools are matured enough of support software development and provide tangible productivity boost when used efficiently.

For the past few days a high severity vulnerability impacting multiple GNU/Linux distributions is going around and as expected, this is from the CUPS printing stack.

Details can be found here www.evilsocket.net

Steps for ensuring your Debian GNU/Linux is not impacted

Check for cups-browsed with: systemctl status cups-browsed

root@host:~# systemctl status cups-browsed

cups-browsed.service
Loaded: not-found (Reason: No such file or directory)
Active: inactive (dead)


Lets scan the port sudo nmap localhost -p 631 --script cups-info

One scan gave a core dump 😳

root@host:~# sudo nmap localhost -p 631 --script cups-info

Starting Nmap 7.01 ( https://nmap.org ) at 2024-09-27 11:40 UTC
Stats: 0:00:00 elapsed; 0 hosts completed (0 up), 0 undergoing Script Pre-Scan
nmap: timing.cc:710: bool ScanProgressMeter::printStats(double, const timeval*): Assertion `ltime' failed.
Aborted (core dumped)

But the port itself is closed

Starting Nmap 7.01 ( https://nmap.org ) at 2024-09-27 11:45 UTC
mass_dns: warning: Unable to determine any DNS servers. Reverse DNS is disabled. Try using --system-dns or specify valid servers with --dns-servers
Nmap scan report for localhost (127.0.0.1)
Host is up (0.000054s latency).
PORT    STATE  SERVICE
631/tcp closed ipp

Inspect the installed packages:

apt list --installed | egrep '(cups-browsed|libcupsfilters|libppd|cups-filters|ipp)'

libcupsfilters1/xenial-infra-security,now 1.8.3-2ubuntu3.5+esm1 amd64 [installed,automatic]

Loo for cups related packages: apt list --installed | grep cups

libcups2/xenial-infra-security,now 2.1.3-4ubuntu0.11+esm7 amd64 [installed,automatic]
libcupsfilters1/xenial-infra-security,now 1.8.3-2ubuntu3.5+esm1 amd64 [installed,automatic]
libcupsimage2/xenial-infra-security,now 2.1.3-4ubuntu0.11+esm7 amd64 [installed]

Disable & remove the services:

If the printing and document management is not used on the server, delete the related packages as follows.

apt remove libcups2 libcupsfilters1 libcupsfilters1 libcupsimage2 cups-browsed 

These steps will make sure that the usually high severity (9.1) rated vulnerabilities are removed from the servers.

This is a peculiar post about a nice little DNS service I came across few days ago. While reviewing a pull request I came across an address along the lines https://192.168.1.56.nip.io & I couldn’t find an immediate clarification and searching I could find the Github repo of the project but I couldn’t understand how it worked.

Later our DevOps engineer had to explain to in detail on what this is and how it works !

The nice little utility service has per-created wild card DNS entries for the entire private IP address range. Queries like NOTEXISTING.192.168.2.98.nip.io will get resolved to 192.168.2.168

bkb@bkb-dev-server1:~/src/discovery$ dig NOTEXISTING.192.168.2.98.nip.io

; <<>> DiG 9.16.1-Ubuntu <<>> NOTEXISTING.192.168.2.98.nip.io
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 12130
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;NOTEXISTING.192.168.2.98.nip.io. IN    A

;; ANSWER SECTION:
NOTEXISTING.192.168.2.98.nip.io. 86400 IN A     192.168.2.98

;; Query time: 583 msec
;; SERVER: 208.66.16.191#53(208.66.16.191)
;; WHEN: Mon Nov 27 03:50:17 AST 2023
;; MSG SIZE  rcvd: 76

This is a very useful trick if we don’t want to edit /etc/hosts or equivalent for tools or software running locally or for scenarios where a DNS record is required.

Learn more https://nip.io

APIs and applications

When I first started working on software applications nearly 2 decades ago, the norm was a user interface connecting to a db and presenting the end user with means to interact with the database. If I remember correctly the first application in the traditional client-server sense that I came across was Mailman. (Looks like the application is heavily rebranded and still around! ). The general idea was any software (desktop or web) can connect with a database using a database driver and work. Things have changed a lot and various API like techniques were introduced enabling faster development cycles and near infinite scalability.

Modern applications uses numerous methods to provide APIs. The shift from RPC, SOAP and even from REST is quite interesting. While RESTful APIs and techniques like OpenAPI specifications are still quite popular, we are moving away to more modern methods. Some ideas like using PostREST is around, GraphQL seems to be the most developer friendly mechanism available.

GraphQL

The GraphQL is a different approach and unlike a typical API, its more of a language that can query and end point. Behind the screens there can be a database and a service can run in-front of the db exposing a URL for querying various data points. Unlike traditional APIs provided by RESTful services etc, the GraphQL method needs just one end point and it can provide different types of responses based on the queries. In addition to this there are numerous advantage like the responses being typed.

Querying the Blockchain

The blockchain technology introduced by Bitcoin is now more than a decade old. But the blockchain implementations generally struggle with providing an easy way to query their blocks/data. Its quite normal to have traditional databases to hold much large amounts of data compared to blockchains but the databases always perform better when one attempt to query them.

Since its inherently difficult to query the blockchain (database), most projects provide means to stream data to a database. Applications often called as dApps essentially calls this “centralized” databases to understand the information on the blockchain. Modern blockchains like Ethereum, Polkadot etc has understood this problem and implemented interfaces to easily consume the data.



[ from: EIP-1767: GraphQL interface to Ethereum node data ]


Ethereum for example has introduced GraphQL support in addition to JSON RPC via EIP-1767. On Polkadot ecosystem there are multiple projects like Hydra, Subquery implementing indexing and exposing GraphQL end points.

In general the GraphQL solutions in the blockchain space looks as follows:

The Graph

The Graph Project, ie https://thegraph.com/en/ is attempt to build a SaaS product and attempt to bring in some level of decentralization via providing a method to run indexer nodes by multiple parties. They have a stacking system and people can run servers as indexers fetching and exposing GraphQL endpoints. While this might be interesting, I want to focus on the technical aspects.

The Graph project has implemented a “listener” to fetch data from various chains like Ethereum, Polkadot and others. This data is then pushed into a PostgreQL database. The team is using IPFS too, but just to store the schema and configuration files (looks like something done to justify the decentralization buzzword ?). The next step in the equation is a GraphQL server which exposes and end point for the external world.

The code for the indexer node is here https://github.com/graphprotocol/graph-node/tree/master/server/index-node

Browsing the code also gave insights to StreamingFast and firehose projects which used by the Graph Project. From a quick read, StreamingFast seems to be a protocol to read the blocks and pack them to into file like data structure and efficiently stream across the network. Looks like a fast and efficient method to stream the data from the chains to external databases.

Why use The Graph project ?

For aspiring blockchain projects to provide modern and easy methods for the dApp developers, getting associated with the project could be beneficial. Its definitely possible to self host a data ingestion tool to push the data into a database and then a software like Hansura to provide GraphQL. But being part of a project which aims at decentralizing the APIs can help in resilience and more visibility. There is some information here on on boarding https://thegraph.com/migration-incentive-program/ [Disclaimer: I am not a user or part of the graph project in anyway and this is not a recommendation to use the product, token or any of their programs ]

Cross chain (Inter Blockchain communication) projects have been something I have been working for last 2+ years now. Transfer of assets from Bitcoin to Graphene like chains have been the essential focus. Right now on the second project on Peerplays, we have purely decentralized Bitcoin Asset Transfer nearing completion.

SONs

SONs aka Sidechain Operator Nodes are democratically elected, decentralized Bitcoin Gateways. The gateways are not just decentralized, we can also extend them to support other chains like EOS, Ethereum, Hive etc.

We are looking at only the transfer of assets or value. This means, records and contracts (smart-contracts) will not be transferred.

High Availability

One of the peculiar aspects is the usage of blockchain itself to do the heart beats to ensure the uptime. With 15 minimum number of nodes working in a decentralized manner and handshaking is our biggest challenge.

Components

Peerplays Blockchain, Bitcoin libraries, ZMQ & Bitcoin Scripts Anyone interested to join this exciting project can find the code in progress here : https://github.com/peerplays-network/peerplays/tree/feature/SONs-base

I had https://pivpn.dev/ successfully running for a while withuot any issues. Then suddenly it stopped working. The configuration was never received on various devices. Unfortunately there was absolutely no information anywhere – no logs, search results returned big essays on OpenVPN.

There is a little handy command which can actually fix the issues in a moment.

Go to the VPN server and just run pivpn -d

Running the pivpn command with -d option fixes most of the issues.

Its diagnosis will be printed to the screen.

=============================================
::::        Self check       ::::
:: [OK] IP forwarding is enabled
:: [OK] Ufw is enabled
:: [OK] Iptables MASQUERADE rule set
:: [OK] Ufw input rule set
:: [OK] Ufw forwarding rule set
:: [OK] OpenVPN is running
:: [OK] OpenVPN is enabled (it will automatically start on reboot)
:: [OK] OpenVPN is listening on port 1194/udp
=============================================
::::      Snippet of the server log      ::::
=============================================
::::        Debug complete       ::::
:::
::: Debug output completed above.
::: Copy saved to /tmp/debug.txt

I had setup Pi-hole, a remote one, DNScrypt, both local and remote, privoxy for cleaning up the bad web traffic that passes through pi-hole etc. Things were looking good and that’s when EFF came up with their new https://panopticlick.eff.org/

After all the effort, the Panopticlick reports are not shining with colors. This gives and idea about the extend to which tracking is prevalent.

The funny part is, this is what you you get with all the circus !!

Useful links & screen shots:

https://github.com/pi-hole/pi-hole/wiki/DNSCrypt-2.0

https://github.com/pi-hole/pi-hole/wiki/DNSCrypt-2.0

https://itchy.nl/raspberry-pi-3-with-openvpn-pihole-dnscrypt

root@ubuntu-s-1vcpu-1gb-xyz-01:~# apt install dnscrypt-proxy

no privacy – as logs are there !

General DNS pointing to one location
randomized !

I was looking at the possibility of staying at an Himalayan Village for a while and working. Interestingly, I stumbled up a training firm called Alt-Campus : https://altcampus.io/

Clockwise from top: Skyline of Dharamsala, Main Street Temple – McLeod Ganj, Gyuto Karmapa, Himachal Pradesh Cricket Association Stadium and St. John church
(Image from https://commons.wikimedia.org/wiki/File:Dharamsala_Montage.png)

Yes, the cool thing is that they have setup this up in Dharmasala!

They seem to have an hands on training which could be useful. The payments are only after one get placed and that sounds pretty cool thing to do as well.

I will post more details, if I find anything new …. !

Featured image from : https://en.wikipedia.org/wiki/Dharamshala#/media/File:Cloudy_Triund,_above_Mcleod_Ganj,_Himachal_Pradesh.jpg