Over the last 3 months or so I have been receiving emails or Linkedin messages that generally talks about needing help with a software project. Initial emails simply shared a link to a bitbucket code base and requested to help with it. The profiles involved were all new Harvard educated individuals and easy to identity as scam.
The recent ones are bit more creative and are from Linkedin profiles which are much more believable with recommendations and long history.
Unlike the initial contacts who shared various code bases in Python or NodeJS in the first day itself, latest messages follows a multi day initiative.
On the third day, I got the code base and this time they also have a functional specification !
What’s in the code ?
The code provided seems to be slightly changed boilerplate code (LLMed ?) in React or Javascript with instructions to run locally. Somewhere in the code base there is an encoded function which is either loaded from an external URL or in one of the source files like the first screen shot in the image.
I took pains to use a secure environment to run the first of the lot which was attempting to access a blockchain wallet. The latest ones looks to be different and attempting to download external code. (Needs further verification.)
In any case, if you happen to get emails/messages requesting help with some code base and access to code, be careful and ignore if you are not sure about what its all about. Do not run the code on your computers!
For the past few days a high severity vulnerability impacting multiple GNU/Linux distributions is going around and as expected, this is from the CUPS printing stack.
Starting Nmap 7.01 ( https://nmap.org ) at 2024-09-27 11:45 UTC
mass_dns: warning: Unable to determine any DNS servers. Reverse DNS is disabled. Try using --system-dns or specify valid servers with --dns-servers
Nmap scan report for localhost (127.0.0.1)
Host is up (0.000054s latency).
PORT STATE SERVICE
631/tcp closed ipp
Inspect the installed packages:
apt list --installed | egrep '(cups-browsed|libcupsfilters|libppd|cups-filters|ipp)'
Over a period of last two weeks I found myself in multiple discussions regarding LLMs, AI, Microsoft Co Pliot and its use cases with individuals in academia to Software Sales. I ended up investigating about a code base for Signal.org last week and I took help of a LLM based tool to speed up discovery. I thought its a good idea to document what I was doing as it will help to showcase use cases considering the hype around the technology and perhaps help someone.
Task at hand
Signal is an opensource communication tool which is arguably one of the most secure messing tools available. In this case I wanted to investigate whether Signal can be used for a secure private communication requirement where the software including the back-end Instant message Server, Android and Web clients should be running in a self hosted environment.
Typically when we investigate a software like this, its possible to read the reviews, follow documentation, instructions etc. But Signal comes with no documentation whatsoever as the purpose of the software is not be used as a self hosted solution by the looks of it. That means the investigation will involve studying the code to understand the underlying technologies, dependant services, dependencies on any third party products etc. The server software was written Java and my expectation was an FOSS database like PostgreSQL, a caching layer and a message queue will be involved in addition to Google’s FCM and Apple push notification services.
I fired up Visual Studio Code with AI Coding assistant Cody. This tool has handful of LLMs but in free version I have Claude 3.5 Sonnet available. After browsing through the code, I could query the code base with the help of the code assistant.
The tool was able to provide a good summary of the codebase including the technologies being used. The biggest surprise was the mention of Amazon DynamoDB instead of a FOSS database. And it quickly helped to reach the conclusion that setting up the service in a self hosted environment may not as very easy. After looking into the codebase little deeper it was quite clear the software is using AWS Java SDK and heavily reliant on DynamoDB. As demonstrated here, we could identify one of the biggest dependencies for setting up the software in few minutes with the help of an LLM tool.
After having a high level idea of the project, I decided to look into the tests that’s a place to find fine grained details about the (any) project. I was able to quickly run tests & found errors in the logs. Usually such errors takes time to decipher and needs deep understanding of the stack involved.
But this time , I turned to the force:
It took 2 queries against the error log to identify the root cause of the error and the service the source code was expecting.
I could address the dependency in no time. Its important to note that I had never came across this particular dependency (statsd) and even then I could find my way around in less than 10 minutes.
Progress
In a matter of few hours a high-level listing of the dependencies and overview of the code base completed. The LLM was not able to help with queries like finding a FOSS alternative for Amazon DyanamoDB API but it fared really well when presented with various errors. Obviously it failed in listing all the important functionality and dependencies as well.
(The Cody Code Assistant was able to support with numerous run time errors along the way)
At end of few hours the following was achieved:
High level understanding of code base
A list of dependencies
Surprisingly good support when it comes to parsing errors and providing feedback
While the Code assistant failed listing out details like VoIP services, extensive use of certain technologies like GRPC, overall the support provided in nagging the code base for the initial assessment was commendable.
Summary
The LLM technology when effectively used can be used to assist in myriad of tasks. Code assistants are generating queryable environments based on the code & are very efficient. In this given example, the productivity gain is clearly visible as tasks like understanding an error log which could end up taking a long time and often collaboration of multiple engineers was handled in a matter of minutes. For tasks where the tool was asked to analysis already existing data like error logs it performed really well. There are numerous tools getting introduced now which claims to assist in software development to perform end-to-end software development but review of such tools are beyond the scope of this post. The LLM based tools are matured enough of support software development and provide tangible productivity boost when used efficiently.
This was a trip to the wild after a long time and with a mirrorless camera (Nikon Z9, 200-500 f5.6). While the trip was hectic due to shuttling between multiple locations, we could get a glimpse of quite a few animals.
This is a peculiar post about a nice little DNS service I came across few days ago. While reviewing a pull request I came across an address along the lineshttps://192.168.1.56.nip.io & I couldn’t find an immediate clarification and searching I could find the Github repo of the project but I couldn’t understand how it worked.
Later our DevOps engineer had to explain to in detail on what this is and how it works !
The nice little utility service has per-created wild card DNS entries for the entire private IP address range. Queries like NOTEXISTING.192.168.2.98.nip.io will get resolved to 192.168.2.168
This is a very useful trick if we don’t want to edit /etc/hosts or equivalent for tools or software running locally or for scenarios where a DNS record is required.
When I first started working on software applications nearly 2 decades ago, the norm was a user interface connecting to a db and presenting the end user with means to interact with the database. If I remember correctly the first application in the traditional client-server sense that I came across was Mailman. (Looks like the application is heavily rebranded and still around! ). The general idea was any software (desktop or web) can connect with a database using a database driver and work. Things have changed a lot and various API like techniques were introduced enabling faster development cycles and near infinite scalability.
Modern applications uses numerous methods to provide APIs. The shift from RPC, SOAP and even from REST is quite interesting. While RESTful APIs and techniques like OpenAPI specifications are still quite popular, we are moving away to more modern methods. Some ideas like using PostREST is around, GraphQL seems to be the most developer friendly mechanism available.
GraphQL
The GraphQL is a different approach and unlike a typical API, its more of a language that can query and end point. Behind the screens there can be a database and a service can run in-front of the db exposing a URL for querying various data points. Unlike traditional APIs provided by RESTful services etc, the GraphQL method needs just one end point and it can provide different types of responses based on the queries. In addition to this there are numerous advantage like the responses being typed.
Querying the Blockchain
The blockchain technology introduced by Bitcoin is now more than a decade old. But the blockchain implementations generally struggle with providing an easy way to query their blocks/data. Its quite normal to have traditional databases to hold much large amounts of data compared to blockchains but the databases always perform better when one attempt to query them.
Since its inherently difficult to query the blockchain (database), most projects provide means to stream data to a database. Applications often called as dApps essentially calls this “centralized” databases to understand the information on the blockchain. Modern blockchains like Ethereum, Polkadot etc has understood this problem and implemented interfaces to easily consume the data.
[ from: EIP-1767: GraphQL interface to Ethereum node data ]
Ethereum for example has introduced GraphQL support in addition to JSON RPC via EIP-1767. On Polkadot ecosystem there are multiple projects like Hydra, Subquery implementing indexing and exposing GraphQL end points.
In general the GraphQL solutions in the blockchain space looks as follows:
The Graph
The Graph Project, ie https://thegraph.com/en/ is attempt to build a SaaS product and attempt to bring in some level of decentralization via providing a method to run indexer nodes by multiple parties. They have a stacking system and people can run servers as indexers fetching and exposing GraphQL endpoints. While this might be interesting, I want to focus on the technical aspects.
The Graph project has implemented a “listener” to fetch data from various chains like Ethereum, Polkadot and others. This data is then pushed into a PostgreQL database. The team is using IPFS too, but just to store the schema and configuration files (looks like something done to justify the decentralization buzzword ?). The next step in the equation is a GraphQL server which exposes and end point for the external world.
The code for the indexer node is here https://github.com/graphprotocol/graph-node/tree/master/server/index-node
Browsing the code also gave insights to StreamingFast and firehose projects which used by the Graph Project. From a quick read, StreamingFast seems to be a protocol to read the blocks and pack them to into file like data structure and efficiently stream across the network. Looks like a fast and efficient method to stream the data from the chains to external databases.
Why use The Graph project ?
For aspiring blockchain projects to provide modern and easy methods for the dApp developers, getting associated with the project could be beneficial. Its definitely possible to self host a data ingestion tool to push the data into a database and then a software like Hansura to provide GraphQL. But being part of a project which aims at decentralizing the APIs can help in resilience and more visibility. There is some information here on on boarding https://thegraph.com/migration-incentive-program/ [Disclaimer: I am not a user or part of the graph project in anyway and this is not a recommendation to use the product, token or any of their programs ]
Cross chain (Inter Blockchain communication) projects have been something I have been working for last 2+ years now. Transfer of assets from Bitcoin to Graphene like chains have been the essential focus. Right now on the second project on Peerplays, we have purely decentralized Bitcoin Asset Transfer nearing completion.
SONs
SONs aka Sidechain Operator Nodes are democratically elected, decentralized Bitcoin Gateways. The gateways are not just decentralized, we can also extend them to support other chains like EOS, Ethereum, Hive etc.
We are looking at only the transfer of assets or value. This means, records and contracts (smart-contracts) will not be transferred.
High Availability
One of the peculiar aspects is the usage of blockchain itself to do
the heart beats to ensure the uptime. With 15 minimum number of nodes
working in a decentralized manner and handshaking is our biggest
challenge.
I had https://pivpn.dev/ successfully running for a while withuot any issues. Then suddenly it stopped working. The configuration was never received on various devices. Unfortunately there was absolutely no information anywhere – no logs, search results returned big essays on OpenVPN.
There is a little handy command which can actually fix the issues in a moment.
Go to the VPN server and just run pivpn -d
Running the pivpn command with -d option fixes most of the issues.
Its diagnosis will be printed to the screen.
=============================================
:::: Self check ::::
:: [OK] IP forwarding is enabled
:: [OK] Ufw is enabled
:: [OK] Iptables MASQUERADE rule set
:: [OK] Ufw input rule set
:: [OK] Ufw forwarding rule set
:: [OK] OpenVPN is running
:: [OK] OpenVPN is enabled (it will automatically start on reboot)
:: [OK] OpenVPN is listening on port 1194/udp
=============================================
:::: Snippet of the server log ::::
=============================================
:::: Debug complete ::::
:::
::: Debug output completed above.
::: Copy saved to /tmp/debug.txt
I had setup Pi-hole, a remote one, DNScrypt, both local and remote, privoxy for cleaning up the bad web traffic that passes through pi-hole etc. Things were looking good and that’s when EFF came up with their new https://panopticlick.eff.org/
After all the effort, the Panopticlick reports are not shining with colors. This gives and idea about the extend to which tracking is prevalent.
The funny part is, this is what you you get with all the circus !!