A Document to Reference

As negative as I am about the impacts that some of the generative AI is going to have on society as it competes against and beats the skillsets of the majority of people, across nearly every skill that we are capable of, there is an inevitability to it. And, it has been heading that way for a very long time already, even though the majority of people don't really pay attention to the way information is handled, and how we as humans interact with it.

For instance, for those of us who work in typical company, how are you handling your documentation? Typically, it looks something like a Sharepoint, Box or Drive repository, using a naming convention and file structures to ensure that a particular document is stored in the right place. Then, there are likely duplicates of that document, for instance, a spreadsheet that gets stored in several locations depending on which stakeholders need it and then, it has also been distributed by email, getting stored locally. On top of this there are edits, so new versions are created, making it near impossible to know what is the latest version. And then on top of this, there are access models for who is able to see what and when. Then there are other repositories, like Teams file storage to contend with, further splitting information.

But, what is interesting is, all of these are stored in a hierarchical structure like a physical filing cabinet, which is logical based on what we have done previously, but doesn't scale when the cost of creating documents has gone from high, to free. The amount of documents have exploded in the digital era, even if we aren't printing them into a physical form.

And, it is because of this explosion of documentation that we have developed increasingly clever ways to remove the need to dig through folders. Sure, we probably still have them on our desktops, but considering that the entire internet is essentially stored in folders, how do you search for the football scores? Instead of diving through the news folders using a convention unique to that site, then moving to dog through folders of another convention to see the weekly weather on another site, we have put interfaces over the top so that we get the information we need, in a consumable format, that doesn't require us knowing the source from where it is arriving. We don't know where it is stored, nor do we care, we just want to know if it will rain or not, and if our team won.

What the Large Language Models (LLMs) are doing, is essentially scraping through all of those folders to pull bits of information out that it deems relevant and then sticking it together in a comprehensible way, so we can understand it. It is pretty clever, but this isn't actually the solution to the problem of information integrity, as the source matters.

Firstly, a far better way to store documents is in a timeline, which is what a blockchain does with bits of information. The reason this is so valuable is that it provides time accuracy for the content that can be referenced when needed. But, by itself, this is not very useful for an organization, because knowing when something was created to find it is harder than a filing system. So, to be useful, it requires cross-referencing with contextual meaning. For instance, if it is a contract that was written for Customer X. However, there might be multiple versions of that document, as it has moved through drafts and revisions before the final. If every revision is also a "new" document on the time line, but referencing the one prior, it creates a single lifecycle line for the document, meaning there is only one copy of the document, multiple versions, but like a blockchain transaction, it is always trackable.

And then, for this to be useful, we need to be able to visualize it as some kind of final document, or visualize a version of that document in time, like a previous draft. What this is doing is essentially creating a snapshot view of the document in the same way a block explorer can show a particular block.

Corporations need blockchains just to manage their information flows.

But, corporations are also going to start changing what they mean by "document" because what the LLMs are able to do is take small slices of a larger document and put it together to create something else. For instance, if it was asked to create a powerpoint presentation on the financial numbers for the quarter, plus whatever marketing has added into the mix, it would be able to go through SalesForce, scrape reports in Sharepoint and take highlights from marketing materials to create a composite document. Once it is tweaked a bit and learns what it needs to do more precisely, it can repeat that report with updated information automatically, rather than having paid person manually locate, search through the documents and cut and paste into the powerpoint.

If your job is creating powerpoints, start retraining.

The AI tools for finding, extracting and creating views of data are going to get better and better, but that differentiator is going to be the integrity of the information it is using. Right now, the LLMs are largely seen to be used scraping the internet, but where they are likely going to be the most valuable is in closed corporate environments, making sense of all the information that is coming into the various repositories around the company and across organizations. Because the AI doesn't care about the location, a timeline with context is the most sensible way for document and information creation. So, an enduser will create something like a contract and save it, without knowing where it is saved, just that it is saved. From there, the next person in the chain doesn't need to know where it is either, they just need to know what they are looking for. And in between, the AI is creating handshakes.

Because information is timelined and contextualized across multiple reference points like who created it, as well as being able to be compared with other similar content, the quality of information goes up, and gets handed to the right person at the right time they need it.

Ever struggled to find a document at work?

Now, for a corporation, immutability isn't something they need (or want), but they do want traceability for as long as they are legally obligated to have a piece of information. So, a pseudo-blockchain suits their purpose for the trackability of the references, not the documents themselves. For instance, it isn't required for the chain to hold all the image data, or videos, it just needs to hold the references, whilst other storage holds the content. This gives them the ability to track all documentation, as well as filter granularly based on an infinite number of search filters to find just what they want, or see it in the way that suits them.

For instance, https://hiveisbeautiful.com/ is a great site that visualizes Hive transactions that look like this.

Embedded into a single transaction are multiple reference points, so depending on what is required, only slices are used. In that bunch there, you can see some upvotes, some claims, some Splinterlands, some Hive-Engine etcetera. Each relevant interface will use a bit of that information for its usecase, whether it is a transfer or a submission of a team into a battle. It is just a document with random bits of information on it.

This is the future of business documentation.

A future where information floats about in a type of data soup with tags attached to it, and AI filter that information based on its programmed needs. Because everything is on a timeline, it will be able to increase the relevancy, know which is first and last, and ensure that the most correct information goes into the view given to the user. And, relevancy is far higher when there is some control over what kinds of information is in the system already. Since it is all coming from the same corporation, information trust is higher than if it is coming from random internet sources where there may be no known track record of who created it.

For years, we have already been moving in this direction, which is why the people who have grown up on mobile phones only don't have good folder structure methods and can struggle in corporate environments using them - which is most. And, there is a massive amount of human error in systems that rely on SOPs and naming conventions to ensure document integrity, because people just aren't consistent enough. Automation is the only way, and because it is also the one that will bring the most profits, it is the way it will go.

As simple as ledger logic is, it is going to fundamentally change the way businesses handle their information, because it aligns so well with the automation processes they want to employ. It is designed to be logical and have integrity, which is what is missing on the internet of information at the moment. However, given some time, the internet will start to reorder "itself" into a more logical structure that the AIs are able to better manage and rely on. This means that it will start to weed out low quality information in a process of "no confidence" voting mechanisms, where content that doesn't have strong enough references, are omitted from search requests.

This is a form of web of trust.

Hive account@blocktrades has been dabbling with webs of trust for a while now and at scale, it is going to need AI support to really make it useful, because the amount of relationships between individual pieces of information is very high, and then being able to consolidate it into something useful in a timely fashion takes a lot of processing. Humans can't do it, which is why we use heuristics to judge our world in order to think fast, and the AIs will do the same except at a much greater amount of information input, and subsequent usecases and view outputs we will demand of it. But, for an individual company, it doesn't take that much effort, because there is already narrow context and known rules that can be applied to the usecase, laws, industry etc.

Document management isn't something most people think about as they search the internet, or even when they have lost that report they need for their meeting in the morning. It just isn't sexy. But, all the information we consume digitally, is stored in a document of some kind somewhere, whether it be a news story, or a dataset. But, it is fundamental to the way we live our lives and the industry is evolving to run parallel to blockchains, even though they are still far behind and are yet to really understand what they are looking to do. And even as they chase the tech, they still don't see the application for blockchains.

Taraz
[ Gen1: Hive ]