Blockchain is a distributed ledger technology used to store transactions and maintain a shared, immutable, and secure database. Cryptography is employed to secure and verify transactions as well as to control the creation of new blocks in the chain. The distinguishing feature of blockchain is that it is decentralized, meaning it does not rely on a central authority to operate. It is primarily used in cryptocurrency applications but also in the recording of documents, data, and various types of information for which certification and authentication methods are applied.
In Figure 1 below, one of the fundamental principles of Blockchain—the actual chain of blocks—is illustrated. Each block is a logical structure that stores information such as transactions, documents, or data (depending on the purpose). Blocks are assigned an identifier called a “hash,” which is in fact an alphanumeric string generated by applying an encryption algorithm (e.g., SHA-256 or MD5) to a given input text. For example, this could be the complete date or timestamp of the block’s registration, the first and last ten words of the text to be recorded, or any other element intended for registration. The hash serves as the block’s unique key, with no two hashes being identical within the entire system. It also functions as a security mechanism, ensuring the integrity of the data stored within the block. This is because any modification made to the block’s content after its formation would alter the resulting hash value. Consequently, the original hash would no longer match, thereby enabling detection of any unauthorized changes.
Figure 1. Basic schematic of the blockchain. It illustrates how blocks are linked sequentially through their hashes, forming a fixed chain of recorded data while remaining open to the addition of new blocks.
To create the blockchain, each block must reference the hash of the previous block, thereby linking them successively. Obviously, any modification to a block will cause a break in the chain, which can be easily detected. This imparts to the Blockchain properties of security and data integrity that are highly suitable for the storage and preservation of information.
On the other hand, it is essential to understand how the information in the blockchain is stored. It is a distributed network, conceptually very similar to peer-to-peer (P2P) networks. This is a network of nodes that share identical records of the blockchain. In other words, each node holds exact copies of the entire chain as it is generated. Thus, any attempt to modify or alter the chain or its blocks is immediately detected and corrected, since copies exist across all nodes. This means that to compromise the information, an attacker would need to simultaneously attack all nodes in the network and alter the blocks in unison. This makes the security of the Blockchain even greater—not only due to the volume or scale of static and dynamic nodes, but also because of the system’s operational properties, which continuously verify the consistency of hashes across all nodes in the network.
On the other hand, to understand Blockchain technology, one must address mining software. This is a computer program designed to reproduce a complex mathematical algorithm that generates a valid hash for the block intended to be added to the chain. This task can vary in difficulty depending on the number of nodes in the network, ensuring a consistent cadence of block publication within a reasonable time frame (typically a few minutes). The first miner (node running the mining software) to successfully compute the correct hash usually receives a reward in the form of cryptocurrency, thereby limiting the introduction of liquidity into the system and (in theory) decoupling it from speculation in cryptocurrency issuance. Once the hash and the responsible miner are recognized and validated, the information propagates across the entire network of nodes, consolidating the new block. As the reader may imagine, blockchain chains are intrinsically linked to cryptocurrencies, since their design implies a computational cost for recording data, transactions, or documents on the network, necessitating incentives to sustain the node network and enabling the creation of a self-managed economic ecosystem.
Limitations
As with any complex system, there are inherent limitations, such as scalability, operational cost, transaction processing speed, regulation when the blockchain is oriented toward cryptocurrency use, and its interoperability with other currency systems.
- Scalability. One of the problems with blockchains is their scalability, as the network of nodes, data, and transactions to be processed expands, the amount of resources required to compute hashes and maintain constant verification of blocks becomes increasingly greater.
- Cost. Mining on a blockchain and data storage can be expensive due to the need for energy and computational resources, which is not always efficient or profitable at a small scale.
- Speed. Transactions on a blockchain can be slow due to the need for nodes in the network to verify hashes. However, this can also be controlled depending on the difficulty level of the hashing algorithm for miners. In other words, reducing the difficulty of computing the correct hash increases the processing speed of blocks, accelerating chain growth and reducing consolidation time. However, miner rewards on the network are reduced.
However, it is possible to create a blockchain without needing to link its existence to a cryptocurrency; for this purpose, a stable node network must be established that complies with the security, identification, and immutability procedures already described. This could be the most appropriate case for the majority of uses potentially required from the perspective of library and information science.
File Storage
The storage of textual or alphanumeric content via hash encryption is the typical procedure for storing information on the blockchain. However, this method is highly limited when it comes to storing larger files in office formats, such as PDF files and all types of multimedia content. In these cases, two options are possible: a) Encoding the files into an encoding or encryption scheme (e.g., Base64), in which case a very long character string is obtained; or b) Storing the file’s hash based on its name, timestamps, author, metadata, specific textual markers at designated positions within the document, the permanent storage link, etc. This latter option allows for the generation of significantly smaller hashes without compromising the ability to identify the document and verify its integrity. However, the actual file itself would not be contained within the block; instead, it would require a separate storage repository, referenced by its own hash. In this way, if an attempt is made to duplicate the document and store it in a different repository, the hash will not match, thereby revealing any change in the document’s location or integrity. Thus, documentation referenced from the blockchain can be securely linked to servers or networks distinct from the blockchain itself. Network nodes can therefore verify the integrity of the file by comparing the hash stored in the blocks with the actual file and its location.
Hash Encoding
Hash encoding is a straightforward task, as most programming languages provide functions that automate the process. For instance, in PHP, there is the "hash" function, which supports major encryption methods (sha, md, ripemd, tiger, crc, gost, snefru, fnv, haval), compatible with any blockchain intended for development.
In the following example, a PHP function named "createBlockHash()" is shown, which takes the variables $index (block number), $timestamp (timestamp), $previousHash (hash of the previous block), $data (may contain transaction metadata, text, or content to be recorded), and $permalink (permanent link to the content) to generate a SHA-256-encoded block hash. However, other relevant or representative data can be incorporated into the hash configuration, such as the author's name, email address, specific words located at certain positions in the text, the author's public keys, etc.
Table 1. Example of Hash encoding using a program written in PHP
Subsequently, the data for the various variables are entered and the hash calculation function is executed, resulting in the $hash variable of the block intended to be added to the chain.
Example of a Blockchain in XML Format
One way to understand how blocks are linked is through a blockchain in XML format. In the following table, the basic block tags are shown: <index>, which indicates the block number; <timestamp>, the timestamp of the block's registration; <data>, which contains the transaction data; <document_text>, which contains the text of the document to be registered; <previous_hash>, which is the hash key of the previous block; and <hash>, which contains the hash key of the current block.
Table 2. In this example, it is verified how block 2 takes the hash of block 1, thereby forming a link in the chain.
PHP Program to Create Blocks
In addition to programs for verifying blocks and confirming transactions on the chain, an essential program is the aggregation of new blocks. This can be supported by a database, as suggested in the following example. In this case, the hash of the previous block is queried before adding the next one.
Table 3. Sample program for generating blocks in a blockchain
To this code, the function for verifying the integrity of the blockchain should be added, before, during, and after the addition of a new block, propagating the data of the new block across the entire network of nodes while simultaneously obtaining confirmation from all nodes in the network that the blockchain continues to maintain the integrity and immutability of hashes and contents. In other words, the verification effort is not negligible in this type of system designed to provide maximum security.
Blockchain in Documentation
As explained, blockchain technology offers advantages that can be leveraged in Documentation to ensure the security of information, its immutability, or protection against any form of alteration, promoting the implementation of anti-fraud information systems, digital libraries, archival systems, and secure information processing. Some relevant applications could include the following:
- a) Information retrieval. By creating a distributed record of information, it is possible to ensure that the content available through search engines has not been altered or modified, thereby enabling the presentation of different versions of the web stored in the cache, each associated with distinct hashes that verify their authenticity. It should be noted that one of the key problems of the Web is the ease with which content can be changed, necessitating special security measures to guarantee its integrity. This allows registered information sources to be easily traced, making the historical record of information and documents on the blockchain transparent. This enables the recovery of historical, relevant, or important documents, irrespective of the use of permalinks.
Conclusions
- Blockchain technology can serve as an effective solution for archival applications, digital libraries, scientific repositories, and even professional collaboration social networks, due to the properties of immutability, security, transparency, and traceability of information. Content is recorded in blocks that cannot be altered or modified once finalized. This represents an advantage in preventing fraud or unauthorized post-editing of content.
- Another beneficial property of blockchain is its decentralized management across networks of nodes, meaning that all nodes contain copies of the transactions carried out, thereby ensuring greater security, as any attempt to alter or modify the content will be detected by inconsistencies in the hashes by the other nodes in the network, prompting their restoration and backup. By guaranteeing the integrity of information, datasets, and scientific evidence—that is, scientific documentation—it can be stored under appropriate security conditions.
- User access control to applications and within documentary chains and processes is another key aspect, amenable to automation. The authenticity of user credentials can be verified. This also transforms it into a payment system linked to cryptocurrencies, which could serve as a solution for creating a credit or reward system for authors and content creators, providing a remuneration method that safeguards intellectual property and ensures fair compensation.
Bibliography
- Abid, H. (2021). Uses of blockchain technologies in library services. Library Hi Tech News, 38(8), 9-11. https://doi.org/10.1108/LHTN-08-2020-0079
- Asadnia, A., CheshmehSohrabi, M., Shabani, A., Asemi, A., & Demneh, M. T. (2022). Future of information retrieval systems and the role of library and information science experts in their development. Journal of Librarianship and Information Science, 09610006211067537. https://doi.org/10.1177/09610006211067537
- Bashir, F., & Warraich, N. F. (2022). Prospects of Semantic Web and Blockchain Technologies in Libraries. In Blockchain and Deep Learning: Future Trends and Enabling Technologies (pp. 31-45). Cham: Springer International Publishing. https://doi.org/10.1007/978-3-030-95419-2_2