Up until now, the techniques used to protect IP and prevent illegal use, like digital rights management DRM and technical protection measures TPM , also prevent legal or permitted uses of the copyrighted digital contents [ 3 , 4 ], by representing a strong limit in terms of freedom of information and expression of the user.
In [ 5 ], Lai and Graber point out the complexity to reach a fair balance among IP rights and freedom of expression and information. In particular, the authors compare the need of the IP owners and privacy and freedom of choice of the users. These classic digital rights protection techniques are based on hardware or firmware supports and proprietary encodings that prevent the user of making copies, reading unauthorized copies, or reproducing it on unauthorized supports.
The total prevention of copy through cryptography and dedicated supports, such as the content scrambling system for DVD protection [ 6 ], reduces the ability of sharing and distributing the creative content. In order to overcome the limits of the classic digital rights protection techniques and meet the various needs in IP protection field, different approaches are developed [ 7 ]. For instance, while steganography provides techniques to hide new information into the original digital content, cryptography produces an unreadable version of the document by applying a kind of permutation or substitution to the original information.
Watermarking is the most balanced technique for sharing not obfuscated information while preserving the copyright [ 8 , 9 ]. In particular, it ensures copyright protection by applying a mark to the original digital content, without showing such mark to the readers. Watermarking methods can be applied in innumerable contexts, such as identifying unauthorized users, establishing the authorship of a digital content, monitoring the broadcasting process, and distrusting a tampered digital content.
Up to an acceptable distortion, watermarking can be also adopted to protect dynamically generated contents from databases [ 10 ]. Watermarking an intellectual property allows the free sharing of a digital content, while binding the artifact with the original author. In this scenario, the author can extract and show the digital watermark as an irrefutable proof of authorship, avoiding costs and efforts of more elaborated and timestamped evidence.
At the same time, the watermark exclude the possibility of unintentional plagiarism, in the case when the malicious user appeal to the lack of originality of the work, that may have lead to the unrelated creation of the same or very similar content. Out of all digital content watermarking techniques, we focus on text watermarking. The reason behind our choice is that textual information represent one of the largest bunch of digital contents that people can daily share and explore online, for instance, online newspaper articles, manuals and guides, social media, and microblogging posts, to name a few.
Furthermore, text messages increase daily and are more often used for commerce, mobile banking, and government communications. In comparison with watermarking techniques for other digital contents, text watermarking is the most difficult task, presenting several challenges mainly because text is not noise-tolerant. In particular, a text watermarking algorithm must work with some additional constraints, as short-length message, a limited set of transformations in order to preserve readability and a restricted number of alternative syntactic and semantic permutations [ 11 ].
Essay: Digital watermarks
Another peculiarity of text in the context of unauthorized copy is that, unlike images, any meaningful excerpt, like a paragraph, could be copied, and it is difficult to predict which one. While it is true that in the case of images, some partial cropping is often applied before unauthorized re-sharing, the unauthorized copy will still account for an important percentage of the original image with some exceptions, for example in aerial photography. Instead, in the context of text, it is very common to copy only few sentences, which may not be subsequent in the original document and may account for a very small percentage of it e.
This can be seen as a special case of a deletion attack, in which most of the watermarked document is deleted and only some paragraphs or sentences are left, motivating the need of a fine-grain approach able to embed the watermark in as many sub-portions of text as possible. The concept of a fine-grain protection of text content is well known in copyright law: it is common to claim intellectual property rights on small portions of larger works, and there is a vast literature involving several trials and studies [ 12 ] trying to define at which fine-grain level an intellectual work can be copyrighted.
This known scenario however has not been addressed so far in the text watermarking literature. It also makes the text length constraint even stricter, because the watermark has to be embedded in smaller parts of the text content. Additional issues arise if we must be able to verify a copied text that is straddling partially two watermarked portions. In this paper, we propose a structural text watermarking method for intellectual property protection 1.
The method protects the whole document as well as smaller excerpt of it, up to a minimum size of excerpt that depends on the specific characters of the text. Nevertheless, it is fair with respect to the concerns regarding communicative freedom and privacy of the users, without altering the content of the text or embedding explicit author-related data. More precisely, the proposed method is invisible and content-preserving and belongs to the fragile and non-blind classes.
In practice, it is able to embed a password-based watermark without altering the content and preserving the length, ensuring data protection against the copy and paste of even small excerpt of text. The embedding process consists of two phases.
- Digital watermarks - Business essays - Essay Sauce Free Student Essay Examples.
- Fractals and Chaos: An Illustrated Course;
- War and Delusion: A Critical Examination.
- The Current Account and Foreign Debt.
- Self-recovery scheme for audio restoration using auditory masking.
- Semiconductor Technologies in the Era of Electronics.
In the first one, the watermark is generated by applying a hash function that combines the user author password and the structural characteristics of the text. In the second phase, that is the core of our methodology, the watermark is embedded into the original text by exploiting homoglyph characters. Homoglyph characters, as symbols, numbers, and letters, look very similar on the screen and in print; nevertheless, their low-level encoding is completely different. More precisely, the Unicode confusable characters, namely the homoglyph characters, are listed by Unicode Consortium and look confusingly similar from each others [ 13 ].
In practice, we replace a subset of characters of the original text with an indistinguishable latin homoglyph symbol, with a substitution process driven by the watermark bits sequence. The password allows to verify the authorship since only the actual author of the text can correctly regenerate the watermark. It leaves visually indistinguishable original text, in other words, the watermark is not noticeable by the user.
It can be continuously applied to small excerpts of a longer text, protecting a document at a fine-grain level against the copy and paste of text portions. It allows to cryptographically bind each text excerpt to the original source document.
The visually indistinguishable features strongly depend on the font used. However, we will show in the evaluation section how the homoglyph characters allow to cover the most used font families. The length preservation feature is quite complex to ensure when the algorithm operates on short texts.
The proposed method is able to embed a watermark while preserving the text length with very short texts theoretically a minimum of 22 symbols. The minimum length depends on the text content, as only a subset of characters can be substituted to embed the data. In order to establish the minimum length requirement on real text examples, we provide the results of an extensive experiments on 1.
The results show that, on average, characters are sufficient to embed the watermark preserving the length and visible aspects of the original text. Despite paragraphs can be very short or having few confusable symbols that can be replaced, the method allows to watermark very short excerpt, shorter than a single paragraph of New York Times articles for the The combination of these two features allows to use our text watermarking method in several new contexts, for instance, word and pdf documents, online newspaper articles, short message communications, e-mails, microblogging platforms, and social networks posts.
The fine-grain watermarking method of the proposed approach allows for the first time to protect small excerpt of text, by repeatedly embedding the watermark across the document. In order to evaluate the fine-grain property of the method and compare it with current methods, we propose also a novel measure for the robustness to partial copy and paste.
The rest of the paper is organized as follows. In Section 2 , we provide a small background in watermarking, in order to classify the methods and show the features usually required to a watermarking algorithm. In Section 3 , we review the literature works related to text watermarking methods. In Section 4 , we describe our text watermarking method, including watermark generation, embedding, extraction, and authorship verification. We discuss the evaluations of our method in Section 5.
Some concluding remarks are made in Section 6. In this section, we provide a small background in watermarking methods. This is important as it will help in understanding the reasons behind the design of our method. Readable or detectable —The watermarking is readable if the user can clearly read it. It is instead detectable if a detection function can be used to check if a watermark exists or not, but it cannot be read. Visible or invisible —A visible watermarking is visually perceptible by the user. Contrary, the watermarking is invisible if it is hidden in the original digital content and it does not noticeable by the user.
A visible watermark may be not readable, that is, a user can visually detect it but cannot read its content.
Blind or non-blind —If the original digital content is not needed in the extraction process, the watermarking is blind. Otherwise, the watermarking belongs to non-blind category. Simple or multiple —If a watermark can be applied only once the watermarking is simple. Otherwise, a multiple watermarking can be embedded more than one time without affecting the whole process. Fragile, semi-fragile, and robust —A fragile watermark is detectable and can be altered or erased; thus, it is used for integrity authentication. On the flip side, a robust watermark is detectable and not erasable and it is most suitable for copyright protection.
A semi-fragile watermarking is suited for content authentication. In [ 16 ], the researchers identify several features usually required to a watermarking method.
Digital watermarks - Business essays - Essay Sauce Free Student Essay Examples
Verifiability represents the ability to irrefutably prove the ownership of the digital content. Data payload represents the maximum number of bits of extra information that can be embedded in the original digital content. Robustness represents the ability to resist to processing operations and attacks, as security is the capacity to not be altered or removed without having full knowledge of the watermark or the embedding process. Finally, computational cost is the cost required in embedding and extraction process. Zero-watermarking techniques —Instead of watermarking the text, some characterizing features of the text are stored on a third-party authority server, such as an Intellectual Property Rights IPR database.
Image-based techniques —Firstly, the text is transformed into an image, then the watermark is embedded into the image. Obviously, this approach modifies the nature of the original document; in other words, it cannot be considered a pure text watermarking method. However, it has some interesting features, as length preservation and language independent. Syntactic techniques —These methods transform the language-depending structures in order to hide the watermark. Typically, the sentences have different language-depending structures that make the process easier.
Semantic techniques —These methods use verbs, nouns, prepositions, and even spelling and grammar rules to permute the contents and embed the watermark. Structural techniques —These methods exploit double letter occurrences, word shift and line shift encoding, and Unicode standard to embed the watermark.
They are one of the most recent methodologies with which the original text is not altered. The text watermarking approaches with actual watermark embedding are usually classified into three main categories [ 15 , 17 ]: image-based, syntactic, and semantic. In this categorization, the zero-watermarking approaches are often not considered as no watermark is actually applied; however, this alternative solution is getting more attention lately and it is important to understand the difference between the zero-watermarking and content-preserving methods.
A recent survey [ 18 ] considers instead the structural, linguistic, and statistics as the three main categories. After highlighting the core ideas, advantages, and disadvantages of the mentioned approaches, we will focus on the structural methods. Unicode-based methods such as the proposed method belong to this latter class.
- Philoponus : on Aristotle physics 1.1-3.
- The Great Lead Water Pipe Disaster (2006)(en)(318s).