Not only text - Image reuse detection

Scholars, teachers, and editors have been accustomed for a long time to the fact that the plagiarism problem concerns mostly cases of improper reuse of text. Those really are the most widespread cases, but this is only the tip of the iceberg of plagiarism. Improper reuse can occur not only with texts but also with data, tables, images, and even ideas.
While plagiarized ideas can barely be subjected to computer-based detection so far, improper use of other elements of research can be defined. A modern, complex solution for plagiarism detection must address at least some of these cases. The most sought-after program tools are the ones capable of detecting image reuse.

In modern science, images are the main confirming element in the article. If the readers don’t consider studying supplementary materials (if they even exist), they trust photos, graphical models, and data visualizations to be sure that the proper research has indeed been performed. Western blot images convince the readers that the experiment was indeed undertaken and can lead to the assumption highlighted in the text; microphotographs prove that the researchers have really seen the described cells in the microscope; and plots give the idea of data volume and method of analysis that can be intuitively compared with the text.

In “Sidereus Nuncius” by Galileo and other papers of that era, images were rather an illustrative element. Even “animalculi”, drawn by Antonie van Leeuwenhoek, gave rather an idea of what the newly discovered creatures could look like. Before the invention of photography and modern methods of data analysis, images were just an extension of the text. But now images are necessary to show readers the accuracy and reliability of conclusions. But what if this tool is improperly misused?
This is an astonishingly frequent situation. Elisabeth M. Bik (Stanford, California, USA) and her colleagues have been analyzing image manipulations in scientific papers for a long time, and their results could sow distrust of science in everyone who gets acquainted with them. The results show that image manipulations can be detected in at least 3.8% of all papers and are often represented by duplications.

While checking the papers, the Advacheck system has already found cases where the same image was stated in different scientific articles as a magnesium composite, in another as a zinc composite, and in a third as a sodium alginate composite. Another example is when the same image appears in several scientific papers as a CT scan of a boy and an adult female.

Universities and research institutes need programs that can ensure image integrity and help them avoid publishing papers with image misconduct, which, in turn, can lead to retraction and undesirable publicity. Moreover, the improper reuse of images leads to a legal problem. In contrast to text citation, which can be considered “fair use” given the appropriate reference, reusing an image requires explicit written permission from the copyright holder. Not only the author of the research paper but also the university and the publisher can be held liable for publishing pictures without said permission. Programs that can detect “image plagiarism” or misuse of images are desperately needed.
Nowadays, almost all search engines, powering the world’s search giants like Google and Bing, are capable of searching for images that are similar to each other. But they are practically useless at detecting plagiarized images. These kinds of solutions are far from ideal when it comes to finding exact matches. And that is exactly the challenge when we are talking about image reuse detection. Moreover, web search engines can only search images from a database of previously indexed image files. This means that search robots (also known as “spiders”) constantly scrape the web pages looking for image files in common formats such as “.JPG, *.PNG, or *.WEBP and load them into their search caches if technically possible.

A typical web search service looks for conventionally formatted images already present in its databases. Meanwhile, scientific articles are often published as PDFs, and they cannot be fully indexed by “search spiders.” Thus, if a Google image search finds nothing, it is helpless to detect improper reuse.

Advacheck — our program solution — has its own search spiders, but they index full-text scientific publications databases. The key difference from conventional search engines is that Advacheck downloads full-text PDF files and extracts images from them. Then, if in the examined text it finds a picture that is similar to the one in the database, it provides a link to the source article of the original image. Users can follow the link and verify that the images are identical (or very similar) directly from the program.

The second feature of Advacheck image search is that it is specific to exact matches and doesn’t find pictures with a similar cell or even a similar color grade, like Google does. To render it capable of finding image manipulations within these strict boundaries, we have programmed it to search for specific types of transformations that can be introduced by unfair research and paper mills:

rotation;
flipping (horizontal or vertical);
cropping;
color balance change.

This set of features makes Advacheck powerful in finding image reuse. It goes without saying that the program finds only a fact of reuse, and the operating human must then decide whether this case of reuse was legitimate or illegitimate. However, the technical possibility of listing all reused facts (if any) can be highly demanded by all people and institutions whose businesses and activities rely on publishing — even beyond science. A current implementation of Advacheck’s image search engine has already been established in biology and medicine but can also be useful in architecture, engineering, and science consulting — an emerging area of science-related business.

Just at the beginning of the 20th century, science publishing was predominantly text-based, but now images are gaining a more and more crucial role in science communication. Modern science goes visual, so the integrity and uniqueness of images are emerging topics. They have already given birth to plenty of technical solutions. Advacheck keeps pace with new demands and addresses new challenges.

Not only text – Image reuse detection

Experience Advacheck with a 14-day FREE trial!