Summer internships 2016

Antiplagiat Research invites to join our summer internship program this year.

Application deadline - 01/07/2016.

See also Academic Cooperation.

Search for unnatural and fiction texts in eLIBRARY.RU

We conducted research to findout if there are machine-generated and non-scientific papers in a collection of our partner eLIBRARY.RU, the largest scientific digital library in Russian. We looked for papers like Rooter, machine -translated papers and texts produced with tools like Yandex Summaries. We also searched for fiction texts in scientific journals included in RINC and RSCI citation indexes.

We presented our results at the SCIENCE ONLINE XX conference in May 2016.

The slides are available here.

Participation in the Dialogue Conference

1-4 June 2016, Moscow, Russia

Our team is presenting a study on machine-translated text detection at the Dialogue 2016 Conference (Moscow, RSUH, June '16). We come up with a solution focusing on the detection of phrase salad, which is sentences lacking semantical and syntactical correctness. We develop a classifier and propose several novel features to distinguish between machine-translated sentences and authentic ones.

Dialogue is the largest international conference on computational linguistics and intellectual technologies in Russia and a world leading forum on problems of computational analysis of the Russian language. The conference covers a broad range of problems and practical tasks in computational linguistics, from linguistic research to the process solution in the end product. Particular attention is focused on modeling, analysis and creation of computational resources for the Russian language.

NLDB 2016

22-24 June 2016, at the University of Salford, MediaCityUK Campus

A survey of existing approaches in computer-generated text detection will be presented on the 21st International Conference on Applications of Natural Language to Information Systems, June '16, at the University of Salford, MediaCityUK Campus. Computer-generated texts appear on the Web more and more frequently, and the task of detecting them becomes important, as they are posted as written by humans. The survey overviews several ways of how these texts are revealed.

Since 1995, the NLDB conference aims at bringing together researchers, industrials and potential users interested in various applications of Natural Language in the Database and Information Systems field. The topics covered are theoretical aspects, algorithms, applications, architectures for applied and integrated NLP, resources for applied NLP, and other aspects of NLP.

The event is the first visit of our team to the UK!

Talks at the AINL-ISMW FRUCT Conference

Two our papers have been accepted for the conference as full papers.

We had the talks at the Plagiarism Detection section. Here are the slides

A Monolingual Approach to Detection of Text Reuse in Russian-English Collection

Discovering Text Reuse in Large Collections of Documents: a Study of Theses in History Sciences

Antiplagiat Research to present a tutorial at the AINL-ISMW FRUCT Conference

Antiplagiat Research runs a tutorial on Plagiarism detection on the AINL-ISMW FRUCT conference (

The tutorial will cover the following topics:

Intrinsic plagiarism detection: methods and approaches

Lecturers: Alexey Romanov, Daria Beresneva, Antiplagiat Research

Intrinsic plagiarism detection is a problem of finding reused text when no reference corpus is given. As a result you cannot compare text from the document being checked with other texts to find coincidences. In this talk we will present trending methods to intrinsic plagiarism detection using machine learning approach. Discuss performance of the methods and results to date.


Detection of Text Re-Use and Plagiarism External Approach

Lecturer: Alberto Barron Cedeno, Scientist at QCRI

The best evidence to support a case of text re-use ---plagiarism if no proper citation is provided---, is to show a chunk of text together with the source it was borrowed from. Given a suspicious document, external plagiarism detection consists precisely of retrieving and spotting potential cases of re-used text together with their claimed source.

This talk kicks off with some basic information retrieval and natural language processing concepts. Later on, the most representive models for external plagiarism detection are discussed. PAN, probably the most important initiative in the research on plagiarism detection, in then overviewed. Finally, directions to start on working on this topic are provided and proposals are made towards pushing the state of the art in the field.


How to detect deception through stylometric analyses

Lecturer: Tommaso Fornaciari, Italian National Police

The tutorial will follow step by step the a research activity in the field of deception detection. The dataset employed is DeCour - DEception in COURts - a corpus constituted by transcripts of hearings held in four Italian courts. The case which will be shown is a typical example of text classification carried out thorough stylometric techniques. Here the task is to train models to distinguish false from truthful statements, however the methodological approach is pretty similar to those applied in computational linguistics for other forensic tasks, such as author profiling, author attribution and plagiarism analysis. The process will be examined from the data collection, through the preprocessing and the feature selection. In the end, the data analysis and the results will be discussed, taking into account possible perspectives for future researches.


Computer-generated text detection

Lecturer: Rita Kuznetsova, Antiplagiat Research

Researchers made computers smart enough to generate well-looking texts and even scientific papers, Now there is a problem that these generated papers get published or defended as part of diploma or thesis. In this lecture we will discuss what are these texts, how they are created and contemporary approaches to detecting such texts. Several methods will be covered in detail, results presented.

Systems for revealing plagiarism

Lecturers: Andrey Ivahnenko, Alexey Romanov, Antiplagiat Research

In this lecture we will discuss existing systems and tools that detect text reuse and machine generated texts. Overall introduction will be given to Antiplagiat Software as a state of the art in exact extrinsic plagiarism engine. We sill show how can it be applied to detect text reuse in separate documents as well as a whole text corpus – a feature used for deep text analysis for reuse. Several tools for producing and detecting machine-generated scientific papers will be discusses, including SciGen and SciDetect, and other tools that use methods discussed earlier in the tutorial.

Slides 1

Slides 2


See you there!

SCIENCE ONLINE + Antiplagiat Conference

Antiplagiat Research and eLIBRARY.RU, a Russian leadning scientific digital library, hold an annual Scientific and Research Conference «Science Online + Anti-Plagiat 2015» on November 14-21, Egypt.

This year the conference unifies The 20-th International Scientific and Research Conference «SCIENCE ONLINE: Electronic Information Resources for Science and Education" and The 1-st Scientific and Research "Text Similarity Detection - 2015". The main topic of the forum will be comprehensive analysis of the quality of scientific information using various assessment tools.

The main topics of the conference include:

    • RSCI in the Web of Science system: preliminary results of the integration of the Russian Science Citation Index in the international information space;
    • Methodological issues in bibliometric and expert review of scientific journals, identifying fraudulent schemes of artificial increase of bibliometric indicators;
    • "SCIENCE INDEX for publishers": a complex solution for scientific editorials;
    • Mapping and visualization of scientific information to work out a strategy of a science, develop scientific partnership, look for prospective lines of developement;
    • Semantic text analysis: methods of evaluating essential aspects of text;
    • Algorithms and technologies, methods and tools for detection of text reuse, search and word processing;
    • The methods of using technical means to detect text reuse: the specificity of detection in educational institution , research organizations and large companies;
    • Legal and ethical problems of detecting test reuse in student qualification works and scientific research;
    • Project "Science Archive": quality evaluation of educational and scientific works, publication of qualification works by Higher Educational Institutions, topic search.


Antiplagiat Research at RUSSIR'2015

On August Research departement of Anti-Plagiat JSC took part in The 9th Russian Summer School in Information Retrieval (RuSSIR 2015) in poster session. The poster's topic was "Explicit semantic analysis for cross-language retrieval in case of Russian-English translation".  In this paper proposed a method of cross-lingual text similarity using Explicit semantic analysis on a small collection of Russian and English documents.


Results on historical dissertations presented at RCDL'2014

We investigated graphs of text reuse cases in scientific degree theses in history sciences (07.xx.xx of Russian Higher Attestation Committee topic codes). Using algorithmic and statistical methods we discovered groups of highly connected theses with large amount of text reuse between them. In addition we located works compiled from several other theses and point out sources of reuse.

Antiplagiat Research presented results at the Russian Conference on Digital Libraries'2014, Oct. 2014, Dubna, Moscow Region.

The paper Structures of text paraphrasing and plagiarism in dissertations on historical sciences (in Russian) is also available online at CEUR-WS website.

Here are the slides.

