|
|
|
 |
Search published articles |
 |
|
Showing 2 results for Fellegi-Sunter Model
Dr Zahra Rezaei Ghahroodi, Zhina Aghamohamadi, Volume 16, Issue 1 (9-2022)
Abstract
With the advent of big data in the last two decades, in order to exploit and use this type of data, the need to integrate databases for building a stronger evidence base for policy and service development is felt more than ever. Therefore, familiarity with the methodology of data linkage as one of the methods of data integration and the use of machine learning methods to facilitate the process of recording records is essential. In this paper, in addition to introducing the record linkage process and some related methods, machine learning algorithms are required to increase the speed of database integration, reduce costs and improve record linkage performance. In this paper, two databases of the Statistical Center of Iran and Social Security Organization are linked.
Alireza Movaffaghi Ardestani, Dr. Zahra Rezaei Ghahroodi, Volume 17, Issue 1 (9-2023)
Abstract
Today, with the increasing access to administrative databases and the high volume of data registered in organizations, the traditional methods of data collection and analysis are not effective due to the response burden. Accordingly, the transition from traditional survey methods to modern methods of data collection and analysis with the register-based statistics approach has received more and more attention from statistical data analysts. In register-based methods, it is especially important to create an integrated database by linking database records of different organizations. Many record linkage algorithms have been developed using the Fellegi and Sunter model. The Fellegi-Sunter model does not leverage information contained in field values and does not care about specific possible values of a string variable (more common and less common values). In this article, a method that can be able to infuse these differences in specific possible values of a string variable in the Fellegi-Sunter model is presented. On the other, the model proposed by Fellegi-Sunter, as well as the method for adjusting the matching weights in the frequency-based record linkage, binding in this paper, are based on the assumption of conditional independence. In some applications of record linkage, this assumption is not met in agreement or disagreement of common variables which are used for matching. One solution used in such a case is to use log-linear model which allows interactions between matching variables in the model.
In this article, we deal with two generalizations of Fellegi-Sunter model, one with the correction of the matching weights and the other with using a log-linear model with interactions in absence of conditional independence. The proposed methods are implemented on labour force data set of Statistical Centre of Iran using R.
|
|
|
|
|
|
|