Adding error messages or enabling data to the genome — science and technology

Add error information or data to obtain favorable genome — technology — original title: add some noise to protect privacy photograph: "natural" large genome database is essential for scientists to search for genetic variants associated with disease. However, for those who contributed to the DNA, this will bring privacy risks. A 2013 study shows that hackers can use the information available on the Internet to identify people from anonymous genomic data. To address these concerns, a system developed by Bonnie and Sean Simmons, a computer scientist at the Massachusetts Institute of Technology, takes advantage of what is known as differential privacy. It identifies the donor’s identity by adding a small amount of noise or random variation to the user’s query results. The researchers published their findings in the latest issue of the journal Cell systems. The system calculates the statistical data the researchers want, such as the probability that a genetic variation is associated with a particular disease, or the 5 genetic variants that are most relevant to the same disease. Then, it adds random mutations to the results, and returns a message with a slight error. For example, in the search for the first 5 genetic variants associated with a disease, the system may produce the first 4 genetic variants and the variation of the first sixth or seventh. The user does not know which query is more accurate, but still uses the information. It’s getting harder for people who want to figure out who’s behind the data. "When you add a little bit of noise to the system, in many ways, it’s not much different from the noise that comes with the data." "To some extent, it is still reliable," said Bradley, a computer expert at the Vanderbilt University in Tennessee, Malin." For decades, the U.S. Census Bureau and the Department of labor have been adding noise to their data in this way. As long as the database is large enough, containing from thousands or more people at the same time, the researchers observed problems limit its number of "privacy budget", then use the technique of data privacy will not be violated. Users will not be able to query hundreds or thousands of loci in a single genome. Databases protected by the technology can be searched immediately, and it is now possible to be allowed to call a database managed by various agencies, including the National Institutes of health, for months. Simmons and Berger said that even with noise, the system provides answers to questions that are still useful when asked for a number of specific questions. "It is primarily used to obtain data sets that may not be accessible by other means." Simmons introduction said. For example, if the research and analysis of a small data set shows that there were genetic variation associated with the disease, the system allows them to use a much wider scale and by other means without access to the data set confirmed this association. It also allows researchers to preview a data set in order to take time to fully access the application process相关的主题文章: