Uniquely identifying people in an electronic database can be a very hard problem. I ran into this problem on my first project with TCG and it has come up a number of times since then. Given a large enough sample size, you can find multiple people with the same name sharing the same birthdate. This can give you false matches where you think two records are actually about the same person, but really are not. You can try to add additional information into your algorithm, but as the pool of data grows larger you will continue to find incorrect matches. I have even read about people having the same name, birthday and Social Security Number…which is supposed to be a unique government identifier in the United States.
Nature has an article online describing the Open Researcher and Contributor ID (ORCID) system to provide a unique identifier for scholarly authors. It will help people identify exactly which person wrote an article and allow you to reliably find all their other publications without getting incorrect matches. I don’t know if it is in the plan now, but I hope it can also be used to simplify data entry and reduce errors there.
Of course, only time will tell if ORCID becomes a success. As an IT professional working with grant management and scentific collaboration, I hope this works out. It is a real problem and will only become harder to deal with as the size of our electronic databases and human population grow.