This project focused on the importance of maintaining accurate healthcare provider directories. Inaccuracies in such directories can lead to decreased patient satisfaction and potential fines from regulatory bodies. Existing research has highlighted Natural Language Processing (NLP) and data science techniques as potential solutions to improve accuracy, and this project proposes to build on this by implementing a scoring algorithm. Four datasets were used for analysis: NPPES, CMS, Healthgrades, and Zocdoc. Methods proposed included using Google Places API or locality sensitive hashing for string address matching. Based on the consistencies of the string address matching, we can then assign the scores to each address and rank our confidence level on the accuracy of the data. Initial manual comparison of data revealed several issues, including discrepancies between mailing and practice addresses, inconsistencies in names, and the handling of professionals with multiple practice locations. Note, we are unable to implement the whole steps in our proposed methods, i.e the step for scoring and weighting addresses. Future work will include implementing and testing the proposed scoring model and further analyzing the reasons behind inconsistencies.
Healthcare provider directories serve to keep track of important information needed for patient care access, transactional exchange, insurance coverage, and other actions that require up-to-date records. Provider information can change quickly as people relocate or rebrand. A review done by Center for Medicare & Medicaid Services (CMS) on 54 Medicare Advantage Organizations (MAOs) found that about 45.1% provider directory locations listed were inaccurate [1]. If databases are not updated, patient satisfaction may decrease as their search for care becomes increasingly difficult, and CMS may fine organizations for using false information. Therefore, an accurate provider directory is important for increasing efficiency and reducing costs so clients can easily find providers.
About Availity This semester we were tasked with a specific problem that concerned Availity. Availity is one of the biggest healthcare clearinghouses, offering multiple advanced solutions to different providers and vendors nationwide. Solutions include free portal connection to payers, submission of electronic transactions, revenue cycle optimization, and patient access management. The main task that our team focused on was the exchange of transactions between healthcare providers and patient payers. Through Availity, providers can simply access a platform and get connected to a network of real-time information exchange with payers. To do this, Availity must rely on up-to-date provider directories. Since Availity cannot change the databases, inconsistency in directories becomes a problem that needs to be addressed.
Datasets Used Much of our time this semester consisted of finding datasets we wanted to verify the practice location of the provider across. We ultimately narrowed down to analyzing these four datasets: NPPES, CMS, Healthgrades, and Zocdoc. For the former two datasets, we were able to download csv files located on their websites, while for the latter two datasets, we used web scraping to obtain the data. An overview of each is given below.
- NPPES The NPPES database is maintained by the Centers for Medicare and Medicaid Services (CMS), which is a federal agency within the U.S. Department of Health and Human Services. The system assigns a unique National Provider Identifier (NPI) to each healthcare provider or organization.
- CMS CMS is the federal agency that runs the Medicare, Medicaid, and Children's Health Insurance Programs, and the federally facilitated Marketplace.
- Healthgrades Healthgrades is an online platform which offers a comprehensive database of healthcare professionals, facilities, and patient reviews to help individuals make informed decisions about their healthcare.
- Zocdoc Zocdoc is an online platform that allows users to find and schedule appointments with healthcare providers in their area.
This project underscores the significance of maintaining accurate provider directories in healthcare, a process that is notably challenging due to providers' changing circumstances, inconsistencies in reporting, and differences in data collection methods across databases. Through our initial data analysis and exploration of methods such as the use of Google Places API or locality sensitive hashing for string matching, we have made strides towards improving data integrity in this crucial sector. Our preliminary manual matching of records highlighted critical areas for improvement, including differentiating between mailing and practice addresses, managing inconsistencies in provider names, and accounting for professionals with multiple practice locations. While we were unable to fully implement our proposed scoring model due to time constraints, the research and preliminary work conducted provide a promising foundation for future exploration. This ongoing research will not only address the pressing need for accuracy in healthcare provider directories but also contribute to improved patient satisfaction, regulatory compliance, and overall healthcare service efficiency. Going forward, we hope to implement the scoring model, conduct additional rounds of data matching, and perform a deeper analysis of the reasons behind discrepancies in provider location data. Additionally, after matching multiple rounds, we want to sort the records into buckets based on characteristics of the providers to identify the reason behind the occurrence of inconsistencies in practice locations. Whether that be geographic region, taxonomy of the healthcare professional, or even the name of the professional, we believe that further analysis of provider directories could suggest which factors are influential. For address matching, there is a possibility of using locality sensitive hashing to be an alternative to using Google Places API. An actual implementation of the LSH might be helpful to compare. Then, the next step would be to score and weighting the provider address. The computation formula to calculate the score and how to assign the weights will also require some further research.