Building a great search engine is still a hard research problem but building a search engine good enough to solve that problem is within the skills of any good fresh CS grad with a month to work on it.
If the Library of Congress is registering copyrights as proposed, then LOC has the database and the search terms will just be the work you propose to copy. It can be a simple service where you upload a few pages of text and LOC responds in a few seconds with the name of anyone who has registered copyright.