How author disambiguation works on LT, for @ka_weaver

Basically:

1. LibraryThing doesn't have a single catalog like authority headings do—or rather pretend they do. LibraryThing's "global level" floats on sea of individual members' data, which is sacrosanct. So we're attempting to view order in chaos without actually changing the chaos.

Although your books are your own, the "global level" can be organized by any LibraryThing member. All of the literally millions of disambiguating edits (more than 2 million for work-combinations alone) were done by members.

The system is also in constant motion. Any member can change it at any time. New authors enter the system literally every minute or two.

2. LibraryThing allows you to combine "names." So, for example, Mark Twain and Samuel Clemens are the same. We figure out the most popular. Members can override that choice.

3. LibraryThing then allows you to split homonymous authors into "division" or "splits." Splits are not done by some external piece of data—like cataloging generally does with birth and death dates.

We aren't chasing ambulances. Instead, authors are divided according to the <i>works that they are composed of</i>. That is, Steve Martin (1) is not "Steve Martin 1945-" but the Steve Martin who wrote Shopgirl--and various other works. Dates and other information can, of course, be attached to Steve Martin (1), but the principle of separation is by works they wrote, not attributes external to that.

4. Today we introduced a clean-up feature that allows you to alias a particular author split into another author. So, example, John Crawford (1)--author of two books on Roman history--is the same fellow as John H. Crawford, who wrote a bunch more books on it. Rather than "correcting" the data--which is user data and besides is accurate to how the books themselves represent their authors--we allow the division to be combined into another author.

Reply · Report Post