We've had requests for that behavior, not only due to sym links, but also
because there are many copies of the same document within an enterprise
network, and even more so when you're indexing large parts of the Internet.
(Imagine how many copies of FAQs are out there, for example.)
I think there are two main reasons it hasn't happened yet. One is just
that it hasn't risen high enough in the priority list, at least for those
of use who have commercial spider tools. For the most part, people are
still happy just to get a spider *working* in a convenient, maintainable
manner. Thus, most haven't even realized that sym links and duplicates are
an issue.
Second, the problem of duplicates is a slippery slope. It's probably not
hard to find 80 or 90 percent of them, but getting the last bunch, which
aren't *exact* duplicates, is going to have to be quite clever, since brute
force will probably be slow, at best.
Nick