It doesn't know. I imagine that some robots make assumptions and equate
index.html or default.html with a resource ending in '/', but there's
nothing in the HTTP spec that guarantees it. The robots I write don't assume
this, nor do most of the other HTTP related tools I use. It may be
irritating to have different entries in your database for '/' and
'/index.html', but it's safer. A given server may have several file names
which it uses as default. For instance, given two files '/index.cgi' and
'/index.html', the server may give you the .cgi when you ask for '/', and
assuming .html would be incorrect even though that resource exists and is
published.