[Pharr 2004] describes the optimal grid size in his Physically Based Raytracer to be a scalar times the cube root of the number of polys in the scene.
Specifically, [Pharr 2004] describes the optimal grid size as:
where N is the number of polygonal faces present in the imported Mesh objects.
[Pharr 2004] says that a good place to "start testing" grid size is the cube root of N, which is what I had been using for my optimal grid size.
After some experimentation, I found that Pharr's suggested scalar multiplication factor of 3 made another large effect on my render time.
Accumulated Optimization Status 
Runtime 
Initial Voxel Traversal with voxel size "guess" 12x12x12s 
2752 seconds. (~46 minutes) 
VoxelPolygon SubMesh Optimization 
143 seconds (~2 and a half minutes) 
Shadow Ray Voxel Optimization 
35 seconds 
Optimal Grid Size Calculation 
24 seconds 
I now have another image to show the optimizations up to this point.
Accumulated Optimization Status 
Runtime 
Initial Voxel Traversal with voxel size "guess" 2x2x2s 
19295 seconds. (~5.4 hours) 
VoxelPolygon SubMesh Optimization 
not calculated (sorry) 
Shadow Ray Voxel Optimization 
not calculated (sorry) 
Optimal Grid Size Calculation 
1044 seconds. (~17 minutes) 
