Here’s a video showing the UE4 guy flying around the Galaxia Universe:

Probably the most noticeable recent change is the galaxy dust. I’m still using the same basic technique as in my previous posts, except that now there’s a texture being used to give a higher detail to the clouds. It’s making full use of mipmaps and anisotropic filtering to deal with aliasing issues, since it’s still an essentially rasterisation-based approach.

I’ve also included the use of textures with anisotropic filtering for the planet rings, which allows for finer details. The technique consists of making a line through a grayscale noise texture when drawing the rings, by using inner and outer UV coordinates. The pixel shader for the rings just determines how far between the inner and outer radii the current pixel is, and samples the noise texture at the appropriate point along that line. A second noise sample with bigger UV’s is added for extra detail, and a separate colour noise texture is also sampled to add colour variation.

I’ve also added binary star systems, and completely rebuilt the star system generating code to support those. Both inner and circumbinary orbits are being generated, which now allows for quite an interesting variety of star systems to appear. Binary planets are now possible, and due to the new recursive nature of the algorithm, under the right circumstances it’s even possible for moons to be orbiting other moons. The sphere of influence for each mass is used to determine how big the moons’ orbits could be, and how many moons a planet could have.

Lighting for planet atmospheres and rings also had to be upgraded to support the multiple light sources, and shadows from those. This does unfortunately mean the shader complexity for those has increased, but thankfully there hasn’t been any performance issues yet.

Terrain has also been an area of active development, since I need to do things a little bit differently in UE. Since the UE world needs to interact with the terrain for things like collisions and object placement, I need to take a different approach to at least the highest detail (i.e nearby) terrain, to be able to use UE materials and deferred shading to their maximum extent. I’ll try to write more about that in the future, when I have something in more of a working state.

As shown in the above screenshot, I’ve been playing around with using textures and bicubic sampling for generating the terrain, instead of just noise functions. Not sure how I feel about this approach yet, but it’s yielded some interesting results so far. Texture aliasing on the terrain is still quite bad due to the generated normalmaps not having any mipmaps, but the atmosphere tends to mask the issue. It becomes quite ugly for smaller planets/moons which don’t have an atmosphere. I think the solution to the problem is to not use generated normalmaps, but rather use a more conventional approach with vertex normals and texture blending.

Another recent addition is the simple anamorphic lens flares, which make the nearby stars more obvious. I think some more radial flares are still required, and possibly some textures as well, so I’ll be playing around with those more in the future.

That’s about it for now! If you made it this far, you might be interested in the Galaxia Discord server. I try to post more frequent updates and screenshots in there, and you can chat about Galaxia with me in *real time*!

Well if you’ve seen the movie Interstellar, or if you studied any astrophysics at all, you’ll know that black holes look weird because of their huge mass and gravity. The effect could be partly explained just using Newton’s law of gravity, by assuming the mass-energy equivalence as discovered by Einstein (E=mc2, i.e. by assuming the photons are attracted by the gravity), but the correct observed value can only be obtained by the results of Einstein’s general relativity, accounting for the curvature of space-time around the mass.

Unfortunately though, evaluating the full integral to find the true path of the ray for each pixel seems like it would be computationally prohibitive (i.e. it would need a lot of computing time). So as always in real-time computer graphics, an approximation needs to be made.

A fast, accurate approximation is already made for the gravitational lens for relatively small deflection angles, as a result of finding the limits of the integral. Thanks to Einstein, the deflection angle is approximated as (image from wikipedia) :

where G is the gravitational constant, M is the mass of the object (black hole in this case), c is the speed of light and b is the impact parameter, or the perpendicular distance to the center of the black hole from the viewing ray. Interestingly, this equation was first shown to be correct by Arthur Eddington and Frank Dyson in 1919 during a solar eclipse by observing stars appearing very near the Sun, and measuring how much their light had been deflected by the Sun’s mass, thus providing evidence to support Einstein’s general relativity.

To draw the image of the gravitational lens, really just a skybox texture is required. Since Galaxia already renders the skybox for star systems, this is reused here. However, although a star system is quite small compared to the distance to nearby stars, the closest stars might appear to be in the wrong locations when drawn in the skybox since it is static over the entire star system. To counter this effect, usually Galaxia does not render nearby stars into the skybox, but instead renders them as normal objects afterwards. This would mean that some stars might appear to be missing in the gravitational lens, so for the black hole systems the engine must render all the stars into the skybox. This unfortunately does reintroduce the star position “popping” effect as the skybox transitions to the full rendering of the stars, when entering or leaving the system. But these are unfortunately the sorts of prices we pay for real-time rendering on consumer hardware. The solution might be to render a separate skybox for the gravitational lens, but this would require a lot more graphics memory. For the moment at least, I’ll put up with the popping effect. It’s not really that noticeable anyway if you’re not paying close attention.

With the skybox texture in place, some geometry needs to be rendered to generate some pixels. The same as some other volumetric effects I’ve been working on, I start with an inverted cube, which is 100 times the Schwarzchild radius (in the case of a black hole, the Schwarzchild radius is the proper radius of the event horizon, or “shadow”). It is also worth noting that the Schwarzchild radius equation is closely related to the gravitational lens equation (image from wikipedia):

So, if the black hole’s Schwarzchild radius is calculated as part of the procedural generation system, the deflection of each pixel’s ray can be easily calculated as 2*rs/b:

float rs = SchwarzchildRadius; //constant for the shader. depends on the mass float3 p = BlackHolePosition; //relative to the camera. constant for the shader float3 rd = normalize(input.Direction); //ray direction for the pixel. float b = PointRayDist(p, rd); //impact parameter. float a = 2 * rs / b; //deflection angle - thanks Albert!

Where b is the impact parameter, calculated using the point-ray distance function:

float PointRayDist(float3 p, float3 rd) { return length(cross(rd, p)); }

Where p is the position of the black hole relative to the camera, and rd is the ray direction. Note that this function assumes that the ray passes through the origin.

I also subtract a small constant value from the calculated deflection angle so that the edges of the lens can blend smoothly with the existing background skybox. The deflection would actually be visible out to many hundreds or even thousands of times the Schwarzchild radius, but for real-time rendering purposes this is not really practical. I had chosen the rendered inverted cube size to be 100 times the Schwarzchild radius to try and keep the result as close to reality as practical. The small offset constant was chosen to work well with that size cube (in case you’re wondering, the value I use is 0.02).

Now that the deflection angle is known, the deflected ray direction can be calculated. I have done this by using a quaternion:

float3 bhdir = normalize(p); //the direction to the center of the black hole float3 axis = normalize(cross(rd, bhdir)); //the axis of deflection float4 qdef = qaxang(axis, a); //quaternion representing the deflection float3 ddir = mulvq(rd, qdef); //apply the rotation to the original ray dir

(See my galaxy rendering post for the definitions of qaxang and mulvq functions)

The deflected direction vector is then used to sample the skybox as normal.

This still won’t produce an image of a black hole though, just a gravitational lens approximation. To get the shadow to appear, a simple solution is to choose a maximum deflection value, and output black for any angle higher than that. On comparison with the actual Schwarzchild radius, I found that a maximum deflection angle of pi/2 (90 degrees) comes close to being correct, while preserving some of the stronger distortions. To reduce aliasing along the shadow’s edge, I blend the black shadow depending on the deflection angle as it approaches the limit. While not strictly realistic, it’s a sacrifice made in the name of image quality.

Even though this is a very cheap and quite inaccurate approximation for a black hole’s gravitational lens and shadow, the results are fairly convincing – showing the distinct Einstein ring effect, and the upward curvature of the shadow as the camera passes the event horizon, resulting in the “engulfment” effect:

Some special considerations need to be taken into account when looking out from within the event horizon. Since the full field of view is compressed into a tiny point, the intensity of light is magnified, so I multiply the colour output by an arbitrary factor based on the deflection angle to provide this aesthetic effect. This works well since I am using floating-point render targets and HDR post processing.

Most good black holes apparently have accretion discs around them, so this can’t be overlooked. Neutron stars will often have them too, as well as the visible gravitational lens due to their incredible density, so I should be able to re-use the technique again later.

Since I’m already doing lensing calculations in this shader, I thought it would be simplest just to add the sampling of an accretion disc texture based off the same deflection calculations. The first step with this approach is to generate a texture. I wanted the accretion disc to be able to spin, and to have the inner areas appearing to be spinning faster than the outer ones. Unfortunately the standard sort of Perlin noise can’t really do this – if we sample Perlin noise with increasing rotation in some domain warping (think of “twisting” the noise), the noise function will “wind up” over time, to a point where the noise would end up just looking like a set of concentric circles.

Taking a similar approach to the dust clouds in the version 2 of my galaxy rendering, my “spinning noise” works by randomly positioning a series of circles within a known range and summing each one’s contribution depending on its distance from the sample point. From this point of view, it is a variant of cell noise (aka Worley noise). The major difference is however that the grouping of the circles is performed in polar coordinates. This is perhaps best explained with an image:

Each concentric ring is divided into segments depending on the radius, to get approximately equal-sized segments. In each segment, *n* data points are randomly positioned. The concentric rings are then made to rotate at varying speeds, with the inner rings spinning faster. The main advantage of this method is that for each pixel in the final image, only 4 segments need to be sampled, regardless of how many concentric rings or segments there are. For any given pixel, the polar coordinates and therefore the corresponding ring and segment are easily calculated. If we assume the circles can overlap the ring and segment borders only in one direction, only the current cell and the 3 others providing a possible overlap are the cells that need to be sampled for each pixel.

Here’s the animated final effect – if you look hard enough you will be able to just notice the concentric rings pattern. But don’t stare at it too long or too closely because when you look away from it everything will look like it’s spinning!! Don’t say I didn’t warn you!

As mesmerising as it is to watch, it doesn’t look much like an accretion disc yet. But with some tweaks to the density and opacity of circles produced based on the radius, some adjustment of the segmenting parameters and some basic colouring, I ended up with this:

Which is a bit more accretion disc-like. Still not perfect obviously, but I can always spend more time improving it later!

Now comes the most difficult part – calculating where on the accretion disc any given ray might hit. I do this in two parts – first find any hits behind the event horizon’s shadow, rendering those before the black shadow itself, and secondly add any contribution from a hit in front of the shadow.

Hopefully the diagram explains clearly what is going on in this implementation, and makes clear why this is such a wild approximation. It should be obvious for starters that the photon would take a curved path rather than the angled one in the diagram. So if I feel the need to improve on this rendering, I now know that I should start with some hyperbolae.

Regarding the overall effect, this article seems to do a good job of explaining what the Interstellar movie got wrong, and what the black hole and accretion disc should actually look like. Most notable are the red/blue Doppler shifting effects due to the rotation of the accretion disc, and the black hole itself. I do plan on implementing the red/blue shift for the accretion discs, but I need to do some more maths for that first. But for now, I’m fairly satisfied with the result.

]]>

Download from https://1drv.ms/u/s!Arp4D6YSlcg0jGFY1hZ2Wr-E7gWB

New things in this version:

- Different galaxy types and improved galaxy distribution
- New galaxy rendering method
- Black holes and star classification system
- Bug fixes and performance improvements

Enjoy!

]]>Most engines that need to render galaxies these days seem to take a particle-based approach, by drawing hundreds of sprites with additive blending, much like how fire effects are usually done in games. The technique does have some drawbacks though, for example the additive blending means that you can’t have light “blocking” particles like smoke or dust interleaved with the light source. It’s not really a problem for rendering fire, but in a galaxy, the dust can appear both in front and behind areas that are emitting light. But perhaps this problem can be worked around by rendering in multiple passes.

So I decided I would try this particle approach for the galaxy rendering. Instead of drawing the particles as billboards though, I’ve decided to try drawing them as inverted cubes. Each pixel drawn for the particle cube will find the integral of the light density for the ray passing through the cube, allowing some simple noise addition. In the screenshot, 16384 randomly positioned solid shaded cubes arranged in a plane are blended into a 256×256 texture to produce the “cloud” effect. Looking at it closely, the cube edges are just visible.

Next step is to remove the visible cube edges with the ray casting, starting by finding the intersection of the ray with a sphere the same size and position as the cube. Once that is done, the integral of the light density along the ray needs to be found. There would be a number of ways to do that, but simple is best in this scenario. Some maths might need to be done…

I found a neat approach to how to arrange particles in a galaxy-like way which I think will be a good starting point for this new approach. Placing the particles on ellipses which are slightly offset from each other I think may allow for the possibility of galaxy collisions, which are something I would really like my new method to be able to do.

That’s all for now. Time to write some more code!

(A few hours later…)

Well I think the elliptical spirals work pretty well! The performance is still much better than the old version, but there’s still a lot more details to add. I do really like how that barred spiral pattern appears as if by magic when just incrementally rotating the concentric ellipses. The even cooler thing about this is that this galaxy could animate in realtime and look somewhat realistic! My stars system won’t be able to handle that sort of thing very well, but it’s definitely something I am going to play with.

(…Later…)

As I suspected, incrementally shifting the positions of the ellipses results in some nice asymmetry. Tweaking the axis of the ellipse rotation can also produce a “warping” effect, and I’m thinking that if I can create a second set of ellipses and align them all correctly, that might be able to produce something resembling a galaxy collision. Since galactic collisions are an important feature of the universe, they are something noticeably missing from Galaxia. Hopefully it can be done this way!

(…)

Incrementing the ellipse rotation angle by a much larger amount results in denser spirals, producing a whirlpool effect… And delaying the angle increments results in much more prominent bars.

(…Playing around with “warping” parameters…)

Some misplaced particles in their cubic form are seen very faintly after warping the outer ellipses. Warping is done by changing the ellipse rotation axis such that it no longer aligns with the galaxy’s rotation axis.

(…)

Some interesting spiral arm patterns resulting from warping.

Here’s the DX11 compute shader that produced this pattern:

RWStructuredBuffer Outputs : register(u0); cbuffer Galaxy2CSVars : register(b0) { uint Index; //galaxy cache index. uint ParticleCount; //number of particles (not yet used) float Pad0; //not used float Pad1; //not used GalaxyData Galaxy; //the galaxy data, not really used yet } [numthreads(128, 1, 1)] void main(uint3 gtid : SV_GroupThreadID, uint3 gid : SV_GroupID) { uint pid = gid.x * 128 + gtid.x; //the particle index in this galaxy. uint oid = Index + pid; //the output data index in the particle buffer float4 rnd = RandVec4(pid); float size = Galaxy.Size.x * 0.025 * (rnd.w*rnd.w+0.5); //randomize the particle size. float ecc = 0.7; //inverse ellipse eccentricity (circle scaling factor) float rot = 23.5; //spiral rotation amount / "twistiness" float off = 0.03; //linear z offset for ellipses - asymmetry factor float bar = 0.3f; //bar size float2 warp1 = 0.3; //pre-rotation warp vector float2 warp2 = 0.01; //post-rotation warp vector float erad = rnd.x; //size of the galactic ellipse float theta = rnd.y * 6.283185307f; //angle around the ellipse float height = rnd.z; //distance from the galactic plane float2 pxy = float2(cos(theta), sin(theta))*Galaxy.Size.x*erad; //make a circle. float3 pos = float3(pxy.x, height*Galaxy.Size.y*0.1, pxy.y); //place the circle. float3 gax = float3(0, 1, 0); //rotation axis for ellipses gax.xz += erad*warp1; //warp the rotation axis float angl = saturate(erad-bar)*rot; //angle to rotate the ellipse gax.xz += erad*angl*warp2; //also warp the axis by the angle amount float4 q = qaxang(gax, angl); //using quaternions here to orient the ellipses float4 qi = qaxang(gax, -angl); pos = mulvq(pos, q); //this line isn't really necessary since it's just rotating a circle pos.x *= ecc; //turn the circle into an ellipse pos.z += erad * Galaxy.Size.x * off; //offset ellipses to make some asymmetry pos = mulvq(pos, qi); //rotate the ellipse to where it should be to make spirals Galaxy2Particle particle; particle.Position = pos; particle.Size = size; particle.Colour = float4(1,1,2,1)*0.01; //it's just blue for now Outputs[oid] = particle; }

And the functions used:

Texture2D g_txRandomByte : register(t0); // permutation random byte tex SamplerState g_samWrap : register(s0); //permutation sampler (wrap) static const float g_TexDims = 256.0f; static const uint g_TexDimu = 256; static const uint g_TexDimSq = 65536; static const float g_InvTexDims = 1.0f / g_TexDims; static const float g_HalfTexel = g_InvTexDims * 0.5f; float GetPermutation(float2 texcoord) { return g_txRandomByte.SampleLevel( g_samWrap, texcoord, 0 ); } float4 RandVec4(uint i) { //This has quite a few perhaps unnecessary calculations but it's like this // to exactly match the CPU version. I'll rewrite both versions some time. uint i4 = i*4; uint nx = i4%g_TexDimSq; uint ny = (i4+1)%g_TexDimSq; uint nz = (i4+2)%g_TexDimSq; uint nw = (i4+3)%g_TexDimSq; float x = GetPermutation(float2((float)(nx%g_TexDimu)*g_InvTexDims, (float)(nx/g_TexDimu)*g_InvTexDims)+g_HalfTexel); float y = GetPermutation(float2((float)(ny%g_TexDimu)*g_InvTexDims, (float)(ny/g_TexDimu)*g_InvTexDims)+g_HalfTexel); float z = GetPermutation(float2((float)(nz%g_TexDimu)*g_InvTexDims, (float)(nz/g_TexDimu)*g_InvTexDims)+g_HalfTexel); float w = GetPermutation(float2((float)(nw%g_TexDimu)*g_InvTexDims, (float)(nw/g_TexDimu)*g_InvTexDims)+g_HalfTexel); return float4(x,y,z,w); } float3 mulvq(float3 v, float4 q) //rotates a 3D vector by a quaternion. { float3 result; float axx = q.x * 2.0; float ayy = q.y * 2.0; float azz = q.z * 2.0; float awxx = q.a * axx; float awyy = q.a * ayy; float awzz = q.a * azz; float axxx = q.x * axx; float axyy = q.x * ayy; float axzz = q.x * azz; float ayyy = q.y * ayy; float ayzz = q.y * azz; float azzz = q.z * azz; result.x = ((v.x * ((1.0 - ayyy) - azzz)) + (v.y * (axyy - awzz))) + (v.z * (axzz + awyy)); result.y = ((v.x * (axyy + awzz)) + (v.y * ((1.0 - axxx) - azzz))) + (v.z * (ayzz - awxx)); result.z = ((v.x * (axzz - awyy)) + (v.y * (ayzz + awxx))) + (v.z * ((1.0 - axxx) - ayyy)); return result; } float4 qaxang(float3 ax, float ang) //creates a quaternion from axis and angle. { float ha = ang * 0.5f; float sha = sin(ha); return float4(ax.x * sha, ax.y * sha, ax.z * sha, cos(ha)); }

(Perhaps quite a few optimisations could be made! And sorry about the formatting, maybe I should have chosen a more code-friendly blog host…)

… Smoothing out those cubic particles:

The sharp noisy particle cube edges are eliminated by some ray maths. Here’s the vertex shader for the particle cubes that generates the per-vertex ray data for the pixel shader:

StructuredBuffer Inputs : register(t0); //from the CS cbuffer Galaxy2VSVars : register(b0) { float4x4 ViewInv; //(not used in this version) float4x4 ViewProj; //camera view-projection matrix float4 GalaxyPosition; //relative to the camera float4 GalaxySize; //in universe units float4 GalaxyOrientation; //(quaternion) float4 GalaxyOrientationInv; //(quaternion) float2 OutputOffset; //offset for render to impostor float2 OutputScale; //scaling for render to impostor float Pad0; float Pad1; uint Index; //galaxy cache index uint ParticleCount; //number of particles }; struct VSOutput { float4 Position : SV_POSITION; float4 Colour : COLOR0; float4 Data : TEXCOORD0; float3 Direction: TEXCOORD1; float3 RayEnd : TEXCOORD2; float3 RayStart : TEXCOORD3; }; VSOutput main(uint iid : SV_InstanceID, float4 pos : POSITION) { VSOutput output; Galaxy2Particle particle = Inputs[Index + iid]; float size = particle.Size; float fade = 1.0; float shine = 2.0; float3 re = pos.xyz * size; float3 fpos = particle.Position + re; float3 cpos = GalaxyPosition.xyz + mulvq(fpos, GalaxyOrientation); float3 dir = mulvq(cpos, GalaxyOrientationInv); output.Position = mul(float4(cpos, 1.0), ViewProj); output.Position.z = output.Position.z*output.Position.w*0.0000000000001; output.Position.xy += (OutputOffset*output.Position.w); output.Position.xy *= OutputScale; output.Colour = particle.Colour; output.Data = float4(size, 0, fade, shine); output.Direction = dir; output.RayEnd = re; output.RayStart = cpos; return output; }

The vertex shader is run with the inverted cube geometry, with position coordinates ranging from -1 to +1. DrawIndexedInstanced is used, with the instance count equal to the number of particles, in this case 16384.

And here’s the pixel shader that goes with it:

cbuffer Galaxy2PSVars : register(b0) { float4x4 ViewProj; float4 Magnitude; }; struct PS_INPUT { float4 Position : SV_POSITION; float4 Colour : COLOR0; float4 Data : TEXCOORD0; float3 Direction: TEXCOORD1; float3 RayEnd : TEXCOORD2; float3 RayStart : TEXCOORD3; }; float4 main(PS_INPUT input) : SV_TARGET { float4 c = input.Colour; float size = input.Data.x; float fade = input.Data.z; float shine = input.Data.w; float3 dir = normalize(input.Direction); float3 re = input.RayEnd; float3 cpos = input.RayStart; float tx = (dir.x>0.0?size+re.x:size-re.x)/abs(dir.x); float ty = (dir.y>0.0?size+re.y:size-re.y)/abs(dir.y); float tz = (dir.z>0.0?size+re.z:size-re.z)/abs(dir.z); float t = min(min(tx, ty), tz); float dist = length(cpos); float dt = (dist<t?dist:t); float3 rs = re - (dir * dt); float hts = 0; float hte = 0; bool hit = RaySphereIntersect(rs, dir, size, hts, hte); if (!hit) discard; //didn't hit the sphere! float rlen = hte - hts; float rlenn = 0.5 * rlen / size; //scale by the square of the relative ray length to approximate //the light density integral over the ray through the sphere c *= saturate(rlenn*rlenn); return c; }

And the RaySphereIntersect method used here (probably could be improved):

// Returns the near intersection point of a line and a sphere float NearIntersection(float3 pos, float3 ray, float distance2, float radius2) { float B = 2.0 * dot(pos, ray); float C = distance2 - radius2; float det = max(0.0, B*B - 4.0 * C); return 0.5 * (-B - sqrt(det)); } // Returns the far intersection point of a line and a sphere float FarIntersection(float3 pos, float3 ray, float distance2, float radius2) { float B = 2.0 * dot(pos, ray); float C = distance2 - radius2; float det = max(0.0, B*B - 4.0 * C); return 0.5 * (-B + sqrt(det)); } //finds whether a ray hits a sphere, and the start and end hit distances bool RaySphereIntersect(float3 s, float3 d, float r, out float ts, out float te) { float r2 = r*r; float s2 = dot(s,s); if(s2<=r2) { ts = 0.0; te = FarIntersection(s, d, s2, r2); return true; } ts = NearIntersection(s, d, s2, r2); te = FarIntersection(s, d, s2, r2); return te>ts && ts>0; }

Since this method finds the distance that the ray has travelled through the sphere (accounting for the camera position!), it produces a nice smooth effect as the camera moves through it, without the aliasing issue that plagues the old implementation. However, dust clouds are still somewhat problematic with this method, and some “fakery” will need to be done to make them look good.

The video shows this galaxy effect from different angles. I think it’s probably just about time for some better colours…

(… The next day…)

Added some colouring using the same code I used to colour stars, based on temperature. I assumed that the temperature of stars is lower near the centre bulge due to the stars being older there, and observations show that the bulge tends to be more yellow/red in colour.

I also was playing with making the spiral arms branch out as can be seen in many spiral galaxies. It probably needs to be a fractal thing though, since the arms could split many times along their length.

Here’s some pics of the colour distribution from planet surfaces, I think it looks similar to what we see the Milky Way Galaxy as from here on Earth (minus the dust clouds because I haven’t tried implementing those yet).

The galaxy shape variation is turning out to be a challenging thing, especially to get them shaped like real galaxies. The Hubble Tuning Fork, or the Hubble Sequence is used by astronomers to classify galaxies, so I think I will start there.

Still figuring out the best way to implement that. Probably with variables such as “bulge size” and “bar size”. My original implementation had those but in a different scale and distribution. But either way, I still need to work on getting the whole range of galaxy shapes out of the algorithm. And I’ll also be working on the dust!

(Another day later)

So I was playing around with adding some dust. I figured the easiest starting point is to just add some particles in with subtractive blending. But for some reason I was getting ridiculous results with the D3D11_BLEND_OP_SUBTRACT blend state, so I decided to just use additive blending and add negative colours. I also had to make sure the final output has the colour value above zero, and alpha between 0 and 1. Since the galaxy is being rendered into a low resolution texture currently, when finally outputting that texture to the screen the pixel shader just makes sure values are in the correct range. Note this would only work with the floating-point render target and texture.

So the dust I think is looking fairly good already, it reminds me a lot of the Elite:Dangerous galaxy. I suspect they are using a similar method.

I’ll be playing with the dust a fair bit more yet. I still want those high detail “wisp” effects that are often seen in galactic dust. The implementation is not currently using any LOD scheme, so I think this first low-res version can serve as a basis where higher detail dust can be “layered” on top when the camera is closer to it.

(Some much darker dust)

(More dust testing and fiddling – planet with moon)

Final update:

It was suggested that I try using a technique called weighted blended order independent transparency (WBOIT) to solve the render order problem for the dust clouds. This now requires that I use two render targets when rendering the low-res galaxy image, and then using both those textures to composite the final image. I based my implementation off some code I found here. It took a fair while of playing with it before I had a good understanding of how its limitations, perhaps I’ll write more about WBOIT in a future article. I still have some slight blending issues with dust using this method though, mainly at the moment it makes the spheres much more obvious than they were due to each particle now making a larger contribution to the end result.

I think the results are quite good but still more work needs to be done to get the higher detail dust lanes to appear. It’s a particularly difficult problem though due to the complex volumetric nature of the dust. I have noted that in engines that render galaxies in high detail, usually billboards are used with a wispy dust lanes type texture to provide the detail. Unfortunately though the illusion breaks when leaving the plane of the galaxy, since the orientation of billboards has a singularity in the calculations leading to the effect of billboards appearing to “spin”. I still need to do more research on this.

I have also implemented elliptical and irregular galaxies, and adjusted the galaxy sizes to be more realistic. Irregular galaxies were achieved by simply using some high warping factors when positioning the particles. Elliptical galaxies have much less dense dust on average and their bulge size parameter is much larger. The temperature range for the glow effect is also set much lower on ellipticals, resulting in a much more yellow colour overall.

Watch the video about the new galaxy types:

]]>

Download from https://1drv.ms/u/s!Arp4D6YSlcg0jGDe5NtGbe5wR5F3

New things in this version:

- Selected/hovered item name and distance info
- XInput support and better control configuration
- Voxel collision (WIP!)
- Bug fixes and code improvements

I had a realisation that the functionality of my voxel level-of-detail blending technique might not have really been documented anywhere. Others have probably already implemented the same idea (this technique may even already have a name, please tell me if you know), but it’s something I didn’t read about anywhere – I went googling for things like “smooth voxel LOD blending” etc, and didn’t come up with much technical info on how any sort of smooth voxel LOD transitions could be done, except for in a ray casting approach found here, which doesn’t really apply. So I just went with my instincts and came up with something that worked surprisingly well. The end result looks much like what Miguel did, presumably for the first time in Voxel Farm. Although as far as I know he hasn’t really elaborated on exactly how it works.

So here’s an article about how I’ve done it for my voxel asteroids! The implementation’s still not quite complete (some small gaps appearing occasionally, and the odd “pop”), but it’s good enough for this explanation. I plan to use this technique on my planet terrain as well once it’s finished. For info about how I’m extracting the voxel surface on the GPU in the first place, see my previous article on voxels.

Firstly, see the first half of the above video to get a feel for the vertex movements that occur during blending between detail levels (you may have to watch in fullscreen HD to be able to see the edges clearly). If you are familiar with adaptive tessellation you may be seeing some familiar patterns. To someone who hasn’t seen a wireframe rendering of adaptive tessellation before, this is often what it looks like – edges in the mesh looking like they split into two, sideways.

The edge splitting effect is noticeable in these images as well:

Click to view slideshow.So, what’s really going on here? The simple answer is that the vertex positions and normals are blended with the vertex at the equivalent location in the parent node, such that at the exact point that a node splits, the new child node vertices are completely blended into the parent position, giving the visual appearance that the node hasn’t split at all. Much like how the LOD blending works for my planet terrain, as the new child nodes get closer to the camera their vertices “morph” back to their original positions (and normals). It’s a straightforward idea, but as always in graphics development, quite a few complexities appear during the implementation. So for the long answer, keep reading.

To start with, the main advancement from my previous voxels post is the LOD implementation itself, which is octree based. The octree nodes are culled against the viewing frustum, and split with the standard 1/d technique, which in my implementation looks more like:

ApparentSize = NodeSize / NodeDistance SplitValue = ApparentSize - NodeRenderingDetail if ( SplitValue >= 0.0 ) { /* split this node (recurse) */ } else { /* queue this node for rendering */ }

Where NodeSize is the radius of the node in the world space, NodeDistance is the node’s center distance from the camera, and NodeRenderingDetail is an adjustable setting. Which is all fairly standard, except maybe for the SplitValue, which is the value that will end up being used to calculate the LOD blending factors. ApparentSize and SplitValue are stored in the node object for later use. At this point, some probing of the voxel density function is also done on the CPU prior to rendering to discard nodes that are far away from the surface. Then once the list of nodes to render is obtained, they are sorted from near to far to reduce overdraw.

The relevant node data (each node representing a chunk of voxels) is packed in the sorted order into a graphics resource (i.e. a buffer), which will result in the order that the chunks are finally rendered.

A couple of changes were required to the update phase described in my previous post on voxels. The voxel data generated in step 1 needed to be enlarged to support a wider range of sampling outside the current chunk. To also accommodate doing an ambient occlusion sampling, a few extra data points were added to bring the size to 39x39x39. Since 39 nicely divides by 3, the compute shader is run in thread groups of size 39x3x1 (117 total threads each). Therefore N x 13 x 39 thread groups need to be dispatched (where N is the number of chunks being updated).

The other main change in the update phase was added to step 10, where the cached vertex data is updated. To accommodate the LOD blending, each vertex no longer needs just its own position and normal, but also those of the parent cell’s vertex. Essentially this means just performing the same set of calculations twice for the vertex, but with different samplings of the voxel data.

The current cell’s position within the voxel data in step 10 is a 3-component uint, which is the primary input into the shader obtained from the group and group thread IDs, which will cover the range [0,0,0] to [32,32,32], representing the 33x33x33 vertices that make up a 32x32x32 cell chunk. So this means that we can simply determine the parent cell’s coordinates by subtracting the modulus of 2 (excerpt from step 10 compute shader) :

[numthreads(33, 11, 1)] //need to run (N,3,33) groups, where N is the chunk count. void main(uint3 gtid : SV_GroupThreadID, uint3 gid : SV_GroupID) { uint3 v = uint3(gtid.x, gid.y*11 + gtid.y, gid.z); //the current cell data indices uint3 bv = v + 4; //the chunk edges are inset by 4 in the input data (39x39x39) ... (this cell's position and normal calc's using bv) uint3 pv = bv - (v % 2); //the computed parent cell data indices ... (parent cell's position and normal calc's using pv) ... (finally, output the calculated data into the vertex cache) }

Obviously when doing the sampling for the parent cell, the data points to sample will need to increment by 2 instead of 1 since the parent data is represented by every second sample from the child data.

So on to the rendering phase, where the “magic” happens here.

Before the rendering phase begins, the correct LOD blending factors for each node need to be calculated. The SplitValue previously calculated for each node is used as the LOD blending factor for that node’s *child nodes*. This is because at the exact point that a node splits, its SplitValue will be equal to zero. The SplitValue will also increase as the child nodes get closer to splitting, thereby being the ideal basis for those child blending factors. The only problem with using SplitValue directly is that it tends not to increase fast enough, and it can go above 1 (which should be the maximum value of a blending factor). So as a shortcut the blending factor is calculated by multiplying by a constant (saturate() just clamps the value to 0..1 range as in HLSL) :

BlendFactor = saturate(ParentNode.SplitValue * 10.0f)

Ideally the blend factor would be based on not only the parent node’s SplitValue, but the child’s as well, ranging from 0 when the parent just split, to 1 when the child node is *about* to split. This would result in a more constant blending over the full LOD range, and may be done in a future upgrade. If anyone knows a simple method of doing so, please let me know! (I think the maths should be fairly straightforward, I just haven’t put any time into it)

The node center camera relative positions and the blending factors are packed along with the other node information in the sorted node info buffer for use by the render shaders (primarily the geometry shader).

So now in the geometry shader, for each vertex being output the corresponding parent position and normal values are available as well as the vertex’s original position and normal. Finally, the chunk’s LOD blend factor is then used to simply blend the parent values with the original values, resulting in the smooth LOD transitions.

But there’s still a major problem! At the join between two adjacent chunks, the vertices will only line up perfectly if both chunks are at the same LOD and have exactly the same blend factors. This means that there will be unsightly gaps appearing along all the joins that need to be filled. Luckily, the LOD blending scheme provides an elegant solution to this issue.

Much like in my planetary terrain implementation, node adjacency information is calculated before rendering. This is done by first adding all the nodes to be rendered into an STL map (C++ equivalent of C# Dictionary<T>), keyed by the node’s center position (note: integer position vector – may not work for floats!). Then a second pass tests for the visibility of all the sibling nodes by calculating the sibling positions from the node’s position and size, and checking if they exist in the map. If a sibling was found, the sibling’s LOD blend factor is written into an edge blending array for the node (otherwise 0.0). This array is also included in the node info buffer to be accessed by the shaders.

Now in the geometry shader, when processing an edge vertex, the appropriate blending factor is selected out of the edge blending array. If the node’s own blending factor is smaller than the value obtained from the array, the node’s own blending factor is used instead. This ensures siblings sharing an edge will both use the same blend factor along that edge. And finally, if the matching edge sibling is split, a value of 1 is used. (This is how I currently have it implemented, but I realise while writing this that these extra checks could be done on the CPU beforehand and the appropriate values written into the edge blending array. I will definitely fix that because it will improve performance).

And that’s it! There shouldn’t be any gaps appearing any more. The only exceptions are now corner cases (i.e. diagonally adjacent nodes), which do appear but are quite infrequent. To account for this, the edge blending array described above will have to be expanded to include the corners. But I think it’s OK for now, at least until the occasional small gap really annoys me.

There’s also an issue when a cell is determined to contain the isosurface but the parent cell doesn’t. This may occur if the surface passes through one face of the parent node, but none of the edges. In this case, the vertices end up blending into an appropriate parent position, but then when the LOD switches to the parent, that vertex doesn’t exist, resulting in a visible “pop” when the faces attached to that vertex suddenly disappear. I have a couple of ideas to solve this problem, but I haven’t had a chance to play around with it yet. The solution will probably involve blending with a sibling of the parent, rather than the parent itself.

Well, that turned into a lot more than I was planning to write… 1800 words!! Hopefully it all makes sense because I wrote it over a period of a few days. I’ll finish off with a few images obtained during the process of implementing all this. Have fun!

]]>So, let’s start at the beginning.

All virtual worlds need some kind of a coordinate system to represent positions in. Most 3D games and simulations represent a position with 3 floating-point numbers representing the X, Y and Z distance from the world’s origin (which is located at X,Y,Z=0,0,0). These numbers are usually in a meaningful unit such as meters, so a value of 27.4 might mean 27.4m. Which is all fine, for most games and simulations.

But any experienced programmer will tell you that there’s a bit of a problem when the numbers start to get big. *Floating point precision* is a limiting factor, because the numbers are made up of a limited number of bits. Most engines currently use 32-bit floating point numbers (aka single precision, or “float”) to represent almost everything, and currently available consumer GPUs are only really efficient with 32-bit operations. These single precision floats generally have about 7 significant figures to work with. For example, I should be able to fairly safely use numbers with as much detail as 123.4567, or 4.567891×10^74. Note how the position of the final digit of precision depends on the exponent value – so in the second case, the difference between incremental numbers is on the order 10^70! This can become a major problem when trying to do calculations involving both big and small numbers.

The problem can be somewhat reduced by using more bits. 64-bit floating point numbers (aka double precision or “double”) are becoming more widely used due to native support from 64-bit processors. Obviously, using doubles to store position information will allow for much larger distances from the world origin before the calculations start to go wrong due to precision errors. But the problem is still there – and in doubles you just have doubled the amount of significant figures you have to work with. There’s still that issue of having to be close to the world origin, with loss of numerical detail further out. And when a value *is* close to the origin, there’s *too much* precision, being a waste of those precious bits.

Galaxia instead uses 64-bit integers. This allows all possible coordinate values to have the same resolution throughout a volume of space, meaning the least significant bit of each number always represents the same distance. A scaling factor just needs to be chosen to correspond integer units to real-world units, for example 10000 units per meter (=0.0001 meters per unit) might be used, meaning we could store positions accurate to 0.1mm. Doing the calculations on that to find the largest possible distance that can be stored:

2^63 / 10000 = 922,337,203,685,477.5808 meters

= ~1 trillion km (at 0.1mm resolution).

Which seems pretty big…

But still, calculations need to be done on objects at any point in the space, for example rendering a 3D object generally involves transforming object/vertex positions with a camera view-projection matrix to determine positions on the screen. The solution is to use *relative* positions when doing calculations. In the case of rendering in Galaxia, the camera is always assumed to be at position (0,0,0), meaning the camera’s view matrix does not have a translation component. Before any object’s screen position is calculated with the camera matrix, its position is calculated relative to the camera. In maths terms, this is:

rp = op – cp

where rp is the position relative to the camera, op is the object’s position in the world, and cp is the camera’s position. In programming terms, since op and cp are both 64-bit integer vectors, subtracting them as such will give an exact answer, and if the two positions are close to each other, the resulting numbers will be small. This means we can safely convert the relative position to a floating point format for use in the rendering calculations.

1 trillion kilometers might sound like a lot but it really is nothing on a galactic scale. So we still need some further method to be able to store those really gigantic values that will allow us to fly in between stars and galaxies. There’s also a problem if we think about using one huge coordinate system to store positions of things on a planet’s surface, for example. When the planet rotates and moves along in its orbit, we don’t want to have to recalculate the position in “intergalactic coordinates” of everything on the surface every time, because there just wouldn’t ever be enough computing power.

So in the tradition of killing two birds with one stone these two problems are solved with one solution – nested coordinate systems. Starting from the largest scale, we need intergalactic coordinates. Since truly vast distances have to be covered, a large scaling factor is required. 10^16 meters per unit is used in Galaxia, which allows for quite smooth motion relative to galaxies and results in a universe that stretches out to what might seem like infinity to the player (okay if you really want to know it’s 9.22×10^34 meters, or 9.75×10^18 light years – many of orders of magnitude bigger than our observable universe).

Galaxies are positioned in the intergalactic universe space (obviously) by means of a pseudo-random number generator (but that’s a different subject for another day). Along with the position, a size value and a random orientation quaternion are also generated (a quaternion is basically 4 floats). These values are then used to form the first “child” coordinate system – the galactic, or interstellar coordinates, which have a coordinate scale of 10000 meters per unit. That scale allows for a galaxy to be up to 9.75 million light years (mly) in radius, where the biggest known galaxy at this time is 5.8 mly in radius. For comparison, the Milky Way galaxy is only about 100,000 light years across (0.05 mly radius).

When the player flies close to a galaxy, a coordinate transform needs to occur. The player’s position in the galactic (interstellar) coordinate system is calculated from the position in universal (intergalactic) coordinates, by first calculating the player position relative to the galaxy. Since this will be a small number in universe scale, we can then convert it to a floating point number. A unit conversion also needs to be performed, since galactic coordinates are on a different scale to universal coordinates. The easiest way to think about that is to first convert the number from universal units into meters (multiply by the universe meters per unit), and then convert back into galactic units (divide by the galaxy meters per unit). Once the relative position is found in galactic units, to find the final position in the galactic coordinates we need to rotate the vector by the inverse of the galaxy’s orientation (using an inverse of the galaxy’s quaternion), to account for the orientation of the galaxy relative to the universal coordinates. Essentially the same conversion is also done with the player’s velocity and orientation, to make the transition seamless.

Once the position in galactic coordinates is obtained, the player’s character object (which currently has no mesh in Galaxia) is moved out of the universal coordinate system’s container and into the galactic one. Basically that means from now on, the player is moving around within the galaxy simulation rather than moving around in the outer universe simulation. In some senses this may sort of equate to an “instance” as the term is understood in online gaming – it’s an isolated coordinate system that isn’t really affected by others (except that you probably wouldn’t need to ever have two of the same galaxy “instance” for example). The only main difference here is that even though the player moves around in the galaxy’s coordinate system, they are still actually within the bounds of the universe coordinates, therefore the player now has two sets of coordinates – the primary one that is being controlled (the galactic) and the secondary (universal), which is calculated from that, using the reverse process as before (rotate by the galaxy quaternion and then adjust the scale).

The coordinate system nesting is continued in this manner for star systems. Objects like planets inside star systems again have their own coordinate systems within that. Currently the maximum coordinate system hierarchy depth in Galaxia is 5 (asteroids in asteroid belts), but there is really no limit to how far it could go, except for the potential increase in the amount of rendering needing to be done at each level.

When it comes to drawing everything, rendering happens from the outer coordinate systems to the inner ones, and since the player’s position and orientation is calculated for each outer system, they are each rendered as normal. The primary depth buffer is also cleared after rendering each level if it’s been used. To help with the problem of increasing rendering complexity at each level, a performance improvement is made at the star level by rendering everything that is seen outside the system to a skybox texture (i.e. the other galaxies, the current galaxy, and other stars). This does unfortunately cause a slight delay when entering a star system for the first time, since rendering the skybox requires rendering everything 6 times with different view orientations, for the 6 faces of the skybox. But once that skybox is rendered, we no longer have to worry about rendering all the galaxies etc. every frame while inside that star system.

When we get to the planet coordinate system, objects on the planet surface will have positions represented relative to the center of the planet. So when the planet moves and rotates, their positions can stay the same, meaning we don’t have to re-position anything but the planet itself (or rather its coordinate system).

The main drawback to these nested coordinate systems is obviously that the bigger ones have lower unit resolution than the smaller ones. This may not be as big an issue as it first seems, since generally space is extremely empty. The main problem case might arise in a multiplayer scenario, if two players approach each other in the intergalactic space, for example. Since units are each 10^16 meters in that space, it would be impossible to have one player “standing” next to the other in that coordinate system. But the simple solution here is just to create a new coordinate system or set of coordinate systems as they approach each other, allowing the resolution of the distance between the two players to increase as they get closer to each other. In that multiplayer scenario that would actually correspond to creating an “instance” for that area of space where the two players can interact.

In addition to all this, Galaxia also has a permanent “local” coordinate system centered on the player character, which remains isolated from the other coordinate systems. This system is the one used for the camera controls – arc rotation around the player, zooming, and will allow for VR head tracking. I also find it interesting that due to the nature of floating point numbers, the full camera zoom range is quite adequately represented by a single 32-bit float.

Hopefully this all helps in understanding how object positioning is done on such a large range of scales in a program like Galaxia when the numbers we have to work with are limited in size. Feel free to leave questions and comments below!

]]>[Fuzzy blobs: spheres with added ridged noise. Header image: spheres with negative ridged noise. And a planet.]

Comparing the filled and the wireframe views of the image below can help reveal how the dual contouring is done. For each voxel “cell” (a cube with a voxel point at each corner) in the grid, between 0 and 3 quads can be generated, depending on which of the corner points are above/below a threshold value. If all corners are on the same side of the threshold, no quads are output.

The output vertex points also need to be moved to an estimate of where the surface would be, based on the voxel gradients along each cell edge. This results in the perfectly round spheres above. Probably the easiest way to visualise what is happening is to think of the terrain in Minecraft. Basically we start out with all those orthogonal cubic faces and then “flatten” the vertices into the smooth surface. If this last step weren’t done, the output would actually look just like Minecraft.

Note that the colours used here are the vertex coordinates in the grid, so some “blockiness” is visible in the colours due to them not being offset in the same manner as the vertex positions.

To get it running at an interactive rate, I’ve devised a method of generating up to 128 voxel “chunks” (each 32x32x32 voxels) at a time all on the GPU. The results of of the chunk generation are put into larger vertex and index buffer caches, holding 256+ chunks (depending on available graphics memory). The updating is currently done immediately before rendering. I’m hesitant to introduce a chunk streaming approach due to the potential few frames that there would be nothing to display – so I’ve sacrificed some “fluidity” here, meaning that as the camera moves/rotates and uncached chunks need to be generated, they are done in that frame, there and then, potentially causing the frame to take much longer to render. Currently my terrain system does the same thing, in order to be able to ensure bits of the terrain don’t suddenly blink out of existence before a higher or lower LOD is generated. But it’s not actually causing a noticeable frame drop yet. In particular if the camera’s velocity is kept small, generally only a small number of chunks need to be updated each frame, which is quite fast, and the cache saves having to regenerate everything if the camera is just rotated in circles.

An outline of the chunk update process is as follows: (CS: compute shader; GS: geometry shader – both using Shader Model 4.0)

- (CS) Fill a 35x35x35 uint grid of the data, with 16 bits each of isovalue and material data. Currently I’m just using static functions but eventually I want to have it generate a surface based off my terrain’s heightmaps.
- (CS) “March” the voxel cells and determine the “case index”. I used a method based off this paper (see p46). The output data is well bitsmashed, with 4 voxel cells in each uint value that the compute shader outputs. Also in there is packed a value for each cell for whether the vertex is required to generate the surface, and the quad count for the cell.
- (CS) “Reduce” the vertex and quad counts along the Z axis, so for each chunk a 32×32 grid of values are output, each “pixel” containing the sum of vertex and index counts along each row of voxels.
- (CS) Perform the same again, except collapsing the 32×32 grid into a 1×32 row of sums.
- (CS) Perform the same yet again, to finally end up with 1 value for each chunk containing the number of vertices and quads for each chunk. (16 bits each).
- (CS) Perform the reverse procedure as the above, but outputting the vertex and quad offsets for each row into a 1×32 row of uints.
- (CS) Continue the “re-expansion” step to get a 32×32 grid of offsets.
- (CS) Final re-expansion step, resulting in a 32x32x32 grid of vertex indices and quad offsets.
- (CS) Update the cached “counts” buffer, which contains how many vertices and quads are to be rendered for each cached chunk.
- (CS) Update the cached vertices buffer. The CS is run for each cell in each updated chunk. If an index is available for output, the vertex position is generated using the “contouring” part of the dual contouring algorithm. The vertex must always remain within its cell, and is found simply by linearly interpolating all the cell’s estimated edge intersection points. The index found in the reduction/re-expansion phase is used here to perform a scatter write of the vertex data.
- (CS) Update the cached indices buffer. A similar thing is done here as for the vertex buffer above, except for the quads in each cell. The only difference here is that each cell could output up to 3 quads, as opposed to the single vertex above. But that’s OK, because the “quad index” generated previously will have accounted for that.
- (GS) At this stage, all the data required for rendering has been prepared. To render the quads, a geometry shader is run with 8192 input primitives. The vertex ID is simply passed from the vertex shader to the GS, and the GS runs using points as input. If the vertex ID arriving in the GS is higher than the number of vertices in the chunk (read from the “cached counts” buffer), nothing is output. Otherwise, between 1 and 3 quads can be output here. The vertex and index data generated in the previous steps are used here to determine the vertices and quads to be output.

Unfortunately this method is slightly wasteful in that potentially many GS invocations are not outputting anything, and the total number of quads for each chunk does have a limit (albeit an arbitrary one designed to reduce this GS wastefulness). But with the shader exiting without doing unnecessary calculations, this impact can be almost unnoticeable on a modern GPU, considering they are designed to run many millions of invocations of shader programs in each frame. Using this technique also allows for no communication back from the GPU to the CPU, namely the number of quads required to draw each cell. This is important because it avoids stalling the pipeline. If Shader Model 5.0 is used, this issue can be removed completely by use of the DrawInstancedIndirect() function, although even that still leaves a little to be desired (a draw call would be required for each chunk). With the current method, only a single draw call is needed to draw all the chunks.

There is plenty more to say on this subject, but honestly this is too long already and probably no-one will read it anyway – but congrats if you got here! :D. Obviously still on the to-do list is to have different data in each chunk (terrain data, asteroid data, etc), and LODs, which will likely be quite fiddly, especially the LOD fading.

But anyways, I’ll finish off with pictures of some spectacular disasters that have occurred during this development process. It’s all art, right?

]]>The most major change was that I had to now create a “scene” space dynamically as the camera moves around, to use as a basis for calculating the shadow map projections. Ideally the camera should be close to the origin in that space, so that calculations can be done in 32-bit precision on the GPU. So for the scene origin, I decided to snap the camera coords to 100 meter increments. That way, the camera would never be more than 100m away from the scene’s origin.

Secondly, all the shadow casters (only rocks at the moment) need to be translated into the new scene space before rendering into the shadow maps. I have to generate a scene bounding box dynamically to best fit the shadow maps to the casters. The visible rocks are used for this, and very distant rocks are discarded when performing the max/min tests, resulting in a reasonable depth range for each shadow texel, but limiting the shadow range to 2km. The rocks are also culled against the shadow cascade frustums before rendering them into the shadow maps, to avoid wasting the GPU’s time.

I also decided to use a texture array for the shadow maps. Unfortunately though the SampleCmpLevelZero function in HLSL 4 apparently won’t work on a texture array, which is quite annoying. This means I have to use the SampleLevel function instead when sampling the shadow maps, and then doing the depth comparison manually. This results in aliased shadow edges, although I have noticed many commercial games have displayed the same artifact. If it annoys me enough, I guess I’ll change the code back to how the original sample worked in that regard – by having one big shadow map texture and dividing it up into pieces for the different cascades.

But for now I’m quite happy with the results and I’ve been quite impressed with how much “depth” the shadows give to the final image.

A really hot planet. Note shadows interacting with the bloom!

The rocks are also all generated on the fly, and each one is different. I think they’re still a bit boring though, I’ll definitely be doing more work on those at some point.

The terrain doesn’t cast shadows yet. I am a bit worried about how that’s going to go from a performance point of view. I’m thinking of also implementing a “whole planet” shadow map to handle things like rings casting shadows, and eclipses.

Planning on releasing a new video of the project soon…

]]>

The first approach I tried was a projective grid, where a fixed grid is spread over the terrain in the screen space, as described somewhere in this thesis. This results in the vertices “sliding” over the terrain, causing quite a bit of wibbly-wobbliness in areas where the terrain is undersampled. The effect can be quite distracting when the camera is moving, and I spent a considerable amount of time earlier this year playing with my terrain generation to avoid the undersampling. In the end I had something that was acceptable for relatively distant terrain. The main problem with this method appeared when trying to get the grid to have enough resolution in the foreground. The issue is due to the possible height range of the terrain – on my test planet at about 10x the diameter of Earth, the terrain height range is something like 300km. So to project the grid, it has to assume that the nearest grid row could be 300km away at all times since the actual terrain height is not known until it is evaluated on the grid. And the field of view at 300km is quite large, meaning that the nearest grid row has to be spread very wide, nowhere near enough to provide adequate resolution up close. The method could possibly be improved by precomputing the nearest distances or tweaking projection matrices but I concluded that the best use for this grid method was for ocean rendering, where wave motion somewhat negates the vertex sliding effect and the min/max height/tidal range is small (maybe 100m for this huge planet).

I went googling for a few days and this approach using quadtrees is what I decided to implement. Using quadtrees is pretty much standard for terrain rendering, but special considerations need to be made when they are used on a spherical surface. For example, since the simplest way to completely cover the sphere surface is to project a cube onto the sphere, nodes near the cube corners tend to become quite stretched and can be much smaller than nodes of the same level in the middle of a cube face. But that’s not so much of a problem if the nodes are split depending on their size.

Floating-point precision is always an issue when dealing with such a large range of scales. Since a consumer GPU can only efficiently work with 32-bit floats, terrain generation on the GPU is limited in precision. Using 3D noise functions with floats, for this 10x Earth-sized planet the resolution is limited to something like 100m in the horizontal plane, and more than 1m vertically. This isn’t really ideal for walking around on as everything becomes big noisy blocks (maybe for Minecraft…)

So to smooth things out, I decided to use a bicubic patch system, where quadtree nodes smaller than a certain size are represented by a bicubic surface, which is computed from the surrounding 16 data points. I’ve done this in 2 passes on the GPU – first to compute a height grid for the patch and second to compute the bicubic patch grid from the height data. Then when a higher detail node is generated, it uses the bicubic surface to smoothly place its vertices. A second level of terrain detail is then added in on top of the smooth bicubic surface, allowing for the smallest terrain details, and no blocky-ness.

Implementation wise, I cache the computed height data in VRAM for each patch for re-use in subsequent frames. Each patch also has a normal map generated which is the source of all the surface normals. The normal maps are higher in resolution than the terrain grids, adding extra visible details. This also means that more data points need to be computed and temporarily stored in VRAM. But in this case, the floating-point precision is not so much of an issue, and just 16 bits per data point are used. The normal map is in R8G8B8A8 format, currently alpha is unused. All the patch normal maps are 128×128 and cached in a single normal map texture which is 4096×4096, allowing 1024 normal maps to be cached in a single 64MB texture. In the future, this will probably have to be modified to double up a pixel around the edge of each normal map, to stop adjacent data bleeding through at the edges (it produces a noticeable artifact).

On my system (i7, GTX780), drawing 500 patches (using instancing to reduce draw calls) doesn’t seem to be a problem. Frustum culling is already in use but the performance could probably be vastly improved by adding occlusion culling (it’s on the to-do list!). Note that efficient culling means that the approximate patch centers and radii have to be precomputed on the CPU, so there needs to be two versions of the terrain generation algorithm – one for CPU and one for GPU. This is quite annoying since they both need to produce exactly the same output. But it’s not a new problem in this program, since many procedurally GPU rendered things in the universe also need some sort of matching CPU representation. So I have spent a fair bit of time creating many different noise functions with matching CPU and GPU versions.

Oh, and I’ve implemented some rudimentary cross-fading between detail levels, so there’s not much popping of LODs. There’s still the odd hole in the terrain appearing and the odd LOD pop along node edges, but I’m quite satisfied with how it looks for now. Will upload a new video soon.

So, next on the list is to generate surface colours and more surface diversity. Many ideas on how…

And I’ll finish this with a planet that I found while flying around the universe testing that really deserves to have procedurally generated unicorns. Sadly I did not make a note of which galaxy or star system this was in, so I won’t be going back there any time soon…

]]>