Aim of the project:

Thanks to this project Ogre users will benefit in faster rendering of particle systems. Particles hidden behind solid objects will not be painted.
Low frequency particle systems will be enabled to be rendered with downsampled resolution without observable loss of quality (i.a. thanks to blur at center of effects and full resolution in the vicinity of solid objects edges). That will enable saving more computational power for improvement of particle system granularity (for example).

General Information:

Project covers implementation of "GPU gems 3. Chapter 23. High-Speed, Off-Screen Particles" from: .

It is being realized during "Google Summer of Code 2012".

Student assigned to the project: Karol Badowski.
Promotor of the project: Assaf Raman.

Project repository is a fork of OGRE mercurial repository (revision 8d3b960cce3e), aviable at: . (new fork)

Project proposal on Google Summer of Code website:


  1. preparation before implementation: - done

  2. depth acquisition: - done

  3. downsampled rendering: - done

  4. render target merging & shaders: - done

  5. mixed resolution rendering: - done

Tasks for each milestone:

- requirements specification
- UML diagrams
- reading tutorials
- understanding documentation

- two pass depth acquisition
- multi render target depth acquisition
- z-buffer depth acquisition
- alpha channel depth acquisition
- automatisation of depth acquisition technique choice

- dowsampling with variable scale
- point sampling depth renderer
- maximum of depth samples renderer

- alpha-blending
- binary depth test
- soft particles

- edges detection
- stenciling

(TODO): It will be replaced, program aim has changed at very beginning to implement SampleBrowser application and move all operations of Off-screen particles to Compositor Script - There will uploaded current model of class inhreriting properties of sample browser application class, methods to enable and disable compositor, listener that swaps a material during change of rendering tactic (between standard render_scene and one that calculates depth - aim is to preserve original material properties of background objects) There will also be a detailed description on Compositor scripts, Materials scripts, Vertex programs/shader, pixel programs/shaders, diagram of communication flow between offscreen targets. Put description of own invented method of rendering to offscreeen both additive and blended particles at the same time with preserving their original final output (there has to be a split to 2 rendering targets to preserve 100% accuracy - solution is scaled) Copy results of efficiency tests from forum.

Initial proposition of classes diagram:


Short description:

OffScreenParticles: (perhaps could be a CompositorChain)

Class controls appropriate order of operations.

In the constructor, it transparently makes a choice of appropriate subclasses (for example subclass of DepthAcquisitioner with appropriate parameters).
It acquires input data, calls methods of following classes and propagates the output.

Even if initial choice of some used subclasses and their parameters is transparent, there are still getters and setters.


Class could be abstract (or even interface), because methods will vary depending on version of used RenderSystem and GPU series.

It would be comfrtable if choice of the subclass was automatic, transparent to the developer. Decision process will need to "ask" classes providing intel about Render System. It also could check whether Alpha channel is beeing used (semi-transparent solid objects).

-ZBuff_DepthAcquisitioner: Z-buffer
(provided in directX10)
-MRT_DepthAcquisitioner: Multiple Rendering Target
(if older than directX10 and we do not use MSAA - multisample AntiAliasing)
-Alpha_DepthAcquisitioner: depth saved in channel alfa
(if tranparency is not used and (we do not need MSAA or GPU series different than GeForce6 or GeForce7))
-TwoPass_DepthAcquisitioner: writing to single rendering target in extra pass
(if we do not use MRT, still have possibility of MSAA even in directx9, still have possibility to use alfa channel)

Class should also provide a method to read acquired depth data transparently, no matter which subclass was used.


Class used when there is need for downsampling of the depth buffer.

It is not obligatory when we use Two-Pass Depth Acquisitioner.
Provides fields and setters allowing to change these fields:
Especially scale of downsampling.

Could be expanded and have a subclass providing downsampling that is not proportional to the powers of 2 or not square. It could provide rectangular fields, that requires additional field. Subclass: RectDepthDownsampler


Abstract Class.

Important abstract method: silhouette().
I will provide support at least for two overrides (suggested in the article), that use private methods:

-pointSamplingDepth(): creates halo
-maximumOfDepthSamples(): minor halo artifacts reduced after linear blending

(Game developers can provide own impementations of silhouette() method, for example: Median, Average, depending on derivative, how rapid depth changes, other...)


Abstract Class.

Class merges data from MRT, preparing final output.

Most important abstract method is concatenate().

Class should support the usage of some already implemented PixelShaders and their concatenations, user defined application-dependant PixelShaders.

Method alfaBlending() that uses zShade() abstract method.

I will provide the implementation of overrides for zShade(), using:

If a specified particle effect does not support transparency, we can use only zShade() without alfaBlending().


It is class not mandatory to use. Game developer can define in the constructor of OffScreenParticles class whether to use it or not.

It provides edge detection method...
-detectEdges(): provides Laplace or Sobel edge detection matrix filter

...and second pass of depth acquisition with full resolution
-stencileEdgesPerticularisation(): fraction of pixels repeat the process without downsampling

(acquisition dependant on DepthAcquisitioner subclass, second pass of depth acquisition with full resolution)

Used strategies:

MAXIMUM OF DEPTH DOWNSAMPLING (displayed debug difference between max and min value)
EDGES DETECTION (Sobel filter X,Y,Xmirrored,Ymirrored)
COPING WITH DiractX / OpenGL transformations
HANDLING HALO EFFECTS (linear filter + DEPTH texture / MAXOFDEPTH texture)
Image Image
DETECT INTERSECTIONS (depth test, Sobel filter)
BLENDING OFFSCREEN TARGET PARTICLES (only additive/only blended)
MIXED TECHNIQUE (my idea for 2-target offscreen rendering, coping with mixed alpha+additive at the same time)
Image Image
Image Image Image
Image Image Image


"Off-Screen Particles" project timeline:

Before May 21 - Reading documentation of OGRE engine, especially particle systems

Before Jun 4 - communication with project supervisor about eventual difficulties or questions in understending of already implemented classes

Before Jun 16 - continuation of Ogre Tutorials, Exam session on one university

Jun 17 - analysing OGRE Example Applications in search of useful solutions

Jun 18-19 - creating bacic scene for presentation and future tests

Jun 20 - defining animation for scene to distribute particles, creating fumes particle system

Jun 21-25 - reading tutorials and articles on depth testing + OGRE tutorials, EXam session on second university, tutorials about CG, HLSL, OGRE tutorials about shaders

Jun 26 - Exam DNA sequentioning, starting attemts to write depth acquisition. FINISHED EXAMS - (Summer begins always 1 month later in my country).

Jun 27 - trying to implement depth acquisition, examining code of "soft particles" effect implementation by ahmedismaiel

Jun 28-29 - still trying to write depth acquisition, examining all forum topics about that.

Jun 30 - hosting family

July 1 - implementing passing achieved already depth texture between compositors in compositor Chain,

July 1-4 - eliminating bugs in code (difference of displaying depth of semi-transparent objects in DirectX3D and OpenGL), downsampling is based on SSAO example - does not inherit texture, analysing possibility of handling semi-transparent objects together with particles.

July 5-6 - using own SolidMaterialsBuffer, entities are given the material manually. Achieved concatenation of compositors into chain. Deciding that there are no other solutions to preserve colour material, than using listeners on application level. Passing mocked texture instead created from depth in R,G,B,A channels. Using Post filters in compositor chain, reading script tutorials. fuguring out how to use compositors selectively? - only for solid materials. Modification of particle materials

July 7 - Figured out difference in execution of material between DirectX9 and OpenGL. Passing depth from vertex to pixel shader instead of calculating in pixel shader. Then texture is passed to next compositor that displays it. Using PF_FLOAT32_R. for passing depth.

July 8 - Writing downsampling in compositor, editing view from prepared movies to show summary, universal shader for downsamplingfor rectangular downsample,debugging of downsampler

July 9 - writing application-level proof of concept basic version of off-screen particles, using backbuffer for depth testing. creting and displayin viewports with view of solid objects only, particles only, writing concatenation. Handling recognition of render targets (setting them to invisibility before rendering and visibility after, to avoid infinite mirrors.) screen target for downsampling, handling multiple miniscreens. rendering separate groups of objects with masks connected to viewports

July 10 - turned off antialiasing. Separated display of solid objects and particles in viewports, compare of depth of downsampled texture (done in backbuffer) and particles. contorling autoupdate level of render targets. Handling render targer in post/preRenderTargetListener, mocking setting of transparent colour.

July 11 - looking for solution to handle colour transparency differently. Start implementing compositor for joining textures

July 12 - calculations for off-screen alpha blening algorythm (found mistake in N-vidia article ), continuation of writing concatenation compositor

July 11-14 - reading tutorials, figuring out how to writr, solved dynamic swap of texture

July 14 - finished texture merging compositor for proof of concept - finally visible halo effects (downsampled texture put on solid object background), taking part in charity event

July 15-17 - comming back to compositor version from 8 July. looking for reason of difference in Dx9 and OpengL display of depth (in one particles / objects without material SolidBuffer are omitted, in second one they generate white noise simmilar texture), learning ogre Wiki examples

July 18-20 - continue transformation to compositor chain, reading examples of post-filters, managed to implement downsampling and maxof depth downsampling, display downsampled depth texture,

July 21 - saturday, spend with friends

July 22-23 - continue transformation to compositor chain, , Writing universal version of maxofdepth downsampling (controlling referenced texture address based on downsample level)- passed as size of pixel as automatic named parameter for transformations

July 24 - trying again to figure out how to preserve original material with rendering depth, reading forums and tutorials

July 25-26 - material of DownsampledDepth rendering no longer has to define material by calling other comositor from chain. Transferred to SolidMaterialsBuffer compositor - everything works faster, input render target is given from compositor. Finding and removing mistakes in configuration. Slowly starting to transfer ready compositors into first one to merge them.

July 26-27 - finally figured out how first_render_queue and last_render_queue attributes work in selective rendering - rebuild earlier method of dividing (maskqes were used before). continue trying to figure out how to render one object with 2 materials in a comppositor: original colours generating pass and depthh texture generating one, how to merge them.

July 24-28 - trying to ask on forum and find in forums, tutorials, other applications how to create new compositor pass inheriting all previous operations of original material (to preserve amount of textures, passess, depth maps, normmal maps) and just extending it without replacing (to generate scene depth map in the same pass). Did not find the answer yet. Seems like complicated issue (it is not achieved in SSAO example either, on which I am basing construction of compositor chain)

till July 28 - uploading archives of channges in scripts and application, to repository (forgot to upload older bakupped archives earlier). Figured out that program stopped working in DirectX9 - works only in OpenGL since some time - pproblem not strictly defined in *.log file. Colour texture is still mocked, not original, but one generated from depth texture. Particles red channel is still displayed (in their case material is different tha SolidMAterialBuffer - red channel of their colour is taken instead of "red channel only" depth texture.) More details of current state and changes, here:
trying to merge compositor to concatenate rextures that way it would use difference in RenderQueueID
and here:

July 29 - continue transferring shaders to recently created "OSP_AllInOne" compositor - trying in every trnsfer to expand shaders line by line from working version to previously created one, to see where is difference, mistake between OpenGL and DirectX work of the same script.

July 29-31 - Still trying to figure out how to universally inherit original tactic of rendering texture and change only little fragment of shader / eventually whether transfer some of original definitionns of passes/textures from original material to new one, to preserve at least colour and alpha value. Ask on forum about that, trying to find the solution in examples simpler than DefferedShading, forums, code of applications. It is a bottle-neck problem. Reading script tutorial averyday to find solution.

August 1-4 - Reading forums in search of solution to swap materials (especially DOF - Depth of Fade forum in search of solution simpler than one used in Deferred Shading), asked on forum again, searched in documentation, script tutorials, it seems noone knows the advice how to solve that. Without this step next passes can't be displayed the original way (own vertex program an fragment program has to be used, so original material is replaced manually with material which uses these shaders).

August 4 - Still no advice on that problem found. Problem precisely described on forum. Splitting task to parallely continue searching for answer and complete missing passes of compositor to finish them before deadline (they will have to work on mocked data, not on original colour texture).

August 5-7 - Work without checking project thread. Due to problematic family-related circumtances had to work most of the time off-line between 5-16 August (still reporting between 8-12 August):

Loaded forums and tutorials on details of every script attributes, hurrying up to complete missing compositor passes, shaders, materials. Paralelly looking for solution of problem with inheriting original material. Managed to create and improve Sobel Filter: Reading and analysing scripts and shader code of existing filters. Managed to write SobelFilter based on that, debug it and make work in both DirectX and OpenGL. For now tests on mocked data (on texture of depth, or downsampled textures) - tests on RGB channels. Extended Sobel Filter with ideas from articles about Sobel and Laplacian filter (decided on additive merging standard and mirrored displays in both X and Y passes of filter) to detect edges from all sides. Optimalisation of shader code to execute acquisition of texture samples less times. Reading Wiki Articles to find solutions for missing shaders. Reduced calculations for additive mixing of sobel filters. Execution of samples acquisition for mirroded versions - replaced with reversing output and one-side clamping. Rebuilding shader after mistakes in compilation. Modified samples distance (nased on how much downsampled the texture is) - operating on matrixex of position delta relative to size of pixel (that way no risk for missing the edges on small textures).

August 8 - Entered forum to report completed compositor. Reported again not finding solution for modifying original material from level of scripts. No answer found. Only perspective seems to be own parser of materials. Only found example is complicated (Big material attributes parser and generator from Defered shading).

August 7,8-12,now - (Earlier and later too, but in this time condensed intensivity) I was looking for the way to use stencil in script language to improve halo effects reduction. - not as mask for separating objects, but for creating map of sobel filter values different on pixels of the same rendered quad. Also seems like a problem - most forums end without an answet / changing idea and using viewport mask instead stenciling . Only forum about topic treats about application-level: Most forums end with changing solution (people wanted stencil for selective rendering there). Idea is to test if "Discard" operation could be used like in article.

August 9 - Describing detailed problem with stenciling on forum, because seems that there is missunderstanding. Got to know that stenciling is either not part of articleobligatory or .

August 10-11 - Research to find how to cope with stenciling on compositor level.

Still trying to sove material attribute inheriting problem, how to do that without constant substitution of original material with buffer (preserving both: colour from original rendering and adding depth test to the same instance of object, in one compositor). No answer on how to do that. Paralely still trying to solve stenciling to avoid doing nothing - analyse post-filter scripts that use multiple texture input, learn how to identify them writing StencilFilter

August 12 - Deadline is approaching with big steps. - report on forum. Change of priority, to use the solution outside of compositor script, to alter original materials in passes of one compositor. Use listeners, so that both: simple colour tecture and depth texture could be produced. Still looking for easier example than Deferred shading (it contains parts difficult to understand). Hurrying up to find more clear solution or understand this one and use in my example.

TODO: Describe changes to DeferredShading compositor classes while conversion for purposes of OffscreeenParticles project, from 13-17 AUGUST

August 13-14 - Certain problem got intense, could not work 70% of day, it caused big unexpected slowdown in crucial time, however worked and researched rest of time.
Trying again to understand how attributes first_rendering_queue and last_rendering_queue work, on any possibly simpler example.

August 15-17morning - Work without checking project thread. Due to problematic family-related circumtances had to work most of the time off-line between 5-16 August (still reporting between 8-12 August):
Hurried upt to solve bottle-nect issue of rendering depth and original colour at the same time (so that prepared post-effect shaders could be tested also on potentially altered situation). Caleaning code structure, Completing and debugging missing shaders, checking why under Dx9 for depth test data passing between vertex shader and pixel shader Analysing Deferred shading Code is lost, understanding the code, also trying to do the solution that would be better, more clear. Main point is that I could not define 2 materials for solid objects at the same rendering without listener that would swap them a) they were different for each object, b) depth test was the same for all, but could not be set in render_scene pass without giving objects this material as constant. Only material swap I could use without listeners was one in render_quad, where alternative materials can be defined. Only tactic I could see was using scheme listener - did not find any better solution.

August 17afternoon - able to report again. Reported progress level and updated repository. Reported idea to use swapping of materials instead of generationg (it solves particle offscreen problem and generator can be next extension, thet will be cooler. Swapped material will use needed solutions and be more clear to read (like SSAO example compared with Deferred shading). Howexer material will be still prepared and swapped in listener. Still putting prepared earlier shaders together.
Finally found much better example (GlowMaterialListener.h from CookBook Glow tutorial of material_scheme listener that helped much better than Deferred shading and than Depth of Fade - it was simpler. It needed only loading depth material and applying it - new scheme was registered for object. Finally, I'm able to swap materials and move forward (put inside prepared shaders).

August 18 - cleaning code to class definitions in header, completed listener that swapped materials.
There was a problem that not every material could have been swapped. Material without own shaders was applied and material with own shaders did not work. Lost some time creating shaders again line by line to check what is the reason. It turned out that some other material with own shaders finally worked - even If i applied my shaders instead of original ones.
Hurrying up to put prepared materials with shaders together (also one shader needed to be coppied + very slightly modified - depth test add compare with previous depth map). Issue of material swapping was finally solved after this all problems.
Putting prepared components together was still possible to be done until pencil-dropdown:

August 19 - Found real reason of previous problem in swapping material and solved it. Used calculated parameters (ready since july) to render alpha-blended particles over original colour target. Found command that reads backbuffer in post-shader modification of final target pixel. Used it. Turned out that my parameters were right calculated. Also apllied designed version from own idea of mixed technique rendering (addition to upgrade from minimal version - forgot for a moment that the transfer to SampleBrowser is still neeeded, could have saved a moment of time and finish that).
Converted application to SampleBrowser version with needed UI controls, applied actions to these controls (swap of compositor and downsample level). Moved all methods from original application to header file (it is preferred practice for Sample Browser), alhough day before spent time on splitting definitions from CPP to header. Made structure of files proper for CMAke compilation, modified CMake files, uploaded to new repository. Spend some time with family.

August_20 - Corrected functionality of UI controls (after change of properties, compositor was nt updated, after some time found solution to reload them). Spend some time with family.
Made depth test that worked in OpenGL. There was a mistake only in DX9 - lost time rebuilding it all day again and again. It turned out later that there was no mistake in shader, but in chooser of profile (2_x parameter instead of 2_0 on first place). Finished plan for minimal solution 4 hours after pencil-down deadline (12:00-13:00) Pacific Time. Added prepared element to compositor. Trip to xero point to printing, signing Contibutor License and sending it
After comming back, put and describe minimal expectations version ready 4 hours after pencil-down hour.

August_21-23 - Swapped profiles of rendering. Presenting effects of compositor, presenting how prepared passes work together with compositor. Presenting ideas of extension for future. Doing efficiency tests, some upgrades, Presented that Sobel filter correctly detects alpha - edges for upgrade of halos reductions (currently halos were solved by already prepared maxofdepth test and linear filtering). Rested after work, several times a day applying some upgrades.
Much more details since page:

August_21 Depth testing is uploaded to new repository.

August_22 Two off-screen pass method inverted and described. Corrected profiles to execute doepth test in DX9. Much more details on forum.

August_23 Adjusted Sample Browser for efficiency test. Y axis calculated properly in both Firect3D and Opengl (although their difference in matrix operations). Generating new material of particles: colour and transparency depend on. Temporery turned off swap of compositors turned on again.

August_24 Reduced every rendering to offscreen targets that could have been rendered straight to screen - tests of efficiency improved much. Separated 2 materials that use prepared shaders for off-screen rendering (own solutions for "mixed-technique"). Compositor reduced for new efficiency tests. Turned off simple rendering mode for time of efficiency tests - efficiency tests for depth test shader for particles. Tidied up alternative compositors, reduced clear passes, disabledd depth downsampling (halo reduction will be now handled by mixed- resolution upgrade). Switched application thumbnail to more representative.

August_25 Scalable (optimalised pass, downsampled on every source) solution for first pass of mixed resolution and comparing with mask texture + discarding pixels on value == 1, alpha-blend rendering others. Sobel mask texture is used only based on alpha channel (finds now only edges where mixed resolution would be needed for low-frequency effects). Stenciling script configuration is ready (turns out to be similar to one in StencilGlow), Check parameter is set on, but stenciling is not executed. Writing Appeal to reconsider evaluation decision.

August_26 Coppied light source from no-SampleBrowser version of application (changing name solved the problem - turns out that in SampleBrowser this name was already registered/not allowed).. Executing (full resolution) second pass of Mixed Resolution in current solution with simple checking of Sobel Texture, use discard - nearly the same shader as for downsampled pass, but based on depth testing (renders only particles in render_scene with matrix transformation of screen view parameters) (for fullscreen, stencil buffer is much faster solution). Writing Appeal to decision.

August_27 Clean unused passes created during new tests. Quickly reversed Y Axis in mixed resolution (change of profilefor OpenGL), trying to transform example of StencilGlow to resign from Compositor script solution and move to listener (different than in mentioned example - executed only for 2 passes). Still figuring out how to use properly stencil{} script so that it could work properly for my purpose. Writing appeal to decision.

August_28 Selecting most representative archive screens for Wiki Article. Recreating timeline based on forum, changes of code from some archives. Writing appeal to decision. Trying to transform SrencilGlow example to accelerate stenciling of current "discard" solution based on comparing values with texture. It is worth it, cose solution is scalabe.

August_29 Imroving own physics engine for presentation on WGK conference. In evening I'll continue recreating timeline based on rest of archives, comparing changes in code and history of tutorials red. Thrn I'll update class model (project changed on very beginning to write Sample Browser application, then to use compositor script only - handling alterations of render__scene could not avoid using material scheme listener).

August_30 Calculation of offscreen alpha changed to calculation of inversed alpha. This is debug view: PICTURE URL still need to correct final rendering to screen.
Here is set of dabug pictures:
I finally found solution - multiplier of inverted alpha channels, stored in backbuffer was blending background colour - that is still right, but output of offscreen colour is added without blending (just added). My solution, (different than one proposed by NVIDIA, without inversions - counting direct alpha) also worked after applying the same change. Here is difference between last blendings: former method, new method