After building your software, do you check-in your generated binary files? How about the output from test runs? If your software runs on multiple platforms or your test runs take hours/days to execute, you may want to consider storing the output — especially if binary reproducibility is critical.
Example. Consider shipping an application to a customer who 2 years later reports a defect. Can you reproduce their build “today”? Surely you have the exact versions of source files. But are you using the exact build file? Probably. How about the original version of the compiler? Maybe. But probably not. Don’t forget that your compilers get upgraded too — their optimization algorithms or bugfixes can change the binary execution format of your application. Thus, compiling source from 2 years ago may result in an equally functioning application at the user-level, but at the byte-level, things may have changed dramatically — and at a level where runtime defects (performance/memory) rear their ugly heads.
Myth #1: Committing generated files results in longer checkout times. No developer wants to checkout source code and wait for or be inundated with megabytes of .o, .class, .jar, .war files that they are either never used or are going to be rebuilt anyway. The AccuRev Truth: Use include/exclude rules on streams and workspaces to control which streams have access to generated objects and who will receive them during checkout.
Myth #2: Committing binary files slows down your CM system. Traditional SCM systems combine both meta data and content resulting in slower performance over time as the number of files increase (think labeling). The AccuRev Truth: AccuRev stores meta-data separate from file contents and uses indexes to lookup and retrieve contents. For example, transactions are labeled not files. Using a card catalog (index lookup) to find your books is always faster than walking the isles (linear scan).
Myth #3: Storing generated artifacts will bloat the repository. Back in the day of wild-west coding, there was little rhyme or reason for where files were saved in the source tree. The build system would simply compile the files it found, save the generated output right next to the source file, and as long as everything linked & compiled — it worked. But in todays complex world of multi-layer software architectures, tiered deployments, mixed technologies, and sophisticated build tools, following a convention is almost a necessity (think ruby on rails, maven, etc). The AccuRev Truth: Organizing the top-level source tree and configuring your build tool can make it very easy to carve out source vs. binary vs. tests vs. scripts, etc. Using include/exclude rules, end-users can decide at the stream or workspace level what parts of the file tree need to be visible.
The Pattern. In this pattern for versioning generated artifacts, I’ll show how streams can be used to store generated files only in the appropriate stage of development and prevent unwanted exposure to developers. Two options are present that can also be used in combination.
Option #1: Store and track generated artifacts as sub-configurations isolated from the mainline. From a baseline snapshot such as a test build or release candidate, create a new child stream to store the generated artifacts. Then create a second snapshot that represents both source code and generated artifacts. For a single “configuration” you now have two snapshots – one for source only and a second for source + binary. Furthermore, you can diff these two snapshots to know exactly how the binary configuration is different from the source configuration. You might also consider storing compiler files, debugging output, test output,the compilers themselves (!), etc.
Option #2: Store and track generated artifacts directly in mainline but exclude them from downstream access using stream-level exclude rules. The top-most streams that need access to both source and binary will include the majority or entire filesystem footprint in their configurations. The first stream that does not need access to generated objects will likely be the candidate to set an exclude rule on the folder(s) that contain those files. The exclude rule is inherited to all children and grandchildren.
When using exclude rules, it is easiest to set a single rule on a top-level ‘./build’ or ‘./generated’ folder rather than creating a rule for each sub-folder in a large source tree. Traditionally, make based build systems would generate the compiled files in-line with the source code. Lately, ant based build systems would package all generated artifacts in a separate sub-tree off the root. Regardless of your build tool, it’s best to have all generated artifacts in their own tree – it makes it easier to exclude as well as safer to clean!
In practice I see both patterns in use and both have equal merit depending simply on the situation at hand. Option #1 is commonly used when generated artifacts are not to be included in the official release. For example, transient or secondary artifacts such as test cases, debugging output, reports, etc. These files are not promoted up to the release stream. Option #2 is usually used when the generated artifacts are expected to be included in the official release snapshot. Thus, they are promoted up through the test/build/release streams. The build system for these types of ‘uber’ configurations may have multiple release targets creating different levels of release packages such as ‘minimal’, ‘app’ , ‘app-with-tests’ and ‘full’. That is to say, the CM system may have all possible files but you can choose what actually gets deployed. Ultimately, storing everything in the CM system may likely be the right choice for audit and reproducibility.
/Happy Coding/

