Salesforce can be a really interesting platform to work on for many reasons. One reason in particular is that, because there are varying skillsets and levels of technicality in teams or businesses, there are often many different ways to tackle the same problem.

Let's take the following example: You've just gotten a requirement to update a flag on Accounts where at least 1 child Contact record contains the same flag. Bubbling this up to the Account level will make work easier for reps that prioritize fewer clicks over everything. You've got the Flow nailed! Whenever a Contact is updated for this flag to be true, you check if the Parent Account record has the flag already turned on. If the flag is false, you set it to true. Otherwise, you do nothing. Similarly, you are checking on the insertion of Contacts with the flag on and deletions of flagged records, keeping everything in sync. All the Flow contexts you have built out work perfectly! Your deployment is on the horizon, and you make a realization, you haven't thought about backfilling existing Accounts in Production!

You reach out to your team, which consists of 4 members of varying levels of technicality:

  1. Roger, an excel whiz, pitches the following: "Let me take a crack at it! I can get a report written to give me all Account Ids with a Contact record with that field populated. I can pop that in Excel and should be able to get a Data load ready in no time!"
  2. Nina, an admin, pitches: "Well, I know you've already done a lot of hard work on these Flows, and they work great! Why don't you create a field that qualifies records for your Flow, or create dummy Contacts to be deleted? That way, you can reuse as much of what you've created!"
  3. Abby, a Developer/Admin, pitches: "There are a LOT of Accounts in Production! I'd recommend you write a Scheduled Flow that runs against all Accounts to put your logic in play. That way, we could even reuse it if we ever needed to!"
  4. Peter, a Developer, pitches: "I'd personally write a batch job in Apex for this, and call it from Anonymous Apex. That's the fastest way I'd be able to help here!"

Those are 4 options worth considering from your team! Let's take some pros and cons

Let's talk about these recommendations:

Nina's recommendation: While I do love reuse, editing production records carries many uncertainties. In addition, the insertion of dummy contacts is not an ideal way to prompt automation to occur. The insertion of these contacts could immediately jump data skew, and the creation of millions of these Contact records could put a strain on your ability to keep track of the new versus the old.

Peter's recommendation: While I believe this is a great option if you have many devs on hand, many organizations will not. This process will warrant a test class, and will also increase the level of technical debt in your Org. However, Apex is likely to be most one of the most optimized ways to check millions of Contacts and Accounts.

Roger's recommendation: Roger is really pushing standard Salesforce to its limits here! Anyone who has worked on millions of records at one time within Salesforce or Excel knows how burdensome the process can really be. Roger will likely have to either take multiple exports, or highly manipulate the data before getting it in a format that allows him to work it. With these manual steps, it will be hard for someone to check his work, and they may be forced to simply 'spot check' the data load Roger plans to do. In many Orgs, this simply isn't good enough!

Abby's recommendation: This one is probably my personal favorite from the bunch! It combines many great elements from the previous sets of options.

  1. It embodies reuse, as the Flow can reuse the setup or elements of what you've already created.
  2. It uses async strategies, where Salesforce bulkifies Flow Elements and allows for batched trasactions, allowing for millions of record updates to take place.
  3. It uses standard Salesforce, within the limits of Salesforce recommends as best practice.
  4. Data doesn't have to leave the platform.
  5. It is low in technical debt, as Flows have a far more forgiving learning curve than Apex code, meaning more folks can read and maintain your solution.

So let's build it as described!

We start by running ALL Accounts through the process.

💡
Unlike some other Flow types, Scheduled Flows offer automatic bulkification. If I have 450 Accounts that qualify for my scheduled Flow, Salesforce will split them into three batches automatically, 200/200/50.

You might be wondering about how the Flow will run without a SOQL error based on what I've just said. If this Flow runs 200 in a single batch, won't that mean I will be performing 200 SOQL queries per batch, 99 more than what is allowed? Good news for us, Salesforce also gave us auto-bulkification on certain elements in Flow. It's stated best on their Flow Bulkification article, but effectively the Flow will 'wait' at a data (pink) element, and process all 200 records in a batch simultaneously. This counts as 1 single SOQL query for your limits!

We then attempt to get a single qualifying Contact:

Then, we check to see if we located a contact with our query. If yes, we check if the Account is already flagged. If the Account is not flagged, we make an update.

And that's it! You've just written a process that is able to handle tons of volume without putting production data at risk!

Backfilling large sets of Salesforce records: A world of options.