feat!: use actor.json as source of truth to support single actor and broader monorepo structures#96
feat!: use actor.json as source of truth to support single actor and broader monorepo structures#96ruocco-l wants to merge 18 commits into
Conversation
…eric builder token
metalwarrior665
left a comment
There was a problem hiding this comment.
I think this is good and obviously needed change. It won't be that big of a hassle for other teams since adding new Actors is quite rare. We need to polish this though.
| '.actor/', | ||
| // In root .actor/ mode, .actor/ changes must trigger builds | ||
| ...(isSingleActorRepo ? [] : ['.actor/']), |
There was a problem hiding this comment.
We can remove this completely, there is no reason we should commit changes to top level .actor in multiactor
|
|
||
| let cachedBuilderUsername: string | undefined; | ||
|
|
||
| export const resolveBuilderTokenUsername = async (): Promise<string> => { |
There was a problem hiding this comment.
What is the use-case to need this? Just so you don't have to think about the username?
| }; | ||
|
|
||
| const readActorName = async (actorJsonPath: string): Promise<string> => { | ||
| const actorJson: { name?: string } = JSON.parse(await fs.readFile(actorJsonPath, 'utf-8')); |
There was a problem hiding this comment.
I would rather introduce a new JSON file for our own custom needs than stitch non-spec fields to actor.json.
There was a problem hiding this comment.
that would also enable us to cover miniactors, monorepos or single actor stuff without doing the
user-name_actor-name convention.
It could all be config stuff
| actorName: fullName, | ||
| folder: actorDir, | ||
| isStandalone: folderType === 'standalone-actors', | ||
| tokenEnvVar, |
There was a problem hiding this comment.
I'm not a big fan of passing the token around because it is more likely to leak, we should be able to just resolve it in place like we did no?
There was a problem hiding this comment.
This doesn't pass the token, just the name of the env variable where it's stored
|
We should also sync with @gullmar about other repos. The monorepo where each Actor has its own src is also probably not ideally supported. |
| const resolveOwner = async (folderName: string): Promise<string> => { | ||
| const ownerMatch = folderName.match(/^(.+)_[^_]+$/); | ||
| if (ownerMatch) return ownerMatch[1]; | ||
| return resolveBuilderTokenUsername(); | ||
| }; |
There was a problem hiding this comment.
So the owner still has to be defined in the folder name 😅 That kind of defeats the purpose of this entire effort 🫠
I would either want the conventionally named folder names only (owner + name in folder name), or to have both the owner and actor name in actor.json, but not both the conventionally named folder names for owners and actor.json for names at once 🙏
There was a problem hiding this comment.
No. IF the owner is there we automatically assign and look for the appropriate token. If not we fallback to whatever is the owner of the BUILDER token. This is still useful if you have a default account where there are all the miniactors (for which is ok to fallback the BUILDER token) and have one miniactor that is build under a different account (for whatever reason), in which case you will specify it with the folder name owner_whatever-actor-name-since-it's-not-used.
But I agree that just adding the owner to the actor.json is much cleaner
There was a problem hiding this comment.
I agree that for most repos, the single BUILDER token would work fine, since we usually have everything under one account. So, for the most part, this would only have to cover edge cases. Even then it seems like there should be a better way to define the edge case then by folder name - be it actor.json or a global config file 🤔
| }; | ||
|
|
||
| const readActorName = async (actorJsonPath: string): Promise<string> => { | ||
| const actorJson: { name?: string } = JSON.parse(await fs.readFile(actorJsonPath, 'utf-8')); |
|
Alright alright, let's do the config |
|
One more related long-term thing. When we designed this library, the core idea was "convention over configuration". We wanted everything to be at hardcoded places since that is always better than having to config and it worked well for that use-case. Now if we want many (or arbitrary) different setups, we cannot keep patching it like this because we would have to keep adding On the other hand, the library has to remain somewhat opinionated, like what file changes trigger build etc, otherwise it has basically no value, it would just be some configs + few API calls. I think for now, this PR is ok because top-level .actor is the template thing, but there probably already are things in the "changed Actors" that break. But for anything more, I would wait for th full rewrite. That should also make it truly open-source usable. |
metalwarrior665
left a comment
There was a problem hiding this comment.
I gave it another thought and I think it wouldn't be too hard to go all the way for really configurable config. Otherwise, we would have to potentially do another breaking change.
We should support (let's look of we can find more) at least 3 reference repos:
- typical Store monorepo
- template-like repo (e.g. WCC)
- monorepo with packages and actors importing them - e-commerce
I'm thinking it could work like this:
- Each Actor config points to the folder where .actor is (like now) or directly to the .actor folder.
- Then from actor.json, we read
dockerContextDirand that contains all folders/files that can influence the changes for that Actor. We can allow config to haveoverrideActorContextif they want to e.g. narrow it down. - Then instead of the current logic that if "code changes" we add mark all Actors as changed, we just do the changed files logic for each mini Actor independently, that cleans up the logic as well.
This way, we handle current Store monorepo (Actor folder + top-level context), template (just top-level), e-commerce (actor folder with its code, packages, shared)
What do you think?
| { | ||
| "actors": [ | ||
| { | ||
| "folder": "actors/web-scraper", |
There was a problem hiding this comment.
I would add actorName too rather than deriving it from the folder
There was a problem hiding this comment.
I thought about it, but I think can cause some confusion when you (usually when you publish the actor) play a little with the naming for SEO reasons (or similar). I think whatever is in the actor.json should be respected as the truth and not be overridden by some obscure logic from the testing package.
There was a problem hiding this comment.
The testing lib will not set or change the name, it just check that it exists. The name.in actor.json is not a single source of truth so I would not use it, better to have all here
| }); | ||
| }, | ||
| ) | ||
| .command( |
There was a problem hiding this comment.
I would remove this, Claude will one-shot this and you will want to manually check it anyway.
There was a problem hiding this comment.
I think this sort of CLI initialization utility perfectly fits into a library like here 🤔
However unless we make this a completely seamless end to end migration command, then I'm also against it. Right now as far as I can tell, it doesn't really migrate the ENV vars. I will open up a separate comment for that.
There was a problem hiding this comment.
| if (!Array.isArray(config.actors)) { | ||
| throw new Error(`Config file "${CONFIG_FILE_NAME}" must have an "actors" array at the top level.`); | ||
| } |
There was a problem hiding this comment.
I would recommend using Zod for parsing the JSON config 👀
| }); | ||
| }, | ||
| ) | ||
| .command( |
There was a problem hiding this comment.
I think this sort of CLI initialization utility perfectly fits into a library like here 🤔
However unless we make this a completely seamless end to end migration command, then I'm also against it. Right now as far as I can tell, it doesn't really migrate the ENV vars. I will open up a separate comment for that.
| owner: options.defaultOwner ?? '<OWNER>', | ||
| tokenEnvVar: options.defaultTokenVar ?? '<TOKEN_ENV_VAR>', |
There was a problem hiding this comment.
We could make this be extracted from the current folder structure we use everywhere, so we would have a seamless end to end migration command 🤔
E.g. take the owner username and translate it to the current env var name: APIFY_TOKEN_<OWNER>
| }); | ||
| }, | ||
| ) | ||
| .command( |
There was a problem hiding this comment.
Closes #84
The CI tooling previously derived actor names and ownership entirely from folder naming conventions (
owner_actor-name), which was fragile — usernames with multiple underscores caused mismatches, and it was impossible to have a folder name that didn't encode the owner. This PR makes three interconnected changes to fix that:actor.jsonis now the source of truth for actor names. Each actor folder must contain.actor/actor.jsonwith a"name"field. The old convention of reverse-engineering the actor name from the folder name (splitting on the last underscore) is gone.maybeParseActorFoldernow returns the folder path directly (e.g.actors/shopify) instead of attempting to reconstructowner/actor-namefrom the folder. This eliminates the class of bugs where folder names didn't round-trip cleanly to actor names.getRepoActorsnow resolves which env var holds the correct token for each actor (APIFY_TOKEN_<OWNER>orBUILDER_APIFY_TOKENas fallback) and validates the actor exists on the platform. The resolvedtokenEnvVaris carried inActorConfigand used byApifyBuilder.fromActorConfig— replacing the oldfromActorNamewhich re-derived the token independently. This catches misconfigured folder names or missing actors before any build/delete work starts, with a clear error message.Additionally:
actors/orstandalone-actors/directories can now be built if they have a root.actor/actor.json. The owner is resolved fromBUILDER_APIFY_TOKEN. In this mode,.actor/changes correctly trigger builds (they're ignored in multi-actor repos where.actor/is just forapify pushCLI).BUILDER_APIFY_TOKENfallback. Both token resolution andApifyBuilderfall back toBUILDER_APIFY_TOKENwhen the per-owner env var is missing, instead of failing immediately.Breaking change
Actor folders must now contain
.actor/actor.jsonwith a"name"field. Teams already using this tooling will need to add this file to every actor folder. The folder name is still used to resolve the owner (everything before the last underscore), but the actor name itself comes exclusively fromactor.json.Note oncirc_leactorscirc_lematching now depends onactor.json's"name"field matching what comes afterapify-managed---in thecirc_leaccount. As long as the"name"inactor.jsonis correct (i.e. matches what the old folder convention would have produced), the fragility is the same as before — no better, no worse. If a repo's builds were already working correctly, adding the right"name"toactor.jsonwill not change behavior.circ_le logic has been removed from main.
Tests
diff-changestests for ownerless folder matching, single-actor.actor/build triggers, and multi-actor.actor/ignore behavior.utilstest suite (10 cases) covering token resolution priority,BUILDER_APIFY_TOKENfallback, validation failure on missing/unreachable actors, missingactor.jsonname field, single-actor repo mode, ownerless folders, standalone actors, and special characters in owner names (e.g.luigi.ruocco->APIFY_TOKEN_LUIGI_RUOCCO).Update: mandatory config file replaces all discovery logic
The discovery/guessing logic from the earlier commits has been fully replaced with a mandatory root-level config file (
.test-tools-actors-config.json). Combined with each actor's.actor/actor.json, these are now the only sources of truth.Config file shape:
{ "actors": [ { "folder": "actors/web-scraper", "owner": "myteam", "tokenEnvVar": "APIFY_TOKEN_MYTEAM", "isStandalone": false } ] }What this replaces:
resolveBuilderTokenUsername,resolveOwner,resolveTokenEnvVar,validateActorExists,readActorName— all deleted.apify-clientimport removed fromutils.ts(onlybuild.tsneeds it now).BUILDER_APIFY_TOKENfallback — removed.tokenEnvVarin the config is final; if the env var is not set at build time, it fails hard.isSingleActorRepobranching in diff detection — removed..actor/changes at root level now always trigger builds for all non-standalone actors.New
init-configcommand:Scans the repo via
git ls-filesfor.actor/actor.jsonfiles, generates the config with placeholders. Warns if a root-level.actor/actor.jsoncoexists with subfolder actors.Updated tests:
.actor/changes now trigger builds in multi-actor repos (updated indiff-changes.test.tsandshould-built-and-test.test.ts).utils.test.ts(18 tests) coveringreadConfigFileandgenerateConfigFile.Update 2: dockerContextDir-based change detection
Config file shape:
{ "actors": [ { "folder": "actors/web-scraper", "actorName": "myteam/web-scraper", "tokenEnvVar": "APIFY_TOKEN_MYTEAM", "overrideActorContext": ["actors/web-scraper", "packages/shared"] } ] }What changed:
ownerreplaced byactorName— fullowner/nameformat (e.g."apify/web-scraper"). The actor name is no longer derived fromactor.json'snamefield —actorNamein the config is the single source of truth.isStandaloneremoved — replaced by sibling exclusion. Each actor'sfolderdefines its ownership boundary. Files inside another actor's folder are automatically excluded, achieving the same effect without an explicit flag.dockerContextDir-based scoping — each actor's.actor/actor.jsondockerContextDirfield defines which files can affect its build. Change detection uses this boundary instead of hardcodedactors/andstandalone-actors/regex matching.overrideActorContext— optional config field. When set, replacesdockerContextDirfor change detection. Useful for actors that depend on shared packages outside their Docker build context..dockerignorefiltering — if a.dockerignoreexists at the root of an actor'sdockerContextDir, matching files are ignored during change detection. Patterns are resolved relative todockerContextDir, matching Docker's own behavior. Added as a self-contained commit for easy revert if it gets pushback.init-configcommand removed — the config must be written manually.Change detection algorithm (per file, per actor):
dockerContextDiroroverrideActorContext) → skip if no match.dockerignorefiltering → ignored if matchedfolder) → skip