Skip to content

feat!: use actor.json as source of truth to support single actor and broader monorepo structures#96

Open
ruocco-l wants to merge 18 commits into
masterfrom
feat/extend-support
Open

feat!: use actor.json as source of truth to support single actor and broader monorepo structures#96
ruocco-l wants to merge 18 commits into
masterfrom
feat/extend-support

Conversation

@ruocco-l

@ruocco-l ruocco-l commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Closes #84

The CI tooling previously derived actor names and ownership entirely from folder naming conventions (owner_actor-name), which was fragile — usernames with multiple underscores caused mismatches, and it was impossible to have a folder name that didn't encode the owner. This PR makes three interconnected changes to fix that:

  • actor.json is now the source of truth for actor names. Each actor folder must contain .actor/actor.json with a "name" field. The old convention of reverse-engineering the actor name from the folder name (splitting on the last underscore) is gone.
  • Changed-file detection matches by folder path, not reconstructed actor name. maybeParseActorFolder now returns the folder path directly (e.g. actors/shopify) instead of attempting to reconstruct owner/actor-name from the folder. This eliminates the class of bugs where folder names didn't round-trip cleanly to actor names.
  • Token resolution happens once, at discovery time. getRepoActors now resolves which env var holds the correct token for each actor (APIFY_TOKEN_<OWNER> or BUILDER_APIFY_TOKEN as fallback) and validates the actor exists on the platform. The resolved tokenEnvVar is carried in ActorConfig and used by ApifyBuilder.fromActorConfig — replacing the old fromActorName which re-derived the token independently. This catches misconfigured folder names or missing actors before any build/delete work starts, with a clear error message.

Additionally:

  • Single-actor repo support. Repos without actors/ or standalone-actors/ directories can now be built if they have a root .actor/actor.json. The owner is resolved from BUILDER_APIFY_TOKEN. In this mode, .actor/ changes correctly trigger builds (they're ignored in multi-actor repos where .actor/ is just for apify push CLI).
  • BUILDER_APIFY_TOKEN fallback. Both token resolution and ApifyBuilder fall back to BUILDER_APIFY_TOKEN when the per-owner env var is missing, instead of failing immediately.

Breaking change

Actor folders must now contain .actor/actor.json with a "name" field. Teams already using this tooling will need to add this file to every actor folder. The folder name is still used to resolve the owner (everything before the last underscore), but the actor name itself comes exclusively from actor.json.

Note on circ_le actors

circ_le matching now depends on actor.json's "name" field matching what comes after apify-managed--- in the circ_le account. As long as the "name" in actor.json is correct (i.e. matches what the old folder convention would have produced), the fragility is the same as before — no better, no worse. If a repo's builds were already working correctly, adding the right "name" to actor.json will not change behavior.

circ_le logic has been removed from main.

Tests

  • New diff-changes tests for ownerless folder matching, single-actor .actor/ build triggers, and multi-actor .actor/ ignore behavior.
  • New utils test suite (10 cases) covering token resolution priority, BUILDER_APIFY_TOKEN fallback, validation failure on missing/unreachable actors, missing actor.json name field, single-actor repo mode, ownerless folders, standalone actors, and special characters in owner names (e.g. luigi.ruocco -> APIFY_TOKEN_LUIGI_RUOCCO).

Update: mandatory config file replaces all discovery logic

Note: Some earlier commits in this PR are now stale or partially overwritten by the changes below. The final state of the code is what matters — the earlier commits were intermediate steps toward the current design.

The discovery/guessing logic from the earlier commits has been fully replaced with a mandatory root-level config file (.test-tools-actors-config.json). Combined with each actor's .actor/actor.json, these are now the only sources of truth.

Config file shape:

{
    "actors": [
        {
            "folder": "actors/web-scraper",
            "owner": "myteam",
            "tokenEnvVar": "APIFY_TOKEN_MYTEAM",
            "isStandalone": false
        }
    ]
}

What this replaces:

  • resolveBuilderTokenUsername, resolveOwner, resolveTokenEnvVar, validateActorExists, readActorName — all deleted.
  • apify-client import removed from utils.ts (only build.ts needs it now).
  • BUILDER_APIFY_TOKEN fallback — removed. tokenEnvVar in the config is final; if the env var is not set at build time, it fails hard.
  • isSingleActorRepo branching in diff detection — removed. .actor/ changes at root level now always trigger builds for all non-standalone actors.

New init-config command:

npx apify-test-tools init-config
npx apify-test-tools init-config --default-owner myteam --default-token-var APIFY_TOKEN_MYTEAM

Scans the repo via git ls-files for .actor/actor.json files, generates the config with placeholders. Warns if a root-level .actor/actor.json coexists with subfolder actors.

Updated tests:

  • .actor/ changes now trigger builds in multi-actor repos (updated in diff-changes.test.ts and should-built-and-test.test.ts).
  • New utils.test.ts (18 tests) covering readConfigFile and generateConfigFile.

Update 2: dockerContextDir-based change detection

Note: The config file shape and change detection logic from the previous update have been rewritten again. The earlier owner field and isStandalone flag are gone.

Config file shape:

{
    "actors": [
        {
            "folder": "actors/web-scraper",
            "actorName": "myteam/web-scraper",
            "tokenEnvVar": "APIFY_TOKEN_MYTEAM",
            "overrideActorContext": ["actors/web-scraper", "packages/shared"]
        }
    ]
}

What changed:

  • owner replaced by actorName — full owner/name format (e.g. "apify/web-scraper"). The actor name is no longer derived from actor.json's name field — actorName in the config is the single source of truth.
  • isStandalone removed — replaced by sibling exclusion. Each actor's folder defines its ownership boundary. Files inside another actor's folder are automatically excluded, achieving the same effect without an explicit flag.
  • dockerContextDir-based scoping — each actor's .actor/actor.json dockerContextDir field defines which files can affect its build. Change detection uses this boundary instead of hardcoded actors/ and standalone-actors/ regex matching.
  • overrideActorContext — optional config field. When set, replaces dockerContextDir for change detection. Useful for actors that depend on shared packages outside their Docker build context.
  • .dockerignore filtering — if a .dockerignore exists at the root of an actor's dockerContextDir, matching files are ignored during change detection. Patterns are resolved relative to dockerContextDir, matching Docker's own behavior. Added as a self-contained commit for easy revert if it gets pushback.
  • init-config command removed — the config must be written manually.

Change detection algorithm (per file, per actor):

  1. Hardcoded ignore list (repo-level dev files) → ignored
  2. Context matching (dockerContextDir or overrideActorContext) → skip if no match
  3. .dockerignore filtering → ignored if matched
  4. Sibling exclusion (file in another actor's folder) → skip
  5. Cosmetic classification (README/CHANGELOG, cosmetic-only JSON schema changes) → only triggers release build
  6. Everything else → functional (triggers build + tests)

@metalwarrior665 metalwarrior665 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is good and obviously needed change. It won't be that big of a hassle for other teams since adding new Actors is quite rare. We need to polish this though.

Comment thread bin/diff-changes.ts Outdated
Comment on lines +36 to +35
'.actor/',
// In root .actor/ mode, .actor/ changes must trigger builds
...(isSingleActorRepo ? [] : ['.actor/']),

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove this completely, there is no reason we should commit changes to top level .actor in multiactor

Comment thread bin/utils.ts Outdated

let cachedBuilderUsername: string | undefined;

export const resolveBuilderTokenUsername = async (): Promise<string> => {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the use-case to need this? Just so you don't have to think about the username?

Comment thread bin/utils.ts Outdated
};

const readActorName = async (actorJsonPath: string): Promise<string> => {
const actorJson: { name?: string } = JSON.parse(await fs.readFile(actorJsonPath, 'utf-8'));

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather introduce a new JSON file for our own custom needs than stitch non-spec fields to actor.json.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that would also enable us to cover miniactors, monorepos or single actor stuff without doing the
user-name_actor-name convention.
It could all be config stuff

Comment thread bin/utils.ts Outdated
Comment thread bin/utils.ts Outdated
actorName: fullName,
folder: actorDir,
isStandalone: folderType === 'standalone-actors',
tokenEnvVar,

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a big fan of passing the token around because it is more likely to leak, we should be able to just resolve it in place like we did no?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't pass the token, just the name of the env variable where it's stored

@metalwarrior665

Copy link
Copy Markdown
Member

We should also sync with @gullmar about other repos. The monorepo where each Actor has its own src is also probably not ideally supported.

Comment thread bin/utils.ts Outdated
Comment on lines +62 to +66
const resolveOwner = async (folderName: string): Promise<string> => {
const ownerMatch = folderName.match(/^(.+)_[^_]+$/);
if (ownerMatch) return ownerMatch[1];
return resolveBuilderTokenUsername();
};

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the owner still has to be defined in the folder name 😅 That kind of defeats the purpose of this entire effort 🫠

I would either want the conventionally named folder names only (owner + name in folder name), or to have both the owner and actor name in actor.json, but not both the conventionally named folder names for owners and actor.json for names at once 🙏

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. IF the owner is there we automatically assign and look for the appropriate token. If not we fallback to whatever is the owner of the BUILDER token. This is still useful if you have a default account where there are all the miniactors (for which is ok to fallback the BUILDER token) and have one miniactor that is build under a different account (for whatever reason), in which case you will specify it with the folder name owner_whatever-actor-name-since-it's-not-used.

But I agree that just adding the owner to the actor.json is much cleaner

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that for most repos, the single BUILDER token would work fine, since we usually have everything under one account. So, for the most part, this would only have to cover edge cases. Even then it seems like there should be a better way to define the edge case then by folder name - be it actor.json or a global config file 🤔

Comment thread bin/utils.ts Outdated
Comment thread bin/utils.ts Outdated
};

const readActorName = async (actorJsonPath: string): Promise<string> => {
const actorJson: { name?: string } = JSON.parse(await fs.readFile(actorJsonPath, 'utf-8'));

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Comment thread bin/utils.ts Outdated
@ruocco-l ruocco-l marked this pull request as draft June 25, 2026 12:24
@ruocco-l

Copy link
Copy Markdown
Contributor Author

Alright alright, let's do the config

@ruocco-l ruocco-l marked this pull request as ready for review June 25, 2026 14:59
@metalwarrior665

Copy link
Copy Markdown
Member

One more related long-term thing. When we designed this library, the core idea was "convention over configuration". We wanted everything to be at hardcoded places since that is always better than having to config and it worked well for that use-case.

Now if we want many (or arbitrary) different setups, we cannot keep patching it like this because we would have to keep adding ifs to already complex logic. We have to rewrite most of the core logic to be truly configurable, e.g. each Actor has points to its "workspace" of path dirs (that can be shared).

On the other hand, the library has to remain somewhat opinionated, like what file changes trigger build etc, otherwise it has basically no value, it would just be some configs + few API calls.

I think for now, this PR is ok because top-level .actor is the template thing, but there probably already are things in the "changed Actors" that break. But for anything more, I would wait for th full rewrite. That should also make it truly open-source usable.

@metalwarrior665 metalwarrior665 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I gave it another thought and I think it wouldn't be too hard to go all the way for really configurable config. Otherwise, we would have to potentially do another breaking change.

We should support (let's look of we can find more) at least 3 reference repos:

  • typical Store monorepo
  • template-like repo (e.g. WCC)
  • monorepo with packages and actors importing them - e-commerce

I'm thinking it could work like this:

  1. Each Actor config points to the folder where .actor is (like now) or directly to the .actor folder.
  2. Then from actor.json, we read dockerContextDir and that contains all folders/files that can influence the changes for that Actor. We can allow config to have overrideActorContext if they want to e.g. narrow it down.
  3. Then instead of the current logic that if "code changes" we add mark all Actors as changed, we just do the changed files logic for each mini Actor independently, that cleans up the logic as well.

This way, we handle current Store monorepo (Actor folder + top-level context), template (just top-level), e-commerce (actor folder with its code, packages, shared)

What do you think?

Comment thread README.md
{
"actors": [
{
"folder": "actors/web-scraper",

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add actorName too rather than deriving it from the folder

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about it, but I think can cause some confusion when you (usually when you publish the actor) play a little with the naming for SEO reasons (or similar). I think whatever is in the actor.json should be respected as the truth and not be overridden by some obscure logic from the testing package.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The testing lib will not set or change the name, it just check that it exists. The name.in actor.json is not a single source of truth so I would not use it, better to have all here

Comment thread bin/main.ts Outdated
});
},
)
.command(

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove this, Claude will one-shot this and you will want to manually check it anyway.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this sort of CLI initialization utility perfectly fits into a library like here 🤔

However unless we make this a completely seamless end to end migration command, then I'm also against it. Right now as far as I can tell, it doesn't really migrate the ENV vars. I will open up a separate comment for that.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread bin/utils.ts
Comment on lines +57 to +59
if (!Array.isArray(config.actors)) {
throw new Error(`Config file "${CONFIG_FILE_NAME}" must have an "actors" array at the top level.`);
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend using Zod for parsing the JSON config 👀

Comment thread bin/main.ts Outdated
});
},
)
.command(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this sort of CLI initialization utility perfectly fits into a library like here 🤔

However unless we make this a completely seamless end to end migration command, then I'm also against it. Right now as far as I can tell, it doesn't really migrate the ENV vars. I will open up a separate comment for that.

Comment thread bin/utils.ts Outdated
Comment on lines +145 to +146
owner: options.defaultOwner ?? '<OWNER>',
tokenEnvVar: options.defaultTokenVar ?? '<TOKEN_ENV_VAR>',

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could make this be extracted from the current folder structure we use everywhere, so we would have a seamless end to end migration command 🤔

E.g. take the owner username and translate it to the current env var name: APIFY_TOKEN_<OWNER>

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed it.

Comment thread bin/main.ts Outdated
});
},
)
.command(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support .actor folder as source to build actor and test

5 participants