Skip to content

Migrate Leaderboard & Stats Scraping to Backend API Cache#1

Merged
sanjay-kv merged 9 commits into
recodehive:mainfrom
SamXop123:backend
Jun 18, 2026
Merged

Migrate Leaderboard & Stats Scraping to Backend API Cache#1
sanjay-kv merged 9 commits into
recodehive:mainfrom
SamXop123:backend

Conversation

@SamXop123

Copy link
Copy Markdown
Contributor

Whats implemented:

  1. Backend Cron Scraper: The Express backend runs a background scheduled cron job (every 2 hours) that scrapes public, unarchived organization repositories, filters for merged PRs matching the "recode" label, aggregates repository statistics, and counts discussions.
  2. Point & Streak Aggregation: Contributors are scored according to Recode rules (Level 1 = 10, Level 2 = 30, Level 3 = 50 points). Streaks are calculated based on active daily consecutive PR merge dates.
  3. Atomic Cache Storage: Scraper data is written atomically to leaderboard.json and stats.json inside the cache directory (compatible with persistent volume mounts).
  4. Backend API: Serves the cached datasets instantly via /api/leaderboard and /api/stats.
  5. Frontend Context Hook: The frontend React context (CommunityStatsProvider) fetches from these two backend API routes in parallel on page load. Client-side sorting and date filtering (week/month/year/all-time) are preserved in memory for instant, responsive UI transitions.
image

Files Changed:

Backend (this repository)

  • server.js: Setup API routes, CORS configs, and initialized startup checks.
  • utils/cacheHelper.js [NEW]: Implements atomic file writes (writeJsonAtomic) to prevent corruption, file exists checks, and fallback JSON models.
  • functions/generateLeaderboard.js: Replaced legacy scraper. Implements GraphQL repository traversal, label scoring weight filters, active streak checks, and discussions count queries.
  • functions/generateCALeaderboard.js: Integrates atomic writes and generation locks.
  • jobs/updateOSLeaderboard.js: Schedules OS leaderboard cron jobs.
  • jobs/updateCALeaderboard.js: Standardizes cron patterns.

Frontend (recode-website repository)

  • docusaurus.config.ts: Declares backendApiUrl customField and removes deprecated gitToken client exposing.
  • src/lib/statsProvider.tsx: Removes client-side scraping pipelines. Fetches and parses backend API endpoints and binds to context states.
  • src/services/githubService.ts: Removes dead code path helper methods.

API Contract

A. GET /api/leaderboard

Returns the contributors array and legacy compatibility aliases.

{
  "success": true,
  "updatedAt": 1781548394726,
  "lastUpdated": 1781548394726,
  "generated": true,
  "contributors": [
    {
      "username": "contributor1",
      "avatar": "https://github.com/contributor1.png",
      "profile": "https://github.com/contributor1",
      "points": 60,
      "prs": 2,
      "prDetails": [
        {
          "title": "PR 2 - Docs",
          "url": "https://github.com/recodehive/repo1/pull/2",
          "mergedAt": "2026-06-15T12:00:00Z",
          "repoName": "repo1",
          "number": 2,
          "points": 50
        }
      ],
      "streak": 2
    }
  ]
}

B. GET /api/stats

{
  "success": true,
  "updatedAt": 1781548394731,
  "lastUpdated": 1781548394731,
  "totalStars": 984,
  "totalForks": 1107,
  "totalContributors": 467,
  "totalRepositories": 10,
  "publicRepositories": 9,
  "discussionsCount": 42
}

Migration Strategy

  1. Backend Deploy (Railway): Deploy backend and map CACHE_DIR to a persistent volume (e.g. /data) to preserve cached listings across restarts. Set PORT (default 5000) and the private GIT_TOKEN.
  2. Frontend Config: Expose BACKEND_API_URL pointing to the deployed backend server domain.
  3. Authentication: Clerk login gate remains unchanged. Discussions panel continues fetching via restricted token until a discussions backend proxy endpoint is introduced in a future PR.

Testing Performed

  • Backend Unit Tests: Verified repository discovery, unarchived filters, point weights, and active streak resets using mock datasets.
  • E2E Headless Browser Testing: Used Puppeteer to verify:
    • Podium: Rendered 1st, 2nd, and 3rd rank performer cards correctly.
    • Table: Populated ranking rows, usernames, avatars, and badges.
    • Stats Cards: Displayed correct community values.
    • PR Modal: Clicking a row count badge successfully launched the PR List Modal with correct titles, URLs, and points.

@SamXop123

SamXop123 commented Jun 17, 2026

Copy link
Copy Markdown
Contributor Author

maintainers please verify whether all the api contracts are fine or not. and also i tested this using a custom leaderboard test file, which worked completely fine. just a test on an actual production env is needed

recode-website pr : recodehive/recode-website#1907

@Adez017 Adez017 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rest othe things looks fine to me @sanjay-kv

Comment thread server.js
Comment on lines 39 to 42
app.get("/", (req, res) => {
res.send("Hello World");
});

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to return hello world somewhere

@Adez017

Adez017 commented Jun 18, 2026

Copy link
Copy Markdown
Member

Whats implemented:
Backend Cron Scraper: The Express backend runs a background scheduled cron job (every 2 hours) that scrapes public, unarchived organization repositories, filters for merged PRs matching the "recode" label, aggregates repository statistics, and counts discussions.

@SamXop123 can you refer the file in which the implementation details are and one more thing we can drill down to recodehive organization repository instead of searching across the github for the recode label . in this way we can make it much more compatible and API friendly .
CC : @sanjay-kv

@sanjay-kv sanjay-kv merged commit c8e46f0 into recodehive:main Jun 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants