🧠 Context
Judge0 is an open-source code execution service that runs submissions in isolated containers on a server. Unlike the Pyodide and Java spikes, this approach requires infrastructure that the society would need to set up and maintain. The upside is that it can run any language without browser compatibility concerns. The downside is operational complexity for a volunteer-run project.
This spike figures out what it actually takes to get Judge0 running, how it handles Python and Java questions, and whether the operational burden is realistic. The repo has example questions and sample submissions at questions/ and submissions/. See SCHEMA.md for the question format and test file conventions.
🎯 What we're hoping to do
- Self-host a Judge0 instance.
- Submit a student's Python or Java submission alongside a test file.
- Get back pass/fail results matching the JSON format in
SCHEMA.md.
- Understand what it takes to keep this running reliably.
🔍 What to look into
Work through the following questions and record your findings.
-
Setup complexity. What does self-hosting Judge0 actually involve? Walk through what you did: prerequisites, how long it took, and what was confusing or poorly documented. Also note whether the public hosted API is a realistic alternative — cover any rate limits, cost, or data concerns.
-
Infrastructure requirements. What kind of server does Judge0 need to run comfortably? What happens during a busy period — say, 100 students submitting simultaneously during a lab? What does it mean for a volunteer society to own and operate this long-term: keeping it running, handling updates, dealing with abuse, cost of hosting?
-
End-to-end latency. From submission sent to result received, how long does it take? Break it down: queue wait, execution time, API round-trip. What's the experience like for a student waiting on feedback?
-
Question and submission workflow. How does the full loop work: we write a test file, a student submits code, and the result comes back as structured pass/fail? Judge0 accepts a single source file per submission — how do you get both the student's code and the test file in there? Note any friction with the SCHEMA.md conventions.
-
Time and memory limits. How are limits configured — per submission at request time, or set platform-wide? How does Judge0 surface timeouts and memory errors back through the API?
-
Python and Java support. Confirm both languages work end-to-end. Run submissions/add-two-numbers/correct.py and submissions/add-two-numbers-java/correct.java through your instance and verify you get the expected output. Note any differences in how the two languages behave.
🖥 How to demonstrate it
Get a Judge0 instance running locally (Docker is fine) or use the public API for the demo.
The demo should:
- Submit
submissions/add-two-numbers/correct.py with the test file questions/add-two-numbers/tests.py to Judge0.
- Poll for or receive the result.
- Print the JSON output and a plain-text pass/fail verdict.
- Repeat with
wrong-answer.py and broken.py to confirm the failure and error cases surface correctly.
- Repeat with the Java question (
submissions/add-two-numbers-java/) to confirm Java works end-to-end.
A simple script (Python, shell, or Node.js) hitting the Judge0 API is fine. No UI needed.
Include a short written document covering your answers to the questions above, the actual latency numbers you measured, and a frank assessment of what it would take to keep this running as a volunteer-run service.
✅ Is it usable for our case?
Determine yes, no, or yes with caveats. If caveats, list the important ones. This is the main thing the team needs from this ticket.
🧠 Context
Judge0 is an open-source code execution service that runs submissions in isolated containers on a server. Unlike the Pyodide and Java spikes, this approach requires infrastructure that the society would need to set up and maintain. The upside is that it can run any language without browser compatibility concerns. The downside is operational complexity for a volunteer-run project.
This spike figures out what it actually takes to get Judge0 running, how it handles Python and Java questions, and whether the operational burden is realistic. The repo has example questions and sample submissions at
questions/andsubmissions/. SeeSCHEMA.mdfor the question format and test file conventions.🎯 What we're hoping to do
SCHEMA.md.🔍 What to look into
Work through the following questions and record your findings.
Setup complexity. What does self-hosting Judge0 actually involve? Walk through what you did: prerequisites, how long it took, and what was confusing or poorly documented. Also note whether the public hosted API is a realistic alternative — cover any rate limits, cost, or data concerns.
Infrastructure requirements. What kind of server does Judge0 need to run comfortably? What happens during a busy period — say, 100 students submitting simultaneously during a lab? What does it mean for a volunteer society to own and operate this long-term: keeping it running, handling updates, dealing with abuse, cost of hosting?
End-to-end latency. From submission sent to result received, how long does it take? Break it down: queue wait, execution time, API round-trip. What's the experience like for a student waiting on feedback?
Question and submission workflow. How does the full loop work: we write a test file, a student submits code, and the result comes back as structured pass/fail? Judge0 accepts a single source file per submission — how do you get both the student's code and the test file in there? Note any friction with the
SCHEMA.mdconventions.Time and memory limits. How are limits configured — per submission at request time, or set platform-wide? How does Judge0 surface timeouts and memory errors back through the API?
Python and Java support. Confirm both languages work end-to-end. Run
submissions/add-two-numbers/correct.pyandsubmissions/add-two-numbers-java/correct.javathrough your instance and verify you get the expected output. Note any differences in how the two languages behave.🖥 How to demonstrate it
Get a Judge0 instance running locally (Docker is fine) or use the public API for the demo.
The demo should:
submissions/add-two-numbers/correct.pywith the test filequestions/add-two-numbers/tests.pyto Judge0.wrong-answer.pyandbroken.pyto confirm the failure and error cases surface correctly.submissions/add-two-numbers-java/) to confirm Java works end-to-end.A simple script (Python, shell, or Node.js) hitting the Judge0 API is fine. No UI needed.
Include a short written document covering your answers to the questions above, the actual latency numbers you measured, and a frank assessment of what it would take to keep this running as a volunteer-run service.
✅ Is it usable for our case?
Determine yes, no, or yes with caveats. If caveats, list the important ones. This is the main thing the team needs from this ticket.