I mixed a lot of automated experimentation with my early instincts to figure out the best way to solve the problem. I have a second repo where I primarily tried focusing on recreating the legacy formula (which did not work) and a third where I had a lot of experiments running. I did a lot of experimenting on trying to recreate the bugs the legacy code had and found some promise with bitwise information / flips (), but was getting stuck on cases where the receipts-per-miles was very high.
Overall, I think I could've squeezed out a second place. On 5-fold CV I had roughly a 6.25K score and most certainly overfit on the final set. In the final model there were 35 features (some scrambled together last second) which most likely was not the best move. I had GPT generate a summary of the top 20 features by importantance which you can see here. I found some interesting ones in addition to some of the more basic ones that I think most people found. if I tinkered with/explored further you might find some results.
If you want to tinker with/run mine its in the predict_single_naive.py. The rest of the code is from the experimentation and can be largely ignored (sorry for the mess lol).
If you wanna talk about the comp just hit me on twitter @alexmaxxing
(GPT SUMMARY)
Your feature engineering captures the complex business logic of this 60-year-old legacy system. Here's what each feature represents:
- Core Foundation Features (Ranks 1, 5, 11)
- days (#1): Raw trip duration - the most important input
- miles (#5): Raw miles traveled
- receipts (#11): Raw receipt amount
These are your base inputs, with days being most critical because it drives per-diem calculations.
- Business Logic Reconstruction (Ranks 2, 14)
- cap_est (#2): Your estimate of the system's internal cap formula: 80 * days + 0.5 * miles
- over_cap_amt (#14): How much receipts exceed this estimated cap
This reveals you discovered the legacy system has an internal reimbursement ceiling based on both trip length and distance.
- Non-Linear Receipt Processing (Ranks 3, 4, 6, 12)
- log_receipts (#3): Captures diminishing returns on high receipt amounts
- price_ending_49_99 (#4): Binary flag for receipts ending in .49/.99 - suggests legacy pricing patterns
- receipt_bins (#6): Categorical receipt tiers (thresholds at $300, $500, $700, $1000, $1500, $2000, $2500)
- sqrt_receipts (#12): Another non-linear transformation for receipt processing
The high importance of price_ending_49_99 suggests the old system had special handling for common retail price endings.
- High-Value Trip Detection (Rank 10)
- receipts_mask (#10): Receipts with high receipts-per-mile cases masked to 0
This is your "router flag" - when receipts/mile > 20, you force the model to ignore the raw receipt amount, suggesting these cases follow completely different logic.
- Complex Interaction Terms (Ranks 7, 8)
- days_miles_receipts (#7): Three-way interaction capturing complex interdependencies
- days_receipts (#8): Two-way interaction between trip length and spending
These capture the non-additive nature of the legacy calculations.
- Custom Travel Metrics (Ranks 9, 15)
- travel_impedance_index (#9): Your custom composite score: 0.54days + 0.33(miles/100) + 0.13*log(receipts+1)
- miles_per_day (#15): Travel intensity metric
The Travel Impedance Index is particularly clever - it weights trip characteristics in a way that apparently mirrors the legacy system's internal scoring.
- Regime Classification (Ranks 16-20)
- log_mileage_ratio (#16): Log of receipts/(0.655 * miles) - mileage reimbursement rate
- log_perdiem_ratio (#17): Log of receipts/(80 * days) - per-diem rate
- mileage_ratio (#18): Raw receipts-to-mileage ratio
- perdiem_ratio (#19): Raw receipts-to-per-diem ratio
- regime_0 (#20): One-hot encoding for A-perdiem regime
These reveal you identified the system processes different trip types through separate calculation paths: A-perdiem: Low mileage, high receipts (luxury accommodations) B-mileage: High mileage, low receipts (road trips) C-mixed: Everything else
- Categorical Encoding (Rank 13)
- day_bins (#13): Trip duration categories with thresholds at [1, 4, 8, 12, 16, 20, 24, 28] days This suggests the legacy system treats trip lengths categorically rather than continuously.
Reverse-engineer a 60-year-old travel reimbursement system using only historical data and employee interviews.
ACME Corp's legacy reimbursement system has been running for 60 years. No one knows how it works, but it's still used daily.
8090 has built them a new system, but ACME Corp is confused by the differences in results. Your mission is to figure out the original business logic so we can explain why ours is different and better.
Your job: create a perfect replica of the legacy system by reverse-engineering its behavior from 1,000 historical input/output examples and employee interviews.
The system takes three inputs:
trip_duration_days
- Number of days spent traveling (integer)miles_traveled
- Total miles traveled (integer)total_receipts_amount
- Total dollar amount of receipts (float)
- A PRD (Product Requirements Document)
- Employee interviews with system hints
- Single numeric reimbursement amount (float, rounded to 2 decimal places)
public_cases.json
- 1,000 historical input/output examples
- Analyze the data:
- Look at
public_cases.json
to understand patterns - Look at
PRD.md
to understand the business problem - Look at
INTERVIEWS.md
to understand the business logic
- Look at
- Create your implementation:
- Copy
run.sh.template
torun.sh
- Implement your calculation logic
- Make sure it outputs just the reimbursement amount
- Copy
- Test your solution:
- Run
./eval.sh
to see how you're doing - Use the feedback to improve your algorithm
- Run
- Submit:
- Run
./generate_results.sh
to get your final results. - Add
arjun-krishna1
to your repo. - Complete the submission form.
- Run
Your run.sh
script must:
- Take exactly 3 parameters:
trip_duration_days
,miles_traveled
,total_receipts_amount
- Output a single number (the reimbursement amount)
- Run in under 5 seconds per test case
- Work without external dependencies (no network calls, databases, etc.)
Example:
./run.sh 5 250 150.75
# Should output something like: 487.25
Run ./eval.sh
to test your solution against all 1,000 cases. The script will show:
- Exact matches: Cases within ±$0.01 of the expected output
- Close matches: Cases within ±$1.00 of the expected output
- Average error: Mean absolute difference from expected outputs
- Score: Lower is better (combines accuracy and precision)
Your submission will be tested against private_cases.json
which does not include the outputs.
When you're ready to submit:
- Push your solution to a GitHub repository
- Add
arjun-krishna1
to your repository - Submit via the submission form.
- When you submit the form you will submit your
private_results.txt
which will be used for your final score.
Good luck and Bon Voyage!