Rewrote Contributing guide

crmne · crmne · commit 3c401bec4a04 · 2025-05-21T17:25:41.000+02:00
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -1,177 +1,108 @@
 # Contributing to RubyLLM
 
-First off, thank you for considering contributing to RubyLLM! It's people like you that make RubyLLM such a great tool.
+Thank you for considering contributing to RubyLLM! We're aiming to build a high-quality, robust library, and thoughtful contributions are welcome.
+
+## Development Setup and Workflow
+
+Getting started and contributing follows a typical GitHub-based workflow:
+
+1.  **Fork & Clone**: Fork the repository to your own GitHub account and then clone it locally.
+    ```bash
+    gh repo fork crmne/ruby_llm --clone
+    cd ruby_llm
+    ```
+2.  **Install Dependencies**:
+    ```bash
+    bundle install
+    ```
+3.  **Set Up Git Hooks**: Required.
+    ```bash
+    overcommit --install
+    ```
+4.  **Branch**: Create a new branch for your feature or bugfix. If it relates to an existing issue, you can use the `gh` CLI to help:
+    ```bash
+    gh issue develop 123 --checkout # Substitute 123 with the relevant issue number
+    ```
+5.  **Code & Test**: Make your changes and ensure they are well-tested. (See "Running Tests" section for more details).
+6.  **Commit**: Write clear and concise commit messages.
+7.  **Pull Request**: Create a Pull Request (PR) against the `main` branch of the `crmne/ruby_llm` repository.
+    * **Thoroughly review your own PR before submitting.** Check for any "vibe coding" – unnecessary files, experimental code that doesn't belong, or incomplete work.
+    * Write a **clear and detailed PR description** explaining the "what" and "why" of your changes. Link to any relevant issues.
+    * Badly/vibe-coded PRs with minimal descriptions will likely be closed or receive extensive review comments, slowing things down for everyone. Follow the existing conventions of RubyLLM. Aim for quality.
+    ```bash
+    gh pr create --web
+    ```
+
+## Model Registry (`models.json`) & Aliases (`aliases.json`)
+
+These files are critical for how RubyLLM identifies and uses AI models. **Both are auto-generated by rake tasks. Do not edit them manually or include manual changes to them in PRs.**
+
+### `models.json`: The Model Catalog
+
+* **How it's made**: The `rake models:update` task builds this file. It fetches model data directly from configured provider APIs (processing these details via each provider's `capabilities.rb` file) and also from the [Parsera LLM Specs API](https://api.parsera.org/v1/llm-specs). These lists are then merged, with Parsera's standardized data generally taking precedence for common models, augmented by provider-specific metadata. Models unique to a provider's API (and not yet in Parsera) are also included.
+* **Updating Model Information**:
+    * **Incorrect public specs (pricing, context size, etc.)?** Parsera scrapes public provider documentation. If data for a publicly documented model is wrong or missing on Parsera, please [file an issue with Parsera](https://github.com/parsera-labs/api-llm-specs/issues). Once they update, `rake models:update` will fetch the corrections.
+    * **Models not in public docs / Provider-specifics**: If a model isn't well-documented publicly by the provider (e.g., older or preview models) or needs provider-specific handling within RubyLLM, update the relevant `lib/ruby_llm/providers/<provider>/capabilities.rb` and potentially `models.rb`. Then run `bundle exec rake models:update`.
+    * **New Provider Support**: This involves more in-depth work to create the provider-specific modules and ensure integration with the `models:update` task.
+
+### `aliases.json`: User-Friendly Shortcuts
+
+* **Purpose**: Maps common names (e.g., `claude-3-5-sonnet`) to precise, versioned model IDs.
+* **How it's made**: Generated by `rake aliases:generate` using the current `models.json`. Run this task *after* `models.json` is updated.
 
-## Development Setup
+## Running Tests
 
-Here's how to get started:
+Tests are crucial. We use RSpec and VCR.
 
 ```bash
-# Clone the repository
-gh repo clone crmne/ruby_llm
-cd ruby_llm
-
-# Install dependencies
-bundle install
-
-# Set up git hooks
-overcommit --install
-
-# Run the tests (uses VCR cassettes)
+# Run all tests (uses existing VCR cassettes)
 bundle exec rspec
-```
-
-## Development Workflow
-
-We recommend using GitHub CLI to simplify the workflow:
-
-```bash
-# Create a new branch for your feature
-gh repo fork crmne/ruby_llm --clone
-cd ruby_llm
-
-# Find or make an issue for the feature on GitHub and then:
-gh issue develop 123 --checkout  # Substitute 123 with the issue number
-
-# Make your changes and test them
-# ...
-
-# Commit your changes
-git commit
-
-# Create a PR
-gh pr create --web
-```
 
-## Model Naming Convention & Provider Strategy
-
-When adding new providers to RubyLLM, please follow these guidelines:
-
-### Normalized Model IDs
-
-We use a consistent approach separating **what** (model) from **where** (provider):
-
-```ruby
-# Default way (from the native provider)
-chat = RubyLLM.chat(model: "claude-3-5-sonnet")
-
-# Same model via different provider
-chat = RubyLLM.chat(model: "claude-3-5-sonnet", provider: :bedrock)
-```
-
-### Implementing a Provider
-
-If you're adding a new provider:
-
-1. **Use normalized model IDs** - Don't include provider prefixes in the model ID itself
-2. **Add provider mapping** - Map the normalized IDs to your provider's specific format internally
-3. **Preserve capabilities** - Ensure models accessed through your provider report the same capabilities as their native counterparts
-4. **Update models.json** - Include your provider's models in models.json
-5. **Update aliases.json** - Add entries to aliases.json for models accessible through your provider
-6. **Implement refresh mechanism** - Ensure your provider supports the `list_models` method for refreshing
-
-### Model Registry (`models.json`)
-
-The `models.json` file is autogenerated. PRs shouldn't change the file manually, instead either refresh it with `rake models:update` or change the relevant `capabilities.rb` files and then refresh it.
-
-### Model Aliases
+# Run a specific test file
+bundle exec rspec spec/ruby_llm/chat_spec.rb
 
-For providers that use complex model identifiers (like Bedrock's `anthropic.claude-3-5-sonnet-20241022-v2:0:200k`), add mappings to the global aliases.json file:
+# To re-record a specific test's cassette, first remove its .yml file:
+rm spec/fixtures/vcr_cassettes/chat_vision_models_*_can_understand_local_images.yml # Adjust file name as needed
+# Then run the specific test or test file that uses this cassette.
 
-```json
-{
-  "claude-3-5-sonnet": {
-    "anthropic": "claude-3-5-sonnet-20241022",
-    "bedrock": "anthropic.claude-3-5-sonnet-20241022-v2:0:200k",
-    "openrouter": "anthropic/claude-3.5-sonnet"
-  },
-  "gpt-4o": {
-    "openai": "gpt-4o-2024-05-13",
-    "bedrock": "anthropic.gpt-4o-2024-05-13",
-    "openrouter": "openai/gpt-4o"
-  }
-}
+# Run a specific test by its description string (or part of it)
+bundle exec rspec -e "can understand local images"
 ```
 
-If a model can't be found with the provided ID and provider, a `ModelNotFoundError` will be raised with an informative message. Your implementation should make this error helpful by suggesting available alternatives.
-
-When the same model has multiple versions and context windows e.g.
+### Testing Philosophy & VCR
 
-```
-anthropic.claude-3-5-sonnet-20240620-v1:0
-anthropic.claude-3-5-sonnet-20240620-v1:0:18k
-anthropic.claude-3-5-sonnet-20240620-v1:0:200k
-anthropic.claude-3-5-sonnet-20240620-v1:0:51k
-anthropic.claude-3-5-sonnet-20241022-v2:0
-anthropic.claude-3-5-sonnet-20241022-v2:0:18k
-anthropic.claude-3-5-sonnet-20241022-v2:0:200k
-anthropic.claude-3-5-sonnet-20241022-v2:0:51k
-```
+* New tests should generally be **end-to-end** to verify integration with actual provider APIs (via VCR).
+* Keep tests **minimal and focused**. We don't need to test every single model variant for every feature if the underlying API mechanism is the same. One or two representative models per provider for a given feature is usually sufficient.
+* **API Call Costs**: VCR cassettes are used to avoid hitting live APIs on every test run. However, recording these cassettes costs real money for API calls. Please be mindful of this when adding tests that would require new recordings. If you're adding extensive tests that significantly increase API usage for VCR recording, consider [sponsoring the project on GitHub](https://github.com/sponsors/crmne) to help offset these costs.
 
-We default all aliases to the biggest context window, and the main alias (without date) to the latest version:
-
-```json
-  "claude-3-5-sonnet": {
-    "anthropic": "claude-3-5-sonnet-20241022",
-    "bedrock": "anthropic.claude-3-5-sonnet-20241022-v2:0:200k",
-    "openrouter": "anthropic/claude-3.5-sonnet"
-  },
-  "claude-3-5-sonnet-20241022": {
-    "anthropic": "claude-3-5-sonnet-20241022",
-    "bedrock": "anthropic.claude-3-5-sonnet-20241022-v2:0:200k",
-    "openrouter": "anthropic/claude-3.5-sonnet"
-  },
-  "claude-3-5-sonnet-20240620": {
-    "anthropic": "claude-3-5-sonnet-20240620",
-    "bedrock": "anthropic.claude-3-5-sonnet-20240620-v1:0:200k"
-  },
-```
+### Recording VCR Cassettes
 
-## Running Tests
+If your changes affect API interactions, you'll need to re-record the VCR cassettes.
 
-Tests automatically use VCR to record and replay HTTP interactions, so you don't need real API keys for testing:
+To re-record cassettes for specific providers (e.g., OpenAI and Anthropic):
 
 ```bash
-# Run all tests (using existing VCR cassettes)
-bundle exec rspec
+# Set necessary API keys as environment variables
+export OPENAI_API_KEY="your_openai_key"
+export ANTHROPIC_API_KEY="your_anthropic_key"
 
-# Run a specific test file
-bundle exec rspec spec/ruby_llm/chat_spec.rb
+# Run the rake task, specifying providers
+bundle exec rake vcr:record[openai,anthropic]
 ```
 
-### Recording VCR Cassettes
-
-When you make changes that affect API interactions, you can record new VCR cassettes.
-
-If you have keys for all providers:
+To re-record all cassettes (requires all relevant API keys to be set):
 
 ```bash
-# Re-record all cassettes
 bundle exec rake vcr:record[all]
 ```
 
-If you only have keys for specific providers (e.g., just OpenAI):
+The rake task will delete the relevant existing cassettes and re-run the tests to record fresh interactions.
 
-```bash
-# Set the API keys you have
-export OPENAI_API_KEY=your_openai_key
-
-# Find and remove only cassettes for OpenAI, then run tests to re-record them
-bundle exec rake vcr:record[openai]
-
-# You can also specify multiple providers
-bundle exec rake vcr:record[openai,anthropic]
-```
-
-Important: After recording new cassettes, please **manually check** them for any sensitive information that might have been missed by the automatic filters.
-
-## Adding New Tests
-
-Tests automatically create VCR cassettes based on their descriptions, so make sure your test descriptions are unique and descriptive.
+**CRITICAL**: After recording new or updated VCR cassettes, **manually inspect the YAML files in `spec/fixtures/vcr_cassettes/`**. Ensure that no sensitive information (API keys, personal data, etc.) has accidentally been recorded. The VCR configuration has filters for common keys, but diligence is required.
 
 ## Coding Style
 
-We follow the [Standard Ruby](https://github.com/testdouble/standard) style. Please ensure your contributions adhere to this style.
+We follow the [Standard Ruby](https://github.com/testdouble/standard) style guide.
 
 ```bash
 # Check your code style
@@ -181,27 +112,27 @@ bundle exec rubocop
 bundle exec rubocop -A
 ```
 
-## Documentation
-
-When adding new features, please include documentation updates:
+The Overcommit pre-commit hook should help enforce this.
 
-- Update relevant guides in the `docs/guides/` directory
-- Add inline documentation using YARD comments
-- Keep the README clean and focused on helping new users get started quickly
+## Documentation
 
-## Discussions and Issues
+If you add new features or change existing behavior, please update the documentation:
 
-- For questions and discussions, please use [GitHub Discussions](https://github.com/crmne/ruby_llm/discussions)
-- For bugs and feature requests, please use [GitHub Issues](https://github.com/crmne/ruby_llm/issues)
+* Update relevant guides in the `docs/guides/` directory.
+* Ensure the `README.md` remains a concise and helpful entry point for new users.
 
 ## Release Process
 
-Gem versioning follows [Semantic Versioning](https://semver.org/):
+Gem versioning follows [Semantic Versioning (SemVer)](https://semver.org/):
 
-1. MAJOR version for incompatible API changes
-2. MINOR version for backwards-compatible functionality
-3. PATCH version for backwards-compatible bug fixes
+1.  **MAJOR** version for incompatible API changes.
+2.  **MINOR** version for adding functionality in a backward-compatible manner.
+3.  **PATCH** version for backward-compatible bug fixes.
 
 Releases are handled by the maintainers through the CI/CD pipeline.
 
-Thanks for helping make RubyLLM better!
+---
+
+Thanks for contributing to RubyLLM,
+
+Carmine