A machine learning service to classify inbound user messages for MomConnect. This service identifies user intents such as service feedback and sensitive topics like baby loss, enabling the platform to provide appropriate and timely responses.
This classifier uses a modern, two-stage process combined with endpoint-specific logic to understand user messages with both accuracy and precision. This design ensures that broad categories for sensitive topics are handled robustly by a data-driven model, while solicited feedback is analyzed directly for sentiment.
General Classification (e.g., for /nlu/babyloss/)
-
Broad Understanding (Machine Learning): First, the user's message is converted into a sophisticated numerical representation (an "embedding") using a SentenceTransformer model (
BAAI/bge-m3). A trained machine learning model (clf_parent.pkl) then reads this embedding to classify the message into one of four broad parent categories:FEEDBACK,SENSITIVE_EXIT,NOISE_SPAM, orOTHER. Classification confidence is determined by tuned thresholds. -
Specific Details (Enrichment): Once the main category is known, the system determines the specific sub-intent.
- If the parent model predicts
SENSITIVE_EXITwith sufficient confidence, a second, specialized machine learning model (clf_sensitive_exit.pkl) determines if it's aboutBABY_LOSSor anOPTOUTrequest. - (Note: The general classifier also uses a sentiment model if FEEDBACK is detected, but this is secondary to the dedicated feedback endpoint's logic).
- If the parent model predicts
Dedicated Feedback Analysis (for /nlu/feedback/)
This endpoint assumes the chatbot has already solicited feedback and bypasses the parent classification model entirely.
- Direct Sentiment Analysis: The user's message is fed directly into a pre-trained multilingual sentiment analysis model (
cardiffnlp/twitter-xlm-roberta-base-sentiment). - Sentiment Mapping: The sentiment model's output (
positive,negative,neutral) is mapped toCOMPLIMENT,COMPLAINT, orNonerespectively. - Review Logic: Predictions are flagged as
NEEDS_REVIEWif the sentiment model output isneutralOR if the model's confidence score falls below a data-driven threshold (sentiment_review_band) tuned specifically for this sentiment model.
This section is for anyone involved in managing the data, training the model, or evaluating its performance.
The primary source of truth for this model lives in the consolidated YAML files within the src/mapped_data/ directory, for example:
src/mapped_data/nlu.yamlsrc/mapped_data/validation.yamlsrc/mapped_data/test.yaml
All new examples and changes should be made directly to these files. The original legacy files in src/data/ are kept for historical purposes and are only used for migration.
This is the standard, safe workflow for improving the model with new data.
-
Edit the Mapped YAML Files: Manually add or modify examples in the
ymalfiles under themapped_datafolder. You will work directly with the four parent intents (FEEDBACK,SENSITIVE_EXIT, etc.) and their sub-intents. -
Build the JSONL Files: After saving your YAML changes, run the following command:
make build-jsonl
This safe command reads your updated YAML files and generates the corresponding
.jsonlfiles that the model training scripts consume. It will never overwrite your YAML files. -
Train and Evaluate: Once the JSONL files are up-to-date, you can run the full pipeline or individual steps as needed.
# Run the entire process: build, train, tune, and evaluate make all # Or run individual steps make train make tune-thresholds
CAUTION: This is a destructive operation. Only run this command if you want to discard all manual changes in
src/mapped_dataand regenerate them from the original legacy data insrc/data.
To perform a full migration from the legacy files, run:
make migrate-legacyThis is a two-step process to find the optimal confidence thresholds for the parent model and the sentiment model, and then get an unbiased measure of the final system's performance.
-
Tune Thresholds on the Validation Set
Run the evaluation script in "tune" mode. This uses the
samples.validation.jsonldataset to find the best confidence threshold for each parent intent category and a separatesentiment_review_bandthreshold for the sentiment model.make tune-thresholds
This generates the
src/artifacts/thresholds.jsonfile, which contains both parent thresholds and the sentiment review band. This file is required for the model to run. It also produces a detailed text report and performance plots in thesrc/evaluations/directory. -
Evaluate Final Performance on the Test Set
Once the models are trained and the thresholds are tuned, run the final performance report. This uses the hold-out
samples.test.jsonldataset to provide an unbiased measure of how the model will perform on new, unseen data.make evaluate
The output of this command is the definitive performance report for the model version. It evaluates the parent model performance and the sub-intent performance (simulating the separate logic used by each API endpoint).
This section is for engineers responsible for deploying and integrating the service, and for QA who need to test the API.
Setup and Installation
-
Install dependencies:
make install # Which runs: poetry install -
Activate the virtual environment:
poetry shell
The application is a standard Flask service. Do not use the built-in Flask development server for production.
-
For production or staging, use a WSGI server like Gunicorn:
gunicorn --workers 2 --bind 0.0.0.0:5001 src.application:app -
For local development, you can use the Flask dev server:
poetry run flask --app src.application runYou will need to set the
NLU_USERNAMEandNLU_PASSWORDenvironment variables for authentication.
The service provides two specific, authenticated endpoints. Authentication is handled via HTTP Basic Auth.
-
Baby Loss Detection (
/nlu/babyloss/)This endpoint analyzes a message using the full classification pipeline to determine if it relates to baby loss. It is optimized for high recall on the
SENSITIVE_EXITparent intent to ensure sensitive cases are not missed.- Request:
GET /nlu/babyloss/ - Query Parameters:
question(string, required): The user's message text.
- Responses:
200 OK: Returns whether baby loss was detected, along with model details. Thebabylosskey istrueonly if the parent intent isSENSITIVE_EXITAND the sub-intent isBABY_LOSS.{ "babyloss": true, "model_version": "2025-10-28-v1", "parent_label": "SENSITIVE_EXIT", "sub_intent": "BABY_LOSS", "probability": 0.98, "review_status": "CLASSIFIED" }400 Bad Request: If thequestionparameter is missing.401 Unauthorized: If authentication fails.503 Service Unavailable: If the classifier failed to load.
- Request:
-
Feedback Analysis (
/nlu/feedback/)This endpoint analyzes a message assuming it is solicited feedback. It bypasses the parent model and directly uses the sentiment model to determine if it is a
COMPLIMENT,COMPLAINT, orNone(for neutral sentiment).- Request:
GET /nlu/feedback/ - Query Parameters:
question(string, required): The user's message text.
- Responses:
200 OK: Returns the detected sentiment intent (COMPLIMENT/COMPLAINT/None), along with model details.parent_labelwill always be"FEEDBACK"for this endpoint.review_statusbecomesNEEDS_REVIEWif sentiment isneutralor confidence is belowsentiment_review_band.{ "intent": "COMPLIMENT", "model_version": "2025-10-28-v1", "parent_label": "FEEDBACK", "probability": 0.99, "review_status": "CLASSIFIED", "sentiment_label": "positive" }{ "intent": "None", "model_version": "2025-10-28-v1", "parent_label": "FEEDBACK", "probability": 0.65, "review_status": "NEEDS_REVIEW", "sentiment_label": "neutral" }400 Bad Request: If thequestionparameter is missing.401 Unauthorized: If authentication fails.503 Service Unavailable: If the classifier failed to load.
- Request:
Run these commands to ensure code quality and correctness.
-
Run Unit Tests:
make test # Which runs: poetry run pytest -vv -
Run Static Type Checking:
make typecheck # Which runs: poetry run mypy . -
Run Linter/Formatter:
make lint # Which runs: poetry run ruff check --fix . && poetry run ruff format .
For production, the Dockerfile should be configured to run the application using the Gunicorn command. The number of workers (--workers 4) should be adjusted based on the resources of the environment. The src/artifacts directory, which contains the trained model, must be included in the final container image.