Date and Time: 2025-09-16 12:30p ET

Observability best practices in AI applications -- Traces and OTEL Open Telemetry
- measure numerical latency PLUS traces when using LLMs to evaluate other models
- Maybe LangSmith? Patterns like one master trace for all agents with subtraces per agent or workflow.
- Perhaps this is so new that it is all experimental.
- Tacoma SciPy keynote emphasized Observability, which requires evaluating prompts. Connect eg: some test data to response output and evaluate the model at a point in time on some dimensions.
- LLMs are non-deterministic, so use other LLM models to evaluate test output and avoid confirmation bias from the model itself. 

Alert inventory: coverage and adding new alerts? B2B + infrastructure
- How do others keep track of the alert coverage, anyone have a matrix of all things that need to be alerted on. 
- How do you surface new alerts vs many production exceptions that are not prioritized to be fixed
- Tracecat, opensource workflow for alerts (similar to Tines). Pivot from security to SRE.
- Suggestion: usable alerts, context sensitive to each app business need
- Challenge: "obviousness" is hindsight bias. Conduct a recurring operational review of recent alerts to understand brittle system areas. [Paige Cruz conference talk: SREcon23 Americas Alert Triage Hour of Power] (https://www.youtube.com/watch?v=c8uRsQPeg_g)

NewRelic Integrations and Practices: metrics scraping, log collection/ alerting
- how to turn metrics across containers and cloud into analyzable or actionable tooling?
- NewRelic grew up in rails and moved to java. Datadog started in containerized microservices, probably ahead in tracing. Market says Datadog was the market winner.
- David Woods thought New Relic demo's most interesting capabilities was not AI, but NRQL (query language)
- Azure's KQL similar for extracting business relevance.
- Took time for developers to parse the alerts.
- Log Rocket has video playback that shows mobile app exceptions.
- Caveat emptor: beware impersonator Anthropologie apps!

AWS Managed Grafana in 2025
- limited APM procurement options; so Grafana is easiest. Any user experience?
- The OTEL is in the logging product, while Grafana is only the dashboard part.
- Is prometheus still necessary in the AWS minimal usage? Or can you get some data for free through EKS?