Skip to content

Analyze your logs in Python

Use the phospho Python package to run custom analytics jobs on your logs.

Setup

Instal the package and set your API key and project ID as environment variables.

pip install phospho pandas
export PHOSPHO_API_KEY=your_api_key
export PHOSPHO_PROJECT_ID=your_project_id

Load logs as a DataFrame

The best way to analyze your logs is to load them into a pandas DataFrame. This format is compatible with most analytics libraries.

One row = one (task, event) pair

Phospho provides a tasks_df function to load the logs into a flattened DataFrame. Note that you need to have the pandas package installed to use this function.

import phospho 

phospho.init()
phospho.tasks_df(limit=1000) # Load the latest 1000 tasks

This will return a DataFrame where one row is one (task, event) pair.

Example:

task_id task_input task_output task_metadata task_eval task_eval_source task_eval_at task_created_at session_id session_length event_name event_created_at
b58aacc6102f4a5e9d2364202ce23bf2 Some input Some output {'client_created_at': 1709925970, 'last_update... success owner 2024-03-08 19:27:49 2024-03-09 15:09:31 71ee278ab2874666ae157c28a69c1679 2 correction by user 2024-03-08 19:27:43
b58aacc6102f4a5e9d2364202ce23bf2 Some input Some output {'client_created_at': 1709925970, 'last_update... success owner 2024-03-08 19:27:49 2024-03-09 15:09:31 71ee278ab2874666ae157c28a69c1679 2 user frustration indication 2024-03-08 19:27:43
b58aacc6102f4a5e9d2364202ce23bf2 Some input Some output {'client_created_at': 1709925970, 'last_update... success owner 2024-03-08 19:27:49 2024-03-09 15:09:31 71ee278ab2874666ae157c28a69c1679 2 follow-up question 2024-03-08 19:27:43

This means that: - If a task has multiple events, there will be multiple rows with the same task_id and different event_name. - If a task has no events, it will have one row with event_name as None.

One row = one task

If you want one row to be one task, pass the parameter with_events=False.

phospho.tasks_df(limit=1000, with_events=False)

Result:

task_id task_input task_output task_metadata task_eval task_eval_source task_eval_at task_created_at session_id session_length
21f3b21e8646402d930f1a02159e942f Some input Some output {'client_created_at':42f'... failure owner 2024-03-08 19:53:59 2024-03-09 16:45:18 a6b1b4224f874608b6037d41d582286a 2
64382c6093b04a028a97a14131a4ab32 Some input Some output {'client_created_at':42f'... success owner 2024-03-08 19:27:48 2024-03-09 15:51:07 9d13562051a84d6c806d4e6f6a58fb37 1
b58aacc6102f4a5e9d2364202ce23bf2 Some input Some output {'client_created_at':42f'... success owner 2024-03-08 19:27:49 2024-03-09 15:09:31 71ee278ab2874666ae157c28a69c1679 3

Ignore session features

To ignore the sessions features, pass the parameter with_sessions=False.

phospho.tasks_df(limit=1000, with_sessions=False)

Run custom analytics jobs

To run custom analytics jobs, you can leverage all the power of the Python ecosystem.

If you have a lot of complex ML models to run and LLM calls to make, consider the phospho lab that streamlines some of the work for you.

Set up the phospho lab to run custom analytics jobs on your logs

Update logs from a DataFrame

After running your analytics jobs, you might want to update the logs with the results.

You can use the push_tasks_df function to push the updated data back to Phospho. This will override the specified fields in the logs.

# Fetch the 3 latest tasks
tasks_df = phospho.tasks_df(limit=3)

Update columns

Make changes to columns. Not all columns are updatable. This is to prevent accidental data loss.

Here is the list of updatable columns: - task_eval: Literal["success", "failure"] - task_eval_source: str - task_eval_at: datetime - task_metadata: Dict[str, object] (Note: this will override the whole metadata object, not just the specified keys)

If you need to update more fields, feel free to open an issue on the GitHub repository, submit a PR, or directly reach out.

# Make some changes
tasks_df["task_eval"] = "success"
tasks_df["task_metadata"] = tasks_df["task_metadata"].apply(
    # To avoid overriding the whole metadata object, use **x to unpack the existing metadata
    lambda x: {**x, "new_key": "new_value", "stuff": 44}
)

Push updated data

To push the updated data back to Phospho, use the push_tasks_df function. - You need to pass the task_id - As a best practice, pass only the columns you want to update.

# Select only the columns you want to update
phospho.push_tasks_df(tasks_df[["task_id", "task_eval"]])

# To check that the data has been updated
phospho.tasks_df(limit=3)

You're all set. Your custom analytics are now also available in the Phospho UI.