Blog

ai-abstract-classifier

Scope

This is an application that takes AnythingLLM and a selection of abstracts
and asks a local LLM (granite ideally) if the abstract has been written by an AI and/or
is a possible sales pitch.
It gives a file called overview.csv with a confidence score of up to 100 if it’s been AI or
too “sales-y.”

You can also inject a csv into this instead of reading an API, either pretalx or sessionize for the time being.

CSV notes

Take a look at test_data/testing.csv as an example. You run it via python main.py -c CSV_FILE, check python main.py -h for help.

NOTE: This is , seporated for the time being, so you’ll need to remove all the , from the actual abstracts so it can be parsed correctly.

The sections that are needed the csv are as follows:

code
title
abstract
description

Configuration

Everything is configured in the config.toml file, copy it to
the working directory and do something like the following:

First install AnythingLLM, here, and configure it
with something along these lines of this.

Note: As of this release you will need to configure the model you want this to us via the
“default” AnythingLLM configuration. It seems for now you can’t programaticly change the workspace
for different models, so this is the work around.

Check out testing_notes.md for some of the numbers ran with other
models on the same data.

Run these following commands:

git clone git@github.com:jjasghar/pretalx-ai-validator.git
cd pretalx-ai-validator
python3.11 -m venv --upgrade-deps venv
source venv/bin/activate
pip install -r requirements.txt
cp config.toml.example config.toml
vim config.toml
python main.py -h

Utils

There is a jsons_to_markdown.py to convert the chat_primes to readable format(s).

License & Authors

If you would like to see the detailed LICENSE click here.

Author: JJ Asghar awesome@ibm.com

Copyright:: 2025- IBM, Inc

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

The Setup Gazette

News Application

Technologies used:

Typescript – too reduce bugs
Tailwind CSS – styling for the site/application
HeroIcons – is a tailwind css icons library and will work great for the site
Stepzen – powered the entire site will also allow the introduction of graphql
MediaStack API – api sed for the live news on the site
Nextjs 13 – as the server component for the site
Reactjs – language struture for typescriptjs

There is 2 types of data in use for the site

Dynamic Data – This is where is built at build time
Static Data – This where every request we go and generate the information we can also revalidation techniques in nextjs 13

Features of the site

Dark Mode (for eye sensitivity)
Light Mode (default)

Installation of Tecnologies used:

Heroicons – when using reactjs user should run the following command in the terminal

npm add @heroicons/react

Next.js + Tailwind CSS Example

This example shows how to use Tailwind CSS (v3.2) with Next.js. It follows the steps outlined in the official Tailwind docs.

Deploy your own

Deploy the example using Vercel or preview live with StackBlitz

How to use

Execute create-next-app with npm, Yarn, or pnpm to bootstrap the example:

npx create-next-app --example with-tailwindcss with-tailwindcss-app

yarn create next-app --example with-tailwindcss with-tailwindcss-app

pnpm create next-app --example with-tailwindcss with-tailwindcss-app

Deploy it to the cloud with Vercel (Documentation).

soc-peripherals-on-MGM210P032

Hardware Used

BRD4001A Rev A01 SiLabs WSTK reffered as Mainboard
BRD4308A Rev A01 (MGM210P032JIA) reffered as Radio Board

Software Used

Gecko SDK v2.7.9 -> Bluetooth SDK v2.13.9.0
Simplicity Studio v4
GNU ARM GCC v7.2.1

About the Project

The project provides an example for accessing peripherals on SiLabs MGM210P032JIA module along with BLE stack running. Radio Board BRD4308A has MGM210P032JIA module. The module is based on the EFR32MG21 SoC.

Important Files

Executable binaries are in ./GNU ARM v7.2.1 - Default

The ./app.c and ./app.h contains event handler for BLE.

Files ./peripheral_utils.c and ./peripheral_utils.h contain functions that deal with peripherals on SoC.

Gatt Profile

When Testing with mobile app, you will see following list of Custom Services and Characteristics. You might not see the names mentioned below but you will see the {UUID}. This can also be seen in Visual GATT Editor by opening ./soc-peripherals-on-MGM210P032.isc

Peripheral Test {686d7b33-129f-4532-89c1-c502c6159bb3}

LED0 {abe6b815-d38c-476e-ae7e-dd1d62e209de}
- Type: USER
- Size: 1 byte
BUTTON1 {ebff5ca7-0398-422a-a0ea-63fefb0765ec}
- Type: HEX
- Size: 1 byte
ADC DATA {bb7b889f-587e-421f-a3b4-c3654998a742}
- Type: USER
- Size: 5 byte
UART2 Data {9b475432-881f-418a-98ca-003c65339261}
- Type: HEX
- Size: 80

Peripherals Used

GPIO

LED
Mainboard has 2 on board LEDs out of which LED0 is used. This LED is connected to GPIO PB0. Therefore, GPIO PB0 is set as output. Characteristic LED0 is used which accepts following values

Value	Action
0	Turn OFF LED
1	Turn ON LED
2	Toggle LED

2. BUTTON
Mainboard has 2 on board push buttons out of which Button1 is used. This Button is connected to GPIO PB1. Therefore, GPIO PB1 is set as input. Characteristic BUTTON1 is used convey number of button presses.
Every button press generates a GPIO interrupt. The ISR keeps a counter for the number of times the button was pressed and generates a signal (SW interrupt to BLE stack). This signal raises an event in BLE stack. Using this event Client is notified of the counter value everytime button is pressed.

IADC

ADC Configurations	Value
Mode	Single input
Port and Pin	PC02
Trigger Action	Once
Over Sampling Ratio	2x
`CLK_ADC_FREQ`	1,000,000 -> 1MHz
`CLK_SRC_ADC_FREQ`	1,000,000 -> 1MHz

If Trigger Action was set to Continuous, then according to formula on pg14 of AN1189: Incremental Analog to Digital Converter (IADC), the converion time would be

Conversion Time = 10/1,000,000
This means Samples per seconds = 1,000,000/10 = 100,000 = 100ksps
Since, we are using Trigger Action as Once and we are invoking IADC every second using BLE stack soft timer, the effective Samples Per Second = 1 sps
Enabling the notification will start the soft timer and also trigger IADC.

UART

UART Configuration	Value
Tx Port and Pin	PC00
Rx Port and Pin	PC01
Baud Rate	115200
Flow Control	No
Data Size	8 bit
Parity	None

The MGM210P has 3 USARTs, USART2 and been configured as asynchronous USART (i.e. UART). Since Rx can only take place in EM0 and EM1 the EM2 sleep must be blocked when MCU wants to receive the data.
Enabling notification will allow MCU to receive the data.

Sources and important links

EFR32xG21 Reference Manual – https://www.silabs.com/documents/public/reference-manuals/efr32xg21-rm.pdf
MGM210P Data Sheet – https://www.silabs.com/documents/public/data-sheets/mgm210p-datasheet.pdf
BRD4308A User Guide – https://www.silabs.com/documents/public/user-guides/ug388-brd4308a-user-guide.pdf
Peripheral Examples – https://github.com/SiliconLabs/peripheral_examples/tree/master/series2
AN1189: Incremental Analog to Digital Converter (IADC) – https://www.silabs.com/documents/public/application-notes/an1189-efr32-iadc.pdf
AN0012: General Purpose Input Output – https://www.silabs.com/documents/public/application-notes/an0012-efm32-gpio.pdf
Bluetooth Software API Reference Manual – https://www.silabs.com/documents/public/reference-manuals/bluetooth-api-reference.pdf

About stdlib…

We believe in a future in which the web is a preferred environment for numerical computation. To help realize this future, we’ve built stdlib. stdlib is a standard library, with an emphasis on numerical and scientific computation, written in JavaScript (and C) for execution in browsers and in Node.js.

The library is fully decomposable, being architected in such a way that you can swap out and mix and match APIs and functionality to cater to your exact preferences and use cases.

When you use stdlib, you can be absolutely certain that you are using the most thorough, rigorous, well-written, studied, documented, tested, measured, and high-quality code out there.

To join us in bringing numerical computing to the web, get started by checking us out on GitHub, and please consider financially supporting stdlib. We greatly appreciate your continued support!

dispatch

Dispatch to a native add-on applying a unary function to an input strided array.

Installation

npm install @stdlib/strided-base-unary-addon-dispatch

Alternatively,

To load the package in a website via a script tag without installation and bundlers, use the ES Module available on the esm branch (see README).
If you are using Deno, visit the deno branch (see README for usage intructions).
For use in Observable, or in browser/node environments, use the Universal Module Definition (UMD) build available on the umd branch (see README).

The branches.md file summarizes the available branches and displays a diagram illustrating their relationships.

To view installation and usage instructions specific to each branch build, be sure to explicitly navigate to the respective README files on each branch, as linked to above.

Usage

var dispatch = require( '@stdlib/strided-base-unary-addon-dispatch' );

dispatch( addon, fallback )

Returns a function which dispatches to a native add-on applying a unary function to an input strided array.

function addon( N, dtypeX, x, strideX, dtypeY, y, strideY ) {
    // Call into native add-on...
}

function fallback( N, dtypeX, x, strideX, dtypeY, y, strideY ) {
    // Fallback JavaScript implementation...
}

// Create a dispatch function:
var f = dispatch( addon, fallback );

// ...

// Invoke the dispatch function with strided array arguments:
f( 2, 'generic', [ 1, 2 ], 1, 'generic', [ 0, 0 ], 1 );

The returned function has the following signature:

f( N, dtypeX, x, strideX, dtypeY, y, strideY )

where

N: number of indexed elements.
dtypeX: x data type.
x: input array.
strideX: x stride length.
dtypeY: y data type.
y: output array.
strideY: y stride length.

The addon function should have the following signature:

f( N, dtypeX, x, strideX, dtypeY, y, strideY )

where

N: number of indexed elements.
dtypeX: x data type (enumeration constant).
x: input array.
strideX: x stride length.
dtypeY: y data type (enumeration constant).
y: output array.
strideY: y stride length.

The fallback function should have the following signature:

f( N, dtypeX, x, strideX, dtypeY, y, strideY )

where

N: number of indexed elements.
dtypeX: x data type.
x: input array.
strideX: x stride length.
dtypeY: y data type.
y: output array.
strideY: y stride length.

dispatch.ndarray( addon, fallback )

Returns a function which dispatches to a native add-on applying a unary function to an input strided array using alternative indexing semantics.

function addon( N, dtypeX, x, strideX, dtypeY, y, strideY ) {
    // Call into native add-on...
}

function fallback( N, dtypeX, x, strideX, offsetX, dtypeY, y, strideY, offsetY ) {
    // Fallback JavaScript implementation...
}

// Create a dispatch function:
var f = dispatch.ndarray( addon, fallback );

// ...

// Invoke the dispatch function with strided array arguments:
f( 2, 'generic', [ 1, 2 ], 1, 0, 'generic', [ 0, 0 ], 1, 0 );

The returned function has the following signature:

f( N, dtypeX, x, strideX, offsetX, dtypeY, y, strideY, offsetY )

where

N: number of indexed elements.
dtypeX: x data type.
x: input array.
strideX: x stride length.
offsetX: starting x index.
dtypeY: y data type.
y: output array.
strideY: y stride length.
offsetY: starting y index.

The addon function should have the following signature:

f( N, dtypeX, x, strideX, dtypeY, y, strideY )

where

N: number of indexed elements.
dtypeX: x data type (enumeration constant).
x: input array.
strideX: x stride length.
dtypeY: y data type (enumeration constant).
y: output array.
strideY: y stride length.

The fallback function should have the following signature:

f( N, dtypeX, x, strideX, offsetX, dtypeY, y, strideY, offsetY )

where

N: number of indexed elements.
dtypeX: x data type.
x: input array.
strideX: x stride length.
offsetX: starting x index.
dtypeY: y data type.
y: output array.
strideY: y stride length.
offsetY: starting y index.

Notes

To determine whether to dispatch to the addon function, the returned dispatch function checks whether the provided arrays are typed arrays. If the provided arrays are typed arrays, the dispatch function invokes the addon function; otherwise, the dispatch function invokes the fallback function.

Examples

var Float64Array = require( '@stdlib/array-float64' );
var dispatch = require( '@stdlib/strided-base-unary-addon-dispatch' );

function addon( N, dtypeX, x, strideX, dtypeY, y, strideY ) {
    console.log( x );
    // => <Float64Array>[ 3, 4 ]

    console.log( y );
    // => <Float64Array>[ 7, 8 ]
}

function fallback( N, dtypeX, x, strideX, offsetX, dtypeY, y, strideY, offsetY ) {
    console.log( x );
    // => [ 1, 2, 3, 4 ]

    console.log( y );
    // => [ 5, 6, 7, 8 ]
}

// Create a dispatch function:
var f = dispatch.ndarray( addon, fallback );

// Create strided arrays:
var x = new Float64Array( [ 1, 2, 3, 4 ] );
var y = new Float64Array( [ 5, 6, 7, 8 ] );

// Dispatch to the add-on function:
f( 2, 'float64', x, 1, 2, 'float64', y, 1, 2 );

// Define new strided arrays:
x = [ 1, 2, 3, 4 ];
y = [ 5, 6, 7, 8 ];

// Dispatch to the fallback function:
f( 2, 'generic', x, 1, 2, 'generic', y, 1, 2 );

Notice

This package is part of stdlib, a standard library for JavaScript and Node.js, with an emphasis on numerical and scientific computing. The library provides a collection of robust, high performance libraries for mathematics, statistics, streams, utilities, and more.

For more information on the project, filing bug reports and feature requests, and guidance on how to develop stdlib, see the main project repository.

Community

License

See LICENSE.

Copyright

About stdlib…

We believe in a future in which the web is a preferred environment for numerical computation. To help realize this future, we’ve built stdlib. stdlib is a standard library, with an emphasis on numerical and scientific computation, written in JavaScript (and C) for execution in browsers and in Node.js.

The library is fully decomposable, being architected in such a way that you can swap out and mix and match APIs and functionality to cater to your exact preferences and use cases.

When you use stdlib, you can be absolutely certain that you are using the most thorough, rigorous, well-written, studied, documented, tested, measured, and high-quality code out there.

To join us in bringing numerical computing to the web, get started by checking us out on GitHub, and please consider financially supporting stdlib. We greatly appreciate your continued support!

dispatch

Dispatch to a native add-on applying a unary function to an input strided array.

Installation

npm install @stdlib/strided-base-unary-addon-dispatch

Alternatively,

To load the package in a website via a script tag without installation and bundlers, use the ES Module available on the esm branch (see README).
If you are using Deno, visit the deno branch (see README for usage intructions).
For use in Observable, or in browser/node environments, use the Universal Module Definition (UMD) build available on the umd branch (see README).

The branches.md file summarizes the available branches and displays a diagram illustrating their relationships.

To view installation and usage instructions specific to each branch build, be sure to explicitly navigate to the respective README files on each branch, as linked to above.

Usage

var dispatch = require( '@stdlib/strided-base-unary-addon-dispatch' );

dispatch( addon, fallback )

Returns a function which dispatches to a native add-on applying a unary function to an input strided array.

function addon( N, dtypeX, x, strideX, dtypeY, y, strideY ) {
    // Call into native add-on...
}

function fallback( N, dtypeX, x, strideX, dtypeY, y, strideY ) {
    // Fallback JavaScript implementation...
}

// Create a dispatch function:
var f = dispatch( addon, fallback );

// ...

// Invoke the dispatch function with strided array arguments:
f( 2, 'generic', [ 1, 2 ], 1, 'generic', [ 0, 0 ], 1 );

The returned function has the following signature:

f( N, dtypeX, x, strideX, dtypeY, y, strideY )

where

N: number of indexed elements.
dtypeX: x data type.
x: input array.
strideX: x stride length.
dtypeY: y data type.
y: output array.
strideY: y stride length.

The addon function should have the following signature:

f( N, dtypeX, x, strideX, dtypeY, y, strideY )

where

N: number of indexed elements.
dtypeX: x data type (enumeration constant).
x: input array.
strideX: x stride length.
dtypeY: y data type (enumeration constant).
y: output array.
strideY: y stride length.

The fallback function should have the following signature:

f( N, dtypeX, x, strideX, dtypeY, y, strideY )

where

N: number of indexed elements.
dtypeX: x data type.
x: input array.
strideX: x stride length.
dtypeY: y data type.
y: output array.
strideY: y stride length.

dispatch.ndarray( addon, fallback )

Returns a function which dispatches to a native add-on applying a unary function to an input strided array using alternative indexing semantics.

function addon( N, dtypeX, x, strideX, dtypeY, y, strideY ) {
    // Call into native add-on...
}

function fallback( N, dtypeX, x, strideX, offsetX, dtypeY, y, strideY, offsetY ) {
    // Fallback JavaScript implementation...
}

// Create a dispatch function:
var f = dispatch.ndarray( addon, fallback );

// ...

// Invoke the dispatch function with strided array arguments:
f( 2, 'generic', [ 1, 2 ], 1, 0, 'generic', [ 0, 0 ], 1, 0 );

The returned function has the following signature:

f( N, dtypeX, x, strideX, offsetX, dtypeY, y, strideY, offsetY )

where

N: number of indexed elements.
dtypeX: x data type.
x: input array.
strideX: x stride length.
offsetX: starting x index.
dtypeY: y data type.
y: output array.
strideY: y stride length.
offsetY: starting y index.

The addon function should have the following signature:

f( N, dtypeX, x, strideX, dtypeY, y, strideY )

where

N: number of indexed elements.
dtypeX: x data type (enumeration constant).
x: input array.
strideX: x stride length.
dtypeY: y data type (enumeration constant).
y: output array.
strideY: y stride length.

The fallback function should have the following signature:

f( N, dtypeX, x, strideX, offsetX, dtypeY, y, strideY, offsetY )

where

N: number of indexed elements.
dtypeX: x data type.
x: input array.
strideX: x stride length.
offsetX: starting x index.
dtypeY: y data type.
y: output array.
strideY: y stride length.
offsetY: starting y index.

Notes

To determine whether to dispatch to the addon function, the returned dispatch function checks whether the provided arrays are typed arrays. If the provided arrays are typed arrays, the dispatch function invokes the addon function; otherwise, the dispatch function invokes the fallback function.

Examples

var Float64Array = require( '@stdlib/array-float64' );
var dispatch = require( '@stdlib/strided-base-unary-addon-dispatch' );

function addon( N, dtypeX, x, strideX, dtypeY, y, strideY ) {
    console.log( x );
    // => <Float64Array>[ 3, 4 ]

    console.log( y );
    // => <Float64Array>[ 7, 8 ]
}

function fallback( N, dtypeX, x, strideX, offsetX, dtypeY, y, strideY, offsetY ) {
    console.log( x );
    // => [ 1, 2, 3, 4 ]

    console.log( y );
    // => [ 5, 6, 7, 8 ]
}

// Create a dispatch function:
var f = dispatch.ndarray( addon, fallback );

// Create strided arrays:
var x = new Float64Array( [ 1, 2, 3, 4 ] );
var y = new Float64Array( [ 5, 6, 7, 8 ] );

// Dispatch to the add-on function:
f( 2, 'float64', x, 1, 2, 'float64', y, 1, 2 );

// Define new strided arrays:
x = [ 1, 2, 3, 4 ];
y = [ 5, 6, 7, 8 ];

// Dispatch to the fallback function:
f( 2, 'generic', x, 1, 2, 'generic', y, 1, 2 );

Notice

This package is part of stdlib, a standard library for JavaScript and Node.js, with an emphasis on numerical and scientific computing. The library provides a collection of robust, high performance libraries for mathematics, statistics, streams, utilities, and more.

For more information on the project, filing bug reports and feature requests, and guidance on how to develop stdlib, see the main project repository.

Community

License

See LICENSE.

Copyright

Welcome to OpenCL by Examples using C++.

Why OpenCL?

I actually started learning CUDA for GPGPU first, but since I do my work with a MacBook Air (late 2012 model); I quickly realized I couldn’t run CUDA code. My machine has an Intel HD Graphics 4000, I know, it sucks, but still usuable! My search on how to best make use of it led me to OpenCL.

My interests with OpenCL is primarly motivated by my interests in Deep Learning. I want a better understanding of how these frameworks are making use of GPGPU to blaze through model training.

Here we are now, a repo of OpenCL examples. I’ll be adding more examples here as I pickup more of OpenCL. I am thinking each example will get a bit more complex.

Setup I am using

Mac OSX
OpenCL 1.2
C++ 11
cmake 3.7

How to Build and Run

Clone this repo and cd in this repo.
Run mkdir build && cd build
Run cmake .. && make

If everything has been correctly installed, you should be able to build the examples with no problems. Check out the CMakeLists.txt file for info on how the examples are being built.

Note, I already added the C++ header for OpenCL 1.x in the libs directory. However, if you are for example working with OpenCL 2 you can create your own header file. Head over to the KhronosGroup OpenCL-CLHPP repo and do the following.

Run git clone https://github.com/KhronosGroup/OpenCL-CLHPP
Run cd OpenCL-CLHPP
Run python gen_cl_hpp.py -i input_cl2.hpp -o cl2.hpp
Move the generated header file cl2.hpp into the libs directory.
Profit!

Quick Introduction and OpenCL Terminology

You’re here so I don’t need to convince you that parallel computing is awesome and the future. I don’t expect you to become an expert after you’ve gone through this repo, but I do hope you at least get an overview of how to think in OpenCL.

OpenCL™ (Open Computing Language) is the open, royalty-free standard for cross-platform, parallel programming of diverse processors found in personal computers, servers, mobile devices and embedded platforms. – khronos site

The following are terms to know:

Platform: Vendor specific OpenCL implementation.
Host: The client code that is running on the CPU. Basically your application.
Device: The physical devices you have that support OpenCL (CPU/GPU/FPGA etc..)
Context: Devices you select to work together.
Kernel: The function that is run on the device and does the work.
Work Item: A unit of work that executes a kernel.
Work Group: A collection of work items.
Command Queue: The only way to tell a device what to do.
Buffer: A chunk of memory on the device.
Memory: Can be global/local/private/constant (more on this later.)
Compute Unit: Think of a GPU core.

OpenCL Memory Model

Project Overview

Design and Implementation of an MLOps CI/CD Pipeline for a Music Recommendation System

This project demonstrates the design and implementation of a Continuous Integration and Continuous Deployment (CI/CD) pipeline for a Music Recommendation System leveraging Spotify playlists.

Key components of the pipeline include:

CI/CD Pipeline Development: Designed an automated pipeline for building, testing, and deploying the music recommendation system.
Pipeline Orchestration with Apache Airflow: Configured and managed the orchestration of the pipeline using Apache Airflow, enabling automated, scalable, and efficient workflows.
Experiment Monitoring with MLflow & DagsHub: Integrated MLflow for tracking machine learning experiments, model parameters, and performance metrics, with the MLflow server hosted on DagsHub.
Containerization with Docker: Dockerized both the recommendation application and the pipeline orchestration to ensure portability, scalability, and consistency across environments.

Project Architecture

.
├── .github/
│   ├── workflows/
│       ├── docker-tests.yml
├── config/
├── dags/
│   ├── custom_logger.py
│   ├── dag_data_feeding.py
│   ├── dag_utils.py
│   │   
├── logs/
│   │        
│   └── logs.log
│
├── mlops_msr
│   │  
│   ├── data/
│   │   ├── interim/
│   │   │   
│   │   ├── models_best/
│   │   │   
│   │   ├── new/
│   │   │   
│   │   ├── processed/
│   │   │   ├── X_test.csv
│   │   │   ├── X_train.csv
│   │   │   ├── playlist_df.csv
│   │   │   ├── y_test.csv
│   │   │   └── y_train.csv
│   │   │
│   │   ├── raw/
│   │   │   ├── data_feeding.log
│   │   │   ├── dataset.csv
│   │   │   ├── dataset.zip
│   │   │   ├── playlists.json
│   │   │   ├── retrain.log
│   │   │   ├── retrain_logs.log
│   │   │   ├── song_df.csv
│   │   │   │
│   │   ├── reco/
│   │   │   └── recommendation_data.json
│   │   ├── to_rec/
│   │   │   ├── genres.txt
│   │   │   ├── to_rec_0.csv
│   │   │   ├── to_rec_1.csv
│   │   │   ├── to_rec_2.csv
│   │   │   ├── to_rec_3.csv
│   │   │   ├── to_rec_4.csv
│   │   │   ├── to_rec_5.csv
│   │   │   ├── to_rec_6.csv
│   │   │   ├── to_rec_7.csv
│   │   │   ├── to_rec_8.csv
│   │   │   └── to_rec_9.csv
│   │   ├── uris/
│   │   └── status.txt
│   ├── images/
│   │   └── cosine_s.jpg
│   ├── logs/
│   │   └── logs.log
│   ├── metrics.
│   │   
│   ├── mlruns/
│   │   
│   ├── models/
│   │   
│   ├── models_best/
│   │   
│   ├── notebooks/
│   │ 
│   ├── shell_scripts_dk/
│   │   ├── run_data_feeding.sh
│   │   └── run_pipe.sh
│   │ 
│   ├── src/
│   │   │
│   │   ├── app/
│   │   │   │   
│   │   │   ├── logs
│   │   │   ├── static
│   │   │   ├── templates
│   │   │   ├── app.py
│   │   │   ├── app_utils.py
│   │   │   └── reco_monitoring.py
│   │   │   
│   │   ├── data_module_def/
│   │   │   ├── __pycache__
│   │   │   ├── __init__.py
│   │   │   ├── data_ingestion.py
│   │   │   ├── data_transformation.py
│   │   │   ├── data_validation.py
│   │   │   └── schema.yaml
│   │   │   
│   │   ├── models_module_def/
│   │   │   ├── __pycache__
│   │   │   ├── __init__.py
│   │   │   ├── model_evaluation.py
│   │   │   ├── model_trainer.py
│   │   │   ├── params.yaml
│   │   │   └── unsmodel_fit.py
│   │   │   
│   │   ├── pipeline_steps/
│   │   │   ├── __pycache__
│   │   │   ├── __init__.py
│   │   │   ├── prediction.py
│   │   │   ├── prediction_old_nogit.py
│   │   │   ├── stage01_data_ingestion.py
│   │   │   ├── stage02_data_validation.py
│   │   │   ├── stage03_data_transformation.py
│   │   │   ├── stage04_model_trainer.py
│   │   │   ├── stage05_model_evaluation.py
│   │   │   └── stage06_uns_model_fit_eval.py
│   │   │
│   │   ├── __init__.py
│   │   ├── common_utils.py
│   │   ├── config.py
│   │   ├── config.yaml
│   │   ├── config_manager.py
│   │   ├── data_feeding.py
│   │   ├── entity.py
│   │   └─ launch_retrain.py
│   │
│   ├── users/
│   │   └── users.json
│   │ 
│   ├── Dockerfile
│   ├── __init__.py
│   ├── app_requirements.txt
│   ├── custom_logger.py
│   ├── docker-compose.yaml
│   ├── dvc.lock
│   ├── dvc.yaml
│   └── main.py
│   
├── plugins/
│   
├── tests/
│
├── Dockerfile
├── README.md
├── __init__.py
├── air_requirements.txt
├── docker-compose.yaml
└── requirements.txt

Setup

1. Clone the Repository

Start by cloning this repository to your local machine:

git clone https://github.com/micheldpd24/mlops_air_msr.git
cd mlops_air_msr

2. Python Virtual Environment Setup

Create and activate a Python virtual environment:

python3 -m venv .venv
source .venv/bin/activate  # Linux/MacOS
.venv\Scripts\activate  # Windows

install the required dependencies:

pip install -r requirements.txt

1st clone the repos

3. Docker Setup

Make sure you have Docker installed on your system. Follow the installation instructions on the official Docker website: Install Docker.

4. Configuration for Spotify, DagsHub, and MLflow

You need to set up your credentials for the Spotify API and DagsHub. Here’s how to configure them:

Spotify API:
- Go to Spotify for Developers and get your Spotify Client ID an Spotify Client Secret.
DagsHub & MLflow:
- Create a DagsHub repository for experiment tracking with MLflow.
- Get your MLflow URI and MLflow Tracking Username from DagsHub.
Open the configuration file: mlops_msr/mlflow_and_sp.env and add the following information:

MLFLOW_TRACKING_URI=https://dagshub.com/<your-username>/<your-repo-name>.mlflow
MLFLOW_TRACKING_USERNAME=<your-dagsHub-username>
MLFLOW_TRACKING_PASSWORD=<your-mlflow-tracking-password>
CLIENT_ID=<your-spotify-api-client-id>
CLIENT_SECRET=<your-spotify-api-client-secret>

Note: Make sure to save the changes to this file.

Data Setup

Unzip playlist.json.zip into the directory: mlops_msr/data/raw/ and remove the zip file afterward.
Ensure that you have the dataset file dataset.zip in mlops_air_msr/mlops_msr/data/raw/ (this is the original dataset of Spotify songs used to initialize the pipeline).

Starting the Pipeline

Once everything is set up, you can start the pipeline using Docker Compose:

docker-compose up --build

After the containers are up and running:

Music Recommender Application: Access the Flask API interface at http://localhost:5000.
Apache Airflow UI: Access the Airflow UI at http://localhost:8080.

Pipeline architecture

The architecture of the system involves 3 main blocs

Spotify data feed
Data processing
Models training and Evaluation:

Setting up the MLOps project step by step 🚀

The project was implemented in two phases:

phase 1 : Mlops pipeline without Spotify data feed

phase 2 : Complete pipeline with Spotify data feed

Phase 1

This part is inspired from Datascientest MLOps wine quality implementation

Welcome to the setup guide! Here, we’ll outline the steps needed to configure and implement the various first stages of the MLOps pipeline. Follow along and fill in the details as you proceed through each step in the workflow_steps.ipynb notebook.

You can start by getting familiar with the architecture of the project:

Through this project we’ll work with a songs dataset. The goal is implement a recommendation system that will recommend a number of songs given a spotify music playlist. All while adhering to the best practices in MLOps in terms of version control, use of pipelines and the most commonly used tools.

The recommender application is a Flask API
Models training and Evaluation are monitored with MLflow with the MLflow Server in DagsHub

Configuration Files 📘

Let’s have a quick look at the three yaml files in our src folder.

You can start by having a look at the config.yaml 📂 You will see that it sets the paths to the different files that will be used and created in each of the steps we’ll put in place.

Next, inside the data_module_def folder we have the schema.yaml 🗃️ If you have a look at it you’ll see it defines the data types for each column in the dataset we’ll work with.

Finally, inside the models_module_def folder you can have a look at params.yaml 📊 What this file does is set the hyperparameters of the model we’ll put in place.

⚠️ The file src/config.py defines the global variables containing the paths to these yaml files to facilitate their access.

Common Utilities 🛠️

In src/common_utils.py we have reusable functions:

read_yaml(filepath: str) -> dict
create_directories(paths: List[str])
save_json(path: str, data: dict)
load_json

These utilities will streamline the loading of configurations and ensure necessary directories are created.

Let’ get to work!

The task

For the next steps you can use the notebook workflow_steps.ipynb to guide you through the code you’ll need to write on each of the corresponding files 🧑‍💻 The task consist of five steps which will help you implement a modularized workflow of an MLOps project.

Step 1: Define Configuration Classes 🧩

Start by writing the configuration objects in src/entity.py. These configurations will help in managing the settings and parameters required for each stage in a clean and organized manner. Using the Step 1 section in the notebook, define dataclasses for configuration objects:

DataIngestionConfig
DataValidationConfig
DataTransformationConfig
ModelTrainerConfig
ModelEvaluationConfig
UnsModelFitConfig

Step 2: Configuration Manager 🗄️

Create the class ConfigurationManager in src/config_manager.py using the Step 2 of the notebook. This class will:

Read paths from config.yaml.
Read hyperparameters from params.yaml.
Read the data types from schema.yaml.
Create configuration objects for each of the stages through the help of the objects defined on the step before: DataIngestionConfig, DataValidationConfig, ModelTrainerConfig and ModelEvaluationConfig.
Create necessary folders.

⚠️ Pay attention to the mlflow_uri on the get_model_evaluation_config, make sure you adapt it with your own dagshub credentials.

Step 3: Data module definition and model module definition.

Using the Step 3 section of the notebook, in the corresponding files of the src/data_module_def folder, create:

Data Ingestion module 📥

This class will:

Download the dataset into the appropriate folder.
Unzip the dataset into the appropriate folder.

Data Validation module ✅

This class will:

Validate columns against the schema. Optional: you can also verify the informatic type.
Issue a text file saying if the data is valid.

Data Transformation module 🔄

This class will:

Split the data into training and test sets.
Save the corresponding csv files into the appropriate folder.

Similarly, in the corresponding files of the src/models_module_def folder, create:

Model trainer module 🏋️‍♂️

This class will:

Train the model using the hyperparameters specified in params.yaml.
Save the trained model into the appropriate folder.

Model Evaluation module 📝

This class will

Evaluate the model and log metrics using MLFlow

Unsupervised Model Fit and Evaluation module 📝 This class will

Fit a clustering model for each music genre
Evaluate each fitted model and log metrics using MLFlow

Step 4: Pipeline Steps 🚀

Using the Step 4 of the notebook, in src/pipeline_steps create scripts for each stage of the pipeline to instantiate and run the processes:

stage01_data_ingestion.py
stage02_data_validation.py
stage03_data_transformation.py
stage04_model_trainer.py
stage05_model_evaluation.py
stage06_uns_model_fit_eval.py

On each script you have to complete the classes with two methods: an __init__ that doesn’t do anything, and a main where you have to implement the code in each section of the Step 4 of the notebook.

Step 5: Use DVC to connect the different stages of your pipeline 🦉

Start by setting DagsHub as your distant storage through DVC.

dvc remote add origin s3://dvc
dvc remote modify origin endpointurl https://dagshub.com/your_username/your_repo.s3 
dvc remote default origin

Use dvc to connect the different steps of your pipeline.

For example, the command for addind the first step of the pipeline is:

dvc run -n data_ingestion \
  -d src/config.yaml \
  -d src/pipeline_steps/stage01_data_ingestion.py \
  -o data/raw/song_df.csv \
  -c 'python src/pipeline_steps/stage01_data_ingestion.py'

Add the following steps for the data transformation, data validation, model training and model evaluation.

You can run the pipeline through the command dvc repro.

Congratulations! 🎉 Now that you have a structured and well-defined MLOps project you’re ready for the next step which is the creation of the API.

Each step is modularized, making it easy to maintain, extend, and scale your Machine Learning pipeline.

Phase 2. Pipeline with Spotify data feed and Orchestration with Apache Airflow

The airflow dag is defined in “dag_data_feeding.py” script

General Objective

This Python script defines a DAG in Airflow called data_feeding_dag, which orchestrates a pipeline to extract, transform, and update Spotify data. The pipeline integrates Spotify API data into a machine learning workflow. Key steps include:

Verifying and collecting Spotify data.
Updating the song database.
Processing the data.
Training and evaluating machine learning models (classification and clustering).

Key Highlights of the Script

Libraries and Configurations

Main Imports:
- Airflow for task management (DAG, Operators, TaskGroups).
- Spotipy for interacting with Spotify’s API.
- pandas for data manipulation.
- Custom modules (dag_utils) for reusable utility functions.
Global Variables:
- CLIENT_ID and CLIENT_SECRET: API credentials for Spotify.
- Data directories (URIS_DIR, INTERIM_DIR) for organizing processed files.
API Rate Limiting:
- Implements a delay between requests to comply with Spotify’s rate limits.

Key Functions

Spotify Data Management

check_songs_base_init:
- Checks if a local song database exists. If not, it triggers a data processing phase starting with the initial dataset ingestion.
get_songs_from_spotify:
- Retrieves song features from Spotify’s API.
update_songs_base:
- Merges newly fetched data with the existing database, removes duplicates, and archives older versions if significant changes are detected.

Pipeline Stages

data_ingestion: Loads initial data.
data_validation: Validates data against the expected schema.
data_transformation: Prepares data for machine learning models.
Model Training and Evaluation:
- classification_model_training and classification_model_evaluation handle classification models.
- clustering_models_fit_and_evaluation handles clustering models.

DAG Definition

General Parameters:
- Scheduled to run every hour (CRON: 20 * * * *).
- Limits execution to one active instance at a time.
Task Structure:
- Initial Branching: check_songs_base_init decides whether to collect data from spotify or initialize the songs base if it is not.
- Task Groups:
  - spotify_data_feed: Handles song collection and database updates.
  - data_processing: Handles ingestion, validation, and transformation of data.
  - classification_model: Handles classification model training and evaluation.
- Final Task: end signals the end of the DAG.
Dependencies:
- The DAG organizes tasks into branches and sequences (e.g., spotify_data_feed must finish before data processing can proceed).

Summary

This Airflow DAG automates a pipeline to:

Sync Spotify data.
Update a local song database.
Prepare the data for machine learning models.
Train and evaluate these models.

It provides a flexible integration with Spotify’s API while maintaining a structured and modular workflow for managing data and models.

The recommendation application

The primary objective of this project is to create an application that recommends at least 10 songs based on a submitted Spotify playlist.

Our application is developed using the Flask framework, ensuring scalability and ease of integration.

The main codebase for the Flask application is located in mlops_msr/src/app/app.py.

Synthetic Description of app.py

Overview

This script implements a Flask web application with user authentication, admin functionalities, and a music recommendation system. The app incorporates robust security measures, user session management, and role-based access control.

Key Functionalities

1. User Authentication and Management

Registration: Users can register with a secure password validation system.
Login/Logout: Users can log in with rate-limited attempts to prevent brute force attacks.
Role Management: Admin and regular users have distinct access rights.
User Storage: User data is stored in a TinyDB database (users.json).

2. Admin-Specific Features

Parameter Update: Admins can update model parameters (e.g., GradientBoostingClassifier and GaussianMixture) stored in a YAML file.
Model Retraining: Admins can trigger model retraining through a button.
Monitoring: Admins can view a cosine similarity trend via a Plotly-generated graph.
User Deletion: Admins can remove users from the database.

3. Music Recommendation

Utilizes the predict_song function to recommend songs based on a reference playlist and a machine learning pipeline.
Rate-limited to ensure fair usage (5 recommendations/minute).

4. Security Measures

Password Validation: Ensures secure passwords with length, uppercase, digit, and special character requirements.
Rate Limiting: Limits global app requests and specific actions (e.g., login, recommendations) to mitigate abuse.
Session Configuration: Implements secure cookie settings and a session timeout.

5. Routes

/register & /login: Handle user registration and login with validation.
/logout: Logs out the current user.
/welcome: Displays a logged-in user’s welcome page.
/recommend: Processes song recommendations for logged-in users.
/update_params: Allows admins to update ML parameters.
/train: Triggers model retraining.
/monitoring: Displays a trend graph for monitoring recommendation system performance.
/delete_user: Provides admin functionality to delete a user.

Technology Stack

Frameworks & Libraries: Flask, Flask-Login, TinyDB, Flask-Limiter, YAML, Plotly.
Security: Implements secure password hashing (werkzeug), role-based access, and session protection.

Usage

The application is designed to provide a secure and feature-rich platform for music recommendations with customizable ML parameters, robust user authentication, and monitoring capabilities.

Data sources

Spotify tracks dataset

References

Datascientest mlops courses

Kidney Disease Classification using MLflow & DVC

Finding the Next Best Songs with Machine Learning

Enhance your Playlists with Machine Learning: Spotify Automatic Playlist Continuation

PHISHING TOOLS
OSINT
Awesome List
APTS
SSRF
Reverse Engineering
MALWARE
Machine Learning
MALWARE SAMPLES
CHEATSHEETS
Machine learning & hacking
SCANNERS
Kali Linux
Linux Guide
Fuzzing
PowerShell
Vulnerable VMs
Bug Bounty
WEB
NEWS SITES
BLUE-TEAM
XSS
SQL Injection
CSRF
Google Hacking
Python
Ruby
Honey Pots
CTF
CTF Tools
Cryptocurrency
Tor
Deep Learning
Threat Maps
Encryption
Red Team
Networking
random-reports
Interview-cheatsheets
Search Engines
Bash
Html-Smuggling
Core
FTP
ravencoin
ssrf
Bash
Mysql
VMs

VMs

https://manjaro.site/how-to-enable-full-screen-mode-on-ubuntu-19-10-on-vmware-workstation-15-5/

Mysql

Bash

SSRF

https://www.vaadata.com/blog/exploiting-the-ssrf-vulnerability

ravencoin

FTP

random-reports

core

html-smuggling

Red-teaming

https://chaah.awankloud.my/redteaming-tips-creating-a-hidden-user

machine-learning-and-hacking

https://github.com/delvelabs/batea
https://github.com/PacktPublishing/Hands-On-Artificial-Intelligence-for-Cybersecurity
https://apps.dtic.mil/dtic/tr/fulltext/u2/a618584.pdf – Automated Cyber Red Teaming – Cyber and Electronic Warfare Division – defence Science and Technology Organisation
https://github.com/gyoisamurai/GyoiThon
https://github.com/Kayzaks/HackingNeuralNetworks
http://taochen.github.io/publications/papers/issta20.pdf

ssrf

https://www.mcafee.com/blogs/other-blogs/mcafee-labs/server-side-request-forgery-takes-advantage-vulnerable-app-servers

Networking

Fedora

Awesome List

search-engines

interview-cheatsheets

red-Team

blue team

PowerShell

Vulnerable VMs

https://github.com/cliffe/SecGen

Linux Guide

https://github.com/UticaCollegeCyberSecurityClub/LinuxGuide

Kali Linux

https://ourcodeworld.com/articles/read/961/how-to-solve-kali-linux-apt-get-install-e-unable-to-locate-package-checkinstall

Fuzzing

https://rhinosecuritylabs.com/research/fuzzing-left4dead-2-with-fuzzing-framework

SSH

Deep Learning

Reverse_Engineering

Machine-Learning

Cheatsheets

Encryption & Cryptography

Apts

PHISHING TOOLS

Modlishka. Reverse Proxy – https://github.com/drk1wi/Modlishka
Evilginx2 MITM phishing – https://github.com/kgretzky/evilginx2
HiddenEye – https://github.com/DarkSecDevelopers/HiddenEye

OSINT

Mitaka A Browser Extension For OSINT Search – https://github.com/ninoseki/mitaka
DarkScrape OSINT Tool For Scraping Dark Websites – https://github.com/itsmehacker/DarkScrape
Sherlock Find Usernames Across Social Networks – https://github.com/sherlock-project/sherlock
OSINT-SPY Search using OSINT – https://github.com/SharadKumar97/OSINT-SPY
SpiderFoot OSINT Tool – https://github.com/smicallef/spiderfoot
KillShot A Penetration Framework – https://github.com/bahaabdelwahed/killshot
Ethereum recon and exploitation tool – https://github.com/cleanunicorn/theo
Find breached emails, databases, pastes – https://github.com/Ekultek/WhatBreach
Xray – A Tool For Recon, Mapping And OSINT Gathering – https://github.com/evilsocket/xray
OSINT framework – https://osintframework.com/
Discover – https://github.com/leebaird/discover
ReconCobra – https://github.com/haroonawanofficial/ReconCobra
plot a Twitter user’s activity onto a map – http://geosocialfootprint.com/
View DNS – https://viewdns.info/
Twitter trend map – https://www.trendsmap.com/

Windows SubSystem

MALWARE

fireELF – Fileless Linux Malware Framework – https://github.com/rek7/fireELF
Ustealer – Ubuntu Stealer, Steal Ubuntu Information – https://github.com/atmoner/Ustealer
BYOB (Build Your Own Botnet) – https://github.com/malwaredllc/byob
UBoat – HTTP Botnet Project – https://github.com/Souhardya/Uboat

Hardware

Posion tap – https://samy.pl/poisontap/
Rubber ducky on steroids – https://github.com/whid-injector/WHID
PI hole – https://github.com/pi-hole/pi-hole
Homemade LAN turtle – https://github.com/CuPcakeN1njA/Int3rcept0r
Rasp pi AI Security camera – https://medium.com/berrynet/diy-your-ai-home-security-camera-with-raspberry-pi-and-open-source-software-10d4364df20f
https://hakin9.org/pi-sniffer-is-a-wi-fi-sniffer-built-on-the-raspberry-pi-zero-w

Cryptocurrency & Blockchains & Tokens

Bitcoin – https://www.bitcoin.com/
Monero – https://www.getmonero.org/
Zcash – https://z.cash/
Ethereum – https://ethereum.org
RavenCoin – https://ravencoin.org
DogeCoin – https://dogecoin.com
TurtleCoin – https://turtlecoin.lol
LiteCoin – https://litecoin.org
BitcoinCash – https://bitcoincash.org
Dash – https://www.dash.org
Stellar – https://www.stellar.org
Mining – https://www.investopedia.com/tech/how-does-bitcoin-mining-work/
51% attack – https://github.com/cburchert/shitcoin\
Play with blockchain – https://github.com/DutchGraa/crackcoin
Monero Info – https://www.monero.how/monero-infographic
Simulated Blockchains for Machine Learning Traceability and Transaction Values in the Monero – https://arxiv.org/abs/2001.03937
How Blockchain works – http://blockchain.mit.edu/how-blockchain-works
http://list.zju.edu.cn/kaibu/netsec/lec04-blockchain.ppt
https://www.exablue.de/en/blog/2020-06-11-malware-on-the-blockchain.html
https://cryptofacilities.zendesk.com/hc/en-us/articles/115002807834-Calculation-of-profit-and-loss
https://github.com/citp/BlockSci
https://bitcointalk.org/index.php?topic=5141594.0
https://medium.com/@philipshen13/monero-part-1-key-concepts-3671186016c6
https://blockchain.unica.it/projects/blockchain-analytics
https://youtu.be/cjbHqvr4ffo – How Does Monero Work?
https://bitcoin.stackexchange.com/questions/59955/how-do-i-calculate-the-profit-of-the-value-of-my-bitcoin
https://reserve.org
https://foam.space
https://dad.one
https://getravencoin.org/ravencoin-asset-token
https://hackernoon.com/what-are-stellar-assets-79b3145b5c7f
https://tronblack.medium.com/ravencoin-proof-of-authenticity-4a0d325d5347

Offline wallet

Govt

Tor

Tor’s web site – https://www.torproject.org
Dark web sites – https://dark.fail/#Philosophy
How tor works – https://jordan-wright.com/blog/2015/02/28/how-tor-works-part-one

BlockChain Analysis

Wallet Explorer – https://www.walletexplorer.com
C-Hound – https://www.c-hound.ai
BlockChain – https://www.blockchain.com
BlockChair – https://blockchair.com
Blockchain Analysis – https://Blockstream.info
https://oxt.me
https://matbea.net
https://sochain.com

Centos

Malware Samples

NOTE: The ones with * after the links are ones that are vetted. If you use your uc email and tell them your a cyber student, they wil mostly likely give you access.

Hybrid analysis – https://www.hybrid-analysis.com
VirusShare – https://virusshare.com*
VirusTotal – https://www.virustotal.com

Threat Maps

SCANNERS

shuffleDNS is a wrapper around massdns – https://github.com/projectdiscovery/shuffledns
Fast web fuzzer written in Go – https://github.com/ffuf/ffuf
bruteforce for AWS s3 buckets – https://github.com/nahamsec/lazys3
MassDNS subdomain enumeration – https://github.com/blechschmidt/massdns
Fenrir Simple Bash IOC Scanner – https://github.com/Neo23x0/Fenrir
Slurp S3 Bucket Enumerator – https://github.com/hehnope/slurp
RapidScan Web Scanner – https://github.com/skavngr/rapidscan
black box WordPress security – https://github.com/wpscanteam/wpscan
ALT DNS – DNS recon – https://github.com/infosec-au/altdns
Sn1per automated scanner – https://github.com/1N3/Sn1per
Httpgrep – Scans HTTP Servers To Find Given Strings In URIs – https://github.com/noptrix/httpgrep
https://github.com/aboul3la/Sublist3r
https://github.com/maurosoria/dirsearch
https://github.com/guelfoweb/knock
https://github.com/nahamsec/lazyrecon
https://cirt.net/Nikto2
https://github.com/yanxiu0614/subdomain3
https://github.com/swisskyrepo/SSRFmap

WEB

Trasxss automated XSS – https://github.com/M4cs/traxss
XSpear Powerfull XSS Scanning And Parameter Analysis Tool – https://github.com/hahwul/XSpear
XSSCon: Simple XSS Scanner tool – https://github.com/menkrep1337/XSSCon
Corsy – CORS Misconfiguration Scanner – https://github.com/s0md3v/Corsy
https://fsec404.github.io/blog/HTTP-parameter-pollution/
https://telekomsecurity.github.io/2020/05/smuggling-http-headers-through-reverse-proxies.html
https://portswigger.net/research/http-desync-attacks-request-smuggling-reborn
https://www.rapid7.com/fundamentals/web-application-vulnerabilities

Bug Bounty

XSS

SQL Injection Tools

SQLMap – https://github.com/sqlmapproject/sqlmap
jSQL Injection is a Java application – https://github.com/ron190/jsql-injection
Blisqy – Exploit Time-based blind-SQL – https://github.com/JohnTroony/Blisqy

SQL Injection

CSRF

Google Hacking

Python

Ruby

C++

Honey Pots

CTF

CTF Tools

commercetools-pino-middleware

Overview

commercetools-pino-middleware is a library that provides a seamless integration of Pino logger with the commercetools SDK. It allows you to easily log SDK requests, responses, and other relevant information with Pino, a fast and minimalist Node.js logger. The middleware is designed to be flexible and can be set up with either an auto-generated Pino instance or a custom Pino logger with specific configurations.

Installation

You can install the library via npm:

npm install @composable-software/commercetools-pino-middleware

or with yarn

yarn add @composable-software/commercetools-pino-middleware

Usage

1. Using Auto-generated Pino Instance

If you prefer a hassle-free setup, you can let the middleware create and configure the Pino instance for you. Simply pass an empty object when setting up the middleware:

import { createPinoMiddleware } from '@composable-software/commercetools-pino-middleware';
import { ClientBuilder } from '@commercetools/sdk-client-v2';

/**
 * Middleware with automatic Pino factory
 */
const client = new ClientBuilder()
  .withMiddleware(createPinoMiddleware({}))
  .build();

In this method, the middleware will handle the instantiation and configuration of the Pino logger automatically.

2. Using a Custom Pino Instance

If you need a more customized Pino logger, you can pass your own Pino instance through options:

import pino from 'pino';
import { createPinoMiddleware } from '@composable-software/commercetools-pino-middleware';
import { ClientBuilder } from '@commercetools/sdk-client-v2';

/**
 * Custom Pino logger instance that can be passed to the middleware
 */
const logger = pino({
  name: 'custom-logger',
  level: 'info',
});

const options = {
  logger: logger,
};

const client = new ClientBuilder()
  .withMiddleware(createPinoMiddleware(options))
  .build();

In this case, the middleware will use the provided Pino instance to log the information, allowing you to have full control over the logger’s configuration.

Logging Details

The commercetools-pino-middleware logs essential details related to SDK requests and responses, including:

Request method and URL
Request headers
Request body (if applicable)
Response status code
Response headers
Response body (if applicable)

All logs are output in a structured JSON format, which makes it easy to parse and analyze the logged data.

Examples

Here are some examples of how the middleware logs different scenarios:

Successful Request:

{"level":"info","time":1678376443924,"msg":"Request sent","method":"POST","url":"https://api.commercetools.com/products","headers":{"Authorization":"Bearer <access_token>","Content-Type":"application/json"},"body":{"typeId":"product","id":"abc123","version":5}}
{"level":"info","time":1678376453920,"msg":"Response received","status":201,"headers":{"x-request-id":"abc123","content-type":"application/json"},"body":{"id":"abc123","version":5}}

Failed Request:

{"level":"error","time":1678376463922,"msg":"Request failed","method":"GET","url":"https://api.commercetools.com/products/xyz789","headers":{"Authorization":"Bearer <access_token>"}}
{"level":"error","time":1678376473923,"msg":"Response received with error","status":404,"headers":{"x-request-id":"xyz789"},"body":{"statusCode":404,"message":"Not Found"}}

Contributing

We welcome contributions from the community! If you encounter any issues or have suggestions for improvements, please feel free to open an issue or submit a pull request on our GitHub repository.

License

This library is licensed under the MIT License. See the LICENSE file for more details.