Infrastructure and server diagnostics with the SimpleOne app

The commands are provided for informational purposes only; it is recommended to study their operation on test environments.

:sparkle: Docker Container Diagnostics

  • List all containers and their statuses
    docker ps -a --format "table {{.Names}}\t{{.Status}}"

  • Display containers that are NOT in “Up” status (if all are running successfully, the output will be empty).
    docker ps -a --format "table {{.Names}}\t{{.Status}}\t{{.State}}" | grep -v "Up"

  • Container resource usage (CPU, RAM, network I/O, block I/O)
    watch docker stats --no-stream

  • Display container logs for a specific time period
    (e.g., proxy, haproxy, backend-api, postgres, minio, kafka, rabbitmq, and others)
    1 hour
    docker logs --since 1h backend-api
    10 minutes
    docker logs --since 10m backend-api

  • View container logs in real time starting from the current moment
    docker logs -fn 0 postgres

  • Output logs for a specific time period
    docker logs --since "2024-07-26T11:30:00" --until "2024-07-26T12:30:00" postgres

  • Docker container disk usage (OverlayFS), show top 5 largest
    docker ps -as --format "{{.Names}}: {{.Size}}" | sort -hr -k2 | head -n 5

  • Save logs from the last hour:
    Create a directory to store logs
    mkdir logs-"$(hostname -s)-$(date +%Y-%m-%d)"
    Navigate into the created directory using cd and run the command to generate log files for each container
    docker ps -a --format '{{.Names}}' | xargs -I {} sh -c 'docker logs --since "1h" {} > {}.log 2>&1'

  • Save all available logs for a container
    docker logs -t indication-calculator > logs.txt 2>&1

:sparkle: Server Computing Resources (CPU/RAM)

  • Interactive monitoring of processes and load
    htop

  • RAM and SWAP size and utilization
    free -hw

  • Information about number of cores and processor
    lscpu

:sparkle: Server Disks

  • Information about server block devices (hard drives, storage devices, partitions, logical volumes)
    lsblk

  • Display filesystem usage and type information
    df -Th

  • Size of specific directories
    du -sh /data/postgres/pgdata/pgroot/data/pg_wal/
    du -sh /data/kafka*

  • Size of all files and directories in the current ./ directory, sorted in descending order
    du -sh * | sort -hr

  • Utility for analyzing disk space usage (ncdu) (-r flag to operate safely in read-only mode and avoid deleting critical data)
    ncdu -r /

  • Docker container disk usage (OverlayFS), show top 5 largest
    docker ps -as --format "{{.Names}}: {{.Size}}" | sort -hr -k2 | head -n 5

  • Extended per-second I/O statistics for disks

    iostat -x 1

    (Install with: sudo apt install sysstat or sudo dnf install sysstat)

    w_await (Write Await) - Average write wait time in milliseconds
    r_await (Read Await) - Average read wait time in milliseconds
    %iowait (I/O Wait) - Percentage of time the CPU waits for I/O operations to complete (critical if > 15%)
    %util (Utilization) - Percentage of time the device was busy (critical if > 80%)

  • Monitor processes performing I/O

    sudo iotop -o -d 1 -a -k

    (Install with: sudo apt install -y iotop or sudo dnf install -y iotop)

  • Replication slot lag in the DB cluster

    docker exec -i postgres psql -U postgres -c "SELECT slot_name, pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) AS replicationSlotLag, active, database, restart_lsn, confirmed_flush_lsn FROM pg_replication_slots;"

  • Size of pg_wal (database write-ahead log)

    docker exec -i $(docker ps -aq --filter name=postgres --filter expose=5432) psql -U postgres -c "select pg_size_pretty(sum(size)) as \"SIZE_WAL\" from pg_ls_waldir();"

  • Check size of specific databases in the cluster

    docker exec -it $(docker ps -aq --filter name=postgres --filter expose=5432) psql -U postgres -c "\\l+"

  • Diagnose table sizes in the DB:

    SELECT 
        schemaname || '.' || relname AS table_name,
        pg_size_pretty(pg_total_relation_size(relid)) AS total_size,
        pg_total_relation_size(relid) AS bytes
    FROM 
        pg_catalog.pg_statio_user_tables 
    ORDER BY 
        pg_total_relation_size(relid) DESC 
    LIMIT 10;
    

:sparkle: Website Security Certificates

  • Check server response and certificate by domain name from outside:
    curl -v https://stage001-rusakov.simpleone.ru/

  • Show proxy container volumes (where certificates are stored on the server and in the container)
    docker inspect proxy | jq '.[].Mounts'

  • Public key and certificate data (domain, expiration date), executed from the directory containing certificates (proxy container bind volume):
    openssl x509 -in public.crt -text -noout | grep -E "Subject:|Not Before|Not After"

  • Which certificate is currently being served by proxy on the server:
    openssl s_client -connect localhost:443 -servername localhost 2>/dev/null | openssl x509 -noout -dates -subject

:sparkle: Network Settings and DNS

  • Check internet connectivity from the server
    ping 1.1.1.1

  • Server network interfaces and routes
    ip a
    ip r

  • Public IP address from which the server accesses the internet + domain resolution check from the server
    curl 2ip.ru

  • Configuration files determining how domain names are resolved from the server
    cat /etc/resolv.conf
    cat /etc/hosts

  • Check packet loss and availability from outside, ping by domain
    ping stage001-rusakov.simpleone.ru

  • Network trace by domain
    traceroute stage001-rusakov.simpleone.ru
    mtr stage001-rusakov.simpleone.ru

  • External domain resolution showing the server IP or load balancer IP behind it
    host stage001-rusakov.simpleone.ru
    nslookup stage001-rusakov.simpleone.ru

* We would greatly appreciate if you share your own tools for high-quality server diagnostics or provide feedback on the commands presented above!!!

6 Likes