Brain Dump

PowerShell and SQL.

So my most recent focus has been a huge scan of Linked In Leads I purchased. It’s a monstrous sized scan from linked in from August of 2024, so slightly dated but still a good baseline. 31 Million USA Location Linked in records with Email Addresses.93 Million USA Location Linked in records to harvest for , say, company names that the user mentions but did not have their email address listed.

And not just the US! Every Country that has members that mention their country , i have a record for them, also separated into two groups…those with emails to reference, and otehr records from the same country, but that do not feature an email.

What I purchased was access to all the flat file dumps, and the leads with email addresses are obviously the most value as far as marketing goes.

It turned into a fascinating project where I had wanted to bring that data into a proper database, and normalize it, to make it useful. What’s really great is how there are exception records….like what should be an email address, but it imports as an Nvarchar(max), and is a bunch of text and narrative from whoever that record is related to. 99% of the columns are what they “should” be, but those 1% are the things that every DBA has to work through and clean up.

So far, this has been my strategy.The code below is parameterized so i can apply it to every country.

Looking at the data, I want to denormalize the data, and limit the column sizes to varchar(128) except for true descriptions. The data will require some AI or Human Review and Intervention to eliminate garbage data that is preventing the data resizing. I Imported each file into SQL Server 2022, each file as a table.

  • Kept the original 31M row table as [dbo].[USAEmails_Original] for reference and cross checks.
  • Add a BinaryCheckSum Column to help find duplicates later
  • Add a BIGINT IDENTITY() column to create a unique primary key to properly reference each row
  • Compressed Every Table with data_compression=PAGE
  • Cleanup data with noise or garbage repeating characters, like data that had all “zzz” values in them
  • Converted my [LinkedInLeads] table to varchar instead of nvarchar.
  • Properly sized the columns from varchar(max) to a calcualted column that is just slightly larger than the current longest column ( ie [FullName from nvarchar(max) to VARCHAR(630) NULL; from there, usaemails
  • Created a ProtoType LinkedInLeads table with the VARCHAR(128) I Envisioned, and just arbitrarily truncated all but descriptions to VARCHAR(128)
  • Created the table LinkedInLeads as MemoryOptimized to see if it was faster to query as well as seeing the size shrin to 1/3 of it’s original 34GB

From There, it was time to start denormalizing and making the data usable

There are a few columns that would lend themselves to leveraging Vectors to find similar data.

  • “industry”
  • “jobtitle”
  • “subrole”
  • “location”
  • “skills”
  • “company”
  • “interests”

using a Python Script I created I grabbed each field, and for fields that were comma delimited them, reordered them alphabetically, lower cased the data, and created a vector for each unique value. Then I associated that unique value back to the original records in a M:1 relationship to my Distinct Vector Tables

things like emails, phone numbers and skills were comma delimited lists, so i pulled them out to a separate table as a one to many relationship. they Python script is slow, as it queries each value to vectorize it from my local LMStudio’s service using the text-embedding-nomic-embed-text-v1 model, but even if it’s doing six records a second, five million distinct values is going to take ten days to populate!

Designed and implemented reverse log shipping from AWS RDS SQL Server to EC2 SQL Server, including automating download, decryption with AWS KMS, and restoring transaction logs in correct LSN order.

Created PowerShell scripts that:

  • Import CSV files into SQL Server dynamically, generating tables named after the filenames.
  • Clean up SSRS log and dump files across any drive by detecting \Reporting Services\LogFiles directories automatically.
  • Monitor folders for new transaction log backups and process them continuously.
  • Parse and reformat ChatGPT conversation archives into text files.
  • Create dynamic robocopy jobs for hourly report synchronization.
  • Generate HTML reports from network scans and display devices not currently online.

Developed hash validation workflows to support SQL Server to MySQL migration:

  • Generated per-row SHA1 hashes.
  • Created hash-of-hashes across batches of 100,000 rows.
  • Compared results between platforms to confirm data consistency.

Converted SQL Server CREATE TABLE statements to be compatible with PostgreSQL:

  • Addressed data type mapping (e.g., uniqueidentifier to uuid).
  • Ensured constraints and defaults were translated correctly.

Investigated SQL Server decryption failures after restoring databases:

  • Troubleshot DECRYPTBYKEYAUTOCERT issues.
  • Confirmed certificate and master key restoration steps.

Built SSRS subscription management scripts:

  • Queried ExecutionLog2, Catalog, and Subscriptions to find report executions.
  • Identified reports not using shared data sources.
  • Extracted subscription details and schedule configurations.

Created AI DBA PowerShell framework:

  • Parameterized connection strings.
  • Ran diagnostic or remediation queries across multiple servers.
  • Logged results to an audit table.
  • Supported both Windows authentication and SQL authentication.

Researched SQL Server Extended Events:

  • Captured login encryption details (is_encrypted_transport).
  • Monitored stored procedure calls using wildcard matches in sql_batch_completed.

Migrated large tables from varchar to nvarchar columns:

  • Identified that in-place ALTER TABLE operations took 9+ hours.
  • Recommended staging tables and SWITCH/RENAME operations to maintain column order.

Tuned index rebuild scripts:

  • Added ONLINE=ON, RESUMABLE=ON, DATA_COMPRESSION=PAGE.
  • Moved large indexes to separate filegroups.

Implemented Resource Governor workload groups to limit CPU and memory for specific AD groups.

Built Python utilities to:

  • Process AWS S3 files and restore SQL Server logs.
  • Export SQL Server tables to Parquet with configurable batch sizes.
  • Scan network devices and generate HTML reports.

Configured Proxmox networking on Wi-Fi only hardware:

  • Wrote a script to detect the Wi-Fi adapter.
  • Applied static IP or DHCP configuration in the interfaces file.

Set up Docker Compose for Grafana on Linux Mint, including persistent volumes.

Addressed PostgreSQL schema scoping:

  • Clarified why custom schemas appeared across databases.
  • Ensured correct search_path usage in Docker deployments.

Converted UTF-16 SQL scripts to UTF-8 to import AdventureWorks into PostgreSQL without encoding errors.

Managed AWS RDS parameter groups:

  • Calculated max_connections based on instance memory.
  • Enabled slow query logs to table storage.

Developed vector embeddings workflows:

  • Stored embeddings in VECTOR columns (e.g., VECTOR(1536)).
  • Integrated LM Studio endpoints for embedding generation.
  • Load balanced embedding generation across multiple local model instances.

Tested BCP commands in PowerShell:

  • Solved issue where -Wait combined with RedirectStandardOutput caused PowerShell to hang after completion.

Built PowerShell file deletion scripts:

  • Filtered .out files and files without extensions.
  • Handled empty arrays safely to avoid method invocation errors.

Created PowerShell routines for SSISDB internal queries:

  • Tracked validation operation statuses (operation_type = 200).
  • Added QuickReview columns to surface error text.

Automated robocopy sync jobs:

  • Designed schedules to mirror folders hourly.
  • Captured logs and error handling for intermittent failures.

Rewrote PowerShell import processes:

  • Supported checkpointing at the folder level.
  • Ensured partial progress could resume without rescanning already imported data.

Monitored RDS PostgreSQL maintenance events:

  • Queried logs to see smart shutdown requests.
  • Investigated cause of restarts during maintenance windows.

Designed and tested PowerShell AI agent:

  • Automatically fixed detected SQL Server issues.
  • Logged all actions to an msdb table.

Investigated and resolved issues with:

  • VECTOR_FROM_JSON function not available in Azure SQL.
  • Differences in vector embedding dimensionality (1536 vs. 768) and compatibility.

Developed routines to process pip package lists:

  • In PowerShell, converted pip output to structured objects.
  • In Python, generated install commands for portability.

Created Proxmox Wi-Fi auto-discovery script:

  • Pre-filled credentials for connecting to a Netgear access point.
  • Assigned a static IP of 10.0.0.200.

Worked on Snowflake queries:

  • Investigated RESULT_SCAN() binding issues.
  • Searched for alternatives to SHOW USERS by querying INFORMATION_SCHEMA.

Evaluated strategies for monetizing idle hardware (excluding crypto mining).

Built and tested data enrichment processes:

  • Loaded lead data from USAEmails and enriched social media links.
  • Integrated plans for Make.com to handle contact workflows.

Debugged Flyway deployment scripts against AWS RDS SQL Server.

Supported columnstore index creation on large tables:

  • Clarified limitations with varchar(max) columns.
  • Tested update performance against columnstore indexes.

Created scripts to move internal Service Broker tables to other filegroups.

Validated encoding conversion workflows:

  • Used vim to re-encode large SQL scripts.

Automated bulk updates in PowerShell pipelines:

  • Managed large batches and retry logic.

Worked through Python FastAPI app deployment in Docker:

  • Managed dependencies and debugging output visibility.

Explored SSL certificates for multiple subdomains:

  • Planned separate certificates for stormrage.com and home.stormrage.com.

Developed automated cleanup of old S3 files via AWS CLI.

Researched PowerShell + Python hybrid workflows for decrypting transaction logs.

Configured Snowflake roles and access control for team workflows.

Addressed MySQL 9.3 vector search implementation:

  • Imported vector data.
  • Planned joins to enrich email datasets.

IF OBJECT_ID('[dbo].[USAEmails]') IS NOT NULL 
DROP TABLE [dbo].[USAEmails] 
GO
CREATE TABLE [dbo].[USAEmails] ( 
[USAEmailsId]                   BIGINT                     IDENTITY(1,1)          NOT NULL,
[FullName]                      VARCHAR(630)                                          NULL,
[Industry]                      VARCHAR(max)                                          NULL,
[JobTitle]                      VARCHAR(2980)                                         NULL,
[SubRole]                       VARCHAR(2850)                                         NULL,
[Emails]                        VARCHAR(360)                                          NULL,
[Mobile]                        VARCHAR(560)                                          NULL,
[PhoneNumbers]                  VARCHAR(680)                                          NULL,
[CompanyName]                   VARCHAR(230)                                          NULL,
[CompanyIndustry]               VARCHAR(290)                                          NULL,
[CompanyWebsite]                VARCHAR(250)                                          NULL,
[CompanySize]                   VARCHAR(1110)                                         NULL,
[Location]                      VARCHAR(2350)                                         NULL,
[Skills]                        VARCHAR(max)                                          NULL,
[FirstName]                     VARCHAR(1090)                                         NULL,
[LastName]                      VARCHAR(470)                                          NULL,
[BirthYear]                     VARCHAR(470)                                          NULL,
[BirthDate]                     VARCHAR(190)                                          NULL,
[Gender]                        VARCHAR(810)                                          NULL,
[LinkedInUrl]                   VARCHAR(720)                                          NULL,
[FacebookUrl]                   VARCHAR(120)                                          NULL,
[TwitterUrl]                    VARCHAR(830)                                          NULL,
[GithubUrl]                     VARCHAR(780)                                          NULL,
[CompanyLinkedinUrl]            VARCHAR(290)                                          NULL,
[CompanyFacebookUrl]            VARCHAR(240)                                          NULL,
[CompanyTwitterUrl]             VARCHAR(100)                                          NULL,
[CompanyLocationName]           VARCHAR(90)                                           NULL,
[CompanyLocationStreetAddress]  VARCHAR(130)                                          NULL,
[CompanyLocationAddressLine2]   VARCHAR(90)                                           NULL,
[CompanyLocationPostalCode]     VARCHAR(100)                                          NULL,
[LocationCountry]               VARCHAR(1100)                                         NULL,
[LocationContinent]             VARCHAR(2090)                                         NULL,
[LinkedinConnections]           VARCHAR(850)                                          NULL,
[InferredSalary]                VARCHAR(990)                                          NULL,
[YearsExperience]               VARCHAR(1090)                                         NULL,
[Countries]                     VARCHAR(1110)                                         NULL,
[Interests]                     VARCHAR(3620)                                         NULL,
[OldLocation]                   VARCHAR(2350)                                         NULL,
[BinaryCheckSum]                INT                                                   NULL,
CONSTRAINT   [PK__USAEmails__USAEmailsId]  PRIMARY KEY CLUSTERED    ([USAEmailsId] asc)  WITH (FILLFACTOR = 100,DATA_COMPRESSION = PAGE ))

GO

Latest Pages

Latest Posts

Latest Presentations