The IT Nerd

Guest Post: What is Web Scraping?

Posted in Commentary with tags Geonode on March 17, 2025 by itnerd

Ever wondered how companies gather huge amounts of data from the internet without breaking a sweat? That’s where web scraping comes into play. Imagine having a digital assistant that tirelessly scours websites, picking up the information you need and organizing it into neat spreadsheets or databases. That’s essentially what web scraping does.

Web scraping involves two main players: the crawler and the scraper. Picture the crawler as a curious explorer, navigating the vast internet landscape, while the scraper is the diligent collector, picking up the data gems. Together, they turn chaotic web data into structured, usable insights.

While you can technically scrape data manually, it’s usually an automated game—think bots or scripts doing the heavy lifting. This automation is a game-changer in today’s data-driven world, empowering businesses to stay competitive. Companies use web scraping for a variety of reasons, like monitoring prices, generating leads, conducting market research, and aggregating content. However, it’s crucial to remember that web scraping isn’t a free-for-all; there are legal and ethical boundaries to respect.

The Legal Landscape of Web Scraping

Web scraping, though incredibly useful, can be a legal minefield. You could stumble into issues like copyright infringement, violating terms of service, breaching data privacy laws, or misusing scraped content. Staying on the right side of the law is key, and understanding the legal frameworks that govern web scraping is crucial.

Key Laws and Regulations

The Computer Fraud and Abuse Act (CFAA)

The CFAA is a cornerstone law in the U.S. that governs web scraping. Established in 1986, it criminalizes “intentionally accessing a computer without authorization” or “exceeding authorized access.” Some landmark cases have helped shape its interpretation.

Van Buren v. United States

In 2021, the Supreme Court ruled in Van Buren v. United States that “exceeds authorized access” should only apply when someone accesses parts of a computer system they’re not supposed to. This narrows the scope of what counts as unauthorized access under the CFAA, offering some relief for web scrapers.

hiQ Labs, Inc. v. LinkedIn Corp.

In another pivotal case, the Ninth Circuit Court ruled that hiQ’s scraping of publicly accessible LinkedIn profiles did not constitute unauthorized access under the CFAA. LinkedIn couldn’t restrict public access to the data, making this a significant decision for the scraping community.

Data Protection Laws

When it comes to personal data, regulations like the GDPR in Europe and the CCPA in the U.S. mandate businesses to obtain proper consent. Ignoring these laws can lead to hefty fines and legal troubles.

Digital Millennium Copyright Act (DMCA)

The DMCA prohibits circumventing technological measures designed to control access to copyrighted works. So, if you’re thinking about bypassing some tech barrier to scrape data, you might want to think twice.

Ethical Best Practices

To navigate these legal complexities, ethical web scraping is the way to go:

Respect Terms of Service: Always abide by the terms of service of the websites you scrape.
Obtain Consent: Ensure you have the necessary consent to collect and use personal data, in line with GDPR and CCPA regulations.
Avoid Technological Barriers: Don’t bypass technical measures designed to protect content.

Ethical Concerns in Web Scraping

Web scraping isn’t just about legality; it’s also about ethics. You wouldn’t want to end up on the wrong side of a moral dilemma, right?

Privacy and Data Protection

Collecting personal data without consent is a major no-no. Ethical web scraping means obtaining necessary consents and complying with data protection laws.

Respect for Terms of Service

Web scraping often clashes with the terms of service of the targeted websites. Ignoring these terms can lead to legal battles and a loss of trust. Ethical scraping involves playing by the rules set by website owners.

Intellectual Property and Copyright

Scraping content without permission can lead to copyright issues. The DMCA and CFAA are pretty clear about this, and violations can have serious repercussions. For example, copying entire web pages or extracting data behind login credentials without authorization can breach proprietary rights.

Responsible Data Use

Misusing scraped data can lead to misinformation, spam, or other harmful activities. Responsible data usage means being transparent about your data collection practices and using the data ethically.

Best Practices for Ethical Web Scraping

Respect Robots.txt and Rate Limits: Configure your scrapers to follow the robots.txt file and adhere to rate limits to avoid overloading servers.
Legal Compliance: Stay updated on the legal landscape and comply with both local and international laws.
Transparency and Accountability: Be transparent about your data collection methods and be accountable for the data you collect.

Case Studies and Precedents

Learning from real-world cases can help you avoid potential pitfalls.

Van Buren v. United States (2021)

This Supreme Court decision reshaped how we interpret the CFAA by narrowing its scope. It ruled that the CFAA’s definition of “exceeds authorized access” only applies when someone breaches a technical barrier.

hiQ Labs, Inc. v. LinkedIn Corp.

In this case, the Ninth Circuit Court ruled that scraping data from a public website likely doesn’t violate the CFAA, even if the website owner objects. This decision emphasizes a more restrained interpretation of “unauthorized access.”

By studying these cases, businesses can better navigate the complex web of laws governing web scraping, ensuring their activities are both ethical and legal.

Actionable Takeaways

Here’s how you can practice ethical and legal web scraping:

Read the Terms of Service: Always check the terms of service of websites before scraping.
Get Consent: Make sure you have permission to collect and use personal data.
Follow Robots.txt: Respect the robots.txt file and adhere to rate limits.
Stay Informed: Keep up-to-date with legal requirements and best practices.
Be Transparent: Clearly communicate your data collection methods and purposes.

So, the next time you think about web scraping, remember to do it the right way—both legally and ethically. Happy scraping!

“Web scraping, if done ethically and legally, can be incredibly beneficial,” notes Josh Gordon, a technology infrastructure expert at Geonode. “With Geonode’s secure and reliable proxy solutions, businesses can access data without barriers, ensuring privacy and security.”

By following these guidelines, you can make the most out of web scraping while staying on the right side of both legal and ethical considerations.

Leave a comment »

PII Exposed Online in Healthcare Marketplace Connecting Facilities and Nurses Data Leak

Posted in Commentary with tags Hacked on March 14, 2025 by itnerd

Cybersecurity researcher Jeremiah Fowler discovered a non-password-protected database that contained over 86,000 records belonging to ESHYFT — a New-Jersey-based HealthTech company that operates in 29 states. This database contained 86,341 records including PII of users. A discovery that I previously covered here.

Erich Kron, Security Awareness Advocate at KnowBe4 had this to say:

“Breaches like this are indicative of the problem with collecting sensitive data without controls to protect it. Not only is the information that has been stolen extremely useful if a bad actor wants to steal one of these individuals’ identity, but it also contains a lot of information that could easily be used in an even more damaging social engineering attack. By having access to information about past jobs, shifts, or similar private life events, a bad actor could easily use it to convince a potential victim that they are from a previous employer, or a potential future employer trying to recruit them. Scams related to employment opportunities are common and can be used to fleece the victims out of money and even more sensitive information.”

“Organizations that handle information such as this have a duty to protect their customers’ information. While it is a temporary inconvenience for an organization to suffer a data breach, the implications of information such as this being lost can impact the victims for a lifetime. Organizations need to address not only technical security controls, but also human risk, which can include misconfiguring security and permissions related to information storage and access, poor software coding practices, or even making unapproved copies of data, among others.”

Paul Bischoff, Consumer Privacy Advocate at Comparitech follows with this:

“There is no excuse for leaving such a sensitive database unprotected, and it has almost certainly been found and copied already by cybercriminals. Our honeypot studies show it takes just a few hours for hackers to find and target exposed databases like this one. Thankfully, none of the data poses a direct threat to data subjects or their finances.”

“Hospitals, clinics, and other healthcare companies are frequently targeted by ransomware gangs and other cybercriminals. Comparitech researchers logged 146 confirmed ransomware attacks on US healthcare companies in 2024, compromising more than 24.8 million records. The average ransom was $1.05 million.”

Chris Hauk, Consumer Privacy Champion at Pixel Privacy adds this:

“Unfortunately, it seems like lately it’s another day, another data breach made easy by a misconfigured AWS S3 data bucket. There is simply no excuse for this happening. We’ve seen enough of these data breaches that are enabled by misconfigured data buckets that every database professional should be aware of the issue and they should have educated themselves as to how to better secure these data buckets. Until we see more educational efforts and efforts on the parts of IT professionals, we’ll continue to see these on a regular basis.”

Organizations need to make protecting PII their priority. And if they don’t take that responsibility seriously, then I say fine them and make it so expensive that they are forced to do the right thing. Because these sorts of events are not acceptable.

UPDATE: Martin Jartelius, CISO at Outpost24 had this to say:

“Do your attack surface management and track data leakage in it – otherwise someone else will. In this case someone who responsibly disclosed it later thankfully.”

Jim Routh, Chief Trust Officer at Saviynt follows with this:

“Thanks to Cybersecurity Researcher, Jeremiah Fowler for pointing out the obvious. Customer information for healthcare or any other sector must apply the right level of control to the appropriate data classification. Data classified as restricted or at the highest level must include encryption of data at rest and advanced multi-factor authentication at a minimum.”

Leave a comment »

FCC creates council to counter Chinese threats

Posted in Commentary with tags FCC on March 14, 2025 by itnerd

The FCC announced it is creating a national security council to improve US defenses against Chinese cyber-attacks and in an effort to “[win] the strategic competition with China over critical technologies” such as 5G, AI, and quantum computing.

The new FCC chair Brendan Carr said he was establishing the council to focus on the “persistent and constant threats from foreign adversaries, particularly the Chinese Communist party”.

“These bad actors are always exploring ways to breach our networks, devices, and technology ecosystem. It is more important than ever that the FCC remain vigilant and protect Americans and American companies from these threats,” Carr said.

Carr also mentioned that the council would “pull resources from a variety of FCC organizations” and target mitigating US vulnerabilities to cyber-attacks, espionage and surveillance and reducing supply chain dependence on adversarial states.

The new council is expected to shift focus from individual Chinese entities to a more sectoral approach due to US loopholes, such as a Chinese group changing its name, that allowed threat actors to circumvent punitive actions.

“The US side, instead of playing up the so-called ‘China threat’, should adopt an objective and rational perception of China. It needs to work with China, under the principles of mutual respect, peaceful coexistence and win-win co-operation, for stable, sound and sustainable development of China-US relations,” said Liu Pengyu, the embassy spokesperson, in learning of the new council.

Evan Dornbush, former NSA cybersecurity expert had this to say:

The FCC announcement to build a China-focused response capability is only a few days old, so it may be too early to understand the first-order tactics (and their effectiveness). This is a bold step. The FCC owns the airwaves, and with so much technology leveraging wireless, from drones using GNSS, to cellular networks using foreign-made 5G routing, to mesh networks coordinating over the managed spectrum, it’s clear the FCC is crucially placed to have impact.

This also gives the FCC a “stick” to match its “carrot”. Over the summer when US telecom carriers revealed that the lawful intercept systems they are obligated to operate (due to CALEA, which is managed by FCC), were exposed to foreign adversaries. The resulting action? Congress gave a $3B hand out to “rip and replace” foreign-manufactured equipment. With that gone, telcos still have vast exposure from old legacy equipment likely vulnerable to both known and zero-day exploits.

What might it take for these companies to upgrade? The new authorities could increase audits and inspections. It could increase stricter fines or other penalties.

And this stick could apply to areas other than telcos. It is common practice for foreign companies to white label through US shell entities to get around various disclosures and other restrictions pertaining to license applications. Tightening up the authorization process to trace the supply chain can perturb aggressors trying to preposition deeply embedded malware.

The Chinese are clearly a threat as demonstrated by their past actions. Thus anything that can be done to counter that threat is a good thing in my mind.

Leave a comment »

CISA Puts Out Advisory On Medusa Ransomware

Posted in Commentary with tags CISA on March 13, 2025 by itnerd

Yesterday, CISA released a joint advisory on the Medusa Ransomware that provided tactics, techniques, and procedures (TTPs), indicators of compromise (IOCs), and detection methods associated with the ransomware group. As of February 2025, Medusa has impacted over 300 victims across critical infrastructure sectors, including medical, education, law, insurance, technology, and manufacturing.

You can read the advisory here.

James Winebrenner, CEO at Elisity had this to say:

“The CISA recent advisory on Medusa ransomware really reflects how threat actors are getting smarter and adapting. What particularly concerns me is Medusa’s exploitation of legitimate remote management tools like AnyDesk, ConnectWise, and Splashtop, which are the tools many OT environments rely on for maintenance and support.

Medusa’s attack pattern through the lens of IEC 62443 is a classic example of why proper zone boundary protection (CR 5.2) and network segmentation (CR 5.1) are foundational to industrial control system security. The attackers first perform reconnaissance and then leverage legitimate tools for lateral movement before payload deployment, a pattern that traditional detection methods struggle to identify.

Organizations should implement three technical controls aligned with IEC 62443:

Implement proper zones and conduits architecture as specified in IEC 62443-3-2, ensuring critical control systems are isolated and protected from IT networks where initial compromise typically occurs.
Apply least privilege principles (CR 7.7) for all network communications. Define granular policies based on asset function and operational context rather than just network location to limit lateral movement.
Deploy solutions that can detect anomalous behavior in legitimate tools and enforce zone boundary protection (CR 5.2), focusing on monitoring behavioral patterns rather than just the presence of these tools.

The triple extortion scheme mentioned in the advisory indicates that Medusa actors understand the unique pressures facing critical infrastructure operators. Organizations must treat ransomware as a business risk requiring defense-in-depth strategies across people, process, and technology controls.

With Medusa attacks up 42% according to Symantec, OT security teams should reassess their segmentation strategies and ensure alignment with IEC 62443 standards.”

What this advisory highlights is the fact that this is a today problem and every organization needs to treat it as such. Because an advisory like this would not exist if this ransomware were not a clear and present danger.

Leave a comment »

John Gruber Rips Apple Over The Apple Intelligence Debacle

Posted in Commentary with tags Apple on March 13, 2025 by itnerd

For those who don’t know, John Gruber has been writing and covering the Apple space for a couple of decades now. He’s even been on stage interviewing top Apple execs. So when he says something about Apple, you should pay attention.

With that in mind, Gruber has posted this piece on his site and it should get the attention of people within Apple. In short, he pretty much takes Apple to the woodshed over Apple Intelligence:

In the two decades I’ve been in this racket, I’ve never been angrier at myself for missing a story than I am about Apple’s announcement on Friday that the “more personalized Siri” features of Apple Intelligence, scheduled to appear between now and WWDC, would be delayed until “the coming year”.

I should have my head examined.

This announcement dropped as a surprise, and certainly took me by surprise to some extent, but it was all there from the start. I should have been pointing out red flags starting back at WWDC last year, and I am embarrassed and sorry that I didn’t see what should have been very clear to me from the start.

And:

What Apple showed regarding the upcoming “personalized Siri” at WWDC was not a demo. It was a concept video. Concept videos are bullshit, and a sign of a company in disarray, if not crisis.

He’s clearly not pulling any punches here. And I’ve just posted a couple of snippets of what he said. If you really want to get the full flavor of his epic takedown of Apple, I encourage you to read the whole piece. But let me get to the TL:DR: He’s basically said that nobody should have believed the Apple Intelligence demo at WWDC 2024 because Apple was lying. And now they’re scrambling to somehow catch up when they were already behind the 8-ball so to speak.

And the thing is he’s right as far as I am concerned. Just like I said here. And hopefully this is the wake up call that Apple needs to get its act together. Because if not, Apple’s credibility at the very least is screwed. And at worst, the company may be screwed as well.

Are you listening Tim Cook?

Leave a comment »

The Evolution of the Worst Passwords Over the Last 10 Years

Posted in Commentary with tags Safety Detectives on March 13, 2025 by itnerd

Here’s some fascinating research done by Safety Detectives on the evolution of the most commonly used passwords, their typical length and complexity, and the behaviors that influence how people create them.

Key findings at a glance:

NordPass’ sixth annual report on the most common passwords for 2024 reveals that “123456” was the most frequently used password worldwide in 2024, used 3,018,050 times in the dataset.
Of the 200 most common passwords identified, an astonishing 161, or 80.5%, can be cracked in just 1 second. The most “difficult” password to crack from the list is g_czechout, taking approximately 12 days.
The most common password in the United States in 2024 was “secret,” used a total of 328,831 times. As for the other countries, “123456′ dominates in the vast majority, only topped by “qwerty123” in Canada, Finland, Lithuania, the Netherlands, and Norway.
Many employees use the same weak passwords for work accounts as they do for personal accounts. Approximately 40% of the most common corporate passwords mirrored those used by individuals, with “123456” again topping the list.

While password habits have evolved over time, many people still rely on simple and predictable choices that leave them vulnerable to cyber threats. As we move forward, stronger, longer, and more unique passwords will be necessary to protect our digital lives. By learning from past trends and adopting better security practices, we can create a safer online environment for ourselves and those around us.

You can access the report here: https://www.safetydetectives.com/blog/worst-passwords-research

Leave a comment »

138k patients have had their personal data stolen from a NYC radiologist

Posted in Commentary with tags Comparitech on March 13, 2025 by itnerd

Ransomware group Fog today claimed responsibility for a November 2024 data breach at University Diagnostic Medical Imaging that compromised 138,080 patients’ names, addresses, dates of birth, referring physicians, medical treatments, and diagnoses.

In a blog published today reporting this news, Paul Bischoff, Consumer Privacy Advocate at Comparitech, wrote:

“Fog is a ransomware gang that first started claiming attacks on its website in July 2024. It has a history of targeting US schools but is not limited to them. In addition to encrypting files, Fog also steals data and targets development environments.”

“Fog has claimed 18 confirmed ransomware attacks since it began, plus another 157 unconfirmed claims that haven’t been acknowledged by the targeted organizations. This breach on UDMI’s is Fog’s biggest attack to date by number of records affected, followed by its attack on medical device maker PRC-Saltillo.”

“Comparitech researchers logged 146 confirmed ransomware attacks on US healthcare companies in 2024, compromising more than 24.8 million records. The average ransom was $1.05 million.”

“Ransomware attacks on hospitals, clinics, and other care providers can lock down computer systems and steal data. Targets are forced to either pay a ransom or face extended downtime, data loss, and putting customers at risk of fraud. Ransomware can cripple a wide range of systems including access to medical records, appointment booking, payroll, prescriptions, patient communications, and more.”

You can read the blog post here.

Leave a comment »

Rogers Recognized as Canada’s Most Reliable Internet by Opensignal

Posted in Commentary with tags Rogers on March 13, 2025 by itnerd

Rogers Communications announced today that it has been named the most reliable internet in Canada by Opensignal, the leading global provider of independent network experience insights and market performance.

The Opensignal report shows that in Canada, Rogers wins for overall reliability experience, consistent quality and download speed. Last month, Rogers was also recognized as Canada’s most reliable wireless network.

Over the last 20 years, Rogers has invested nearly $70 billion in our networks and continues to invest to deliver enhanced reliability and multi-gig speeds to almost eight million homes this year. These investments in network infrastructure combined with the introduction of Rogers Xfinity late last year brings Canadians industry-leading internet technology on a world-class suite of products so they can game more, stream more and do more.

To learn more about Rogers Xfinity visit Rogers.com.

Leave a comment »

KnowBe4’s KB4-CON 2025 to Spotlight AI’s Dual Role in Cybersecurity Threats and Defenses

Posted in Commentary with tags KnowBe4 on March 12, 2025 by itnerd

KnowBe4, the world-renowned cybersecurity platform that comprehensively addresses human risk management, announced registration details and its lineup of speakers who will cover the latest topics of cybersecurity at the organization’s annual conference KB4-CON 2025 at the Gaylord Palm Resort & Convention Center in Orlando, Florida, April 7-9.

This premier cybersecurity event will bring together security professionals from across the industry to tackle one of today’s most pressing challenges: managing human risk in an era of advanced, AI-powered threats. Throughout the three days, attendees will explore the latest research and insights on cybersecurity, with in-depth discussions led by industry experts on the role of AI in managing human risk.

The All-Access Pass includes entry to all keynotes and breakout sessions, the KB4 Lab, meals and drinks throughout the event, and access to virtual content for 90 days after the conference. The conference will also feature a celebration of the 2025 KnowBe4 Sharky Award winners, where organizations are recognized for excelling in fostering a safe and informed digital culture. Attendees will also have the exciting opportunity to be extras in the filming of Season seven of KnowBe4’s award-winning series “The Inside Man”. Channel partners will have exclusive sessions and networking opportunities tailored to their needs.

The conference will feature an impressive lineup of industry experts addressing the most pressing cybersecurity challenges of 2025. Highlights include Perry Carpenter’s “FAIK Around and Find Out”, Roger Grimes “You Might Have a North Korean Employee” and Anna Collard’s “Cultivating a Zero-Trust Mindset”. View the full list of speakers and the agenda here.

To learn more about the keynote speakers, visit here. For more information about KB4-CON 2025 and to register, visit here.

Leave a comment »

ESET Canada Announces 2024 Partner of the Year Awards

Posted in Commentary with tags ESET on March 12, 2025 by itnerd

ESET Canada is proud to announce the winners of its 2024 Canadian Partner of the Year Awards, recognizing the outstanding achievements and contributions of our reseller ecosystem, which contributed to our above-market SMB growth and success in the past year.

2024 Highlights:

Services Growth: ESET Canada saw a remarkable 70% increase in services over the previous year, driven by the adoption of ESET Managed Detection and Response (MDR), providing 24/7 threat monitoring, detection, and incident response.
MSP Business: Their MSP business thrived with double-digit growth as they onboarded new partners, and as more partners looked to standardize on their most robust cloud offerings, thanks to the opening of their Canadian data centre.
Customer loyalty: ESET Canada achieved it’s target benchmark for renewals, which highlights their strong customer loyalty and satisfaction among the more than 10,000 Canadian businesses they protect.

2024 Partner of the Year Awards: ESET Canada is thrilled to present the winners of this year’s Partner of the Year Awards:

SMB Partner of the Year: GB Micro
Enterprise Partner of the Year: Insight Canada
Services Partner of the Year: SOS Computer Experts
MSP Partner of the Year: GAM Tech
Rising Star Partner of the Year: IO SECURE

Congratulations to all the winners.

Resellers can uncover more growth opportunities at ESET World 2025 in Las Vegas. Secure your spot virtually, today!

Leave a comment »

The IT Nerd

Guest Post: What is Web Scraping?

The Legal Landscape of Web Scraping

Key Laws and Regulations

The Computer Fraud and Abuse Act (CFAA)

Van Buren v. United States

hiQ Labs, Inc. v. LinkedIn Corp.

Data Protection Laws

Digital Millennium Copyright Act (DMCA)

Ethical Best Practices

Ethical Concerns in Web Scraping

Privacy and Data Protection

Respect for Terms of Service

Intellectual Property and Copyright

Responsible Data Use

Best Practices for Ethical Web Scraping

Case Studies and Precedents

Van Buren v. United States (2021)

hiQ Labs, Inc. v. LinkedIn Corp.

Actionable Takeaways

PII Exposed Online in Healthcare Marketplace Connecting Facilities and Nurses Data Leak

FCC creates council to counter Chinese threats

CISA Puts Out Advisory On Medusa Ransomware

John Gruber Rips Apple Over The Apple Intelligence Debacle

The Evolution of the Worst Passwords Over the Last 10 Years

138k patients have had their personal data stolen from a NYC radiologist

Rogers Recognized as Canada’s Most Reliable Internet by Opensignal

KnowBe4’s KB4-CON 2025 to Spotlight AI’s Dual Role in Cybersecurity Threats and Defenses

ESET Canada Announces 2024 Partner of the Year Awards

Pages

Blogroll