Cybersecurity researcher Jeremiah Fowler discovered a non-password-protected database that contained over 86,000 records belonging to ESHYFT — a New-Jersey-based HealthTech company that operates in 29 states. This database contained 86,341 records including PII of users. A discovery that I previously covered here.
Erich Kron, Security Awareness Advocate at KnowBe4 had this to say:
“Breaches like this are indicative of the problem with collecting sensitive data without controls to protect it. Not only is the information that has been stolen extremely useful if a bad actor wants to steal one of these individuals’ identity, but it also contains a lot of information that could easily be used in an even more damaging social engineering attack. By having access to information about past jobs, shifts, or similar private life events, a bad actor could easily use it to convince a potential victim that they are from a previous employer, or a potential future employer trying to recruit them. Scams related to employment opportunities are common and can be used to fleece the victims out of money and even more sensitive information.”
“Organizations that handle information such as this have a duty to protect their customers’ information. While it is a temporary inconvenience for an organization to suffer a data breach, the implications of information such as this being lost can impact the victims for a lifetime. Organizations need to address not only technical security controls, but also human risk, which can include misconfiguring security and permissions related to information storage and access, poor software coding practices, or even making unapproved copies of data, among others.”
Paul Bischoff, Consumer Privacy Advocate at Comparitech follows with this:
“There is no excuse for leaving such a sensitive database unprotected, and it has almost certainly been found and copied already by cybercriminals. Our honeypot studies show it takes just a few hours for hackers to find and target exposed databases like this one. Thankfully, none of the data poses a direct threat to data subjects or their finances.”
“Hospitals, clinics, and other healthcare companies are frequently targeted by ransomware gangs and other cybercriminals. Comparitech researchers logged 146 confirmed ransomware attacks on US healthcare companies in 2024, compromising more than 24.8 million records. The average ransom was $1.05 million.”
Chris Hauk, Consumer Privacy Champion at Pixel Privacy adds this:
“Unfortunately, it seems like lately it’s another day, another data breach made easy by a misconfigured AWS S3 data bucket. There is simply no excuse for this happening. We’ve seen enough of these data breaches that are enabled by misconfigured data buckets that every database professional should be aware of the issue and they should have educated themselves as to how to better secure these data buckets. Until we see more educational efforts and efforts on the parts of IT professionals, we’ll continue to see these on a regular basis.”
Organizations need to make protecting PII their priority. And if they don’t take that responsibility seriously, then I say fine them and make it so expensive that they are forced to do the right thing. Because these sorts of events are not acceptable.
UPDATE: Martin Jartelius, CISO at Outpost24 had this to say:
“Do your attack surface management and track data leakage in it – otherwise someone else will. In this case someone who responsibly disclosed it later thankfully.”
Jim Routh, Chief Trust Officer at Saviynt follows with this:
“Thanks to Cybersecurity Researcher, Jeremiah Fowler for pointing out the obvious. Customer information for healthcare or any other sector must apply the right level of control to the appropriate data classification. Data classified as restricted or at the highest level must include encryption of data at rest and advanced multi-factor authentication at a minimum.”
Guest Post: What is Web Scraping?
Posted in Commentary with tags Geonode on March 17, 2025 by itnerdBy Geonode
Ever wondered how companies gather huge amounts of data from the internet without breaking a sweat? That’s where web scraping comes into play. Imagine having a digital assistant that tirelessly scours websites, picking up the information you need and organizing it into neat spreadsheets or databases. That’s essentially what web scraping does.
Web scraping involves two main players: the crawler and the scraper. Picture the crawler as a curious explorer, navigating the vast internet landscape, while the scraper is the diligent collector, picking up the data gems. Together, they turn chaotic web data into structured, usable insights.
While you can technically scrape data manually, it’s usually an automated game—think bots or scripts doing the heavy lifting. This automation is a game-changer in today’s data-driven world, empowering businesses to stay competitive. Companies use web scraping for a variety of reasons, like monitoring prices, generating leads, conducting market research, and aggregating content. However, it’s crucial to remember that web scraping isn’t a free-for-all; there are legal and ethical boundaries to respect.
The Legal Landscape of Web Scraping
Web scraping, though incredibly useful, can be a legal minefield. You could stumble into issues like copyright infringement, violating terms of service, breaching data privacy laws, or misusing scraped content. Staying on the right side of the law is key, and understanding the legal frameworks that govern web scraping is crucial.
Key Laws and Regulations
The Computer Fraud and Abuse Act (CFAA)
The CFAA is a cornerstone law in the U.S. that governs web scraping. Established in 1986, it criminalizes “intentionally accessing a computer without authorization” or “exceeding authorized access.” Some landmark cases have helped shape its interpretation.
Van Buren v. United States
In 2021, the Supreme Court ruled in Van Buren v. United States that “exceeds authorized access” should only apply when someone accesses parts of a computer system they’re not supposed to. This narrows the scope of what counts as unauthorized access under the CFAA, offering some relief for web scrapers.
hiQ Labs, Inc. v. LinkedIn Corp.
In another pivotal case, the Ninth Circuit Court ruled that hiQ’s scraping of publicly accessible LinkedIn profiles did not constitute unauthorized access under the CFAA. LinkedIn couldn’t restrict public access to the data, making this a significant decision for the scraping community.
Data Protection Laws
When it comes to personal data, regulations like the GDPR in Europe and the CCPA in the U.S. mandate businesses to obtain proper consent. Ignoring these laws can lead to hefty fines and legal troubles.
Digital Millennium Copyright Act (DMCA)
The DMCA prohibits circumventing technological measures designed to control access to copyrighted works. So, if you’re thinking about bypassing some tech barrier to scrape data, you might want to think twice.
Ethical Best Practices
To navigate these legal complexities, ethical web scraping is the way to go:
Ethical Concerns in Web Scraping
Web scraping isn’t just about legality; it’s also about ethics. You wouldn’t want to end up on the wrong side of a moral dilemma, right?
Privacy and Data Protection
Collecting personal data without consent is a major no-no. Ethical web scraping means obtaining necessary consents and complying with data protection laws.
Respect for Terms of Service
Web scraping often clashes with the terms of service of the targeted websites. Ignoring these terms can lead to legal battles and a loss of trust. Ethical scraping involves playing by the rules set by website owners.
Intellectual Property and Copyright
Scraping content without permission can lead to copyright issues. The DMCA and CFAA are pretty clear about this, and violations can have serious repercussions. For example, copying entire web pages or extracting data behind login credentials without authorization can breach proprietary rights.
Responsible Data Use
Misusing scraped data can lead to misinformation, spam, or other harmful activities. Responsible data usage means being transparent about your data collection practices and using the data ethically.
Best Practices for Ethical Web Scraping
Case Studies and Precedents
Learning from real-world cases can help you avoid potential pitfalls.
Van Buren v. United States (2021)
This Supreme Court decision reshaped how we interpret the CFAA by narrowing its scope. It ruled that the CFAA’s definition of “exceeds authorized access” only applies when someone breaches a technical barrier.
hiQ Labs, Inc. v. LinkedIn Corp.
In this case, the Ninth Circuit Court ruled that scraping data from a public website likely doesn’t violate the CFAA, even if the website owner objects. This decision emphasizes a more restrained interpretation of “unauthorized access.”
By studying these cases, businesses can better navigate the complex web of laws governing web scraping, ensuring their activities are both ethical and legal.
Actionable Takeaways
Here’s how you can practice ethical and legal web scraping:
So, the next time you think about web scraping, remember to do it the right way—both legally and ethically. Happy scraping!
“Web scraping, if done ethically and legally, can be incredibly beneficial,” notes Josh Gordon, a technology infrastructure expert at Geonode. “With Geonode’s secure and reliable proxy solutions, businesses can access data without barriers, ensuring privacy and security.”
By following these guidelines, you can make the most out of web scraping while staying on the right side of both legal and ethical considerations.
Leave a comment »