Program to Find the Duplicate Words in a String & File

Find the Duplicate Words in a String & File: This Java program aims to count the frequency of repeated words in a given input string and display the results. It does so by splitting the input string into individual words using the space character as a delimiter and then using nested loops to compare each word with the rest of the words to identify repeated occurrences.

Find the Duplicate Words in a String

package com.softwaretestingo.interviewprograms;
public class InterviewPrograms90 
{
	//Java Program to Count repeated words in String
	public static void main(String[] args) 
	{
		String input="Welcome to Java Session Session Session";  
		String[] words=input.split(" ");
		int wrc=1;

		for(int i=0;i<words.length;i++)   
		{
			for(int j=i+1;j<words.length;j++)
			{

				if(words[i].equals(words[j]))
				{
					wrc=wrc+1; 
					words[j]="0";
				}
			}
			if(words[i]!="0")
				System.out.println(words[i]+"--"+wrc);
			wrc=1;

		}  
	}
}

Output

Welcome--1
to--1
Java--1
Session--3

Step-by-Step Explanation:

  • The program starts with the definition of the public class named InterviewPrograms90.
  • The main method serves as the entry point of the program.
  • A String variable named input is initialized with the input string “Welcome to Java Session Session Session.”
  • The input string is split into individual words using the split(” “) method, which creates a String array named words.
  • A variable wrc (word count) is initialized with the value 1 to keep track of the frequency of each word.
  • The program uses two nested loops to compare each word with the rest of the words in the words array.
  • If a repeated word is found, the wrc is incremented by 1, and the second occurrence of the word is marked as “0” to avoid counting it again.
  • After checking all occurrences of a word, if it is not marked as “0,” the program prints the word along with its frequency using System.out.println(words[i] + “–” + wrc);.
  • The wrc is reset to 1 for the next word, and the process continues until all words have been processed.

Summary: The Java program allows for counting the frequency of repeated words in a given input string. It achieves this by using nested loops to compare each word with the rest of the words in the input string. If a word appears more than once, its frequency is incremented and marked as “0” to avoid recounting. The program then displays the unique words and their respective frequencies.

This program is helpful for beginners to understand how to split strings into words, use nested loops for word comparison, and count occurrences to find the frequency of repeated words in a string in Java. It showcases a simple and straightforward approach to solving this problem without using any external data structures like maps.

Alternative Way 1:

This Java program aims to count and identify duplicated or repeated words in a given input string and display the results. It uses a Scanner class to read user input from the console and then utilizes a HashMap to keep track of unique words and duplicated words. The program performs a case-insensitive comparison to handle words in different letter cases.

package com.softwaretestingo.interviewprograms;
import java.util.HashMap;
import java.util.Map;
import java.util.Scanner;
public class InterviewPrograms90_1 
{
	//Java Program to Count repeated words in String
	public static void main(String[] args) 
	{
		System.out.print("Enter string to analyse:");
		Scanner sn = new Scanner(System.in);
		String input = sn.nextLine();

		// first let us split string into words
		String[] words = input.split(" ");

		// adds all words into a map
		// we also check whether the word is already in map!
		Map<String,String> wordMap = new HashMap<String,String>();                
		Map<String,String> printedMap = new HashMap<String,String>();

		for(int i=0;i<words.length;i++) 
		{
			String word = words[i].toUpperCase(); // for case insensitive comparison
			if(wordMap.get(word)!=null) 
			{
				// we found a duplicated word!
				if(printedMap.get(word)==null) { // first check if it is printed already!
					System.out.println("Duplicated/Repeated word:"+word);
					printedMap.put(word, word); 
				}
			}else 
			{
				wordMap.put(word, word);
			}
		}
	}
}

Output

Enter string to analyse:Welcome to software testing blog to blog
Duplicated/Repeated word:TO
Duplicated/Repeated word:BLOG

Step-by-Step Explanation:

  • The program starts with the definition of the public class named InterviewPrograms90_1.
  • The main method serves as the entry point of the program.
  • The program prompts the user to enter a string using System.out.print(“Enter string to analyze:”);.
  • The input string is read from the user using Scanner and sn.nextLine(), and it is stored in the String variable input.
  • The input string is then split into individual words using the split(” “) method, creating a String array named words.
  • Two HashMap objects, wordMap and printedMap, are created to store words and track printed words, respectively.
  • The program uses a loop to iterate through each word in the words array.
  • Each word is converted to uppercase (word.toUpperCase()) to handle case-insensitive comparisons.
  • The program checks whether the word is already in the wordMap. If so, it means it is a duplicated word.
  • If the word is duplicated and not yet printed (printedMap.get(word)==null), the program prints it as a duplicated/repeated word using System.out.println(“Duplicated/Repeated word:” + word);.
  • After printing a duplicated word, it is added to the printedMap to avoid printing it again.
  • If the word is not duplicated, it is added to the wordMap to keep track of unique words.
  • The process continues until all words have been processed.

Summary: The Java program allows for counting and identifying duplicated or repeated words in a given input string. It achieves this by utilizing a HashMap to store unique words and track printed words. The program performs a case-insensitive comparison to handle words with different letter cases.

This program is helpful for beginners to understand how to read user input, split strings into words, use HashMap to keep track of words, and identify duplicated words. It demonstrates an efficient approach to solving this problem by utilizing the built-in data structure HashMap and performing a case-insensitive comparison for word handling.

Alternative Way 2:

This Java program aims to read a text file, count the occurrences of each word, and then display the repeated words along with their occurrences in descending order of frequency. It utilizes various Java classes like BufferedReader, FileReader, HashMap, ArrayList, and Entry to achieve this.

Find the Duplicate Words in a File

package com.softwaretestingo.interviewprograms;
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Collections;
import java.util.Comparator;
import java.util.HashMap;
import java.util.List;
import java.util.Map.Entry;
import java.util.Set;
public class InterviewPrograms91 
{
	//Java Program to Count repeated words in Text File
	public static void main(String[] args) 
	{
		HashMap<String, Integer> wordCountMap = new HashMap<String, Integer>();
		BufferedReader reader = null;

		try
		{
			//Creating BufferedReader object

			reader = new BufferedReader(new FileReader("C:\\Users\\XXXXX\\git\\javapgms\\Java_Programs\\testdata.txt"));

			//Reading the first line into currentLine
			String currentLine = reader.readLine();

			while (currentLine != null)
			{   
				//splitting the currentLine into words
				String[] words = currentLine.toLowerCase().split(" ");

				//Iterating each word
				for (String word : words)
				{
					//if word is already present in wordCountMap, updating its count

					if(wordCountMap.containsKey(word))
					{   
						wordCountMap.put(word, wordCountMap.get(word)+1);
					}

					//otherwise inserting the word as key and 1 as its value
					else
					{
						wordCountMap.put(word, 1);
					}
				}

				//Reading next line into currentLine

				currentLine = reader.readLine();
			}

			//Getting all the entries of wordCountMap in the form of Set

			Set<Entry<String, Integer>> entrySet = wordCountMap.entrySet();

			//Creating a List by passing the entrySet

			List<Entry<String, Integer>> list = new ArrayList<Entry<String,Integer>>(entrySet);

			//Sorting the list in the decreasing order of values 

			Collections.sort(list, new Comparator<Entry<String, Integer>>() 
			{
				@Override
				public int compare(Entry<String, Integer> e1, Entry<String, Integer> e2) 
				{
					return (e2.getValue().compareTo(e1.getValue()));
				}
			});

			//Printing the repeated words in input file along with their occurrences

			System.out.println("Repeated Words In Input File Are :");

			for (Entry<String, Integer> entry : list) 
			{
				if (entry.getValue() > 1)
				{
					System.out.println(entry.getKey() + " : "+ entry.getValue());
				}
			}
		} 
		catch (IOException e) 
		{
			e.printStackTrace();
		}
		finally
		{
			try
			{
				reader.close();           //Closing the reader
			}
			catch (IOException e) 
			{
				e.printStackTrace();
			}
		}
	}
}

Output

Repeated Words In Input File Are :
java : 5
jdbc : 3
jsf : 3
hibernate : 2
spring : 2

Step-by-Step Explanation:

  • The program starts with the definition of the public class named InterviewPrograms91.
  • The main method serves as the entry point of the program.
  • A HashMap named wordCountMap is initialized to store word occurrences.
  • A BufferedReader object (reader) is created to read the text file testdata.txt.
  • The program reads each line of the file using reader.readLine() and processes it until the end of the file is reached.
  • The current line is converted to lowercase (currentLine.toLowerCase()) to ensure case-insensitive word comparisons.
  • The program splits the current line into individual words using the space character as a delimiter (currentLine.split(” “)).
  • It iterates through each word and checks if it is already present in wordCountMap.
  • If the word is already present, its count is updated by incrementing the corresponding value in wordCountMap.
  • If the word is not present, it is added to wordCountMap with a count of 1.
  • After processing all lines in the file, the program gets all the entries from wordCountMap as a set.
  • It converts the entry set into an ArrayList named list for sorting and displaying.
  • The program sorts the list in descending order of word occurrences using a custom comparator.
  • It prints the repeated words and their occurrences where the occurrences are greater than 1.

Summary: The Java program allows users to read a text file, count the occurrences of each word, and display repeated words along with their occurrences. It demonstrates file handling, reading lines from a file, word counting using a HashMap, and sorting the results using an ArrayList.

This program is helpful for beginners to understand how to read files, process text data, count word occurrences, and sort data using built-in data structures and Java classes. It provides a comprehensive example of working with file I/O and collections in Java, and it is useful for text analysis tasks, such as identifying repeated words and their frequencies in a file.

I love open-source technologies and am very passionate about software development. I like to share my knowledge with others, especially on technology that's why I have given all the examples as simple as possible to understand for beginners. All the code posted on my blog is developed, compiled, and tested in my development environment. If you find any mistakes or bugs, Please drop an email to softwaretestingo.com@gmail.com, or You can join me on Linkedin.

Leave a Comment